Nebari LLM Serving Pack

active Author

KubernetesvLLMllm-dEnvoy AI GatewayKeycloakHelm

Most organisations that want to run their own language models end up building the same plumbing from scratch: serving infrastructure, access control, rate limiting, audit logging. The Nebari LLM Serving Pack packages that plumbing as a first-class NKP software pack.

Deploy a model by creating an LLMModel custom resource. The pack handles the rest: llm-d scheduling, vLLM serving pods, per-model RBAC enforced by Keycloak so teams can only reach models they’re authorised for, token-based rate limiting to prevent runaway costs, and complete audit logging through Envoy AI Gateway - all integrated with the platform’s existing SSO and GitOps infrastructure.

The self-service API key manager lets users generate their own keys for models they’re authorised to use, without involving platform engineering.