Containerized Deployment

This unified guide helps you quickly run Semantic Router locally (Docker Compose) or in a cluster (Kubernetes) and explains when to choose each path.Both share the same configuration concepts: Docker Compose is ideal for rapid iteration and demos, while Kubernetes is suited for long‑running workloads, elasticity, and upcoming Operator / CRD scenarios.

Choosing a Path

Docker Compose path = semantic-router + Envoy proxy + optional mock vLLM (testing profile) + Prometheus + Grafana. It gives you an end-to-end local playground with minimal friction.

Kubernetes path (current manifests) = ONLY the semantic-router Deployment (gRPC + metrics), a PVC for model cache, its ConfigMap, and two Services (gRPC + metrics). It does NOT yet bundle Envoy, a real LLM inference backend, Istio, or any CRDs/Operator.

Scenario / Goal	Recommended Path	Why
Local dev, quickest iteration, hacking code	Docker Compose	One command starts router + Envoy + (optionally) mock vLLM + observability stack
Demo with dashboard quickly	Docker Compose (testing profile)	Bundled Prometheus + Grafana + mock responses
Team shared staging / pre‑prod	Kubernetes	Declarative config, rolling upgrades, persistent model volume
Performance, scalability, autoscaling	Kubernetes	HPA, scheduling, resource isolation
Future Operator / CRD driven config	Kubernetes	Native controller pattern

You can seamlessly reuse the same configuration concepts in both paths.

Common Prerequisites

Docker Engine: see more in Docker Engine Installation

Clone repo：

git clone https://github.com/vllm-project/semantic-router.git
cd semantic-router

Download classification models (≈1.5GB, first run only):
```
make download-models
```
This downloads the classification models used by the router:
- Category classifier (ModernBERT-base)
- PII classifier (ModernBERT-base)
- Jailbreak classifier (ModernBERT-base)

Path A: Docker Compose Quick Start

Requirements

Docker Compose v2 (docker compose command, not the legacy docker-compose)

Install Docker Compose Plugin (if missing), see more in Docker Compose Plugin Installation

# For Debian / Ubuntu
sudo apt-get update 
sudo apt-get install -y docker-compose-plugin

# For RHEL / CentOS / Fedora
sudo yum update -y 
sudo yum install -y docker-compose-plugin

# Verify
docker compose version

Ensure ports 8801, 50051, 19000, 3000 and 9090 are free

Start Services

# Core (router + envoy)
docker compose up --build

# Detached (recommended once OK)
docker compose up -d --build

# Include mock vLLM + testing profile (points router to mock endpoint)
CONFIG_FILE=/app/config/config.testing.yaml \
  docker compose --profile testing up --build

Verify

gRPC: localhost:50051
Envoy HTTP: http://localhost:8801
Envoy Admin: http://localhost:19000
Prometheus: http://localhost:9090
Grafana: http://localhost:3000 (admin / admin for first login)

Common Operations

# View service status
docker compose ps

# Follow logs for the router service
docker compose logs -f semantic-router

# Exec into the router container
docker compose exec semantic-router bash

# Recreate after config change
docker compose up -d --build

# Stop and clean up containers
docker compose down

Path B: Kubernetes Quick Start

Requirements

Kubernetes cluster
kubectlaccess (CLI)
Optional: Prometheus metrics stack (e.g. Prometheus Operator)
(Planned / not yet merged) Service Mesh or advanced gateway:
- Istio / Kubernetes Gateway API
Separate deployment of Envoy (or another gateway) + real LLM endpoints (follow Installation guide).
- Replace placeholder IPs in deploy/kubernetes/config.yaml once services exist.

Deploy (Kustomize)

kubectl apply -k deploy/kubernetes/

# Wait for pod
kubectl -n semantic-router get pods

Manifests create:

Deployment (main container + init model downloader)
Service semantic-router (gRPC 50051)
Service semantic-router-metrics (metrics 9190)
ConfigMap (base config)
PVC (model cache)

Port Forward (Ad-hoc)

kubectl -n semantic-router port-forward svc/semantic-router 50051:50051 &
kubectl -n semantic-router port-forward svc/semantic-router-metrics 9190:9190 &

Observability (Summary)

Add a ServiceMonitor or a static scrape rule
Import deploy/llm-router-dashboard.json (see observability.md)

Updating Config

deploy/kubernetes/config.yaml updated：

kubectl apply -k deploy/kubernetes/
kubectl -n semantic-router rollout restart deploy/semantic-router

Typical Customizations

Goal	Change
Scale horizontally	`kubectl scale deploy/semantic-router --replicas=N`
Resource tuning	Edit `resources:` in `deployment.yaml`
Add HTTP readiness	Switch TCP probe -> HTTP `/health` (port 8080)
PVC size	Adjust storage request in PVC manifest
Metrics scraping	Add ServiceMonitor / scrape rule

Feature Comparison

Capability	Docker Compose	Kubernetes
Startup speed	Fast (seconds)	Depends on cluster/image pull
Config reload	Manual recreate	Rolling restart / future Operator / hot reload
Model caching	Host volume/bind	PVC persistent across pods
Observability	Bundled stack	Integrate existing stack
Autoscaling	Manual	HPA / custom metrics
Isolation / multi-tenant	Single host network	Namespaces / RBAC
Rapid hacking	Minimal friction	YAML overhead
Production lifecycle	Basic	Full (probes, rollout, scaling)

Troubleshooting (Unified)

HF model download failure / DNS errors

Log example: Dns Failed: resolve huggingface.co. See solutions in Network Tips

Port conflicts

Adjust external port mappings in docker-compose.yml, or free local ports 8801 / 50051 / 19000.

Extra tip: If you use the testing profile, also pass the testing config so the router targets the mock service:

CONFIG_FILE=/app/config/config.testing.yaml docker compose --profile testing up --build

Envoy/Router up but requests fail

Ensure mock-vllm is healthy (testing profile only):
- docker compose ps should show mock-vllm healthy; logs show 200 on /health.
Verify the router config in use:
- Router logs print Starting vLLM Semantic Router ExtProc with config: .... If it shows /app/config/config.yaml while testing, you forgot CONFIG_FILE.
Basic smoke test via Envoy (OpenAI-compatible):
- Send a POST to http://localhost:8801/v1/chat/completions with {"model":"auto", "messages":[{"role":"user","content":"hi"}]} and check that the mock responds with [mock-openai/gpt-oss-20b] content when testing profile is active.

DNS problems inside containers

If DNS is flaky in your Docker environment, add DNS servers to the semantic-router service in docker-compose.yml:

services:
  semantic-router:
    # ...
    dns:
      - 1.1.1.1
      - 8.8.8.8

For corporate proxies, set http_proxy, https_proxy, and no_proxy in the service environment.

Make sure 8801, 50051, 19000 are not bound by other processes. Adjust ports in docker-compose.yml if needed.

Choosing a Path​

Common Prerequisites​

Path A: Docker Compose Quick Start​

Requirements​

Start Services​

Verify​

Common Operations​

Path B: Kubernetes Quick Start​

Requirements​

Deploy (Kustomize)​

Port Forward (Ad-hoc)​

Observability (Summary)​

Updating Config​

Typical Customizations​

Feature Comparison​

Troubleshooting (Unified)​

HF model download failure / DNS errors​

Port conflicts​

Envoy/Router up but requests fail​

DNS problems inside containers​

Choosing a Path

Common Prerequisites

Path A: Docker Compose Quick Start

Requirements

Start Services

Verify

Common Operations

Path B: Kubernetes Quick Start

Requirements

Deploy (Kustomize)

Port Forward (Ad-hoc)

Observability (Summary)

Updating Config

Typical Customizations

Feature Comparison

Troubleshooting (Unified)

HF model download failure / DNS errors

Port conflicts

Envoy/Router up but requests fail

DNS problems inside containers