Skip to main content

Observability

This page focuses solely on collecting and visualizing metrics for Semantic Router using Prometheus and Grafanaβ€”deployment method (Docker Compose vs Kubernetes) is covered in docker-quickstart.md.


1. Metrics & Endpoints Summary​

ComponentEndpointNotes
Router metrics:9190/metricsPrometheus format (flag: --metrics-port)
Router health (future probe):8080/healthHTTP readiness/liveness candidate
Envoy metrics (optional):19000/stats/prometheusIf you enable Envoy

Dashboard JSON: deploy/llm-router-dashboard.json.

Primary source file exposing metrics: src/semantic-router/cmd/main.go (uses promhttp).


2. Docker Compose Observability​

Compose bundles: prometheus, grafana, semantic-router, (optional) envoy, mock-vllm.

Key files:

  • config/prometheus.yaml
  • config/grafana/datasource.yaml
  • config/grafana/dashboards.yaml
  • deploy/llm-router-dashboard.json

Start (with testing profile example):

CONFIG_FILE=/app/config/config.testing.yaml docker compose --profile testing up --build

Access:

Expected Prometheus targets:

  • semantic-router:9190
  • envoy-proxy:19000 (optional)

3. Kubernetes Observability​

This guide adds a production-ready Prometheus + Grafana stack to the existing Semantic Router Kubernetes deployment. It includes manifests for collectors, dashboards, data sources, RBAC, and ingress so you can monitor routing performance in any cluster.

Namespace – All manifests default to the vllm-semantic-router-system namespace to match the core deployment. Override it with Kustomize if you use a different namespace.

What Gets Installed​

ComponentPurposeKey Files
PrometheusScrapes Semantic Router metrics and stores them with persistent retentionprometheus/ (rbac.yaml, configmap.yaml, deployment.yaml, pvc.yaml, service.yaml)
GrafanaVisualizes metrics using the bundled LLM Router dashboard and a pre-configured Prometheus datasourcegrafana/ (secret.yaml, configmap-*.yaml, deployment.yaml, pvc.yaml, service.yaml)
Ingress (optional)Exposes the UIs outside the clusteringress.yaml
Dashboard provisioningAutomatically loads deploy/llm-router-dashboard.json into Grafanagrafana/configmap-dashboard.yaml

Prometheus is configured to discover the semantic-router-metrics service (port 9190) automatically. Grafana provisions the same LLM Router dashboard that ships with the Docker Compose stack.

1. Prerequisites​

  • Deployed Semantic Router workload via deploy/kubernetes/
  • A Kubernetes cluster (managed, on-prem, or kind)
  • kubectl v1.23+
  • Optional: an ingress controller (NGINX, ALB, etc.) if you want external access

2. Directory Layout​

deploy/kubernetes/observability/
β”œβ”€β”€ README.md
β”œβ”€β”€ kustomization.yaml # (created in the next step)
β”œβ”€β”€ ingress.yaml # optional HTTPS ingress examples
β”œβ”€β”€ prometheus/
β”‚ β”œβ”€β”€ configmap.yaml # Scrape config (Kubernetes SD)
β”‚ β”œβ”€β”€ deployment.yaml
β”‚ β”œβ”€β”€ pvc.yaml
β”‚ β”œβ”€β”€ rbac.yaml # SA + ClusterRole + binding
β”‚ └── service.yaml
└── grafana/
β”œβ”€β”€ configmap-dashboard.yaml # Bundled LLM router dashboard
β”œβ”€β”€ configmap-provisioning.yaml # Datasource + provider config
β”œβ”€β”€ deployment.yaml
β”œβ”€β”€ pvc.yaml
β”œβ”€β”€ secret.yaml # Admin credentials (override in prod)
└── service.yaml

3. Prometheus Configuration Highlights​

  • Uses kubernetes_sd_configs to enumerate endpoints in vllm-semantic-router-system
  • Keeps 15 days of metrics by default (--storage.tsdb.retention.time=15d)
  • Stores metrics in a PersistentVolumeClaim named prometheus-data
  • RBAC rules grant read-only access to Services, Endpoints, Pods, Nodes, and EndpointSlices

Scrape configuration snippet​

scrape_configs:
- job_name: semantic-router
kubernetes_sd_configs:
- role: endpoints
namespaces:
names:
- vllm-semantic-router-system
relabel_configs:
- source_labels: [__meta_kubernetes_service_name]
regex: semantic-router-metrics
action: keep
- source_labels: [__meta_kubernetes_endpoint_port_name]
regex: metrics
action: keep

Modify the namespace or service name if you changed them in your primary deployment.

4. Grafana Configuration Highlights​

  • Stateful deployment backed by the grafana-storage PVC
  • Datasource provisioned automatically pointing to http://prometheus:9090
  • Dashboard provider watches /var/lib/grafana-dashboards
  • Bundled llm-router-dashboard.json is identical to deploy/llm-router-dashboard.json
  • Admin credentials pulled from the grafana-admin secret (default admin/admin – change this!)

Updating credentials​

kubectl create secret generic grafana-admin \
--namespace vllm-semantic-router-system \
--from-literal=admin-user=monitor \
--from-literal=admin-password='pick-a-strong-password' \
--dry-run=client -o yaml | kubectl apply -f -

Remove or overwrite the committed secret.yaml when you adopt a different secret management approach.

5. Deployment Steps​

5.1. Create the Kustomization​

Create deploy/kubernetes/observability/kustomization.yaml (see below) to assemble all manifests. This guide assumes you keep Prometheus & Grafana in the same namespace as the router.

5.2. Apply manifests​

kubectl apply -k deploy/kubernetes/observability/

Verify pods:

kubectl get pods -n vllm-semantic-router-system

You should see prometheus-... and grafana-... pods in Running state.

5.3. Integration with the core deployment​

  1. Deploy or update Semantic Router (kubectl apply -k deploy/kubernetes/).

  2. Deploy observability stack (kubectl apply -k deploy/kubernetes/observability/).

  3. Confirm the metrics service (semantic-router-metrics) has endpoints:

    kubectl get endpoints semantic-router-metrics -n vllm-semantic-router-system
  4. Prometheus target should transition to UP within ~15 seconds.

5.4. Accessing the UIs​

Optional Ingress – If you prefer to keep the stack private, delete ingress.yaml from kustomization.yaml before applying.

  • Port-forward (quick check)

    kubectl port-forward svc/prometheus 9090:9090 -n vllm-semantic-router-system
    kubectl port-forward svc/grafana 3000:3000 -n vllm-semantic-router-system

    Prometheus β†’ http://localhost:9090, Grafana β†’ http://localhost:3000

  • Ingress (production) – Customize ingress.yaml with real domains, TLS secrets, and your ingress class before applying. Replace *.example.com and configure HTTPS certificates via cert-manager or your provider.

6. Verifying Metrics Collection​

  1. Open Prometheus (port-forward or ingress) β†’ Status β–Έ Targets β†’ ensure semantic-router job is green.
  2. Query rate(llm_model_completion_tokens_total[5m]) – should return data after traffic.
  3. Open Grafana, log in with the admin credentials, and confirm the LLM Router Metrics dashboard exists under the Semantic Router folder.
  4. Generate traffic to Semantic Router (classification or routing requests). Key panels should start populating:
    • Prompt Category counts
    • Token usage rate per model
    • Routing modifications between models
    • Latency histograms (TTFT, completion p95)

7. Dashboard Customization​

  • Duplicate the provisioned dashboard inside Grafana to make changes while keeping the original as a template.
  • Update Grafana provisioning (grafana/configmap-provisioning.yaml) to point to alternate folders or add new providers.
  • Add additional dashboards by extending grafana/configmap-dashboard.yaml or mounting a different ConfigMap.
  • Incorporate Kubernetes cluster metrics (CPU/memory) by adding another datasource or deploying kube-state-metrics + node exporters.

8. Best Practices​

Resource Sizing​

  • Prometheus: increase CPU/memory with higher scrape cardinality or retention > 15 days.
  • Grafana: start with 500m CPU / 1Gi RAM; scale replicas horizontally when concurrent viewers exceed a few dozen.

Storage​

  • Use SSD-backed storage classes for Prometheus when retention/window is large.
  • Increase prometheus/pvc.yaml (default 20Gi) and grafana/pvc.yaml (default 10Gi) to match retention requirements.
  • Enable volume snapshots or backups for dashboards and alert history.

Security​

  • Replace the demo grafana-admin secret with credentials stored in your preferred secret manager.
  • Restrict ingress access with network policies, OAuth proxies, or SSO integrations.
  • Enable Grafana role-based access control and API keys for automation.
  • Scope Prometheus RBAC to only the namespaces you need. If metrics run in multiple namespaces, list them in the scrape config.

Maintenance​

  • Monitor Prometheus disk usage; prune retention or scale PVC before it fills up.
  • Back up Grafana dashboards or store them in Git (already done through this ConfigMap).
  • Roll upgrades separately: update Prometheus and Grafana images via kustomization.yaml patches.
  • Consider adopting the Prometheus Operator (ServiceMonitor + PodMonitor) if you already run kube-prometheus-stack. A sample ServiceMonitor is in website/docs/tutorials/observability/observability.md.

4. Key Metrics (Sample)​

MetricTypeDescription
llm_category_classifications_countcounterNumber of category classification operations
llm_model_completion_tokens_totalcounterTokens emitted per model
llm_model_routing_modifications_totalcounterModel switch / routing adjustments
llm_model_completion_latency_secondshistogramCompletion latency distribution
process_cpu_seconds_total / process_resident_memory_bytesstandardRuntime resource usage

Use typical PromQL patterns:

rate(llm_model_completion_tokens_total[5m])
histogram_quantile(0.95, sum by (le) (rate(llm_model_completion_latency_seconds_bucket[5m])))

5. Troubleshooting​

SymptomLikely CauseCheckFix
Target DOWN (Docker)Service name mismatchPrometheus /targetsEnsure semantic-router container running
Target DOWN (K8s)Label/selectors mismatchkubectl get ep semantic-router-metricsAlign labels or ServiceMonitor selector
No new tokens metricsNo trafficGenerate chat/completions via EnvoySend test requests
Dashboard emptyDatasource URL wrongGrafana datasource settingsPoint to http://prometheus:9090 (Docker) or cluster Prometheus
Large 5xx spikesBackend model unreachableRouter logsVerify vLLM endpoints configuration