OpenTelemetry
All em-runtime services export traces, metrics, and logs via OpenTelemetry using the vendor-neutral OTLP protocol. The platform ships with built-in instrumentation — no application code changes are needed.
Architecture
The OTLP collector is not included in the em-runtime Helm chart. Deploy your preferred collector separately. Any OTLP-compatible backend works: Grafana Cloud, Datadog, Splunk, New Relic, Honeycomb, or self-hosted.
OpenTelemetry vs Langfuse
These two observability systems are complementary , not overlapping:
OpenTelemetry Langfuse Instruments HTTP requests, DB queries, Redis, service health LLM API calls (model, prompt, tokens, cost, quality) Answers ”Is the service healthy? Is it slow?" "Is the AI producing good output? What does it cost?” Integration OTLP exporter (always active) LiteLLM callbacks (when LANGFUSE_HOST is set) Storage PromQL-compatible metrics backend / Tempo / Loki Langfuse’s own PostgreSQL + ClickHouse Visualization Grafana Langfuse UI
Use both together: OTel for infrastructure health, Langfuse for LLM quality and cost. See the Langfuse setup guide for LLM-specific observability.
Configuration
Telemetry is controlled via environment variables in each service’s env block in the Helm values:
Variable Default Description OTEL_ENABLED"true"Master switch for all telemetry. OTEL_EXPORTER_OTLP_ENDPOINT"http://otel-collector:4317"OTLP collector gRPC endpoint. OTEL_TRACES_ENABLED"true"Enable distributed tracing. OTEL_METRICS_ENABLED"true"Enable metrics export. OTEL_LOGS_ENABLED"true"Enable log record export via OTLP. OTEL_TRACE_SAMPLE_RATE"1.0"Trace sampling ratio (0.0-1.0). Default is 100%; override via Helm values for production.
Point to Your Collector
Override the endpoint in your values file:
em-runtime-governance :
env :
OTEL_EXPORTER_OTLP_ENDPOINT : "http://otel-collector.monitoring:4317"
em-runtime-assets :
env :
OTEL_EXPORTER_OTLP_ENDPOINT : "http://otel-collector.monitoring:4317"
em-runtime-utils :
env :
OTEL_EXPORTER_OTLP_ENDPOINT : "http://otel-collector.monitoring:4317"
Disable Telemetry
For test environments without a collector:
em-runtime-governance :
env :
OTEL_ENABLED : "false"
Individual signals can also be toggled independently.
Telemetry initialization is non-fatal. If the collector is unreachable, services log a warning and continue operating normally.
Auto-Instrumented Libraries
The following libraries are automatically instrumented with no code changes:
Library What It Captures FastAPI Inbound HTTP request spans (excludes /health) SQLAlchemy Database query spans and connection metrics Redis Redis command spans httpx Outbound HTTP request spans (inter-service SDK calls)
Telemetry Signals
Traces
Distributed traces follow the W3C Trace Context format. Traces propagate across service boundaries automatically via httpx instrumentation.
Key trace fields:
service.name — identifies the emitting service
http.method, http.url, http.status_code — HTTP span attributes
db.system, db.statement — database query details
trace_id — correlates logs and traces
Metrics
Application metrics are exported via OTLP and include:
HTTP request duration histograms
HTTP request counts by status code
Database connection pool utilization
Redis command latency
Logs
Structured JSON logs are written to stdout and optionally exported via OTLP:
Field Description levelLog level (DEBUG, INFO, WARNING, ERROR) messageLog message timestampISO 8601 timestamp serviceService name trace_idW3C trace ID for log-trace correlation
OTel Collector Configuration
Deploy the OpenTelemetry Collector to receive, process, and export telemetry data.
Recommended Setup
# otel-collector-config.yaml
receivers :
otlp :
protocols :
grpc :
endpoint : 0.0.0.0:4317
http :
endpoint : 0.0.0.0:4318
processors :
batch :
timeout : 5s
send_batch_size : 1024
exporters :
# Traces
otlp/tempo :
endpoint : tempo.monitoring:4317
tls :
insecure : true
# Metrics
prometheusremotewrite :
endpoint : http://prometheus.monitoring:9090/api/v1/write
# Logs
loki :
endpoint : http://loki.monitoring:3100/loki/api/v1/push
service :
pipelines :
traces :
receivers : [ otlp ]
processors : [ batch ]
exporters : [ otlp/tempo ]
metrics :
receivers : [ otlp ]
processors : [ batch ]
exporters : [ prometheusremotewrite ]
logs :
receivers : [ otlp ]
processors : [ batch ]
exporters : [ loki ]
Helm Installation
helm install otel-collector open-telemetry/opentelemetry-collector \
--namespace monitoring \
--create-namespace \
-f otel-collector-config.yaml
Metric Sources
Source Endpoint Protocol Application services (OTel) OTLP gRPC (4317) / HTTP (4318) OpenTelemetry Keycloak /keycloak/metrics (port 8080)Prometheus exposition format Kubernetes kube-state-metrics, node-exporter Prometheus exposition format PostgreSQL pg_exporter (optional) Prometheus exposition format Redis redis_exporter (optional) Prometheus exposition format
Backend Options
Component Purpose Prometheus Metrics storage and querying Grafana Tempo Distributed trace storage Loki Log aggregation Grafana Dashboards and visualization Alertmanager Alert routing and notification
Deploy via the Grafana LGTM Helm chart or individual component charts. Point the OTel Collector exporters to Grafana Cloud endpoints: exporters :
otlp/grafana :
endpoint : tempo-<region>.grafana.net:443
headers :
Authorization : "Basic <base64-credentials>"
Use the Datadog Agent as an OTLP receiver: exporters :
datadog :
api :
key : ${DD_API_KEY}
Use the AWS Distro for OpenTelemetry (ADOT) Collector: exporters :
awsxray : {}
awsemf : {}
Next Steps
Helm Configuration Telemetry environment variables in Helm values.
Prerequisites Infrastructure requirements for the observability stack.