Documentation Index
Fetch the complete documentation index at: https://docs.emergence.ai/llms.txt
Use this file to discover all available pages before exploring further.
Access LLMs
Solutions never call OpenAI, Anthropic, or Gemini directly. Instead, point an OpenAI-compatible SDK at the platform’s LiteLLM gateway. You get provider-agnostic access, central API-key management, per-project cost attribution, rate-limit enforcement, and Langfuse traces — for free. The canonical reference implementation is in Data Insights › Text-to-SQL; this page extracts the patterns every solution needs.Why a gateway
| Without a gateway | With LiteLLM gateway |
|---|---|
| Each solution holds its own API keys | One key per provider, held centrally |
| Switching from OpenAI → Anthropic = code change | Switching is an env var change |
| Per-solution cost attribution requires custom telemetry | Free via gateway metadata + Langfuse |
| Rate limits enforced per-key (poorly) | Enforced centrally, per project |
| Adding observability requires per-call wrapping | Auto-instrumented via Langfuse hook |
provider/model to the chosen upstream and auto-traces every call into Langfuse. The metadata={"project_id": ..., "solution": ...} you pass on every call is what makes per-project cost attribution possible.
Configure your service
Three env vars (plus Langfuse credentials when you want tracing):charts/<solution>/values.yaml
Code patterns
Minimal call (litellm)
litellm speaks the OpenAI completions wire format and works against any compatible endpoint. Install with uv add litellm.
packages/api/src/api/llm.py
Per-call model override
The default model is configurable at deploy time, but any call site can override it:provider/model prefix. Any model the gateway is configured to serve is callable; ask the platform operator for the current list.
Streaming (SSE)
StreamingResponse to expose to your UI as Server-Sent Events.
Observability — Langfuse
WhenLANGFUSE_HOST, LANGFUSE_PUBLIC_KEY, and LANGFUSE_SECRET_KEY are set, litellm auto-emits traces. Each call shows up as a trace with the metadata you passed (project, solution, latency, token counts, model, prompt+completion). See Guides › Langfuse Setup and Deployment › Observability › Langfuse.
Add litellm.success_callback = ["langfuse"] once at startup (the gateway’s reference implementation does this in commons/llm/__init__.py of em-talk2data).
For LLM-specific observability patterns, see Deployment › Observability › LLM Observability.
Cost attribution
Themetadata={"project_id": ..., "solution": ...} you pass on each call is the only thing that gives the platform per-project cost attribution. Always pass it. Without it, the call rolls up to a generic bucket and the cost dashboard cannot tell you which project caused the spike.
If you wrap litellm.acompletion in a service helper (recommended), make project_id a required parameter so it’s impossible to call without it.
Failure modes
429 Too Many Requests
429 Too Many Requests
The gateway enforces per-project rate limits. Back off exponentially (
tenacity with wait_random_exponential) and surface a friendly message to the user. Do not retry forever — let the user see the rate limit.503 Service Unavailable
503 Service Unavailable
401 Unauthorized
401 Unauthorized
Model unavailable
Model unavailable
Cost spike — no attribution
Cost spike — no attribution
A spike with no project attribution means a code path is calling without
metadata. Grep for litellm.acompletion( and confirm every call passes metadata={"project_id": ..., "solution": ...}.Verification
Next steps
Langfuse setup
Stand up Langfuse and wire it to your service.
LLM observability
The platform-side observability stack.
LLM observability deep-dive
Trace inspection, cost dashboards, model comparison.
Data Insights › Text-to-SQL
Reference implementation that uses this pattern in production.

