Documentation Index
Fetch the complete documentation index at: https://docs.emergence.ai/llms.txt
Use this file to discover all available pages before exploring further.
Text-to-SQL Agent
The Text2SQL Agent is a specialized A2A agent that converts natural-language questions into SQL queries. It uses schema-aware generation, SQL validation viasqlglot, and safe execution against customer databases.
Pipeline versions
Two Text2SQL pipelines are available, controlled by a single configuration switch:- Legacy pipeline (default, in production): the generate → validate → execute flow described below. A single schema is fetched per question, validated, and executed.
- Multi-schema pipeline (in rollout, behind a flag): introduces schema selection and enrichment steps so questions spanning multiple schemas are handled without requiring the user to pre-select a schema. The architecture is decided (ADRs 004/005) and the scaffolding is in place; production rollout is in progress.
text2sql.pipeline.use_new_pipeline in config.yaml. The legacy pipeline is the supported path in current deployments.
The multi-schema pipeline introduces:
- Schema table selection (ADR 004): a schema-selection step narrows the catalog to the schemas relevant to the question, using semantic search over the data catalog.
- Raw DDL storage (ADR 005): column-level DDL is stored and retrieved for the selected schemas, giving the LLM richer type and constraint context without fetching the entire catalog.
CatalogSearchResultmodel and pipeline state fields (pruned_schema,enriched_schema, etc.) carrying multi-schema context across steps.- A new
allow_multi_schemaflag onCatalogSchemaProviderthat bypasses the single-schema guard when the pipeline is active.
Pipeline Overview
The Text2SQL agent follows a generate, validate, execute pipeline built on thecommons.pipeline.Pipeline framework:
Schema retrieval
The agent fetches the target database schema (tables, columns, types, relationships, foreign keys) from the data connection. This schema context is included in the LLM prompt for accurate SQL generation.
SQL generation
The LLM generates a SQL query based on the natural-language question and schema context. The prompt includes:
- Database schema (tables, columns, types)
- Table relationships and foreign keys
- Previous conversation context (for follow-up questions)
- Database dialect (PostgreSQL)
SQL validation
The generated SQL is validated using
sqlglot:- Syntax validation: Ensures the SQL is syntactically correct
- Dialect validation: Confirms compatibility with the target database dialect
- Safety checks: Prevents destructive operations (DROP, DELETE, TRUNCATE, ALTER)
- If validation fails, the agent retries generation with the error feedback
Query execution
The validated SQL is executed against the customer database via the data connection. The agent uses a read-only connection with query timeouts to prevent runaway queries.
Schema-Aware Generation
The quality of generated SQL depends heavily on the schema context provided to the LLM. The Text2SQL agent builds a comprehensive schema representation:SQL Validation
The agent usessqlglot for multi-layer validation:
| Check | Purpose |
|---|---|
| Parse | Verify the SQL is syntactically valid |
| Transpile | Confirm compatibility with the PostgreSQL dialect |
| Safety | Block destructive statements (DDL, DML mutations) |
| Column validation | Verify referenced columns exist in the schema |
Safety Rules
The Text2SQL agent enforces strict safety rules:| Allowed | Blocked |
|---|---|
SELECT queries | INSERT, UPDATE, DELETE |
WITH (CTEs) | DROP, ALTER, TRUNCATE |
| Aggregate functions | CREATE, GRANT, REVOKE |
| Window functions | Raw EXECUTE, dynamic SQL |
| Subqueries | Statements that modify data or schema |
Pipeline Framework
The Text2SQL agent uses thecommons.pipeline.Pipeline state machine for managing the generate-validate-execute flow:
Transition objects that specify the next step (goto), with "break" signaling pipeline completion and "error" signaling failure. Built-in cooperative cancellation allows long-running pipelines to be interrupted gracefully.
Event Streaming
The Text2SQL agent emits A2A events at each pipeline step for real-time progress:| Step | Event | Message |
|---|---|---|
| Schema retrieval | TaskStatusUpdateEvent | ”Analyzing database schema…” |
| SQL generation | TaskStatusUpdateEvent | ”Generating SQL query…” |
| Validation | TaskStatusUpdateEvent | ”Validating query…” |
| Execution | TaskStatusUpdateEvent | ”Executing query…” |
| Results | TaskArtifactUpdateEvent | Query results as DataPart |
Error Handling
The agent uses typed error hierarchies for recoverable failures:| Error Type | Handling |
|---|---|
SQLValidationError | Retry SQL generation with validation error context |
QueryExecutionError | Return user-friendly error message with the failed SQL |
SchemaRetrievalError | Check data connection health and credentials |
QueryTimeoutError | Suggest query simplification or filtering |
| Unexpected errors | Wrapped in StepError by the pipeline framework |
Agent Card
The Text2SQL agent exposes its capabilities via the standard A2A Agent Card:Sample Data Preview
The Talk2Data Service exposes aPOST /talk2data/v1/sample endpoint that returns a preview of rows from a database table without running a full analytical query. This is used by the frontend to let users inspect table contents before composing a natural-language question.
Request fields:
| Field | Type | Description |
|---|---|---|
table_fqn | string | Fully qualified table name: database.schema.table |
resource_uri | string | Data connection resource URI (from the Assets service) |
limit | integer | Number of rows to return (default varies by connection type) |
| Status | Cause |
|---|---|
400 | Invalid table_fqn format (must be database.schema.table) |
404 | Data connection not found for the given resource_uri |
501 | Connection type does not support row sampling |
503 | EM_RUNTIME_ASSETS_URL not configured |
MCP API changes (2026-05-13)
Two breaking changes shipped in em-talk2data v3.2.0 (commitecdc18a):
generate-sql: schema field is now required
GenerateSQLRequest changed from an optional list to a required single object:
- The JSON field name is
schema(notschema_— that is the Python attribute alias) - The field is required; omitting it returns
422 Unprocessable Entitywith anHTTPValidationErrorbody
execute-query: 422 responses now include error detail
execute-query previously returned a generic 422 with no body schema on SQL validation failures. It now returns a structured HTTPValidationError body, making it easier to surface validation failure reasons in MCP clients.
Deployment: missing env vars in local/CI Helm values
EM_RUNTIME_ASSETS_URL and EM_RUNTIME_UTILS_URL were missing from the talk2data-service env block in values.local.yaml and values.ci.yaml. This caused execute_query calls to return 503. Update local and CI overrides to include:
values.yaml), the URLs use the cross-namespace service DNS (em-runtime-assets.em-runtime.svc.cluster.local) — these were already correct before this fix.
Next Steps
Chat With Data
See how Text2SQL fits into the full conversational analytics pipeline.
Analysis Agent
Learn how the Insights Agent orchestrates analysis beyond SQL.
Visualizations
Understand how query results become interactive charts.
Data Source Setup
Connect a database for Text2SQL to query.

