Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.emergence.ai/llms.txt

Use this file to discover all available pages before exploring further.

Text-to-SQL Agent

The Text2SQL Agent is a specialized A2A agent that converts natural-language questions into SQL queries. It uses schema-aware generation, SQL validation via sqlglot, and safe execution against customer databases.

Pipeline versions

Two Text2SQL pipelines are available, controlled by a single configuration switch:
  • Legacy pipeline (default, in production): the generate → validate → execute flow described below. A single schema is fetched per question, validated, and executed.
  • Multi-schema pipeline (in rollout, behind a flag): introduces schema selection and enrichment steps so questions spanning multiple schemas are handled without requiring the user to pre-select a schema. The architecture is decided (ADRs 004/005) and the scaffolding is in place; production rollout is in progress.
Switch via text2sql.pipeline.use_new_pipeline in config.yaml. The legacy pipeline is the supported path in current deployments. The multi-schema pipeline introduces:
  • Schema table selection (ADR 004): a schema-selection step narrows the catalog to the schemas relevant to the question, using semantic search over the data catalog.
  • Raw DDL storage (ADR 005): column-level DDL is stored and retrieved for the selected schemas, giving the LLM richer type and constraint context without fetching the entire catalog.
  • CatalogSearchResult model and pipeline state fields (pruned_schema, enriched_schema, etc.) carrying multi-schema context across steps.
  • A new allow_multi_schema flag on CatalogSchemaProvider that bypasses the single-schema guard when the pipeline is active.
The rest of this page describes the legacy pipeline, which remains the default.

Pipeline Overview

The Text2SQL agent follows a generate, validate, execute pipeline built on the commons.pipeline.Pipeline framework:
1

Schema retrieval

The agent fetches the target database schema (tables, columns, types, relationships, foreign keys) from the data connection. This schema context is included in the LLM prompt for accurate SQL generation.
2

SQL generation

The LLM generates a SQL query based on the natural-language question and schema context. The prompt includes:
  • Database schema (tables, columns, types)
  • Table relationships and foreign keys
  • Previous conversation context (for follow-up questions)
  • Database dialect (PostgreSQL)
3

SQL validation

The generated SQL is validated using sqlglot:
  • Syntax validation: Ensures the SQL is syntactically correct
  • Dialect validation: Confirms compatibility with the target database dialect
  • Safety checks: Prevents destructive operations (DROP, DELETE, TRUNCATE, ALTER)
  • If validation fails, the agent retries generation with the error feedback
4

Query execution

The validated SQL is executed against the customer database via the data connection. The agent uses a read-only connection with query timeouts to prevent runaway queries.
5

Result formatting

Query results are formatted as structured data (DataPart) and returned to the calling agent via A2A events.

Schema-Aware Generation

The quality of generated SQL depends heavily on the schema context provided to the LLM. The Text2SQL agent builds a comprehensive schema representation:
# Schema context includes:
# - Table names and descriptions
# - Column names, types, and constraints
# - Primary and foreign key relationships
# - Sample values for enum-like columns
# - Table-level comments and documentation
For better SQL generation quality, add comments and descriptions to your database schema. The Text2SQL agent uses these to understand the semantic meaning of tables and columns.

SQL Validation

The agent uses sqlglot for multi-layer validation:
CheckPurpose
ParseVerify the SQL is syntactically valid
TranspileConfirm compatibility with the PostgreSQL dialect
SafetyBlock destructive statements (DDL, DML mutations)
Column validationVerify referenced columns exist in the schema

Safety Rules

The Text2SQL agent enforces strict safety rules:
AllowedBlocked
SELECT queriesINSERT, UPDATE, DELETE
WITH (CTEs)DROP, ALTER, TRUNCATE
Aggregate functionsCREATE, GRANT, REVOKE
Window functionsRaw EXECUTE, dynamic SQL
SubqueriesStatements that modify data or schema

Pipeline Framework

The Text2SQL agent uses the commons.pipeline.Pipeline state machine for managing the generate-validate-execute flow:
from commons.pipeline import Pipeline, Transition, StepError

class Text2SQLPipeline(Pipeline):
    async def generate(self, context):
        sql = await self.llm.generate_sql(context.question, context.schema)
        return Transition(goto="validate", data={"sql": sql})

    async def validate(self, context):
        try:
            validated = sqlglot.transpile(context.sql, read="postgres")
            return Transition(goto="execute", data={"sql": validated})
        except sqlglot.errors.ParseError as e:
            # Retry generation with error feedback
            return Transition(goto="generate", data={"error": str(e)})

    async def execute(self, context):
        results = await self.db.execute(context.sql)
        return Transition(goto="break", data={"results": results})
Steps return Transition objects that specify the next step (goto), with "break" signaling pipeline completion and "error" signaling failure. Built-in cooperative cancellation allows long-running pipelines to be interrupted gracefully.

Event Streaming

The Text2SQL agent emits A2A events at each pipeline step for real-time progress:
StepEventMessage
Schema retrievalTaskStatusUpdateEvent”Analyzing database schema…”
SQL generationTaskStatusUpdateEvent”Generating SQL query…”
ValidationTaskStatusUpdateEvent”Validating query…”
ExecutionTaskStatusUpdateEvent”Executing query…”
ResultsTaskArtifactUpdateEventQuery results as DataPart

Error Handling

The agent uses typed error hierarchies for recoverable failures:
Error TypeHandling
SQLValidationErrorRetry SQL generation with validation error context
QueryExecutionErrorReturn user-friendly error message with the failed SQL
SchemaRetrievalErrorCheck data connection health and credentials
QueryTimeoutErrorSuggest query simplification or filtering
Unexpected errorsWrapped in StepError by the pipeline framework

Agent Card

The Text2SQL agent exposes its capabilities via the standard A2A Agent Card:
{
  "name": "text2sql",
  "description": "Converts natural-language questions to SQL queries",
  "url": "http://text2sql:8001",
  "skills": [
    {
      "id": "nl-to-sql",
      "name": "Natural Language to SQL",
      "description": "Generate and execute SQL from natural-language questions",
      "tags": ["sql", "database", "analytics"]
    }
  ]
}

Sample Data Preview

The Talk2Data Service exposes a POST /talk2data/v1/sample endpoint that returns a preview of rows from a database table without running a full analytical query. This is used by the frontend to let users inspect table contents before composing a natural-language question. Request fields:
FieldTypeDescription
table_fqnstringFully qualified table name: database.schema.table
resource_uristringData connection resource URI (from the Assets service)
limitintegerNumber of rows to return (default varies by connection type)
Error responses:
StatusCause
400Invalid table_fqn format (must be database.schema.table)
404Data connection not found for the given resource_uri
501Connection type does not support row sampling
503EM_RUNTIME_ASSETS_URL not configured
The endpoint authenticates via the standard platform JWT and respects project-scoped data connection access.

MCP API changes (2026-05-13)

Two breaking changes shipped in em-talk2data v3.2.0 (commit ecdc18a):

generate-sql: schema field is now required

GenerateSQLRequest changed from an optional list to a required single object:
{
  "question": "How many orders shipped last month?",
  "database": "analytics",
  "schemas": [{ "tables": [...] }]
}
  • The JSON field name is schema (not schema_ — that is the Python attribute alias)
  • The field is required; omitting it returns 422 Unprocessable Entity with an HTTPValidationError body

execute-query: 422 responses now include error detail

execute-query previously returned a generic 422 with no body schema on SQL validation failures. It now returns a structured HTTPValidationError body, making it easier to surface validation failure reasons in MCP clients.

Deployment: missing env vars in local/CI Helm values

EM_RUNTIME_ASSETS_URL and EM_RUNTIME_UTILS_URL were missing from the talk2data-service env block in values.local.yaml and values.ci.yaml. This caused execute_query calls to return 503. Update local and CI overrides to include:
talk2data-service:
  env:
    EM_RUNTIME_ASSETS_URL: "http://em-runtime-assets:8000"   # local: subchart
    EM_RUNTIME_UTILS_URL:  "http://em-runtime-utils:8000"    # local: subchart
In GKE (values.yaml), the URLs use the cross-namespace service DNS (em-runtime-assets.em-runtime.svc.cluster.local) — these were already correct before this fix.

Next Steps

Chat With Data

See how Text2SQL fits into the full conversational analytics pipeline.

Analysis Agent

Learn how the Insights Agent orchestrates analysis beyond SQL.

Visualizations

Understand how query results become interactive charts.

Data Source Setup

Connect a database for Text2SQL to query.