Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.emergence.ai/llms.txt

Use this file to discover all available pages before exploring further.

Data Insights Overview

Ask natural-language questions about your data and get instant SQL-generated answers, visualizations, and insights. Data Insights coordinates a team of AI agents that generate SQL, execute queries, analyze results, and produce visualizations — all in real time via streaming.
Data Insights is built on the A2A (Agent-to-Agent) protocol, an open standard for inter-agent communication using JSON-RPC 2.0 over Server-Sent Events (SSE). Each agent is independently deployable and discoverable via Agent Cards.

Who Is It For

Business Users & Knowledge Workers

Ask questions about your data in plain language and get instant answers, visualizations, and reports — no SQL or technical skills required.

Data Analysts

Explore data through multi-turn conversations, generate visualizations, and export results. The AI handles SQL generation so you can focus on analysis.

Business Leaders

Get AI-powered insights from enterprise data to support strategic decisions. Generate on-demand reports and discover trends through natural-language queries.

Data Engineers

Validate SQL generation, review query plans, and configure data connections. Monitor agent performance and query execution through observability tooling.

How It Works

1

User Asks a Question

A user types a natural-language question in the chat interface, such as “What were the top 10 products by revenue last quarter?”
2

Talk2Data Service Routes the Request

The Talk2Data Service (REST + SSE gateway) creates a session, routes the message to the Insights Agent, and establishes an SSE stream back to the client.
3

Insights Agent Orchestrates

The Insights Agent runs an agentic loop: it reasons about the question, selects tools, and delegates SQL generation to the Text2SQL Agent.
4

Text2SQL Agent Generates and Validates SQL

The Text2SQL Agent analyzes the database schema, generates SQL using an LLM, validates it with sqlglot, and executes the query against the user’s data connection.
5

Results Stream Back

Query results, analysis, and visualizations stream back to the user in real time via A2A events (TaskStatusUpdateEvent for progress, TaskArtifactUpdateEvent for results).

Architecture

Building a similar solution? Use the Solution Developer Guide › Starter Templates — the Tier 3 template mirrors this layered agent shape with copy-pasteable scaffolds.
The solution follows a layered agent architecture:
Client / React UI
    |
Talk2Data Service (REST + SSE gateway, port 8080)
    |
Insights Agent (agentic loop, reasoning, tool orchestration, port 8002)
    |--- Text2SQL Agent (NL-to-SQL generation, validation, execution, port 8001)
    |--- Coding Agent (LLM-generated Python code execution, port 8004)
    |--- MCP Plotly Server (visualization tools via MCP, port 8000)
Text2SQL and Coding agents communicate with the Insights Agent via A2A, while the MCP Plotly Server is accessed via MCP. The Insights Agent dynamically discovers both sub-agents and MCP tools on each request.

A2A Protocol Integration

All inter-agent communication uses the A2A protocol:
Each agent publishes a JSON manifest at /.well-known/agent-card.json that describes its identity, skills, and capabilities. Agent Cards enable automatic discovery by the Talk2Data service and Insights Agent at request time.Key fields include:
  • name — human-readable agent name
  • skills — list of capabilities with input/output schemas
  • capabilities — supported features (streaming, multi-turn, etc.)
  • endpoint — the agent’s A2A service URL
A2A messages contain typed parts:
Part TypeUsage
TextPartNatural-language text (questions, analysis, explanations)
DataPartStructured data (datasource configs, query parameters)
FilePartBinary artifacts (charts, exported files)
Agents emit events for real-time frontend updates:
EventPurpose
TaskStatusUpdateEventProgress messages (“Analyzing schema…”, “Generating SQL…”)
TaskArtifactUpdateEventFinal results (data tables, charts, text analysis)
Every pipeline step emits a status event before starting work, providing real-time visibility into the agent’s reasoning process.
The A2A context_id maps to the session ID for multi-turn conversation state. This enables agents to maintain context across multiple questions in the same conversation.

Pipeline Framework

Agent workflows are built on the commons.pipeline.Pipeline state-machine framework:
  • Steps are named functions that perform a unit of work
  • Each step returns a Transition object with a goto target (next step, break, or error)
  • The pipeline supports cooperative cancellation for graceful shutdown
  • Unexpected exceptions are wrapped in StepError for structured error handling
The Text2SQL agent uses this framework for its generate-validate-execute flow:
generate_sql -> validate_sql -> execute_query -> format_results

LLM Integration

Data Insights uses LiteLLM for provider-agnostic LLM access:
FeatureDetails
Clientcommons.llm.LLMClient wrapping LiteLLM
Model formatprovider/model (e.g., gemini/gemini-2.0-flash, gpt-4o)
ObservabilityLangfuse LLM tracing auto-enabled when LANGFUSE_HOST is set. Chat dispatch and the pipeline executor are auto-instrumented with @observe decorators, A2A trace context propagates across services so a single conversation produces one unified trace, trace IDs are seeded deterministically from turn_id, and per-iteration spans are emitted inside agent loops. See Langfuse Setup for configuration and Langfuse Overview for the full tracing model.
ConfigurationPer-service LLM env vars (TALK2DATA_TEXT2SQL_LLM_MODEL, TALK2DATA_INSIGHTS_LLM_MODEL, etc.) via pydantic_settings.BaseSettings
Credentials are never hardcoded. All API keys and connection strings are loaded from environment variables or .env files via pydantic_settings.BaseSettings.

Database Schema

The talk2data database schema stores conversation state and artifacts:
TablePurpose
sessionsChat sessions with user and project context
conversation_messagesIndividual messages within a session
artifactsGenerated outputs (SQL queries, results, visualizations)
feedbackUser feedback on agent responses
  • Primary keys: UUID strings
  • Timestamps: DateTime(timezone=True) with UTC
  • ORM: SQLAlchemy 2.0+ async with asyncpg driver
  • Migrations: Alembic in packages/common-db/

REST Endpoints

The Talk2Data Service exposes the following REST and SSE endpoints:
EndpointPurpose
GET /talk2data/chat/sessionsList active chat sessions
POST /talk2data/chat/*Start or continue a chat session (SSE streaming)
POST /talk2data/v1/samplePreview rows from a data connection table (no LLM involved)
The /talk2data/v1/sample endpoint is useful for data exploration before composing a question — it fetches a configurable number of rows from a named table via an existing data connection, using a fully qualified table name (database.schema.table). See Text-to-SQL for request and error details.

Platform Integration

Data Insights integrates with the platform layer for:
CapabilityPlatform Service
AuthenticationGovernance (JWT validation via the platform identity provider)
AuthorizationGovernance (permission checks via the authorization service SDK)
| Data connections | Assets (configured database connections) | | Asset management | Assets (data connections, artifacts, files, models) | The integration uses auto-generated Python SDKs from the platform’s OpenAPI specs. Data Insights never implements its own permission checks.

Next Steps

Chat with Data

Learn how to use the conversational interface for data analysis.

Text-to-SQL

Understand how natural-language questions are converted to SQL queries.

Agent Registry

See how Data Insights agents are registered and discovered.

Data Source Setup

Configure data connections for your databases.