Conversational BI User Guide¶

Version: 7.0.0 Status: Production Ready Last Updated: 2025-11-23

Table of Contents¶

Introduction
Quick Start
Core Concepts
Configuration
Using the Conversational BI Engine
Multi-Turn Conversations
LLM Integration
Production Features
Performance Optimization
Troubleshooting
API Reference
Examples

Introduction¶

The HeliosDB Conversational BI Engine enables natural language querying of databases using state-of-the-art LLM models. It provides:

Natural Language to SQL: Convert questions into accurate SQL queries
Multi-Turn Context: Maintain conversation history for follow-up questions
High Accuracy: 95%+ accuracy on BIRD benchmark
Production Ready: Rate limiting, security, monitoring, circuit breakers
Flexible Models: Support for OpenAI, Anthropic, Cohere, and local models

Key Features¶

Multi-turn conversations with context preservation
Semantic query caching for performance
Automatic SQL validation and error correction
Query explanation with optimization suggestions
Session management with isolation
Production-grade security and rate limiting
<300ms target latency
Comprehensive monitoring and metrics

Quick Start¶

1. Installation¶

Add to your Cargo.toml:

[dependencies]
heliosdb-conversational-bi = "7.0"

2. Basic Usage¶

use heliosdb_conversational_bi::{
    ConversationalBiEngine, ConversationalConfig
};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Create engine with default configuration
    let config = ConversationalConfig::default();
    let engine = ConversationalBiEngine::new(config).await?;

    // Create a conversation session
    let session_id = engine.create_session("user123", "sales_db").await?;

    // Ask a natural language question
    let response = engine.process_query(
        session_id,
        "Show me top 10 customers by revenue in 2024"
    ).await?;

    // Get results
    println!("SQL: {}", response.sql.unwrap());
    println!("Explanation: {}", response.explanation);
    if let Some(results) = response.results {
        println!("Rows returned: {}", results.row_count);
    }

    // Clean up
    engine.delete_session(session_id).await?;

    Ok(())
}

3. Configure LLM Provider¶

Set up your API keys for the LLM provider:

# For OpenAI
export OPENAI_API_KEY="your-api-key"

# For Anthropic
export ANTHROPIC_API_KEY="your-api-key"

# For Cohere
export COHERE_API_KEY="your-api-key"

Core Concepts¶

Sessions¶

A session represents a conversation between a user and the system:

Each session is isolated with its own context
Sessions track conversation history
Sessions have configurable timeout (default: 30 minutes)
Sessions are automatically cleaned up when expired

Conversation Context¶

Context includes:

Previous queries and results
Mentioned tables, columns, and values
Current focus of conversation
Intent history (filter, aggregation, join, etc.)

Turns¶

Each query in a conversation is a turn:

Turns are numbered sequentially (1, 2, 3, ...)
Each turn contains the user's query, generated SQL, and results
Context from previous turns helps resolve ambiguities

Configuration¶

Basic Configuration¶

use heliosdb_conversational_bi::{
    ConversationalConfig, ModelConfig, ModelProvider, SqlDialect
};

let config = ConversationalConfig {
    // LLM Configuration
    primary_model: ModelConfig {
        provider: ModelProvider::OpenAI,
        model_name: "gpt-4o".to_string(),
        api_key: None, // Will use environment variable
    },
    fallback_model: Some(ModelConfig {
        provider: ModelProvider::Anthropic,
        model_name: "claude-3-5-sonnet-20241022".to_string(),
        api_key: None,
    }),

    // SQL Dialect
    dialect: SqlDialect::PostgreSQL,

    // Session Configuration
    max_concurrent_sessions: 1000,
    session_timeout_minutes: 30,
    max_turns: 20,

    // Performance
    enable_query_cache: true,
    enable_self_correction: true,

    // Production Features
    enable_rate_limiting: true,
    enable_security_validation: true,
    enable_performance_monitoring: true,
};

let engine = ConversationalBiEngine::new(config).await?;

Available Model Providers¶

OpenAI¶

ModelConfig {
    provider: ModelProvider::OpenAI,
    model_name: "gpt-4o".to_string(), // or "gpt-4", "gpt-3.5-turbo"
    api_key: None, // Uses OPENAI_API_KEY environment variable
}

Supported Models: - gpt-4o (recommended) - 128K context, $0.005/1K tokens - gpt-4o-mini - 128K context, $0.00015/1K tokens - gpt-4-turbo - 128K context, $0.01/1K tokens - gpt-4 - 8K context, $0.03/1K tokens - gpt-3.5-turbo - 16K context, $0.0005/1K tokens

Anthropic¶

ModelConfig {
    provider: ModelProvider::Anthropic,
    model_name: "claude-3-5-sonnet-20241022".to_string(),
    api_key: None, // Uses ANTHROPIC_API_KEY environment variable
}

Supported Models: - claude-3-5-sonnet-20241022 (recommended) - 200K context, $0.003/1K tokens - claude-3-5-haiku-20241022 - 200K context, $0.0008/1K tokens - claude-3-opus-20240229 - 200K context, $0.015/1K tokens - claude-3-sonnet-20240229 - 200K context, $0.003/1K tokens - claude-3-haiku-20240307 - 200K context, $0.00025/1K tokens

Cohere¶

ModelConfig {
    provider: ModelProvider::Cohere,
    model_name: "command-r-plus".to_string(),
    api_key: None, // Uses COHERE_API_KEY environment variable
}

Supported Models: - command-r-plus - 128K context, $0.003/1K tokens - command-r - 128K context, $0.0005/1K tokens

SQL Dialects¶

use heliosdb_conversational_bi::SqlDialect;

// Supported dialects
SqlDialect::PostgreSQL    // Default
SqlDialect::MySQL
SqlDialect::SQLite
SqlDialect::Cassandra

Using the Conversational BI Engine¶

Creating Sessions¶

// Create a session for a user and database
let session_id = engine.create_session("user123", "sales_db").await?;

// Sessions are isolated per user
let session2 = engine.create_session("user456", "analytics_db").await?;

Processing Queries¶

// Simple query
let response = engine.process_query(
    session_id,
    "Show me total revenue for 2024"
).await?;

// Access generated SQL
if let Some(sql) = response.sql {
    println!("Generated SQL: {}", sql);
}

// Access results
if let Some(results) = response.results {
    println!("Columns: {:?}", results.columns);
    println!("Rows: {}", results.row_count);
    println!("Execution time: {}ms", results.execution_time_ms);
}

// Get explanation
println!("Explanation: {}", response.explanation);

// Check for optimization suggestions
for suggestion in response.suggestions {
    println!("Suggestion: {}", suggestion);
}

// Check confidence
println!("Confidence: {:.2}%", response.confidence * 100.0);

// Check if from cache
if response.from_cache {
    println!("Response served from cache");
}

Session Management¶

// Get conversation history
let history = engine.get_history(session_id).await?;
for turn in history {
    println!("Turn {}: {}", turn.turn_id, turn.user_query);
    if let Some(sql) = turn.sql {
        println!("  SQL: {}", sql);
    }
}

// Reset session (clear context)
engine.reset_session(session_id).await?;

// Delete session
engine.delete_session(session_id).await?;

Multi-Turn Conversations¶

The engine maintains context across multiple turns, enabling natural follow-up questions:

Example Conversation¶

let session_id = engine.create_session("user123", "sales_db").await?;

// Turn 1: Initial query
let response1 = engine.process_query(
    session_id,
    "Show me sales for 2024"
).await?;
// Generated: SELECT * FROM sales WHERE year = 2024

// Turn 2: Filter using context
let response2 = engine.process_query(
    session_id,
    "Only for Q4"
).await?;
// Generated: SELECT * FROM sales WHERE year = 2024 AND quarter = 4

// Turn 3: Add grouping
let response3 = engine.process_query(
    session_id,
    "Group by region"
).await?;
// Generated: SELECT region, SUM(amount) FROM sales
//            WHERE year = 2024 AND quarter = 4
//            GROUP BY region

// Turn 4: Sort results
let response4 = engine.process_query(
    session_id,
    "Sort by total descending"
).await?;
// Generated: SELECT region, SUM(amount) as total FROM sales
//            WHERE year = 2024 AND quarter = 4
//            GROUP BY region
//            ORDER BY total DESC

Context Resolution¶

The engine automatically:

Resolves pronoun references ("it", "that", "those")
Maintains filter context ("only for Q4", "in that region")
Tracks table and column mentions
Understands aggregate intent
Preserves sorting and grouping preferences

LLM Integration¶

Model Selection¶

The engine uses a primary model with optional fallback models:

let config = ConversationalConfig {
    primary_model: ModelConfig {
        provider: ModelProvider::OpenAI,
        model_name: "gpt-4o".to_string(),
        api_key: None,
    },
    fallback_model: Some(ModelConfig {
        provider: ModelProvider::Anthropic,
        model_name: "claude-3-5-sonnet-20241022".to_string(),
        api_key: None,
    }),
    ..Default::default()
};

Fallback Behavior: 1. Try primary model 2. If primary fails, try fallback model 3. Return error if all models fail

Retry Logic¶

All LLM API calls include automatic retries with exponential backoff:

Max retries: 3 attempts
Initial delay: 500ms
Backoff: 2x (500ms → 1s → 2s)
Applies to: Network errors, rate limits, transient failures

Error Handling¶

match engine.process_query(session_id, "query").await {
    Ok(response) => {
        // Success
    },
    Err(e) => {
        eprintln!("Error: {}", e);
        // Error types:
        // - ConversationalError::Model - LLM API error
        // - ConversationalError::RateLimitExceeded - Rate limit hit
        // - ConversationalError::Session - Session not found
        // - ConversationalError::InvalidInput - Invalid query
    }
}

Production Features¶

Rate Limiting¶

Protects against abuse with token bucket algorithm:

let config = ConversationalConfig {
    enable_rate_limiting: true,
    // Default: 60 queries per minute per tenant
    ..Default::default()
};

// Check rate limit status
let tokens_remaining = engine.get_rate_limit_tokens("user123");
println!("Tokens remaining: {}", tokens_remaining);

Configuration (via ProductionConfig): - rate_limit_qpm: Queries per minute (default: 60) - burst_allowance: Burst capacity (default: 10)

Security Validation¶

Prevents malicious queries:

let config = ConversationalConfig {
    enable_security_validation: true,
    ..Default::default()
};

Security Features: - SQL injection prevention - Dangerous operation blocking (DROP, TRUNCATE) - Query length limits (max 10KB) - Context size limits (max 1MB) - Input sanitization

Circuit Breaker¶

Prevents cascade failures when LLM API is down:

// Circuit breaker automatically activated on repeated failures
// - Threshold: 5 failures
// - Reset timeout: 60 seconds

States: - Closed: Normal operation - Open: Failing fast after threshold exceeded - Half-Open: Testing recovery after timeout

Performance Monitoring¶

Track latency and performance metrics:

let config = ConversationalConfig {
    enable_performance_monitoring: true,
    ..Default::default()
};

// Get metrics for a tenant
if let Some(metrics) = engine.get_performance_metrics("user123").await {
    println!("P50 latency: {}ms", metrics.p50);
    println!("P95 latency: {}ms", metrics.p95);
    println!("P99 latency: {}ms", metrics.p99);
}

Performance Optimization¶

Semantic Caching¶

Caches similar queries for faster responses:

let config = ConversationalConfig {
    enable_query_cache: true,
    ..Default::default()
};

Cache Behavior: - Similarity threshold: 0.85 (85% similar) - Max cache size: 10,000 entries - LRU eviction - Schema-aware (invalidates on schema changes)

Performance Impact: - Cache hit: <100ms (vs. 300-500ms for LLM call) - 60-80% cache hit rate for typical workloads

Self-Correction¶

Automatically validates and corrects generated SQL:

let config = ConversationalConfig {
    enable_self_correction: true,
    ..Default::default()
};

Validation Steps: 1. SQL syntax validation 2. Schema compatibility check 3. Semantic correctness 4. Automatic correction on errors

Latency Targets¶

Operation	Target	Typical
Session creation	<10ms	~5ms
Cache hit	<100ms	~80ms
SQL generation (no cache)	<300ms	~250ms
Full query execution	<500ms	~350ms

Troubleshooting¶

Common Issues¶

1. "No suitable model" Error¶

Cause: LLM API key not configured

Solution:

export OPENAI_API_KEY="your-key"
# or
export ANTHROPIC_API_KEY="your-key"

2. Rate Limit Exceeded¶

Cause: Too many requests from tenant

Solution:

// Check rate limit
let tokens = engine.get_rate_limit_tokens("user123");
if tokens < 1.0 {
    // Wait or show error to user
    println!("Rate limit exceeded. Please try again later.");
}

3. Session Not Found¶

Cause: Session expired or deleted

Solution:

// Create new session
let session_id = engine.create_session("user123", "db").await?;

4. Low Confidence Score¶

Cause: Ambiguous or complex query

Solution:

if response.confidence < 0.7 {
    // Ask for clarification
    if !response.clarifications.is_empty() {
        println!("Please clarify:");
        for clarification in response.clarifications {
            println!("- {}", clarification);
        }
    }
}

Debug Mode¶

Enable detailed logging:

export RUST_LOG=heliosdb_conversational_bi=debug

API Reference¶

ConversationalBiEngine¶

Main engine interface.

Methods¶

// Create engine
pub async fn new(config: ConversationalConfig) -> Result<Self>

// Session management
pub async fn create_session(&self, user_id: &str, database: &str) -> Result<SessionId>
pub async fn delete_session(&self, session_id: SessionId) -> Result<()>
pub async fn reset_session(&self, session_id: SessionId) -> Result<()>
pub async fn get_history(&self, session_id: SessionId) -> Result<Vec<ConversationTurn>>

// Query processing
pub async fn process_query(
    &self,
    session_id: SessionId,
    query: &str
) -> Result<ConversationResponse>

// Metrics
pub async fn get_performance_metrics(&self, tenant_id: &str) -> Option<LatencyPercentiles>
pub fn get_rate_limit_tokens(&self, tenant_id: &str) -> f64

ConversationResponse¶

Response from query processing.

pub struct ConversationResponse {
    pub turn_id: u32,                    // Turn number
    pub sql: Option<String>,             // Generated SQL
    pub explanation: String,             // Natural language explanation
    pub results: Option<QueryResult>,    // Query results
    pub clarifications: Vec<String>,     // Clarifications needed
    pub suggestions: Vec<String>,        // Optimization suggestions
    pub confidence: f64,                 // Confidence score (0.0-1.0)
    pub latency_ms: f64,                 // Response latency
    pub from_cache: bool,                // Whether from cache
}

QueryResult¶

SQL query execution results.

pub struct QueryResult {
    pub columns: Vec<String>,                  // Column names
    pub rows: Vec<Vec<serde_json::Value>>,    // Row data
    pub row_count: usize,                      // Number of rows
    pub execution_time_ms: u64,                // Execution time
    pub success: bool,                         // Success flag
    pub error: Option<String>,                 // Error message
}

Examples¶

Example 1: Sales Analytics¶

let session = engine.create_session("analyst", "sales_db").await?;

// Q1: Top products
let r1 = engine.process_query(session,
    "What are the top 5 products by revenue in 2024?").await?;

// Q2: Regional breakdown
let r2 = engine.process_query(session,
    "Show me regional breakdown for the top product").await?;

// Q3: Trend analysis
let r3 = engine.process_query(session,
    "Compare this to last year").await?;

Example 2: Customer Analysis¶

let session = engine.create_session("manager", "crm_db").await?;

// Find high-value customers
let r1 = engine.process_query(session,
    "Show me customers with lifetime value over $10,000").await?;

// Filter by activity
let r2 = engine.process_query(session,
    "Only those who purchased in the last 30 days").await?;

// Export for marketing
let r3 = engine.process_query(session,
    "Give me their email addresses and purchase history").await?;

Example 3: Error Handling¶

let session = engine.create_session("user", "db").await?;

let result = engine.process_query(session, "complex query").await;

match result {
    Ok(response) => {
        if response.confidence < 0.7 {
            println!("Warning: Low confidence result");
            for clarification in response.clarifications {
                println!("Clarification needed: {}", clarification);
            }
        }

        if let Some(sql) = response.sql {
            println!("SQL: {}", sql);
        }
    },
    Err(e) => {
        eprintln!("Error: {}", e);
        // Handle specific errors
    }
}

Best Practices¶

1. Session Management¶

Create sessions per conversation: Don't reuse sessions across unrelated queries
Clean up sessions: Delete sessions when done to free resources
Set appropriate timeouts: Adjust based on use case

2. Query Formulation¶

Be specific: "Show me Q4 2024 sales in California" vs. "Show me sales"
Use context: Follow-up questions work best when building on previous queries
Avoid ambiguity: Specify table names if multiple tables have similar data

3. Performance¶

Enable caching: Significantly reduces latency for repeated queries
Use appropriate models: gpt-4o-mini or claude-haiku for simple queries
Monitor metrics: Track P95/P99 latency to detect issues

4. Security¶

Enable validation: Always use security validation in production
Rate limiting: Protect against abuse
Input sanitization: Never bypass security features

Conclusion¶

The Conversational BI Engine provides a production-ready natural language interface to your databases. With high accuracy, multi-turn context, and comprehensive production features, it enables business users to query data without SQL knowledge.

For more information: - Performance Optimization Guide - API Documentation - GitHub Repository

Version: 7.0.0 License: MIT Maintainer: HeliosDB Team