Skip to content

Conversational BI User Guide

Version: 7.0.0 Status: Production Ready Last Updated: 2025-11-23

Table of Contents

  1. Introduction
  2. Quick Start
  3. Core Concepts
  4. Configuration
  5. Using the Conversational BI Engine
  6. Multi-Turn Conversations
  7. LLM Integration
  8. Production Features
  9. Performance Optimization
  10. Troubleshooting
  11. API Reference
  12. Examples

Introduction

The HeliosDB Conversational BI Engine enables natural language querying of databases using state-of-the-art LLM models. It provides:

  • Natural Language to SQL: Convert questions into accurate SQL queries
  • Multi-Turn Context: Maintain conversation history for follow-up questions
  • High Accuracy: 95%+ accuracy on BIRD benchmark
  • Production Ready: Rate limiting, security, monitoring, circuit breakers
  • Flexible Models: Support for OpenAI, Anthropic, Cohere, and local models

Key Features

  • Multi-turn conversations with context preservation
  • Semantic query caching for performance
  • Automatic SQL validation and error correction
  • Query explanation with optimization suggestions
  • Session management with isolation
  • Production-grade security and rate limiting
  • <300ms target latency
  • Comprehensive monitoring and metrics

Quick Start

1. Installation

Add to your Cargo.toml:

[dependencies]
heliosdb-conversational-bi = "7.0"

2. Basic Usage

use heliosdb_conversational_bi::{
    ConversationalBiEngine, ConversationalConfig
};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Create engine with default configuration
    let config = ConversationalConfig::default();
    let engine = ConversationalBiEngine::new(config).await?;

    // Create a conversation session
    let session_id = engine.create_session("user123", "sales_db").await?;

    // Ask a natural language question
    let response = engine.process_query(
        session_id,
        "Show me top 10 customers by revenue in 2024"
    ).await?;

    // Get results
    println!("SQL: {}", response.sql.unwrap());
    println!("Explanation: {}", response.explanation);
    if let Some(results) = response.results {
        println!("Rows returned: {}", results.row_count);
    }

    // Clean up
    engine.delete_session(session_id).await?;

    Ok(())
}

3. Configure LLM Provider

Set up your API keys for the LLM provider:

# For OpenAI
export OPENAI_API_KEY="your-api-key"

# For Anthropic
export ANTHROPIC_API_KEY="your-api-key"

# For Cohere
export COHERE_API_KEY="your-api-key"

Core Concepts

Sessions

A session represents a conversation between a user and the system:

  • Each session is isolated with its own context
  • Sessions track conversation history
  • Sessions have configurable timeout (default: 30 minutes)
  • Sessions are automatically cleaned up when expired

Conversation Context

Context includes:

  • Previous queries and results
  • Mentioned tables, columns, and values
  • Current focus of conversation
  • Intent history (filter, aggregation, join, etc.)

Turns

Each query in a conversation is a turn:

  • Turns are numbered sequentially (1, 2, 3, ...)
  • Each turn contains the user's query, generated SQL, and results
  • Context from previous turns helps resolve ambiguities

Configuration

Basic Configuration

use heliosdb_conversational_bi::{
    ConversationalConfig, ModelConfig, ModelProvider, SqlDialect
};

let config = ConversationalConfig {
    // LLM Configuration
    primary_model: ModelConfig {
        provider: ModelProvider::OpenAI,
        model_name: "gpt-4o".to_string(),
        api_key: None, // Will use environment variable
    },
    fallback_model: Some(ModelConfig {
        provider: ModelProvider::Anthropic,
        model_name: "claude-3-5-sonnet-20241022".to_string(),
        api_key: None,
    }),

    // SQL Dialect
    dialect: SqlDialect::PostgreSQL,

    // Session Configuration
    max_concurrent_sessions: 1000,
    session_timeout_minutes: 30,
    max_turns: 20,

    // Performance
    enable_query_cache: true,
    enable_self_correction: true,

    // Production Features
    enable_rate_limiting: true,
    enable_security_validation: true,
    enable_performance_monitoring: true,
};

let engine = ConversationalBiEngine::new(config).await?;

Available Model Providers

OpenAI

ModelConfig {
    provider: ModelProvider::OpenAI,
    model_name: "gpt-4o".to_string(), // or "gpt-4", "gpt-3.5-turbo"
    api_key: None, // Uses OPENAI_API_KEY environment variable
}

Supported Models: - gpt-4o (recommended) - 128K context, $0.005/1K tokens - gpt-4o-mini - 128K context, $0.00015/1K tokens - gpt-4-turbo - 128K context, $0.01/1K tokens - gpt-4 - 8K context, $0.03/1K tokens - gpt-3.5-turbo - 16K context, $0.0005/1K tokens

Anthropic

ModelConfig {
    provider: ModelProvider::Anthropic,
    model_name: "claude-3-5-sonnet-20241022".to_string(),
    api_key: None, // Uses ANTHROPIC_API_KEY environment variable
}

Supported Models: - claude-3-5-sonnet-20241022 (recommended) - 200K context, $0.003/1K tokens - claude-3-5-haiku-20241022 - 200K context, $0.0008/1K tokens - claude-3-opus-20240229 - 200K context, $0.015/1K tokens - claude-3-sonnet-20240229 - 200K context, $0.003/1K tokens - claude-3-haiku-20240307 - 200K context, $0.00025/1K tokens

Cohere

ModelConfig {
    provider: ModelProvider::Cohere,
    model_name: "command-r-plus".to_string(),
    api_key: None, // Uses COHERE_API_KEY environment variable
}

Supported Models: - command-r-plus - 128K context, $0.003/1K tokens - command-r - 128K context, $0.0005/1K tokens

SQL Dialects

use heliosdb_conversational_bi::SqlDialect;

// Supported dialects
SqlDialect::PostgreSQL    // Default
SqlDialect::MySQL
SqlDialect::SQLite
SqlDialect::Cassandra

Using the Conversational BI Engine

Creating Sessions

// Create a session for a user and database
let session_id = engine.create_session("user123", "sales_db").await?;

// Sessions are isolated per user
let session2 = engine.create_session("user456", "analytics_db").await?;

Processing Queries

// Simple query
let response = engine.process_query(
    session_id,
    "Show me total revenue for 2024"
).await?;

// Access generated SQL
if let Some(sql) = response.sql {
    println!("Generated SQL: {}", sql);
}

// Access results
if let Some(results) = response.results {
    println!("Columns: {:?}", results.columns);
    println!("Rows: {}", results.row_count);
    println!("Execution time: {}ms", results.execution_time_ms);
}

// Get explanation
println!("Explanation: {}", response.explanation);

// Check for optimization suggestions
for suggestion in response.suggestions {
    println!("Suggestion: {}", suggestion);
}

// Check confidence
println!("Confidence: {:.2}%", response.confidence * 100.0);

// Check if from cache
if response.from_cache {
    println!("Response served from cache");
}

Session Management

// Get conversation history
let history = engine.get_history(session_id).await?;
for turn in history {
    println!("Turn {}: {}", turn.turn_id, turn.user_query);
    if let Some(sql) = turn.sql {
        println!("  SQL: {}", sql);
    }
}

// Reset session (clear context)
engine.reset_session(session_id).await?;

// Delete session
engine.delete_session(session_id).await?;

Multi-Turn Conversations

The engine maintains context across multiple turns, enabling natural follow-up questions:

Example Conversation

let session_id = engine.create_session("user123", "sales_db").await?;

// Turn 1: Initial query
let response1 = engine.process_query(
    session_id,
    "Show me sales for 2024"
).await?;
// Generated: SELECT * FROM sales WHERE year = 2024

// Turn 2: Filter using context
let response2 = engine.process_query(
    session_id,
    "Only for Q4"
).await?;
// Generated: SELECT * FROM sales WHERE year = 2024 AND quarter = 4

// Turn 3: Add grouping
let response3 = engine.process_query(
    session_id,
    "Group by region"
).await?;
// Generated: SELECT region, SUM(amount) FROM sales
//            WHERE year = 2024 AND quarter = 4
//            GROUP BY region

// Turn 4: Sort results
let response4 = engine.process_query(
    session_id,
    "Sort by total descending"
).await?;
// Generated: SELECT region, SUM(amount) as total FROM sales
//            WHERE year = 2024 AND quarter = 4
//            GROUP BY region
//            ORDER BY total DESC

Context Resolution

The engine automatically:

  • Resolves pronoun references ("it", "that", "those")
  • Maintains filter context ("only for Q4", "in that region")
  • Tracks table and column mentions
  • Understands aggregate intent
  • Preserves sorting and grouping preferences

LLM Integration

Model Selection

The engine uses a primary model with optional fallback models:

let config = ConversationalConfig {
    primary_model: ModelConfig {
        provider: ModelProvider::OpenAI,
        model_name: "gpt-4o".to_string(),
        api_key: None,
    },
    fallback_model: Some(ModelConfig {
        provider: ModelProvider::Anthropic,
        model_name: "claude-3-5-sonnet-20241022".to_string(),
        api_key: None,
    }),
    ..Default::default()
};

Fallback Behavior: 1. Try primary model 2. If primary fails, try fallback model 3. Return error if all models fail

Retry Logic

All LLM API calls include automatic retries with exponential backoff:

  • Max retries: 3 attempts
  • Initial delay: 500ms
  • Backoff: 2x (500ms → 1s → 2s)
  • Applies to: Network errors, rate limits, transient failures

Error Handling

match engine.process_query(session_id, "query").await {
    Ok(response) => {
        // Success
    },
    Err(e) => {
        eprintln!("Error: {}", e);
        // Error types:
        // - ConversationalError::Model - LLM API error
        // - ConversationalError::RateLimitExceeded - Rate limit hit
        // - ConversationalError::Session - Session not found
        // - ConversationalError::InvalidInput - Invalid query
    }
}

Production Features

Rate Limiting

Protects against abuse with token bucket algorithm:

let config = ConversationalConfig {
    enable_rate_limiting: true,
    // Default: 60 queries per minute per tenant
    ..Default::default()
};

// Check rate limit status
let tokens_remaining = engine.get_rate_limit_tokens("user123");
println!("Tokens remaining: {}", tokens_remaining);

Configuration (via ProductionConfig): - rate_limit_qpm: Queries per minute (default: 60) - burst_allowance: Burst capacity (default: 10)

Security Validation

Prevents malicious queries:

let config = ConversationalConfig {
    enable_security_validation: true,
    ..Default::default()
};

Security Features: - SQL injection prevention - Dangerous operation blocking (DROP, TRUNCATE) - Query length limits (max 10KB) - Context size limits (max 1MB) - Input sanitization

Circuit Breaker

Prevents cascade failures when LLM API is down:

// Circuit breaker automatically activated on repeated failures
// - Threshold: 5 failures
// - Reset timeout: 60 seconds

States: - Closed: Normal operation - Open: Failing fast after threshold exceeded - Half-Open: Testing recovery after timeout

Performance Monitoring

Track latency and performance metrics:

let config = ConversationalConfig {
    enable_performance_monitoring: true,
    ..Default::default()
};

// Get metrics for a tenant
if let Some(metrics) = engine.get_performance_metrics("user123").await {
    println!("P50 latency: {}ms", metrics.p50);
    println!("P95 latency: {}ms", metrics.p95);
    println!("P99 latency: {}ms", metrics.p99);
}

Performance Optimization

Semantic Caching

Caches similar queries for faster responses:

let config = ConversationalConfig {
    enable_query_cache: true,
    ..Default::default()
};

Cache Behavior: - Similarity threshold: 0.85 (85% similar) - Max cache size: 10,000 entries - LRU eviction - Schema-aware (invalidates on schema changes)

Performance Impact: - Cache hit: <100ms (vs. 300-500ms for LLM call) - 60-80% cache hit rate for typical workloads

Self-Correction

Automatically validates and corrects generated SQL:

let config = ConversationalConfig {
    enable_self_correction: true,
    ..Default::default()
};

Validation Steps: 1. SQL syntax validation 2. Schema compatibility check 3. Semantic correctness 4. Automatic correction on errors

Latency Targets

Operation Target Typical
Session creation <10ms ~5ms
Cache hit <100ms ~80ms
SQL generation (no cache) <300ms ~250ms
Full query execution <500ms ~350ms

Troubleshooting

Common Issues

1. "No suitable model" Error

Cause: LLM API key not configured

Solution:

export OPENAI_API_KEY="your-key"
# or
export ANTHROPIC_API_KEY="your-key"

2. Rate Limit Exceeded

Cause: Too many requests from tenant

Solution:

// Check rate limit
let tokens = engine.get_rate_limit_tokens("user123");
if tokens < 1.0 {
    // Wait or show error to user
    println!("Rate limit exceeded. Please try again later.");
}

3. Session Not Found

Cause: Session expired or deleted

Solution:

// Create new session
let session_id = engine.create_session("user123", "db").await?;

4. Low Confidence Score

Cause: Ambiguous or complex query

Solution:

if response.confidence < 0.7 {
    // Ask for clarification
    if !response.clarifications.is_empty() {
        println!("Please clarify:");
        for clarification in response.clarifications {
            println!("- {}", clarification);
        }
    }
}

Debug Mode

Enable detailed logging:

export RUST_LOG=heliosdb_conversational_bi=debug

API Reference

ConversationalBiEngine

Main engine interface.

Methods

// Create engine
pub async fn new(config: ConversationalConfig) -> Result<Self>

// Session management
pub async fn create_session(&self, user_id: &str, database: &str) -> Result<SessionId>
pub async fn delete_session(&self, session_id: SessionId) -> Result<()>
pub async fn reset_session(&self, session_id: SessionId) -> Result<()>
pub async fn get_history(&self, session_id: SessionId) -> Result<Vec<ConversationTurn>>

// Query processing
pub async fn process_query(
    &self,
    session_id: SessionId,
    query: &str
) -> Result<ConversationResponse>

// Metrics
pub async fn get_performance_metrics(&self, tenant_id: &str) -> Option<LatencyPercentiles>
pub fn get_rate_limit_tokens(&self, tenant_id: &str) -> f64

ConversationResponse

Response from query processing.

pub struct ConversationResponse {
    pub turn_id: u32,                    // Turn number
    pub sql: Option<String>,             // Generated SQL
    pub explanation: String,             // Natural language explanation
    pub results: Option<QueryResult>,    // Query results
    pub clarifications: Vec<String>,     // Clarifications needed
    pub suggestions: Vec<String>,        // Optimization suggestions
    pub confidence: f64,                 // Confidence score (0.0-1.0)
    pub latency_ms: f64,                 // Response latency
    pub from_cache: bool,                // Whether from cache
}

QueryResult

SQL query execution results.

pub struct QueryResult {
    pub columns: Vec<String>,                  // Column names
    pub rows: Vec<Vec<serde_json::Value>>,    // Row data
    pub row_count: usize,                      // Number of rows
    pub execution_time_ms: u64,                // Execution time
    pub success: bool,                         // Success flag
    pub error: Option<String>,                 // Error message
}

Examples

Example 1: Sales Analytics

let session = engine.create_session("analyst", "sales_db").await?;

// Q1: Top products
let r1 = engine.process_query(session,
    "What are the top 5 products by revenue in 2024?").await?;

// Q2: Regional breakdown
let r2 = engine.process_query(session,
    "Show me regional breakdown for the top product").await?;

// Q3: Trend analysis
let r3 = engine.process_query(session,
    "Compare this to last year").await?;

Example 2: Customer Analysis

let session = engine.create_session("manager", "crm_db").await?;

// Find high-value customers
let r1 = engine.process_query(session,
    "Show me customers with lifetime value over $10,000").await?;

// Filter by activity
let r2 = engine.process_query(session,
    "Only those who purchased in the last 30 days").await?;

// Export for marketing
let r3 = engine.process_query(session,
    "Give me their email addresses and purchase history").await?;

Example 3: Error Handling

let session = engine.create_session("user", "db").await?;

let result = engine.process_query(session, "complex query").await;

match result {
    Ok(response) => {
        if response.confidence < 0.7 {
            println!("Warning: Low confidence result");
            for clarification in response.clarifications {
                println!("Clarification needed: {}", clarification);
            }
        }

        if let Some(sql) = response.sql {
            println!("SQL: {}", sql);
        }
    },
    Err(e) => {
        eprintln!("Error: {}", e);
        // Handle specific errors
    }
}

Best Practices

1. Session Management

  • Create sessions per conversation: Don't reuse sessions across unrelated queries
  • Clean up sessions: Delete sessions when done to free resources
  • Set appropriate timeouts: Adjust based on use case

2. Query Formulation

  • Be specific: "Show me Q4 2024 sales in California" vs. "Show me sales"
  • Use context: Follow-up questions work best when building on previous queries
  • Avoid ambiguity: Specify table names if multiple tables have similar data

3. Performance

  • Enable caching: Significantly reduces latency for repeated queries
  • Use appropriate models: gpt-4o-mini or claude-haiku for simple queries
  • Monitor metrics: Track P95/P99 latency to detect issues

4. Security

  • Enable validation: Always use security validation in production
  • Rate limiting: Protect against abuse
  • Input sanitization: Never bypass security features

Conclusion

The Conversational BI Engine provides a production-ready natural language interface to your databases. With high accuracy, multi-turn context, and comprehensive production features, it enables business users to query data without SQL knowledge.

For more information: - Performance Optimization Guide - API Documentation - GitHub Repository


Version: 7.0.0 License: MIT Maintainer: HeliosDB Team