SQL-Native RAG User Guide¶

HeliosDB: The First Database with SQL-Native RAG

Table of Contents¶

Introduction
Quick Start
Core Concepts
Creating Embedding Functions
Creating RAG Pipelines
Adding Documents
Querying with RAG
Advanced Features
Best Practices
Troubleshooting
Migration Guide
Performance Tuning

Introduction¶

What is SQL-Native RAG?¶

HeliosDB is the first database to provide Retrieval-Augmented Generation (RAG) directly through SQL. No external frameworks, no Python glue code, no complex orchestration—just pure SQL.

Traditional RAG Stack:

# 50+ lines of Python boilerplate
vector_store = Chroma(...)
embeddings = OpenAIEmbeddings(...)
llm = ChatOpenAI(...)
retriever = vector_store.as_retriever()
rag_chain = RetrievalQA.from_chain_type(llm, retriever)
result = rag_chain.run("What is HeliosDB?")

HeliosDB SQL-Native RAG:

-- 3 lines of SQL
CREATE RAG PIPELINE docs WITH (llm_model = 'openai/gpt-4');
INSERT INTO docs (text) VALUES ('HeliosDB documentation...');
SELECT rag_query('What is HeliosDB?', 'docs');

Key Benefits¶

Pure SQL: No code required, works with any SQL client Database-Native: Vector search, embeddings, and LLMs integrated at the database level Production-Ready: Built-in caching, rate limiting, retry logic, and monitoring Multi-Provider: OpenAI, Anthropic, Cohere, Voyage AI, and local models Scalable: Automatic batch processing, parallel query execution Cost-Optimized: Smart caching reduces API costs by 70-90%

Quick Start¶

5-Minute Tutorial¶

-- Step 1: Create an embedding function
CREATE EMBEDDING FUNCTION embed_text
USING PROVIDER 'openai'
MODEL 'text-embedding-3-small'
OPTIONS (cache_ttl = '7 days', batch_size = 100);

-- Step 2: Create a RAG pipeline
CREATE RAG PIPELINE knowledge_base
WITH (
    embedding_function = 'embed_text',
    chunking_strategy = 'semantic',
    retrieval_k = 5,
    llm_model = 'openai/gpt-4',
    enable_reranking = true
);

-- Step 3: Add documents
INSERT INTO knowledge_base (text, metadata)
VALUES
    ('HeliosDB is a next-generation autonomous database...',
     '{"source": "overview.md", "section": "intro"}'),
    ('The vector search feature provides HNSW indexing...',
     '{"source": "features.md", "section": "vector"}');

-- Step 4: Query with RAG
SELECT rag_query('What is HeliosDB?', 'knowledge_base');

-- Result:
-- {
--   "answer": "HeliosDB is a next-generation autonomous database...",
--   "sources": [
--     {"text": "HeliosDB is...", "score": 0.92},
--     {"text": "The vector search...", "score": 0.87}
--   ],
--   "metadata": {"chunks_retrieved": 2, "tokens_used": 150}
-- }

That's it! You now have a production-ready RAG system.

Core Concepts¶

1. Embedding Functions¶

Embedding functions convert text into vector representations. They are: - Reusable: Create once, use everywhere - Cached: Automatic caching to reduce API costs - Batched: Automatic batch processing for efficiency

2. RAG Pipelines¶

A RAG pipeline orchestrates the entire RAG workflow: - Chunking: Splits documents into semantic chunks - Embedding: Generates embeddings for each chunk - Storage: Stores chunks in vector database - Retrieval: Finds relevant chunks via hybrid search - Reranking (optional): Re-scores chunks for relevance - Generation: Synthesizes answer using LLM

3. Hybrid Search¶

HeliosDB uses hybrid search combining: - Vector Search: Semantic similarity via HNSW - Full-Text Search: Keyword matching via BM25 - Metadata Filtering: Structured attribute filters

Creating Embedding Functions¶

Basic Syntax¶

CREATE EMBEDDING FUNCTION <function_name>
USING PROVIDER '<provider>'
MODEL '<model>'
[OPTIONS (<option_key> = '<option_value>', ...)];

OpenAI Embeddings¶

-- Small model (1536 dimensions, $0.02/1M tokens)
CREATE EMBEDDING FUNCTION embed_small
USING PROVIDER 'openai'
MODEL 'text-embedding-3-small'
OPTIONS (cache_ttl = '7 days', batch_size = 2048);

-- Large model (3072 dimensions, $0.13/1M tokens)
CREATE EMBEDDING FUNCTION embed_large
USING PROVIDER 'openai'
MODEL 'text-embedding-3-large'
OPTIONS (cache_ttl = '30 days', batch_size = 2048);

Cohere Embeddings¶

-- English embeddings
CREATE EMBEDDING FUNCTION embed_english
USING PROVIDER 'cohere'
MODEL 'embed-english-v3.0'
OPTIONS (cache_ttl = '14 days', batch_size = 96);

-- Multilingual embeddings (100+ languages)
CREATE EMBEDDING FUNCTION embed_multilingual
USING PROVIDER 'cohere'
MODEL 'embed-multilingual-v3.0'
OPTIONS (cache_ttl = '14 days', batch_size = 96);

Voyage AI Embeddings¶

-- General purpose
CREATE EMBEDDING FUNCTION embed_voyage
USING PROVIDER 'voyage'
MODEL 'voyage-large-2'
OPTIONS (cache_ttl = '7 days', batch_size = 128);

-- Code-optimized
CREATE EMBEDDING FUNCTION embed_code
USING PROVIDER 'voyage'
MODEL 'voyage-code-2'
OPTIONS (cache_ttl = '30 days', batch_size = 128);

Local Embeddings (No API Costs)¶

-- Local model via ONNX
CREATE EMBEDDING FUNCTION embed_local
USING PROVIDER 'local'
MODEL 'sentence-transformers/all-MiniLM-L6-v2'
OPTIONS (
    device = 'cuda',
    normalize = 'true',
    pooling = 'mean'
);

Using Embedding Functions Directly¶

-- Single text
SELECT embed_small('Hello, world!');

-- Batch embedding
SELECT embed_small(text) FROM documents;

-- With ad-hoc model (no pre-registration)
SELECT embed('Hello', 'openai/text-embedding-3-small');

Dropping Embedding Functions¶

DROP EMBEDDING FUNCTION embed_text;

Creating RAG Pipelines¶

Basic Syntax¶

CREATE RAG PIPELINE <pipeline_name>
WITH (
    llm_model = '<provider>/<model>',
    [embedding_function = '<function_name>',]
    [chunking_strategy = '<strategy>',]
    [retrieval_k = <number>,]
    [enable_reranking = <true|false>,]
    [context_window = <size>]
);

OpenAI GPT-4 Pipeline¶

CREATE RAG PIPELINE docs_gpt4
WITH (
    embedding_function = 'embed_small',
    chunking_strategy = 'semantic',
    retrieval_k = 5,
    llm_model = 'openai/gpt-4',
    enable_reranking = true,
    context_window = 8192
);

Anthropic Claude Pipeline¶

CREATE RAG PIPELINE docs_claude
WITH (
    embedding_function = 'embed_large',
    chunking_strategy = 'semantic',
    retrieval_k = 7,
    llm_model = 'anthropic/claude-3-opus-20240229',
    enable_reranking = true,
    context_window = 200000  -- Claude's 200K context
);

Cost-Optimized Pipeline (GPT-3.5)¶

CREATE RAG PIPELINE docs_budget
WITH (
    embedding_function = 'embed_small',
    chunking_strategy = 'fixed',
    retrieval_k = 3,
    llm_model = 'openai/gpt-3.5-turbo',
    enable_reranking = false,
    context_window = 4096
);

Multilingual Pipeline¶

CREATE RAG PIPELINE docs_multilingual
WITH (
    embedding_function = 'embed_multilingual',
    chunking_strategy = 'semantic',
    retrieval_k = 5,
    llm_model = 'openai/gpt-4',
    enable_reranking = true
);

Chunking Strategies¶

Strategy	Description	Best For
`semantic`	AI-powered semantic chunking	General documents
`fixed`	Fixed-size chunks (512 tokens)	Uniform content
`recursive`	Recursive splitting by headers	Structured docs
`markdown`	Markdown-aware chunking	Markdown files

Dropping RAG Pipelines¶

DROP RAG PIPELINE docs_gpt4;

Adding Documents¶

Basic Insert¶

INSERT INTO <pipeline_name> (text, metadata)
VALUES ('Document text...', '{"key": "value"}');

Single Document¶

INSERT INTO knowledge_base (text, metadata)
VALUES (
    'HeliosDB is a next-generation autonomous database with AI-native features...',
    '{"source": "overview.md", "date": "2025-11-05", "author": "HeliosDB Team"}'
);

Bulk Insert¶

INSERT INTO knowledge_base (text, metadata)
VALUES
    ('First document...', '{"source": "doc1.md"}'),
    ('Second document...', '{"source": "doc2.md"}'),
    ('Third document...', '{"source": "doc3.md"}');

From Existing Table¶

INSERT INTO knowledge_base (text, metadata)
SELECT
    content,
    json_build_object('source', filename, 'date', created_at)
FROM documents;

From Files (with metadata extraction)¶

-- PDF documents
INSERT INTO knowledge_base (text, metadata)
SELECT
    extract_text(file_path),
    json_build_object(
        'source', file_path,
        'type', 'pdf',
        'pages', extract_page_count(file_path)
    )
FROM uploaded_pdfs;

-- Markdown files
INSERT INTO knowledge_base (text, metadata)
SELECT
    read_file(file_path),
    json_build_object('source', file_path, 'type', 'markdown')
FROM markdown_files;

Document Updates¶

-- Update document content (re-embeds automatically)
UPDATE knowledge_base
SET text = 'Updated document content...'
WHERE metadata->>'source' = 'overview.md';

-- Update metadata only (no re-embedding)
UPDATE knowledge_base
SET metadata = metadata || '{"reviewed": true}'
WHERE metadata->>'source' = 'overview.md';

Document Deletion¶

-- Delete by metadata
DELETE FROM knowledge_base
WHERE metadata->>'source' = 'old_doc.md';

-- Delete all documents
TRUNCATE TABLE knowledge_base;

Querying with RAG¶

Basic Query¶

SELECT rag_query('<question>', '<pipeline_name>');

Simple Questions¶

-- Basic query
SELECT rag_query('What is HeliosDB?', 'knowledge_base');

-- Follow-up question
SELECT rag_query('What are its key features?', 'knowledge_base');

-- Complex question
SELECT rag_query(
    'How does HeliosDB compare to PostgreSQL for vector search?',
    'knowledge_base'
);

Query with Options¶

SELECT rag_query(
    '<question>',
    '<pipeline>',
    retrieval_k => <number>,
    enable_reranking => <true|false>,
    system_prompt => '<custom prompt>',
    filters => '<json filters>',
    temperature => <float>
);

Advanced Queries¶

-- Retrieve more chunks
SELECT rag_query(
    'Explain vector search in detail',
    'knowledge_base',
    retrieval_k => 10
);

-- Disable reranking (faster, lower cost)
SELECT rag_query(
    'Quick question',
    'knowledge_base',
    enable_reranking => false
);

-- Custom system prompt
SELECT rag_query(
    'What is HeliosDB?',
    'knowledge_base',
    system_prompt => 'You are a database expert. Provide technical details.'
);

-- Metadata filtering
SELECT rag_query(
    'What are the new features?',
    'knowledge_base',
    filters => '{"source": "changelog.md", "date": {"$gte": "2025-01-01"}}'
);

-- Adjust LLM temperature
SELECT rag_query(
    'Tell me about HeliosDB',
    'knowledge_base',
    temperature => 0.7
);

Analyzing Results¶

-- Extract answer only
SELECT (rag_query('What is HeliosDB?', 'kb'))->>'answer';

-- Extract sources
SELECT jsonb_array_elements(
    (rag_query('What is HeliosDB?', 'kb'))->'sources'
);

-- Extract metadata
SELECT (rag_query('What is HeliosDB?', 'kb'))->'metadata';

-- Count tokens used
SELECT (rag_query('What is HeliosDB?', 'kb'))->'metadata'->>'tokens_used';

Advanced Features¶

1. Multi-Step RAG Queries¶

-- Step 1: Gather context
WITH context AS (
    SELECT rag_query('What is vector search?', 'kb') AS result
),
-- Step 2: Deep dive
deep_dive AS (
    SELECT rag_query(
        'Explain HNSW algorithm in the context above',
        'kb',
        system_prompt => 'Use this context: ' || (SELECT result->>'answer' FROM context)
    ) AS result
)
SELECT * FROM deep_dive;

2. Cross-Pipeline Queries¶

-- Query multiple knowledge bases
SELECT
    'technical' AS source,
    rag_query('What is HeliosDB?', 'technical_docs') AS answer
UNION ALL
SELECT
    'marketing' AS source,
    rag_query('What is HeliosDB?', 'marketing_docs') AS answer;

3. Conditional RAG¶

-- Use different pipelines based on question type
SELECT CASE
    WHEN question LIKE '%technical%' THEN
        rag_query(question, 'technical_pipeline')
    WHEN question LIKE '%pricing%' THEN
        rag_query(question, 'sales_pipeline')
    ELSE
        rag_query(question, 'general_pipeline')
END AS answer
FROM user_questions;

4. RAG with Function Composition¶

-- Translate, then RAG
SELECT rag_query(
    translate(user_input, 'auto', 'en'),
    'english_kb'
);

-- Summarize RAG results
SELECT summarize(
    (rag_query('Explain HeliosDB architecture', 'kb'))->>'answer',
    max_length => 100
);

5. Hybrid Queries (RAG + Traditional SQL)¶

-- RAG + analytics
WITH rag_result AS (
    SELECT rag_query('What are common support issues?', 'support_kb') AS result
),
ticket_stats AS (
    SELECT
        category,
        COUNT(*) AS count,
        AVG(resolution_time) AS avg_time
    FROM support_tickets
    GROUP BY category
)
SELECT
    (SELECT result->>'answer' FROM rag_result) AS ai_summary,
    json_agg(ticket_stats) AS statistics
FROM ticket_stats;

6. Streaming Responses¶

-- Stream tokens as they're generated (vendor extension)
SELECT rag_query_stream('Tell me about HeliosDB', 'kb');

Best Practices¶

1. Choosing Embedding Models¶

For General Text: - Best Quality: openai/text-embedding-3-large (3072D, $0.13/1M) - Best Value: openai/text-embedding-3-small (1536D, $0.02/1M) - Multilingual: cohere/embed-multilingual-v3.0 (1024D) - No Cost: local/all-MiniLM-L6-v2 (384D)

For Code: - Best: voyage/voyage-code-2 (1536D) - Alternative: openai/text-embedding-3-large

For Low Latency: - Use local models with GPU acceleration - Enable aggressive caching (30+ days)

2. Choosing LLM Models¶

For Complex Reasoning: - anthropic/claude-3-opus-20240229 (200K context) - openai/gpt-4-turbo-preview (128K context)

For Speed & Cost: - openai/gpt-3.5-turbo (16K context, 10x cheaper) - anthropic/claude-3-haiku (200K context, fast)

For Code Generation: - openai/gpt-4-turbo-preview - anthropic/claude-3-opus

3. Optimizing Retrieval¶

-- Start with k=3-5 for most queries
retrieval_k => 5

-- Increase for complex questions
retrieval_k => 10

-- Decrease for simple lookups
retrieval_k => 3

-- Enable reranking for quality
enable_reranking => true

-- Disable reranking for speed/cost
enable_reranking => false

4. Chunking Strategy Selection¶

Content Type	Strategy	Why
Technical docs	`semantic`	Preserves meaning
Books/articles	`recursive`	Respects structure
Code	`fixed`	Consistent sizes
Markdown	`markdown`	Section-aware
Mixed	`semantic`	Most flexible

5. Metadata Best Practices¶

-- Good metadata structure
{
    "source": "docs/architecture.md",
    "section": "Vector Search",
    "date": "2025-11-05",
    "author": "Engineering Team",
    "tags": ["vector", "search", "performance"],
    "version": "6.0",
    "reviewed": true
}

-- Use metadata for filtering
SELECT rag_query(
    'What are the new features?',
    'docs',
    filters => '{"version": "6.0", "reviewed": true}'
);

6. Cost Optimization¶

-- Aggressive caching (70-90% cost reduction)
CREATE EMBEDDING FUNCTION embed_cached
USING PROVIDER 'openai'
MODEL 'text-embedding-3-small'
OPTIONS (cache_ttl = '30 days', batch_size = 2048);

-- Use cheaper models for simple queries
CREATE RAG PIPELINE budget_rag
WITH (
    llm_model = 'openai/gpt-3.5-turbo',
    retrieval_k = 3,
    enable_reranking = false
);

-- Local embeddings (zero API cost)
CREATE EMBEDDING FUNCTION embed_free
USING PROVIDER 'local'
MODEL 'all-MiniLM-L6-v2';

7. Quality Optimization¶

-- High-quality pipeline
CREATE RAG PIPELINE premium_rag
WITH (
    embedding_function = 'embed_large',
    chunking_strategy = 'semantic',
    retrieval_k = 10,
    llm_model = 'anthropic/claude-3-opus',
    enable_reranking = true,
    context_window = 200000
);

Troubleshooting¶

Empty Results¶

Problem: RAG returns "I don't have enough information..."

Solutions:

-- 1. Check if documents exist
SELECT COUNT(*) FROM knowledge_base;

-- 2. Test retrieval directly
SELECT * FROM vector_search('your query', 'knowledge_base', k => 10);

-- 3. Increase retrieval_k
SELECT rag_query('query', 'kb', retrieval_k => 10);

-- 4. Check metadata filters
SELECT rag_query('query', 'kb', filters => '{}');  -- Remove filters

-- 5. Verify chunking
SELECT chunk_document('Sample text', 'semantic');

Poor Quality Answers¶

Problem: Answers are vague or incorrect

Solutions:

-- 1. Enable reranking
SELECT rag_query('query', 'kb', enable_reranking => true);

-- 2. Use larger embedding model
CREATE EMBEDDING FUNCTION embed_better
USING PROVIDER 'openai'
MODEL 'text-embedding-3-large';

-- 3. Use better LLM
CREATE RAG PIPELINE better_rag
WITH (llm_model = 'openai/gpt-4');

-- 4. Add custom system prompt
SELECT rag_query(
    'query',
    'kb',
    system_prompt => 'You are an expert. Provide detailed, accurate answers.'
);

-- 5. Increase context
SELECT rag_query('query', 'kb', retrieval_k => 15);

High Latency¶

Problem: Queries take too long

Solutions:

-- 1. Reduce retrieval_k
SELECT rag_query('query', 'kb', retrieval_k => 3);

-- 2. Disable reranking
SELECT rag_query('query', 'kb', enable_reranking => false);

-- 3. Use faster LLM
CREATE RAG PIPELINE fast_rag
WITH (llm_model = 'openai/gpt-3.5-turbo');

-- 4. Enable caching
CREATE EMBEDDING FUNCTION embed_fast
USING PROVIDER 'openai'
MODEL 'text-embedding-3-small'
OPTIONS (cache_ttl = '7 days');

-- 5. Use local embeddings
CREATE EMBEDDING FUNCTION embed_local
USING PROVIDER 'local'
MODEL 'all-MiniLM-L6-v2'
OPTIONS (device = 'cuda');

High Costs¶

Problem: API costs are too high

Solutions:

-- 1. Enable aggressive caching
OPTIONS (cache_ttl = '30 days', batch_size = 2048)

-- 2. Use cheaper models
MODEL 'text-embedding-3-small'  -- vs text-embedding-3-large
llm_model = 'openai/gpt-3.5-turbo'  -- vs gpt-4

-- 3. Reduce retrieval_k
retrieval_k => 3

-- 4. Use local embeddings
USING PROVIDER 'local'

-- 5. Monitor usage
SELECT
    function_name,
    COUNT(*) AS calls,
    SUM(tokens_used) AS total_tokens,
    SUM(cost) AS total_cost
FROM embedding_usage
GROUP BY function_name;

Debugging¶

-- Check pipeline configuration
SELECT * FROM rag_pipelines WHERE name = 'knowledge_base';

-- List embedding functions
SELECT * FROM embedding_functions;

-- Check document count
SELECT COUNT(*) FROM knowledge_base;

-- View cache stats
SELECT * FROM embedding_cache_stats;

-- Monitor query performance
SELECT
    query,
    execution_time,
    chunks_retrieved,
    tokens_used,
    cost
FROM rag_query_log
ORDER BY execution_time DESC
LIMIT 10;

Migration Guide¶

From LangChain (Python)¶

Before (LangChain):

from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import Chroma
from langchain.chat_models import ChatOpenAI
from langchain.chains import RetrievalQA

# Setup (50+ lines)
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
vector_store = Chroma(persist_directory="./db", embedding_function=embeddings)
llm = ChatOpenAI(model="gpt-4", temperature=0)
retriever = vector_store.as_retriever(search_kwargs={"k": 5})
rag_chain = RetrievalQA.from_chain_type(
    llm=llm,
    retriever=retriever,
    return_source_documents=True
)

# Add documents
vector_store.add_texts(["doc1", "doc2", "doc3"])

# Query
result = rag_chain.run("What is HeliosDB?")

After (HeliosDB SQL):

-- Setup (3 lines)
CREATE EMBEDDING FUNCTION embed USING PROVIDER 'openai' MODEL 'text-embedding-3-small';
CREATE RAG PIPELINE docs WITH (llm_model = 'openai/gpt-4', retrieval_k = 5);

-- Add documents (1 line)
INSERT INTO docs (text) VALUES ('doc1'), ('doc2'), ('doc3');

-- Query (1 line)
SELECT rag_query('What is HeliosDB?', 'docs');

From pgvector + Custom Code¶

Before (pgvector + Python):

import openai
import psycopg2

# Manual embedding
response = openai.Embedding.create(input="query", model="text-embedding-3-small")
query_vector = response['data'][0]['embedding']

# Manual vector search
conn = psycopg2.connect(...)
cur = conn.cursor()
cur.execute("""
    SELECT text FROM documents
    ORDER BY embedding <=> %s
    LIMIT 5
""", (query_vector,))
results = cur.fetchall()

# Manual LLM call
context = "\n".join([r[0] for r in results])
response = openai.ChatCompletion.create(
    model="gpt-4",
    messages=[
        {"role": "system", "content": "Use this context: " + context},
        {"role": "user", "content": "What is HeliosDB?"}
    ]
)
answer = response['choices'][0]['message']['content']

After (HeliosDB SQL):

-- All in one query
SELECT rag_query('What is HeliosDB?', 'docs');

From Elasticsearch + LLM¶

Before:

// Elasticsearch search
const searchResults = await esClient.search({
  index: 'documents',
  body: {
    query: { match: { text: 'HeliosDB' } },
    size: 5
  }
});

// Extract context
const context = searchResults.hits.hits.map(h => h._source.text).join('\n');

// LLM call
const response = await openai.chat.completions.create({
  model: 'gpt-4',
  messages: [
    { role: 'system', content: `Context: ${context}` },
    { role: 'user', content: 'What is HeliosDB?' }
  ]
});

After:

SELECT rag_query('What is HeliosDB?', 'docs');

Performance Tuning¶

1. Embedding Performance¶

-- Baseline: ~100ms per text
CREATE EMBEDDING FUNCTION embed_baseline
USING PROVIDER 'openai' MODEL 'text-embedding-3-small';

-- Optimized: ~10ms per text (10x faster)
CREATE EMBEDDING FUNCTION embed_optimized
USING PROVIDER 'openai' MODEL 'text-embedding-3-small'
OPTIONS (
    cache_ttl = '30 days',      -- Hit rate: 70-90%
    batch_size = 2048           -- Max batch size
);

-- Local: ~1ms per text (100x faster, GPU required)
CREATE EMBEDDING FUNCTION embed_local
USING PROVIDER 'local' MODEL 'all-MiniLM-L6-v2'
OPTIONS (device = 'cuda', batch_size = 512);

2. Query Performance¶

-- Fast query (<500ms)
SELECT rag_query(
    'Simple question?',
    'docs',
    retrieval_k => 3,           -- Fewer chunks
    enable_reranking => false,  -- Skip reranking
    llm_model => 'gpt-3.5-turbo' -- Faster model (override)
);

-- Quality query (1-3s)
SELECT rag_query(
    'Complex question?',
    'docs',
    retrieval_k => 10,          -- More chunks
    enable_reranking => true,   -- Better ranking
    llm_model => 'gpt-4'        -- Better quality
);

-- Streaming query (immediate response)
SELECT rag_query_stream('Question?', 'docs');

3. Indexing Performance¶

-- Slow: One at a time
INSERT INTO docs (text) VALUES ('doc1');
INSERT INTO docs (text) VALUES ('doc2');
-- etc...

-- Fast: Bulk insert
INSERT INTO docs (text) VALUES ('doc1'), ('doc2'), ('doc3'), ...;

-- Faster: Disable indexing, bulk insert, re-enable
ALTER TABLE docs DISABLE TRIGGER ALL;
INSERT INTO docs (text) SELECT content FROM source_table;
ALTER TABLE docs ENABLE TRIGGER ALL;

4. Cache Configuration¶

-- Monitoring cache performance
SELECT
    function_name,
    cache_hits,
    cache_misses,
    cache_hit_rate,
    cache_size_mb,
    cache_evictions
FROM embedding_cache_stats;

-- Optimal cache TTL by use case
-- - Static documentation: 30+ days
-- - Frequently updated: 1-7 days
-- - Real-time data: 1-24 hours
-- - One-time queries: 0 (disable)

CREATE EMBEDDING FUNCTION embed_static
OPTIONS (cache_ttl = '30 days');  -- Static docs

CREATE EMBEDDING FUNCTION embed_dynamic
OPTIONS (cache_ttl = '6 hours');  -- Dynamic content

5. Benchmarking¶

-- Query latency breakdown
EXPLAIN ANALYZE
SELECT rag_query('What is HeliosDB?', 'docs');

-- Results:
-- - Embedding: 15ms (cached)
-- - Vector search: 8ms
-- - Reranking: 45ms
-- - LLM generation: 1200ms
-- - Total: 1268ms

-- Optimize based on bottleneck
-- - Slow embedding → enable caching or use local models
-- - Slow vector search → optimize index, reduce retrieval_k
-- - Slow reranking → disable if not critical
-- - Slow LLM → use faster model or reduce context

Appendix¶

Supported Providers¶

Provider	Models	Dimensions	Cost	Best For
OpenAI	text-embedding-3-small	1536	$0.02/1M	General purpose
OpenAI	text-embedding-3-large	3072	$0.13/1M	High quality
Cohere	embed-english-v3.0	1024	$0.10/1M	English text
Cohere	embed-multilingual-v3.0	1024	$0.10/1M	100+ languages
Voyage	voyage-large-2	1536	$0.12/1M	General purpose
Voyage	voyage-code-2	1536	$0.12/1M	Code
Local	all-MiniLM-L6-v2	384	Free	Low latency
Local	BGE-large	1024	Free	High quality

LLM Models¶

Provider	Model	Context	Cost (per 1M tokens)	Best For
OpenAI	gpt-4-turbo	128K	$10/$30 (in/out)	Quality
OpenAI	gpt-3.5-turbo	16K	$0.50/$1.50	Speed/cost
Anthropic	claude-3-opus	200K	$15/$75	Quality + context
Anthropic	claude-3-sonnet	200K	$3/$15	Balanced
Anthropic	claude-3-haiku	200K	$0.25/$1.25	Speed/cost

Environment Variables¶

# API Keys
export OPENAI_API_KEY="sk-..."
export ANTHROPIC_API_KEY="sk-ant-..."
export COHERE_API_KEY="..."
export VOYAGE_API_KEY="..."

# Configuration
export HELIOSDB_EMBEDDING_CACHE_DIR="/var/cache/heliosdb/embeddings"
export HELIOSDB_EMBEDDING_CACHE_SIZE_GB="10"
export HELIOSDB_LLM_TIMEOUT_SECONDS="30"
export HELIOSDB_LLM_MAX_RETRIES="3"

Configuration File¶

# heliosdb.yml
embeddings:
  cache:
    enabled: true
    directory: /var/cache/heliosdb/embeddings
    max_size_gb: 10
    default_ttl: 604800  # 7 days

  rate_limits:
    openai:
      requests_per_minute: 3000
      tokens_per_minute: 1000000
    cohere:
      requests_per_minute: 10000
    voyage:
      requests_per_minute: 300

llm:
  timeout_seconds: 30
  max_retries: 3
  retry_delay_seconds: 2

  default_options:
    temperature: 0.7
    max_tokens: 2000
    top_p: 0.9

rag:
  default_chunking_strategy: semantic
  default_retrieval_k: 5
  default_reranking: true
  max_context_tokens: 100000

Support & Resources¶

Documentation: https://docs.heliosdb.com/rag
API Reference: https://docs.heliosdb.com/api/rag
Examples: https://github.com/heliosdb/examples/rag
Discord: https://discord.gg/heliosdb
GitHub Issues: https://github.com/heliosdb/heliosdb/issues

Last Updated: November 5, 2025 Version: 6.0 Author: HeliosDB Team

HeliosDB: The first database with SQL-native RAG. Build production AI applications with pure SQL—no frameworks required.