SQL-Native RAG User Guide¶
HeliosDB: The First Database with SQL-Native RAG
Table of Contents¶
- Introduction
- Quick Start
- Core Concepts
- Creating Embedding Functions
- Creating RAG Pipelines
- Adding Documents
- Querying with RAG
- Advanced Features
- Best Practices
- Troubleshooting
- Migration Guide
- Performance Tuning
Introduction¶
What is SQL-Native RAG?¶
HeliosDB is the first database to provide Retrieval-Augmented Generation (RAG) directly through SQL. No external frameworks, no Python glue code, no complex orchestration—just pure SQL.
Traditional RAG Stack:
# 50+ lines of Python boilerplate
vector_store = Chroma(...)
embeddings = OpenAIEmbeddings(...)
llm = ChatOpenAI(...)
retriever = vector_store.as_retriever()
rag_chain = RetrievalQA.from_chain_type(llm, retriever)
result = rag_chain.run("What is HeliosDB?")
HeliosDB SQL-Native RAG:
-- 3 lines of SQL
CREATE RAG PIPELINE docs WITH (llm_model = 'openai/gpt-4');
INSERT INTO docs (text) VALUES ('HeliosDB documentation...');
SELECT rag_query('What is HeliosDB?', 'docs');
Key Benefits¶
Pure SQL: No code required, works with any SQL client Database-Native: Vector search, embeddings, and LLMs integrated at the database level Production-Ready: Built-in caching, rate limiting, retry logic, and monitoring Multi-Provider: OpenAI, Anthropic, Cohere, Voyage AI, and local models Scalable: Automatic batch processing, parallel query execution Cost-Optimized: Smart caching reduces API costs by 70-90%
Quick Start¶
5-Minute Tutorial¶
-- Step 1: Create an embedding function
CREATE EMBEDDING FUNCTION embed_text
USING PROVIDER 'openai'
MODEL 'text-embedding-3-small'
OPTIONS (cache_ttl = '7 days', batch_size = 100);
-- Step 2: Create a RAG pipeline
CREATE RAG PIPELINE knowledge_base
WITH (
embedding_function = 'embed_text',
chunking_strategy = 'semantic',
retrieval_k = 5,
llm_model = 'openai/gpt-4',
enable_reranking = true
);
-- Step 3: Add documents
INSERT INTO knowledge_base (text, metadata)
VALUES
('HeliosDB is a next-generation autonomous database...',
'{"source": "overview.md", "section": "intro"}'),
('The vector search feature provides HNSW indexing...',
'{"source": "features.md", "section": "vector"}');
-- Step 4: Query with RAG
SELECT rag_query('What is HeliosDB?', 'knowledge_base');
-- Result:
-- {
-- "answer": "HeliosDB is a next-generation autonomous database...",
-- "sources": [
-- {"text": "HeliosDB is...", "score": 0.92},
-- {"text": "The vector search...", "score": 0.87}
-- ],
-- "metadata": {"chunks_retrieved": 2, "tokens_used": 150}
-- }
That's it! You now have a production-ready RAG system.
Core Concepts¶
1. Embedding Functions¶
Embedding functions convert text into vector representations. They are: - Reusable: Create once, use everywhere - Cached: Automatic caching to reduce API costs - Batched: Automatic batch processing for efficiency
2. RAG Pipelines¶
A RAG pipeline orchestrates the entire RAG workflow: - Chunking: Splits documents into semantic chunks - Embedding: Generates embeddings for each chunk - Storage: Stores chunks in vector database - Retrieval: Finds relevant chunks via hybrid search - Reranking (optional): Re-scores chunks for relevance - Generation: Synthesizes answer using LLM
3. Hybrid Search¶
HeliosDB uses hybrid search combining: - Vector Search: Semantic similarity via HNSW - Full-Text Search: Keyword matching via BM25 - Metadata Filtering: Structured attribute filters
Creating Embedding Functions¶
Basic Syntax¶
CREATE EMBEDDING FUNCTION <function_name>
USING PROVIDER '<provider>'
MODEL '<model>'
[OPTIONS (<option_key> = '<option_value>', ...)];
OpenAI Embeddings¶
-- Small model (1536 dimensions, $0.02/1M tokens)
CREATE EMBEDDING FUNCTION embed_small
USING PROVIDER 'openai'
MODEL 'text-embedding-3-small'
OPTIONS (cache_ttl = '7 days', batch_size = 2048);
-- Large model (3072 dimensions, $0.13/1M tokens)
CREATE EMBEDDING FUNCTION embed_large
USING PROVIDER 'openai'
MODEL 'text-embedding-3-large'
OPTIONS (cache_ttl = '30 days', batch_size = 2048);
Cohere Embeddings¶
-- English embeddings
CREATE EMBEDDING FUNCTION embed_english
USING PROVIDER 'cohere'
MODEL 'embed-english-v3.0'
OPTIONS (cache_ttl = '14 days', batch_size = 96);
-- Multilingual embeddings (100+ languages)
CREATE EMBEDDING FUNCTION embed_multilingual
USING PROVIDER 'cohere'
MODEL 'embed-multilingual-v3.0'
OPTIONS (cache_ttl = '14 days', batch_size = 96);
Voyage AI Embeddings¶
-- General purpose
CREATE EMBEDDING FUNCTION embed_voyage
USING PROVIDER 'voyage'
MODEL 'voyage-large-2'
OPTIONS (cache_ttl = '7 days', batch_size = 128);
-- Code-optimized
CREATE EMBEDDING FUNCTION embed_code
USING PROVIDER 'voyage'
MODEL 'voyage-code-2'
OPTIONS (cache_ttl = '30 days', batch_size = 128);
Local Embeddings (No API Costs)¶
-- Local model via ONNX
CREATE EMBEDDING FUNCTION embed_local
USING PROVIDER 'local'
MODEL 'sentence-transformers/all-MiniLM-L6-v2'
OPTIONS (
device = 'cuda',
normalize = 'true',
pooling = 'mean'
);
Using Embedding Functions Directly¶
-- Single text
SELECT embed_small('Hello, world!');
-- Batch embedding
SELECT embed_small(text) FROM documents;
-- With ad-hoc model (no pre-registration)
SELECT embed('Hello', 'openai/text-embedding-3-small');
Dropping Embedding Functions¶
Creating RAG Pipelines¶
Basic Syntax¶
CREATE RAG PIPELINE <pipeline_name>
WITH (
llm_model = '<provider>/<model>',
[embedding_function = '<function_name>',]
[chunking_strategy = '<strategy>',]
[retrieval_k = <number>,]
[enable_reranking = <true|false>,]
[context_window = <size>]
);
OpenAI GPT-4 Pipeline¶
CREATE RAG PIPELINE docs_gpt4
WITH (
embedding_function = 'embed_small',
chunking_strategy = 'semantic',
retrieval_k = 5,
llm_model = 'openai/gpt-4',
enable_reranking = true,
context_window = 8192
);
Anthropic Claude Pipeline¶
CREATE RAG PIPELINE docs_claude
WITH (
embedding_function = 'embed_large',
chunking_strategy = 'semantic',
retrieval_k = 7,
llm_model = 'anthropic/claude-3-opus-20240229',
enable_reranking = true,
context_window = 200000 -- Claude's 200K context
);
Cost-Optimized Pipeline (GPT-3.5)¶
CREATE RAG PIPELINE docs_budget
WITH (
embedding_function = 'embed_small',
chunking_strategy = 'fixed',
retrieval_k = 3,
llm_model = 'openai/gpt-3.5-turbo',
enable_reranking = false,
context_window = 4096
);
Multilingual Pipeline¶
CREATE RAG PIPELINE docs_multilingual
WITH (
embedding_function = 'embed_multilingual',
chunking_strategy = 'semantic',
retrieval_k = 5,
llm_model = 'openai/gpt-4',
enable_reranking = true
);
Chunking Strategies¶
| Strategy | Description | Best For |
|---|---|---|
semantic |
AI-powered semantic chunking | General documents |
fixed |
Fixed-size chunks (512 tokens) | Uniform content |
recursive |
Recursive splitting by headers | Structured docs |
markdown |
Markdown-aware chunking | Markdown files |
Dropping RAG Pipelines¶
Adding Documents¶
Basic Insert¶
Single Document¶
INSERT INTO knowledge_base (text, metadata)
VALUES (
'HeliosDB is a next-generation autonomous database with AI-native features...',
'{"source": "overview.md", "date": "2025-11-05", "author": "HeliosDB Team"}'
);
Bulk Insert¶
INSERT INTO knowledge_base (text, metadata)
VALUES
('First document...', '{"source": "doc1.md"}'),
('Second document...', '{"source": "doc2.md"}'),
('Third document...', '{"source": "doc3.md"}');
From Existing Table¶
INSERT INTO knowledge_base (text, metadata)
SELECT
content,
json_build_object('source', filename, 'date', created_at)
FROM documents;
From Files (with metadata extraction)¶
-- PDF documents
INSERT INTO knowledge_base (text, metadata)
SELECT
extract_text(file_path),
json_build_object(
'source', file_path,
'type', 'pdf',
'pages', extract_page_count(file_path)
)
FROM uploaded_pdfs;
-- Markdown files
INSERT INTO knowledge_base (text, metadata)
SELECT
read_file(file_path),
json_build_object('source', file_path, 'type', 'markdown')
FROM markdown_files;
Document Updates¶
-- Update document content (re-embeds automatically)
UPDATE knowledge_base
SET text = 'Updated document content...'
WHERE metadata->>'source' = 'overview.md';
-- Update metadata only (no re-embedding)
UPDATE knowledge_base
SET metadata = metadata || '{"reviewed": true}'
WHERE metadata->>'source' = 'overview.md';
Document Deletion¶
-- Delete by metadata
DELETE FROM knowledge_base
WHERE metadata->>'source' = 'old_doc.md';
-- Delete all documents
TRUNCATE TABLE knowledge_base;
Querying with RAG¶
Basic Query¶
Simple Questions¶
-- Basic query
SELECT rag_query('What is HeliosDB?', 'knowledge_base');
-- Follow-up question
SELECT rag_query('What are its key features?', 'knowledge_base');
-- Complex question
SELECT rag_query(
'How does HeliosDB compare to PostgreSQL for vector search?',
'knowledge_base'
);
Query with Options¶
SELECT rag_query(
'<question>',
'<pipeline>',
retrieval_k => <number>,
enable_reranking => <true|false>,
system_prompt => '<custom prompt>',
filters => '<json filters>',
temperature => <float>
);
Advanced Queries¶
-- Retrieve more chunks
SELECT rag_query(
'Explain vector search in detail',
'knowledge_base',
retrieval_k => 10
);
-- Disable reranking (faster, lower cost)
SELECT rag_query(
'Quick question',
'knowledge_base',
enable_reranking => false
);
-- Custom system prompt
SELECT rag_query(
'What is HeliosDB?',
'knowledge_base',
system_prompt => 'You are a database expert. Provide technical details.'
);
-- Metadata filtering
SELECT rag_query(
'What are the new features?',
'knowledge_base',
filters => '{"source": "changelog.md", "date": {"$gte": "2025-01-01"}}'
);
-- Adjust LLM temperature
SELECT rag_query(
'Tell me about HeliosDB',
'knowledge_base',
temperature => 0.7
);
Analyzing Results¶
-- Extract answer only
SELECT (rag_query('What is HeliosDB?', 'kb'))->>'answer';
-- Extract sources
SELECT jsonb_array_elements(
(rag_query('What is HeliosDB?', 'kb'))->'sources'
);
-- Extract metadata
SELECT (rag_query('What is HeliosDB?', 'kb'))->'metadata';
-- Count tokens used
SELECT (rag_query('What is HeliosDB?', 'kb'))->'metadata'->>'tokens_used';
Advanced Features¶
1. Multi-Step RAG Queries¶
-- Step 1: Gather context
WITH context AS (
SELECT rag_query('What is vector search?', 'kb') AS result
),
-- Step 2: Deep dive
deep_dive AS (
SELECT rag_query(
'Explain HNSW algorithm in the context above',
'kb',
system_prompt => 'Use this context: ' || (SELECT result->>'answer' FROM context)
) AS result
)
SELECT * FROM deep_dive;
2. Cross-Pipeline Queries¶
-- Query multiple knowledge bases
SELECT
'technical' AS source,
rag_query('What is HeliosDB?', 'technical_docs') AS answer
UNION ALL
SELECT
'marketing' AS source,
rag_query('What is HeliosDB?', 'marketing_docs') AS answer;
3. Conditional RAG¶
-- Use different pipelines based on question type
SELECT CASE
WHEN question LIKE '%technical%' THEN
rag_query(question, 'technical_pipeline')
WHEN question LIKE '%pricing%' THEN
rag_query(question, 'sales_pipeline')
ELSE
rag_query(question, 'general_pipeline')
END AS answer
FROM user_questions;
4. RAG with Function Composition¶
-- Translate, then RAG
SELECT rag_query(
translate(user_input, 'auto', 'en'),
'english_kb'
);
-- Summarize RAG results
SELECT summarize(
(rag_query('Explain HeliosDB architecture', 'kb'))->>'answer',
max_length => 100
);
5. Hybrid Queries (RAG + Traditional SQL)¶
-- RAG + analytics
WITH rag_result AS (
SELECT rag_query('What are common support issues?', 'support_kb') AS result
),
ticket_stats AS (
SELECT
category,
COUNT(*) AS count,
AVG(resolution_time) AS avg_time
FROM support_tickets
GROUP BY category
)
SELECT
(SELECT result->>'answer' FROM rag_result) AS ai_summary,
json_agg(ticket_stats) AS statistics
FROM ticket_stats;
6. Streaming Responses¶
-- Stream tokens as they're generated (vendor extension)
SELECT rag_query_stream('Tell me about HeliosDB', 'kb');
Best Practices¶
1. Choosing Embedding Models¶
For General Text:
- Best Quality: openai/text-embedding-3-large (3072D, $0.13/1M)
- Best Value: openai/text-embedding-3-small (1536D, $0.02/1M)
- Multilingual: cohere/embed-multilingual-v3.0 (1024D)
- No Cost: local/all-MiniLM-L6-v2 (384D)
For Code:
- Best: voyage/voyage-code-2 (1536D)
- Alternative: openai/text-embedding-3-large
For Low Latency: - Use local models with GPU acceleration - Enable aggressive caching (30+ days)
2. Choosing LLM Models¶
For Complex Reasoning:
- anthropic/claude-3-opus-20240229 (200K context)
- openai/gpt-4-turbo-preview (128K context)
For Speed & Cost:
- openai/gpt-3.5-turbo (16K context, 10x cheaper)
- anthropic/claude-3-haiku (200K context, fast)
For Code Generation:
- openai/gpt-4-turbo-preview
- anthropic/claude-3-opus
3. Optimizing Retrieval¶
-- Start with k=3-5 for most queries
retrieval_k => 5
-- Increase for complex questions
retrieval_k => 10
-- Decrease for simple lookups
retrieval_k => 3
-- Enable reranking for quality
enable_reranking => true
-- Disable reranking for speed/cost
enable_reranking => false
4. Chunking Strategy Selection¶
| Content Type | Strategy | Why |
|---|---|---|
| Technical docs | semantic |
Preserves meaning |
| Books/articles | recursive |
Respects structure |
| Code | fixed |
Consistent sizes |
| Markdown | markdown |
Section-aware |
| Mixed | semantic |
Most flexible |
5. Metadata Best Practices¶
-- Good metadata structure
{
"source": "docs/architecture.md",
"section": "Vector Search",
"date": "2025-11-05",
"author": "Engineering Team",
"tags": ["vector", "search", "performance"],
"version": "6.0",
"reviewed": true
}
-- Use metadata for filtering
SELECT rag_query(
'What are the new features?',
'docs',
filters => '{"version": "6.0", "reviewed": true}'
);
6. Cost Optimization¶
-- Aggressive caching (70-90% cost reduction)
CREATE EMBEDDING FUNCTION embed_cached
USING PROVIDER 'openai'
MODEL 'text-embedding-3-small'
OPTIONS (cache_ttl = '30 days', batch_size = 2048);
-- Use cheaper models for simple queries
CREATE RAG PIPELINE budget_rag
WITH (
llm_model = 'openai/gpt-3.5-turbo',
retrieval_k = 3,
enable_reranking = false
);
-- Local embeddings (zero API cost)
CREATE EMBEDDING FUNCTION embed_free
USING PROVIDER 'local'
MODEL 'all-MiniLM-L6-v2';
7. Quality Optimization¶
-- High-quality pipeline
CREATE RAG PIPELINE premium_rag
WITH (
embedding_function = 'embed_large',
chunking_strategy = 'semantic',
retrieval_k = 10,
llm_model = 'anthropic/claude-3-opus',
enable_reranking = true,
context_window = 200000
);
Troubleshooting¶
Empty Results¶
Problem: RAG returns "I don't have enough information..."
Solutions:
-- 1. Check if documents exist
SELECT COUNT(*) FROM knowledge_base;
-- 2. Test retrieval directly
SELECT * FROM vector_search('your query', 'knowledge_base', k => 10);
-- 3. Increase retrieval_k
SELECT rag_query('query', 'kb', retrieval_k => 10);
-- 4. Check metadata filters
SELECT rag_query('query', 'kb', filters => '{}'); -- Remove filters
-- 5. Verify chunking
SELECT chunk_document('Sample text', 'semantic');
Poor Quality Answers¶
Problem: Answers are vague or incorrect
Solutions:
-- 1. Enable reranking
SELECT rag_query('query', 'kb', enable_reranking => true);
-- 2. Use larger embedding model
CREATE EMBEDDING FUNCTION embed_better
USING PROVIDER 'openai'
MODEL 'text-embedding-3-large';
-- 3. Use better LLM
CREATE RAG PIPELINE better_rag
WITH (llm_model = 'openai/gpt-4');
-- 4. Add custom system prompt
SELECT rag_query(
'query',
'kb',
system_prompt => 'You are an expert. Provide detailed, accurate answers.'
);
-- 5. Increase context
SELECT rag_query('query', 'kb', retrieval_k => 15);
High Latency¶
Problem: Queries take too long
Solutions:
-- 1. Reduce retrieval_k
SELECT rag_query('query', 'kb', retrieval_k => 3);
-- 2. Disable reranking
SELECT rag_query('query', 'kb', enable_reranking => false);
-- 3. Use faster LLM
CREATE RAG PIPELINE fast_rag
WITH (llm_model = 'openai/gpt-3.5-turbo');
-- 4. Enable caching
CREATE EMBEDDING FUNCTION embed_fast
USING PROVIDER 'openai'
MODEL 'text-embedding-3-small'
OPTIONS (cache_ttl = '7 days');
-- 5. Use local embeddings
CREATE EMBEDDING FUNCTION embed_local
USING PROVIDER 'local'
MODEL 'all-MiniLM-L6-v2'
OPTIONS (device = 'cuda');
High Costs¶
Problem: API costs are too high
Solutions:
-- 1. Enable aggressive caching
OPTIONS (cache_ttl = '30 days', batch_size = 2048)
-- 2. Use cheaper models
MODEL 'text-embedding-3-small' -- vs text-embedding-3-large
llm_model = 'openai/gpt-3.5-turbo' -- vs gpt-4
-- 3. Reduce retrieval_k
retrieval_k => 3
-- 4. Use local embeddings
USING PROVIDER 'local'
-- 5. Monitor usage
SELECT
function_name,
COUNT(*) AS calls,
SUM(tokens_used) AS total_tokens,
SUM(cost) AS total_cost
FROM embedding_usage
GROUP BY function_name;
Debugging¶
-- Check pipeline configuration
SELECT * FROM rag_pipelines WHERE name = 'knowledge_base';
-- List embedding functions
SELECT * FROM embedding_functions;
-- Check document count
SELECT COUNT(*) FROM knowledge_base;
-- View cache stats
SELECT * FROM embedding_cache_stats;
-- Monitor query performance
SELECT
query,
execution_time,
chunks_retrieved,
tokens_used,
cost
FROM rag_query_log
ORDER BY execution_time DESC
LIMIT 10;
Migration Guide¶
From LangChain (Python)¶
Before (LangChain):
from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import Chroma
from langchain.chat_models import ChatOpenAI
from langchain.chains import RetrievalQA
# Setup (50+ lines)
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
vector_store = Chroma(persist_directory="./db", embedding_function=embeddings)
llm = ChatOpenAI(model="gpt-4", temperature=0)
retriever = vector_store.as_retriever(search_kwargs={"k": 5})
rag_chain = RetrievalQA.from_chain_type(
llm=llm,
retriever=retriever,
return_source_documents=True
)
# Add documents
vector_store.add_texts(["doc1", "doc2", "doc3"])
# Query
result = rag_chain.run("What is HeliosDB?")
After (HeliosDB SQL):
-- Setup (3 lines)
CREATE EMBEDDING FUNCTION embed USING PROVIDER 'openai' MODEL 'text-embedding-3-small';
CREATE RAG PIPELINE docs WITH (llm_model = 'openai/gpt-4', retrieval_k = 5);
-- Add documents (1 line)
INSERT INTO docs (text) VALUES ('doc1'), ('doc2'), ('doc3');
-- Query (1 line)
SELECT rag_query('What is HeliosDB?', 'docs');
From pgvector + Custom Code¶
Before (pgvector + Python):
import openai
import psycopg2
# Manual embedding
response = openai.Embedding.create(input="query", model="text-embedding-3-small")
query_vector = response['data'][0]['embedding']
# Manual vector search
conn = psycopg2.connect(...)
cur = conn.cursor()
cur.execute("""
SELECT text FROM documents
ORDER BY embedding <=> %s
LIMIT 5
""", (query_vector,))
results = cur.fetchall()
# Manual LLM call
context = "\n".join([r[0] for r in results])
response = openai.ChatCompletion.create(
model="gpt-4",
messages=[
{"role": "system", "content": "Use this context: " + context},
{"role": "user", "content": "What is HeliosDB?"}
]
)
answer = response['choices'][0]['message']['content']
After (HeliosDB SQL):
From Elasticsearch + LLM¶
Before:
// Elasticsearch search
const searchResults = await esClient.search({
index: 'documents',
body: {
query: { match: { text: 'HeliosDB' } },
size: 5
}
});
// Extract context
const context = searchResults.hits.hits.map(h => h._source.text).join('\n');
// LLM call
const response = await openai.chat.completions.create({
model: 'gpt-4',
messages: [
{ role: 'system', content: `Context: ${context}` },
{ role: 'user', content: 'What is HeliosDB?' }
]
});
After:
Performance Tuning¶
1. Embedding Performance¶
-- Baseline: ~100ms per text
CREATE EMBEDDING FUNCTION embed_baseline
USING PROVIDER 'openai' MODEL 'text-embedding-3-small';
-- Optimized: ~10ms per text (10x faster)
CREATE EMBEDDING FUNCTION embed_optimized
USING PROVIDER 'openai' MODEL 'text-embedding-3-small'
OPTIONS (
cache_ttl = '30 days', -- Hit rate: 70-90%
batch_size = 2048 -- Max batch size
);
-- Local: ~1ms per text (100x faster, GPU required)
CREATE EMBEDDING FUNCTION embed_local
USING PROVIDER 'local' MODEL 'all-MiniLM-L6-v2'
OPTIONS (device = 'cuda', batch_size = 512);
2. Query Performance¶
-- Fast query (<500ms)
SELECT rag_query(
'Simple question?',
'docs',
retrieval_k => 3, -- Fewer chunks
enable_reranking => false, -- Skip reranking
llm_model => 'gpt-3.5-turbo' -- Faster model (override)
);
-- Quality query (1-3s)
SELECT rag_query(
'Complex question?',
'docs',
retrieval_k => 10, -- More chunks
enable_reranking => true, -- Better ranking
llm_model => 'gpt-4' -- Better quality
);
-- Streaming query (immediate response)
SELECT rag_query_stream('Question?', 'docs');
3. Indexing Performance¶
-- Slow: One at a time
INSERT INTO docs (text) VALUES ('doc1');
INSERT INTO docs (text) VALUES ('doc2');
-- etc...
-- Fast: Bulk insert
INSERT INTO docs (text) VALUES ('doc1'), ('doc2'), ('doc3'), ...;
-- Faster: Disable indexing, bulk insert, re-enable
ALTER TABLE docs DISABLE TRIGGER ALL;
INSERT INTO docs (text) SELECT content FROM source_table;
ALTER TABLE docs ENABLE TRIGGER ALL;
4. Cache Configuration¶
-- Monitoring cache performance
SELECT
function_name,
cache_hits,
cache_misses,
cache_hit_rate,
cache_size_mb,
cache_evictions
FROM embedding_cache_stats;
-- Optimal cache TTL by use case
-- - Static documentation: 30+ days
-- - Frequently updated: 1-7 days
-- - Real-time data: 1-24 hours
-- - One-time queries: 0 (disable)
CREATE EMBEDDING FUNCTION embed_static
OPTIONS (cache_ttl = '30 days'); -- Static docs
CREATE EMBEDDING FUNCTION embed_dynamic
OPTIONS (cache_ttl = '6 hours'); -- Dynamic content
5. Benchmarking¶
-- Query latency breakdown
EXPLAIN ANALYZE
SELECT rag_query('What is HeliosDB?', 'docs');
-- Results:
-- - Embedding: 15ms (cached)
-- - Vector search: 8ms
-- - Reranking: 45ms
-- - LLM generation: 1200ms
-- - Total: 1268ms
-- Optimize based on bottleneck
-- - Slow embedding → enable caching or use local models
-- - Slow vector search → optimize index, reduce retrieval_k
-- - Slow reranking → disable if not critical
-- - Slow LLM → use faster model or reduce context
Appendix¶
Supported Providers¶
| Provider | Models | Dimensions | Cost | Best For |
|---|---|---|---|---|
| OpenAI | text-embedding-3-small | 1536 | $0.02/1M | General purpose |
| OpenAI | text-embedding-3-large | 3072 | $0.13/1M | High quality |
| Cohere | embed-english-v3.0 | 1024 | $0.10/1M | English text |
| Cohere | embed-multilingual-v3.0 | 1024 | $0.10/1M | 100+ languages |
| Voyage | voyage-large-2 | 1536 | $0.12/1M | General purpose |
| Voyage | voyage-code-2 | 1536 | $0.12/1M | Code |
| Local | all-MiniLM-L6-v2 | 384 | Free | Low latency |
| Local | BGE-large | 1024 | Free | High quality |
LLM Models¶
| Provider | Model | Context | Cost (per 1M tokens) | Best For |
|---|---|---|---|---|
| OpenAI | gpt-4-turbo | 128K | $10/$30 (in/out) | Quality |
| OpenAI | gpt-3.5-turbo | 16K | $0.50/$1.50 | Speed/cost |
| Anthropic | claude-3-opus | 200K | $15/$75 | Quality + context |
| Anthropic | claude-3-sonnet | 200K | $3/$15 | Balanced |
| Anthropic | claude-3-haiku | 200K | $0.25/$1.25 | Speed/cost |
Environment Variables¶
# API Keys
export OPENAI_API_KEY="sk-..."
export ANTHROPIC_API_KEY="sk-ant-..."
export COHERE_API_KEY="..."
export VOYAGE_API_KEY="..."
# Configuration
export HELIOSDB_EMBEDDING_CACHE_DIR="/var/cache/heliosdb/embeddings"
export HELIOSDB_EMBEDDING_CACHE_SIZE_GB="10"
export HELIOSDB_LLM_TIMEOUT_SECONDS="30"
export HELIOSDB_LLM_MAX_RETRIES="3"
Configuration File¶
# heliosdb.yml
embeddings:
cache:
enabled: true
directory: /var/cache/heliosdb/embeddings
max_size_gb: 10
default_ttl: 604800 # 7 days
rate_limits:
openai:
requests_per_minute: 3000
tokens_per_minute: 1000000
cohere:
requests_per_minute: 10000
voyage:
requests_per_minute: 300
llm:
timeout_seconds: 30
max_retries: 3
retry_delay_seconds: 2
default_options:
temperature: 0.7
max_tokens: 2000
top_p: 0.9
rag:
default_chunking_strategy: semantic
default_retrieval_k: 5
default_reranking: true
max_context_tokens: 100000
Support & Resources¶
- Documentation: https://docs.heliosdb.com/rag
- API Reference: https://docs.heliosdb.com/api/rag
- Examples: https://github.com/heliosdb/examples/rag
- Discord: https://discord.gg/heliosdb
- GitHub Issues: https://github.com/heliosdb/heliosdb/issues
Last Updated: November 5, 2025 Version: 6.0 Author: HeliosDB Team
HeliosDB: The first database with SQL-native RAG. Build production AI applications with pure SQL—no frameworks required.