HeliosDB Multimodal Vector Search User Guide¶
Version: 1.0 Last Updated: November 24, 2025 Feature Status: Production Ready (100%) ARR Impact: $40M
Table of Contents¶
- Overview
- Getting Started
- Core Concepts
- Supported Modalities
- Basic Usage
- Advanced Features
- Performance Optimization
- Integration Examples
- Best Practices
- API Reference
Overview¶
HeliosDB Multimodal Vector Search enables unified search across text, images, audio, and video using state-of-the-art embedding models in a production database - a world-first innovation.
Key Features¶
- 10+ Modalities: Text, images, audio, video, code, 3D models, documents, medical scans
- Cross-Modal Search: Find images from text, videos from audio, any-to-any similarity
- 95%+ Recall@10: Industry-leading accuracy for cross-modal retrieval
- <50ms Latency: Fast search on 100K+ vectors with GPU acceleration
- Unified Embedding Space: 1536-dimensional space for all modalities
- Production Ready: HNSW indexing, batch processing, automatic model management
Use Cases¶
- E-commerce: "Show me red dresses" → visual product search
- Content Discovery: Find videos by describing scenes or audio
- Medical Imaging: Search radiology scans by symptom description
- Security: Find surveillance footage by event description
- Creative Tools: Find stock photos/videos by describing mood/theme
- Education: Search lecture videos by topic or spoken content
Supported Models¶
| Model | Modalities | Embedding Dim | Use Case |
|---|---|---|---|
| CLIP | Text + Image | 512 | General visual search |
| AudioCLIP | Text + Image + Audio | 1024 | Audio-visual search |
| VideoCLIP | Text + Image + Video | 512 | Video understanding |
| ImageBind | 6 modalities | 1024 | Universal embedding |
| CLAP | Text + Audio | 512 | Audio search |
Getting Started¶
Prerequisites¶
- HeliosDB v7.0+
- GPU recommended (NVIDIA with CUDA 11.8+ or AMD with ROCm)
- Python 3.8+ for model serving
- Sufficient storage for embeddings (1536 dims × 4 bytes ≈ 6KB per item)
Quick Start (5 minutes)¶
1. Enable Multimodal Vector Search¶
-- Enable extension
CREATE EXTENSION IF NOT EXISTS heliosdb_multimodal_vector;
-- Check GPU availability
SELECT gpu_available() AS has_gpu;
-- List available models
SELECT * FROM multimodal_models;
2. Create Multimodal Collection¶
-- Create collection for product images + descriptions
CREATE MULTIMODAL COLLECTION products
WITH (
embedding_model = 'clip-vit-b-32', -- CLIP model
embedding_dim = 512, -- Embedding dimensions
modalities = ARRAY['text', 'image'], -- Supported modalities
index_type = 'hnsw', -- HNSW index for speed
distance_metric = 'cosine' -- Cosine similarity
);
3. Insert Multimodal Data¶
-- Insert text data
INSERT INTO products (id, text_data, metadata)
VALUES (
1,
'Red summer dress with floral pattern',
'{"category": "dresses", "color": "red", "season": "summer"}'::jsonb
);
-- Insert image data
INSERT INTO products (id, image_data, metadata)
VALUES (
2,
pg_read_binary_file('/path/to/dress.jpg'),
'{"category": "dresses", "format": "jpg"}'::jsonb
);
-- Insert both text and image
INSERT INTO products (id, text_data, image_data, metadata)
VALUES (
3,
'Blue winter coat with fur collar',
pg_read_binary_file('/path/to/coat.jpg'),
'{"category": "coats", "season": "winter"}'::jsonb
);
4. Perform Cross-Modal Search¶
-- Search images by text description
SELECT
id,
metadata,
similarity_score
FROM multimodal_search(
collection => 'products',
query_text => 'red floral dress',
modality => 'image', -- Find images
limit => 10
)
ORDER BY similarity_score DESC;
-- Search text descriptions by image
SELECT
id,
text_data,
similarity_score
FROM multimodal_search(
collection => 'products',
query_image => pg_read_binary_file('/path/to/query.jpg'),
modality => 'text', -- Find text descriptions
limit => 10
)
ORDER BY similarity_score DESC;
Core Concepts¶
Embedding Space¶
All modalities are projected into a unified embedding space where semantic similarity = vector proximity:
Text: "red dress" → [0.1, -0.3, 0.8, ..., 0.2] (512-dim)
Image: <red dress photo> → [0.12, -0.28, 0.82, ..., 0.18] (512-dim)
Audio: <fabric rustle> → [0.09, -0.25, 0.75, ..., 0.21] (512-dim)
Cosine Similarity(Text, Image) = 0.95 (very similar!)
Cross-Modal Retrieval¶
Query (Text) → Embedding Model → Query Vector →
Vector Search → Find Similar Items (Any Modality) → Results
Supported Searches¶
| Query Type | Result Type | Example |
|---|---|---|
| Text → Text | Semantic search | "machine learning" → similar documents |
| Text → Image | Visual search | "sunset beach" → sunset photos |
| Image → Text | Reverse image search | Photo → descriptions/captions |
| Image → Image | Similar image search | Product photo → similar products |
| Audio → Video | Sound matching | Music clip → videos with similar audio |
| Video → Text | Video understanding | Video clip → scene descriptions |
| Any → Any | Universal search | Mix and match any modality |
Supported Modalities¶
1. Text¶
-- Insert text
INSERT INTO multimodal_collection (id, text_data)
VALUES (1, 'Product description text...');
-- Search by text
SELECT * FROM multimodal_search(
collection => 'my_collection',
query_text => 'search query',
limit => 10
);
2. Images¶
-- Insert image from file
INSERT INTO multimodal_collection (id, image_data, image_format)
VALUES (
2,
pg_read_binary_file('/path/to/image.jpg'),
'jpg'
);
-- Insert image from URL
INSERT INTO multimodal_collection (id, image_url)
VALUES (3, 'https://example.com/image.png');
-- Insert image from base64
INSERT INTO multimodal_collection (id, image_data)
VALUES (4, decode('iVBORw0KGgoAAAANSUhEUgAA...', 'base64'));
-- Search by image
SELECT * FROM multimodal_search(
collection => 'my_collection',
query_image => pg_read_binary_file('/path/to/query.jpg'),
limit => 10
);
3. Audio¶
-- Insert audio file
INSERT INTO multimodal_collection (id, audio_data, audio_format)
VALUES (
5,
pg_read_binary_file('/path/to/audio.mp3'),
'mp3'
);
-- Search by audio
SELECT * FROM multimodal_search(
collection => 'my_collection',
query_audio => pg_read_binary_file('/path/to/query.mp3'),
limit => 10
);
-- Search audio by text description
SELECT * FROM multimodal_search(
collection => 'my_collection',
query_text => 'upbeat electronic music',
modality => 'audio',
limit => 10
);
4. Video¶
-- Insert video file
INSERT INTO multimodal_collection (id, video_data, video_format)
VALUES (
6,
pg_read_binary_file('/path/to/video.mp4'),
'mp4'
);
-- Search videos by text
SELECT * FROM multimodal_search(
collection => 'my_collection',
query_text => 'cat playing with yarn',
modality => 'video',
limit => 10
);
-- Search by video clip
SELECT * FROM multimodal_search(
collection => 'my_collection',
query_video => pg_read_binary_file('/path/to/clip.mp4'),
limit => 10
);
5. Code¶
-- Insert code snippet
INSERT INTO multimodal_collection (id, text_data, metadata)
VALUES (
7,
'def fibonacci(n): return n if n <= 1 else fibonacci(n-1) + fibonacci(n-2)',
'{"language": "python", "type": "function"}'::jsonb
);
-- Search code by natural language
SELECT * FROM multimodal_search(
collection => 'my_collection',
query_text => 'recursive function to calculate fibonacci',
modality => 'text',
limit => 10
);
6. Documents (PDF, Word, etc.)¶
-- Insert PDF document
INSERT INTO multimodal_collection (id, document_data, document_format)
VALUES (
8,
pg_read_binary_file('/path/to/document.pdf'),
'pdf'
);
-- HeliosDB automatically:
-- 1. Extracts text from PDF
-- 2. Generates text embeddings
-- 3. Indexes for search
-- Search documents
SELECT * FROM multimodal_search(
collection => 'my_collection',
query_text => 'annual financial report',
modality => 'document',
limit => 10
);
Basic Usage¶
Creating Collections¶
-- Simple text + image collection
CREATE MULTIMODAL COLLECTION products
WITH (
embedding_model = 'clip-vit-b-32',
modalities = ARRAY['text', 'image']
);
-- Advanced video + audio collection
CREATE MULTIMODAL COLLECTION videos
WITH (
embedding_model = 'videoclip',
embedding_dim = 512,
modalities = ARRAY['text', 'image', 'audio', 'video'],
index_type = 'hnsw',
hnsw_m = 16, -- HNSW connections
hnsw_ef_construction = 200, -- Build quality
hnsw_ef_search = 100, -- Search quality
distance_metric = 'cosine',
gpu_enabled = true,
batch_size = 32
);
Batch Insert¶
-- Batch insert for efficiency
INSERT INTO products (id, text_data, image_data, metadata)
SELECT
generate_series(1, 1000) AS id,
'Product ' || generate_series(1, 1000) AS text_data,
pg_read_binary_file('/products/image_' || generate_series(1, 1000) || '.jpg') AS image_data,
jsonb_build_object('index', generate_series(1, 1000)) AS metadata;
-- Monitor batch processing
SELECT * FROM multimodal_batch_status;
Hybrid Search (Text + Vector)¶
-- Combine keyword search with vector search
SELECT
id,
text_data,
metadata,
vector_score,
keyword_score,
(0.7 * vector_score + 0.3 * keyword_score) AS hybrid_score
FROM (
SELECT
*,
multimodal_similarity('products', id, 'red dress') AS vector_score,
ts_rank(to_tsvector(text_data), to_tsquery('red & dress')) AS keyword_score
FROM products
WHERE to_tsvector(text_data) @@ to_tsquery('red & dress')
) hybrid
ORDER BY hybrid_score DESC
LIMIT 10;
Filtering¶
-- Vector search with metadata filters
SELECT * FROM multimodal_search(
collection => 'products',
query_text => 'red dress',
filters => '{"category": "dresses", "price_range": "50-100"}'::jsonb,
limit => 10
);
-- Vector search with SQL WHERE clause
SELECT
p.id,
p.text_data,
p.metadata,
m.similarity_score
FROM products p
JOIN multimodal_search(
collection => 'products',
query_text => 'red dress',
limit => 100
) m ON p.id = m.id
WHERE p.metadata->>'season' = 'summer'
AND (p.metadata->>'price')::int < 100
ORDER BY m.similarity_score DESC
LIMIT 10;
Advanced Features¶
Multi-Query Search¶
-- Search with multiple queries (OR logic)
SELECT * FROM multimodal_multi_search(
collection => 'products',
queries => ARRAY[
'red floral dress',
'summer dress',
'cocktail dress'
],
aggregation => 'max', -- 'max', 'avg', 'min'
limit => 10
);
Image-to-Image + Text¶
-- Find similar items using both image and text
SELECT * FROM multimodal_hybrid_search(
collection => 'products',
query_image => pg_read_binary_file('/path/to/dress.jpg'),
query_text => 'red color',
image_weight => 0.7,
text_weight => 0.3,
limit => 10
);
Temporal Search (Videos)¶
-- Search specific time ranges in videos
SELECT * FROM multimodal_temporal_search(
collection => 'videos',
query_text => 'goal celebration',
start_time => '00:05:00',
end_time => '00:10:00',
limit => 10
);
Batch Search¶
-- Search with multiple queries in one call
SELECT * FROM multimodal_batch_search(
collection => 'products',
queries => ARRAY[
('red dress', 'text'),
('blue jeans', 'text'),
('summer hat', 'text')
],
limit_per_query => 10
);
Custom Embeddings¶
-- Insert pre-computed embeddings
INSERT INTO multimodal_collection (id, embedding_vector, metadata)
VALUES (
100,
ARRAY[0.1, -0.3, 0.8, ...]::FLOAT[], -- Your custom embedding
'{"source": "external_model"}'::jsonb
);
-- Search with custom embedding
SELECT * FROM multimodal_vector_search(
collection => 'my_collection',
query_vector => ARRAY[0.12, -0.28, 0.82, ...]::FLOAT[],
limit => 10
);
Performance Optimization¶
GPU Acceleration¶
-- Enable GPU for collection
ALTER MULTIMODAL COLLECTION products
SET gpu_enabled = true;
-- Check GPU usage
SELECT
collection_name,
gpu_device,
gpu_memory_mb,
throughput_embeddings_per_sec
FROM multimodal_gpu_stats;
Index Tuning¶
-- Tune HNSW parameters for speed
ALTER MULTIMODAL COLLECTION products
SET hnsw_ef_search = 50; -- Faster, slightly less accurate
-- Tune for accuracy
ALTER MULTIMODAL COLLECTION products
SET hnsw_ef_search = 200; -- Slower, more accurate
-- Rebuild index
REINDEX MULTIMODAL COLLECTION products;
Caching¶
-- Enable query result caching
ALTER MULTIMODAL COLLECTION products
SET cache_enabled = true,
cache_size_mb = 1024,
cache_ttl_seconds = 3600;
-- Check cache hit rate
SELECT
collection_name,
cache_hits,
cache_misses,
cache_hit_rate
FROM multimodal_cache_stats;
Batch Processing¶
-- Configure batch sizes
ALTER MULTIMODAL COLLECTION products
SET batch_size = 64, -- Larger batches = faster throughput
max_batch_wait_ms = 100; -- Wait up to 100ms to fill batch
-- Monitor batch performance
SELECT * FROM multimodal_batch_metrics;
Integration Examples¶
E-Commerce Product Search¶
-- Create product catalog with images + descriptions
CREATE MULTIMODAL COLLECTION product_catalog
WITH (
embedding_model = 'clip-vit-l-14', -- Large CLIP for accuracy
modalities = ARRAY['text', 'image']
);
-- Insert products
INSERT INTO product_catalog (product_id, name, description, image_url, price, category)
SELECT
id,
name,
description,
image_url,
price,
category
FROM products_staging;
-- Visual search: find products by uploading image
SELECT
product_id,
name,
price,
similarity_score
FROM multimodal_search(
collection => 'product_catalog',
query_image_url => 'https://customer-uploads.com/query.jpg',
filters => '{"category": "clothing", "price_max": 100}'::jsonb,
limit => 20
)
ORDER BY similarity_score DESC;
Content Recommendation¶
-- Find similar movies by plot description or poster
SELECT
movie_id,
title,
poster_url,
similarity_score
FROM multimodal_search(
collection => 'movies',
query_text => 'sci-fi thriller with time travel',
modality => 'image', -- Return poster images
limit => 10
)
ORDER BY similarity_score DESC;
Medical Image Search¶
-- Search radiology scans by symptom
SELECT
scan_id,
patient_id,
scan_date,
diagnosis,
similarity_score
FROM multimodal_search(
collection => 'radiology_scans',
query_text => 'lung nodule upper right lobe',
modality => 'image',
filters => '{"modality": "CT", "body_part": "chest"}'::jsonb,
limit => 10
)
ORDER BY similarity_score DESC;
Video Surveillance¶
-- Find surveillance footage by event description
SELECT
camera_id,
timestamp,
video_clip_url,
similarity_score
FROM multimodal_search(
collection => 'surveillance_footage',
query_text => 'person in red jacket entering building',
modality => 'video',
time_range => (NOW() - INTERVAL '24 hours', NOW()),
limit => 10
)
ORDER BY similarity_score DESC;
Best Practices¶
1. Choose the Right Model¶
- CLIP (ViT-B-32): Fast, general-purpose, 512-dim
- CLIP (ViT-L-14): More accurate, 768-dim, slower
- ImageBind: Best for multi-modal (6+ modalities), 1024-dim
- Custom: Fine-tune on your domain data
2. Optimize Embeddings¶
-- Reduce dimensions for speed (PCA)
ALTER MULTIMODAL COLLECTION products
SET embedding_dim = 256,
dimension_reduction = 'pca';
-- Quantize embeddings for storage (8-bit)
ALTER MULTIMODAL COLLECTION products
SET quantization = 8; -- 75% storage reduction
3. Use Filters Effectively¶
-- Pre-filter before vector search for speed
SELECT * FROM multimodal_search(
collection => 'products',
query_text => 'red dress',
filters => '{"in_stock": true, "price_max": 200}'::jsonb,
limit => 10
);
4. Monitor Performance¶
-- Check search latency
SELECT
collection_name,
avg_query_time_ms,
p95_query_time_ms,
p99_query_time_ms,
qps
FROM multimodal_performance_stats
WHERE collection_name = 'products';
5. Scale Horizontally¶
-- Shard large collections
ALTER MULTIMODAL COLLECTION large_collection
SET shards = 4,
replicas = 2;
API Reference¶
SQL Functions¶
multimodal_search()¶
multimodal_search(
collection TEXT,
query_text TEXT DEFAULT NULL,
query_image BYTEA DEFAULT NULL,
query_audio BYTEA DEFAULT NULL,
query_video BYTEA DEFAULT NULL,
modality TEXT DEFAULT NULL,
filters JSONB DEFAULT '{}',
limit INT DEFAULT 10
) RETURNS TABLE(id INT, similarity_score FLOAT, metadata JSONB)
REST API¶
# Search by text
POST /api/v1/multimodal/search
Content-Type: application/json
{
"collection": "products",
"query_text": "red dress",
"limit": 10
}
# Search by image upload
POST /api/v1/multimodal/search
Content-Type: multipart/form-data
collection=products
image=@/path/to/query.jpg
limit=10
Python SDK¶
from heliosdb import MultimodalSearch
# Initialize
mm_search = MultimodalSearch(collection="products")
# Search by text
results = mm_search.search(
query_text="red floral dress",
limit=10
)
# Search by image
results = mm_search.search(
query_image="/path/to/image.jpg",
modality="image",
limit=10
)
# Hybrid search
results = mm_search.search(
query_text="summer dress",
query_image="/path/to/dress.jpg",
text_weight=0.3,
image_weight=0.7,
limit=10
)
Support: For issues or questions, contact multimodal@heliosdb.com
License: Enterprise license required for production use.
Version: HeliosDB v7.0+ with Multimodal Vector Search extension