Skip to content

Vector Search Quick Start Guide

Version: v7.0 Time to Complete: 15 minutes


Prerequisites

  • HeliosDB v7.0+ installed and running
  • Basic SQL knowledge
  • (Optional) Python or Rust for embedding generation

Step 1: Create a Table with Vector Column

Create a table to store documents with their vector embeddings:

-- Create table with vector column
CREATE TABLE documents (
    id SERIAL PRIMARY KEY,
    title VARCHAR(255),
    content TEXT,
    category VARCHAR(50),
    embedding VECTOR(384)  -- 384-dimensional vector
);

The VECTOR(n) type supports dimensions from 1 to 65,536. Common sizes: - 384: all-MiniLM-L6-v2 (fast, general purpose) - 768: BERT-base, RoBERTa - 1536: OpenAI text-embedding-ada-002


Step 2: Insert Vector Data

Method 1: Direct Vector Literals

-- Insert with vector literal (bracket notation)
INSERT INTO documents (title, content, category, embedding)
VALUES (
    'Introduction to Machine Learning',
    'Machine learning is a subset of artificial intelligence...',
    'AI',
    '[0.021, -0.043, 0.087, ...]'  -- 384 values
);

-- Insert multiple rows
INSERT INTO documents (title, content, category, embedding) VALUES
    ('Neural Networks Basics', 'Deep learning uses neural...', 'AI',
     '[0.032, -0.051, 0.094, ...]'),
    ('Database Indexing', 'Indexes improve query performance...', 'DB',
     '[-0.015, 0.072, -0.038, ...]'),
    ('Vector Search Guide', 'Vector similarity enables semantic...', 'Search',
     '[0.045, -0.028, 0.061, ...]');

Method 2: With Explicit CAST

-- Using explicit type cast
INSERT INTO documents (title, embedding)
VALUES (
    'Deep Learning Overview',
    '[0.1, 0.2, 0.3, 0.4]'::VECTOR(4)
);

Method 3: Auto-Detection

When inserting into a known vector column, the type is auto-detected:

-- Vector format is automatically detected
INSERT INTO documents (id, title, embedding)
VALUES (100, 'Auto-detected Vector', '[0.1, 0.2, 0.3, 0.4]');

Method 4: Scientific Notation

-- Scientific notation for precision
INSERT INTO documents (title, embedding)
VALUES (
    'Precision Example',
    '[1.5e-3, -2.7e-2, 9.8e-1, 4.2e-4]'
);

Step 3: Create a Vector Index

Create an HNSW index for fast similarity search:

-- Create HNSW index for cosine similarity
CREATE INDEX idx_documents_embedding ON documents
  USING hnsw (embedding vector_cosine_ops)
  WITH (m = 16, ef_construction = 200);

Index Options

Operator Classes: - vector_cosine_ops: Cosine distance (best for text embeddings) - vector_l2_ops: Euclidean distance (best for images) - vector_ip_ops: Inner product (for normalized vectors)

Parameters: - m: Number of connections per node (default: 16, range: 4-64) - ef_construction: Build quality (default: 200, range: 50-500)

-- High accuracy index
CREATE INDEX idx_high_accuracy ON documents
  USING hnsw (embedding vector_cosine_ops)
  WITH (m = 32, ef_construction = 400);

-- Fast, lower memory index
CREATE INDEX idx_fast ON documents
  USING hnsw (embedding vector_cosine_ops)
  WITH (m = 8, ef_construction = 100);

Find Most Similar Documents

-- Define query vector (your search embedding)
-- In practice, generate this from an embedding model

-- Search for 10 most similar documents
SELECT
    id,
    title,
    category,
    embedding <=> '[0.025, -0.041, 0.089, ...]'::VECTOR AS distance
FROM documents
ORDER BY embedding <=> '[0.025, -0.041, 0.089, ...]'::VECTOR
LIMIT 10;

Distance Operators

-- Cosine distance (0 = identical, 2 = opposite)
SELECT * FROM documents
ORDER BY embedding <=> query_vector
LIMIT 10;

-- L2/Euclidean distance (0 = identical)
SELECT * FROM documents
ORDER BY embedding <-> query_vector
LIMIT 10;

-- Inner product (negative, sort ascending for similarity)
SELECT * FROM documents
ORDER BY embedding <#> query_vector
LIMIT 10;

Step 5: Filtering with Vectors

Combine vector search with traditional SQL filters:

Filter by Category

-- Search within a specific category
SELECT id, title, embedding <=> query_vector AS distance
FROM documents
WHERE category = 'AI'
ORDER BY embedding <=> query_vector
LIMIT 10;

Filter by Multiple Conditions

-- Complex filtering with vector search
SELECT
    id,
    title,
    category,
    embedding <=> query_vector AS distance
FROM documents
WHERE category IN ('AI', 'ML')
  AND LENGTH(content) > 100
  AND created_at > '2024-01-01'
ORDER BY embedding <=> query_vector
LIMIT 10;

Distance Threshold Filter

-- Only return results within distance threshold
SELECT id, title
FROM documents
WHERE embedding <=> query_vector < 0.5  -- Cosine distance < 0.5
ORDER BY embedding <=> query_vector;

Step 6: Working with Query Vectors

Using Variables

-- Store query vector in variable
WITH query AS (
    SELECT '[0.025, -0.041, 0.089, ...]'::VECTOR AS vec
)
SELECT
    d.id,
    d.title,
    d.embedding <=> q.vec AS distance
FROM documents d, query q
ORDER BY d.embedding <=> q.vec
LIMIT 10;

Using Subqueries

-- Search for documents similar to another document
SELECT
    d2.id,
    d2.title,
    d1.embedding <=> d2.embedding AS similarity
FROM documents d1, documents d2
WHERE d1.id = 1  -- Reference document
  AND d2.id != 1 -- Exclude self
ORDER BY d1.embedding <=> d2.embedding
LIMIT 10;

Step 7: Batch Operations

Batch Insert

-- Efficient batch insert
INSERT INTO documents (title, content, embedding)
SELECT
    'Document ' || i,
    'Content for document ' || i,
    -- In practice, use real embeddings
    (SELECT ARRAY_AGG(random())::VECTOR FROM generate_series(1, 384))
FROM generate_series(1, 1000) AS i;
-- Search multiple queries at once
WITH queries AS (
    SELECT 1 AS query_id, '[0.1, 0.2, ...]'::VECTOR AS vec
    UNION ALL
    SELECT 2, '[0.3, 0.4, ...]'::VECTOR
    UNION ALL
    SELECT 3, '[0.5, 0.6, ...]'::VECTOR
)
SELECT
    q.query_id,
    d.id AS doc_id,
    d.title,
    d.embedding <=> q.vec AS distance
FROM queries q
CROSS JOIN LATERAL (
    SELECT id, title, embedding
    FROM documents
    ORDER BY embedding <=> q.vec
    LIMIT 5
) d
ORDER BY q.query_id, distance;

Step 8: Verify Search Results

Check Index Usage

-- Verify index is being used
EXPLAIN ANALYZE
SELECT id, title
FROM documents
ORDER BY embedding <=> '[0.1, 0.2, ...]'::VECTOR
LIMIT 10;

-- Look for "Index Scan using idx_documents_embedding"

Get Index Statistics

-- View index details
SELECT
    indexname,
    indexdef,
    pg_size_pretty(pg_relation_size(indexname::regclass)) AS size
FROM pg_indexes
WHERE tablename = 'documents';

Complete Example Session

Here is a complete working example you can run:

-- 1. Create table
CREATE TABLE demo_docs (
    id SERIAL PRIMARY KEY,
    title TEXT,
    embedding VECTOR(3)
);

-- 2. Insert sample data
INSERT INTO demo_docs (title, embedding) VALUES
    ('AI Basics', '[0.9, 0.1, 0.0]'),
    ('Machine Learning', '[0.8, 0.2, 0.0]'),
    ('Deep Learning', '[0.7, 0.3, 0.0]'),
    ('Database Systems', '[0.1, 0.1, 0.9]'),
    ('SQL Queries', '[0.0, 0.2, 0.8]');

-- 3. Create index
CREATE INDEX idx_demo_embedding ON demo_docs
  USING hnsw (embedding vector_cosine_ops);

-- 4. Search for AI-related documents
SELECT
    title,
    embedding <=> '[1.0, 0.0, 0.0]'::VECTOR AS distance
FROM demo_docs
ORDER BY embedding <=> '[1.0, 0.0, 0.0]'::VECTOR
LIMIT 3;

-- Expected results:
-- AI Basics        | 0.0557
-- Machine Learning | 0.1180
-- Deep Learning    | 0.1910

-- 5. Search with filter
SELECT title, embedding <=> '[1.0, 0.0, 0.0]'::VECTOR AS distance
FROM demo_docs
WHERE title LIKE '%Learning%'
ORDER BY embedding <=> '[1.0, 0.0, 0.0]'::VECTOR;

-- 6. Cleanup
DROP TABLE demo_docs;

Common Issues and Solutions

Issue: "Vector dimension mismatch"

-- Error: dimension mismatch
INSERT INTO documents (embedding) VALUES ('[0.1, 0.2]');  -- Wrong!

-- Solution: Match declared dimension
-- If table has VECTOR(384), provide exactly 384 values
INSERT INTO documents (embedding) VALUES ('[0.1, 0.2, ..., 0.384]');

Issue: Slow queries without index

-- Check if index exists
SELECT indexname FROM pg_indexes WHERE tablename = 'documents';

-- Create index if missing
CREATE INDEX IF NOT EXISTS idx_docs_embedding ON documents
  USING hnsw (embedding vector_cosine_ops);

Issue: Low recall (missing relevant results)

-- Increase ef_search parameter for better recall
SET hnsw.ef_search = 200;  -- Default is 40

-- Then run your query
SELECT * FROM documents
ORDER BY embedding <=> query_vector
LIMIT 10;

Next Steps

  1. Learn Index Tuning: See INDEXING_GUIDE.md for advanced configuration
  2. Explore Use Cases: Check EXAMPLES.md for production patterns
  3. Add Hybrid Search: Combine vectors with keyword search
  4. Scale Up: Configure distributed vector search for larger datasets

Quick Reference

Vector Insert Syntax

INSERT INTO table (vector_col) VALUES ('[0.1, 0.2, 0.3]');
INSERT INTO table (vector_col) VALUES ('[0.1, 0.2, 0.3]'::VECTOR(3));

Distance Operators

Operator Metric Best For
<=> Cosine Text embeddings
<-> L2 Image embeddings
<#> Inner Product Normalized vectors

Index Creation

CREATE INDEX name ON table USING hnsw (column vector_cosine_ops)
  WITH (m = 16, ef_construction = 200);
SELECT * FROM table ORDER BY embedding <=> query_vector LIMIT k;

Status: Production Ready Version: v7.0 Last Updated: January 2026