Pinecone Vector Configuration Guide¶

Comprehensive configuration reference for HeliosDB's Pinecone vector protocol support.

Connection Configuration¶

Basic Connection¶

from pinecone import Pinecone

# Connect to HeliosDB (Pinecone-compatible)
pc = Pinecone(
    api_key="your-api-key",
    host="http://localhost:8080"
)

# Access index
index = pc.Index("my-vectors")

Connection Parameters¶

Parameter	Type	Default	Description
`api_key`	string	Required	API authentication key
`host`	string	Required	HeliosDB vector endpoint
`index`	string	Required	Index name
`namespace`	string	""	Optional namespace

REST API Configuration¶

Base URL¶

http://localhost:8080/vectors/v1

Authentication Headers¶

curl -X POST "http://localhost:8080/vectors/v1/upsert" \
  -H "Api-Key: your-api-key" \
  -H "Content-Type: application/json" \
  -d '{...}'

Request Headers¶

Header	Required	Description
`Api-Key`	Yes	API authentication key
`Content-Type`	Yes	application/json
`Accept`	No	Response format

Index Configuration¶

Creating an Index¶

# Create index with configuration
pc.create_index(
    name="my-vectors",
    dimension=1536,
    metric="cosine",
    spec=ServerlessSpec(
        cloud="heliosdb",
        region="default"
    )
)

Index Parameters¶

Parameter	Type	Default	Description
`name`	string	Required	Index name
`dimension`	int	Required	Vector dimension
`metric`	string	cosine	Similarity metric
`pods`	int	1	Number of pods
`replicas`	int	1	Number of replicas
`pod_type`	string	p1.x1	Pod type

Similarity Metrics¶

Metric	Description	Use Case
`cosine`	Cosine similarity	Text embeddings
`euclidean`	Euclidean distance	Dense vectors
`dotproduct`	Dot product	Normalized vectors

Vector Operations Configuration¶

Upsert Configuration¶

# Upsert with options
index.upsert(
    vectors=[
        {
            "id": "vec1",
            "values": [0.1, 0.2, ...],
            "metadata": {"category": "A"}
        }
    ],
    namespace="production",
    show_progress=True
)

Query Configuration¶

# Query with options
results = index.query(
    vector=[0.1, 0.2, ...],
    top_k=10,
    filter={"category": {"$eq": "A"}},
    include_values=True,
    include_metadata=True,
    namespace="production"
)

Query Parameters¶

Parameter	Type	Default	Description
`vector`	list	Required	Query vector
`top_k`	int	10	Number of results
`filter`	dict	None	Metadata filter
`include_values`	bool	False	Include vectors
`include_metadata`	bool	False	Include metadata
`namespace`	string	""	Namespace

Filtering Configuration¶

Filter Operators¶

Operator	Description	Example
`$eq`	Equal	`{"field": {"$eq": "value"}}`
`$ne`	Not equal	`{"field": {"$ne": "value"}}`
`$gt`	Greater than	`{"field": {"$gt": 10}}`
`$gte`	Greater or equal	`{"field": {"$gte": 10}}`
`$lt`	Less than	`{"field": {"$lt": 10}}`
`$lte`	Less or equal	`{"field": {"$lte": 10}}`
`$in`	In list	`{"field": {"$in": ["a", "b"]}}`
`$nin`	Not in list	`{"field": {"$nin": ["a", "b"]}}`

Compound Filters¶

# AND filter
filter = {
    "$and": [
        {"category": {"$eq": "A"}},
        {"price": {"$lt": 100}}
    ]
}

# OR filter
filter = {
    "$or": [
        {"category": {"$eq": "A"}},
        {"category": {"$eq": "B"}}
    ]
}

Batch Configuration¶

Batch Upsert¶

# Batch configuration
index.upsert(
    vectors=large_vector_list,
    batch_size=100,  # Vectors per batch
    show_progress=True
)

Batch Parameters¶

Parameter	Default	Description
`batch_size`	100	Vectors per batch
`pool_threads`	1	Parallel threads
`show_progress`	False	Progress bar

Namespace Configuration¶

Working with Namespaces¶

# Upsert to specific namespace
index.upsert(
    vectors=[...],
    namespace="production"
)

# Query specific namespace
results = index.query(
    vector=[...],
    namespace="production"
)

# Delete from namespace
index.delete(
    ids=["vec1", "vec2"],
    namespace="production"
)

# List namespaces
stats = index.describe_index_stats()
namespaces = stats.namespaces

Hybrid Search Configuration¶

Dense + Sparse Vectors¶

# Upsert with sparse vectors
index.upsert(
    vectors=[
        {
            "id": "vec1",
            "values": [0.1, 0.2, ...],  # Dense vector
            "sparse_values": {
                "indices": [1, 5, 100],
                "values": [0.5, 0.3, 0.1]
            },
            "metadata": {"category": "A"}
        }
    ]
)

# Hybrid query
results = index.query(
    vector=[0.1, 0.2, ...],
    sparse_vector={
        "indices": [1, 5],
        "values": [0.5, 0.3]
    },
    top_k=10
)

HeliosDB-Specific Settings¶

Server Configuration¶

# heliosdb.toml
[pinecone]
enabled = true
port = 8080
bind = "0.0.0.0"
max_connections = 10000

[pinecone.auth]
api_key = "your-secure-api-key"

[pinecone.index]
default_dimension = 1536
default_metric = "cosine"
max_vectors = 10000000
max_metadata_size = 40960

[pinecone.performance]
batch_size = 1000
query_threads = 8
index_threads = 4

Environment Variables¶

Variable	Description
`HELIOSDB_PINECONE_PORT`	Vector API port
`HELIOSDB_PINECONE_API_KEY`	API authentication key
`HELIOSDB_PINECONE_MAX_DIMENSION`	Max vector dimension
`HELIOSDB_PINECONE_MAX_VECTORS`	Max vectors per index

Performance Tuning¶

Query Optimization¶

# Optimize for speed
results = index.query(
    vector=[...],
    top_k=10,
    include_values=False,  # Don't return vectors
    include_metadata=False  # Don't return metadata
)

# Optimize for accuracy
results = index.query(
    vector=[...],
    top_k=100,  # Fetch more, filter client-side
    include_metadata=True
)

Batch Insert Optimization¶

# Optimal batch size
BATCH_SIZE = 100  # Recommended for most cases

# Insert in batches
for i in range(0, len(vectors), BATCH_SIZE):
    batch = vectors[i:i+BATCH_SIZE]
    index.upsert(vectors=batch)

Rate Limiting¶

Default Limits¶

Operation	Limit	Notes
Upsert	100 vectors/request	Configurable
Query	10 top_k	Default
Fetch	100 IDs/request	Maximum
Delete	1000 IDs/request	Maximum

Related: README.md | COMPATIBILITY.md | EXAMPLES.md

Last Updated: December 2025