HeliosDB Geospatial Operations Guide¶
Version: 1.0 Last Updated: 2025-11-30 Status: Complete
Table of Contents¶
- Overview
- Getting Started
- Spatial Data Types
- Spatial Functions
- Spatial Indexing
- Coordinate Reference Systems
- Advanced Queries
- Performance Optimization
- Best Practices
- Troubleshooting
Overview¶
HeliosDB Geospatial provides PostGIS-compatible spatial data capabilities for location-based applications. With 11,357 lines of optimized Rust code, it delivers enterprise-grade performance for:
- Location-based search and discovery
- Route optimization and navigation
- Geographic analysis and reporting
- Proximity-based recommendations
- GIS applications and mapping services
- Real estate and property management
Why Geospatial in HeliosDB?¶
| Feature | Benefit |
|---|---|
| PostGIS Compatible | Migrate existing PostGIS applications seamlessly |
| High Performance | KD-tree indexing with SIMD optimization |
| Distributed Support | Efficient sharding across regions |
| Full SQL Integration | Use spatial queries alongside standard SQL |
| Real-Time Updates | No data warehouse delays |
| Topology Support | Advanced geometric relationships |
Getting Started¶
Step 1: Enable Spatial Support¶
-- Create table with spatial columns
CREATE TABLE locations (
id SERIAL PRIMARY KEY,
name VARCHAR(255),
coordinates GEOMETRY, -- Store latitude/longitude as POINT
service_area GEOMETRY -- Store polygon areas
);
-- Or enable on existing table
ALTER TABLE restaurants ADD COLUMN location GEOMETRY(POINT, 4326);
Step 2: Insert Spatial Data¶
-- Insert a point (latitude, longitude)
INSERT INTO locations (name, coordinates)
VALUES ('Eiffel Tower', ST_GeomFromText('POINT(48.8584 2.2945)', 4326));
-- Insert multiple points
INSERT INTO locations (name, coordinates)
SELECT
name,
ST_GeomFromText(CONCAT('POINT(', lat, ' ', lon, ')'), 4326)
FROM import_data;
-- Insert polygon (area)
INSERT INTO locations (name, service_area)
VALUES (
'Downtown Zone',
ST_GeomFromText('POLYGON((40.7128 -74.0060, 40.7129 -74.0061, 40.7130 -74.0062, 40.7128 -74.0060))', 4326)
);
Step 3: Query Spatial Data¶
-- Find all locations within 5km of a point
SELECT name, coordinates
FROM locations
WHERE ST_Distance(coordinates, ST_GeomFromText('POINT(40.7128 -74.0060)', 4326)) <= 5000;
-- Find locations within a polygon
SELECT name, coordinates
FROM locations
WHERE ST_Contains(service_area, coordinates);
-- Calculate distance between two points
SELECT
name,
ST_Distance(coordinates, ST_GeomFromText('POINT(40.7128 -74.0060)', 4326)) / 1000 as distance_km
FROM locations
ORDER BY distance_km
LIMIT 10;
Spatial Data Types¶
GEOMETRY Type¶
-- Create a table with various geometry types
CREATE TABLE geo_features (
id SERIAL PRIMARY KEY,
name VARCHAR(255),
point_geom GEOMETRY(POINT), -- Single point
line_geom GEOMETRY(LINESTRING), -- Line segment
polygon_geom GEOMETRY(POLYGON), -- Closed area
multi_point GEOMETRY(MULTIPOINT), -- Multiple points
multi_line GEOMETRY(MULTILINESTRING), -- Multiple lines
multi_polygon GEOMETRY(MULTIPOLYGON) -- Multiple areas
);
GEOGRAPHY Type (Spherical)¶
For Earth-based calculations with proper accounting for curvature:
-- GEOGRAPHY automatically handles Earth curvature
CREATE TABLE cities (
id SERIAL PRIMARY KEY,
name VARCHAR(255),
location GEOGRAPHY(POINT, 4326) -- Lat/Long with Earth radius
);
-- Distances are in meters (great circle distance)
SELECT
name,
ST_Distance(location, ST_GeographyFromText('POINT(0 0)')) / 1000 as distance_km
FROM cities;
WKT (Well-Known Text) Format¶
-- POINT: (longitude latitude)
POINT(0 0)
-- LINESTRING: connected points
LINESTRING(0 0, 1 1, 2 0)
-- POLYGON: closed ring, may have holes
POLYGON((0 0, 4 0, 4 4, 0 4, 0 0), (1 1, 2 1, 2 2, 1 2, 1 1))
-- MULTIPOINT: set of points
MULTIPOINT((0 0), (1 1), (1 2))
-- MULTILINESTRING: set of lines
MULTILINESTRING((0 0, 1 1, 1 2), (2 3, 3 2, 5 4))
-- MULTIPOLYGON: set of polygons
MULTIPOLYGON(((0 0, 4 0, 4 4, 0 4, 0 0)), ((5 5, 7 5, 7 7, 5 7, 5 5)))
GeoJSON Format¶
-- Insert from GeoJSON
INSERT INTO locations (name, coordinates)
SELECT
properties->>'name',
ST_GeomFromGeoJSON(geometry)
FROM import_geojson;
-- Export to GeoJSON
SELECT
jsonb_build_object(
'type', 'Feature',
'properties', jsonb_build_object('name', name),
'geometry', ST_AsGeoJSON(coordinates)::jsonb
)
FROM locations;
Spatial Functions¶
Relationship Functions¶
ST_Contains¶
Checks if one geometry completely contains another.
-- Find all cities in a state
SELECT city_name
FROM cities
WHERE ST_Contains(state_boundary, city_center);
ST_Within¶
Checks if geometry is within another.
-- Find all parking lots within downtown area
SELECT location_name
FROM parking_lots
WHERE ST_Within(location_geom, downtown_polygon);
ST_Intersects¶
Checks if geometries overlap or touch.
-- Find all roads that cross a river
SELECT road_name
FROM roads
WHERE ST_Intersects(road_geom, river_geom);
ST_Touches¶
Checks if geometries touch at boundary.
-- Find adjacent properties
SELECT p1.id, p2.id
FROM properties p1
JOIN properties p2 ON ST_Touches(p1.boundary, p2.boundary)
WHERE p1.id < p2.id;
ST_Overlaps¶
Checks if geometries overlap.
-- Find overlapping flood zones
SELECT z1.zone_id, z2.zone_id
FROM flood_zones z1
JOIN flood_zones z2 ON ST_Overlaps(z1.polygon, z2.polygon)
WHERE z1.id < z2.id;
ST_Disjoint¶
Checks if geometries don't intersect.
-- Find buildings not affected by evacuation zone
SELECT building_id
FROM buildings
WHERE ST_Disjoint(building_geom, evacuation_zone);
Distance Functions¶
-- Planar distance (fastest, use for local areas)
SELECT ST_Distance(point1, point2);
-- Spherical distance (accurate for Earth)
SELECT ST_Distance(geography1::GEOGRAPHY, geography2::GEOGRAPHY);
-- Distance within 5km
SELECT * FROM stores
WHERE ST_Distance(location, user_location) < 5000;
Measurement Functions¶
-- Get area of polygon in square meters
SELECT ST_Area(polygon_geom) as area_m2;
-- Get length of line in meters
SELECT ST_Length(line_geom) as length_m;
-- Get bounding box
SELECT ST_Envelope(geometry) as bbox;
-- Get centroid of polygon
SELECT ST_Centroid(polygon_geom) as center;
-- Get perimeter of polygon
SELECT ST_Perimeter(polygon_geom) as perimeter_m;
Geometric Operations¶
-- Buffer: create zone around geometry
SELECT ST_Buffer(point_geom, 1000) as buffer_zone; -- 1km buffer
-- Union: combine multiple geometries
SELECT ST_Union(ARRAY[geom1, geom2, geom3]) as combined;
-- Intersection: find overlap
SELECT ST_Intersection(polygon1, polygon2) as overlap;
-- Difference: subtract geometry
SELECT ST_Difference(polygon1, polygon2) as result;
-- Simplify: reduce points in geometry (faster processing)
SELECT ST_Simplify(complex_geom, 10) as simplified;
Constructors¶
-- Create from text
ST_GeomFromText('POINT(0 0)', 4326)
ST_GeomFromText('LINESTRING(0 0, 1 1, 2 2)', 4326)
-- Create from coordinates
ST_MakePoint(longitude, latitude)
ST_MakePointZ(longitude, latitude, elevation)
-- Create from GeoJSON
ST_GeomFromGeoJSON('{"type":"Point","coordinates":[0,0]}')
-- Create from WKB (binary)
ST_GeomFromWKB(binary_data)
-- Create specific types
ST_Point(lat, lon)
ST_LineString(point_array)
ST_Polygon(exterior_ring, interior_rings)
Spatial Indexing¶
Creating Spatial Indexes¶
-- Create GIST index (most common for PostGIS-like operations)
CREATE INDEX idx_locations_gist ON locations USING GIST(coordinates);
-- Create BRIN index (for very large tables, lower memory)
CREATE INDEX idx_locations_brin ON locations USING BRIN(coordinates);
-- Create KD-tree index (HeliosDB optimized)
CREATE INDEX idx_locations_kdtree ON locations USING KDTREE(coordinates);
Index Types Comparison¶
| Index Type | Use Case | Memory | Speed | Notes |
|---|---|---|---|---|
| GIST | General purpose | Medium | Fast | Best for most queries |
| BRIN | Very large tables | Low | Medium | Great for sorted data |
| KD-tree | Point searches | Low | Very Fast | HeliosDB optimized |
| R-tree | Box/Polygon queries | Medium | Fast | Specialized |
Query Planning with Spatial Indexes¶
-- Check if index is used
EXPLAIN ANALYZE
SELECT name
FROM locations
WHERE ST_Distance(coordinates, ST_GeomFromText('POINT(0 0)')) < 5000;
-- Should show "Index Scan" for efficient queries
-- If showing "Seq Scan", index may need rebuilding
Index Maintenance¶
-- Analyze index statistics
ANALYZE locations;
-- Rebuild index
REINDEX INDEX idx_locations_gist;
-- Check index size
SELECT
schemaname,
tablename,
indexname,
pg_size_pretty(pg_relation_size(indexrelid)) as index_size
FROM pg_indexes
WHERE tablename = 'locations';
Coordinate Reference Systems (CRS)¶
Understanding SRID¶
SRID (Spatial Reference ID) defines the coordinate system.
-- WGS84 (GPS coordinates) - SRID 4326
ST_GeomFromText('POINT(40.7128 -74.0060)', 4326)
-- Web Mercator (Web mapping) - SRID 3857
ST_Transform(geom, 3857)
-- UTM Zone 18N (Eastern US) - SRID 32618
ST_Transform(geom, 32618)
Common SRIDs¶
| SRID | CRS Name | Use Case |
|---|---|---|
| 4326 | WGS84 | GPS, global coordinates (lat/lon) |
| 3857 | Web Mercator | Web maps, Google Maps, OpenStreetMap |
| 32633 | UTM Zone 33N | Europe |
| 32618 | UTM Zone 18N | Eastern North America |
| 2154 | Lambert 93 | France |
| 28350 | MGA Zone 50 | Australia |
Transforming Between CRS¶
-- Convert WGS84 to Web Mercator for mapping
SELECT
name,
ST_Transform(coordinates, 3857) as web_mercator_coords
FROM locations;
-- Store data in one CRS, transform on query
UPDATE locations
SET coordinates = ST_Transform(coordinates, 4326)
WHERE ST_SRID(coordinates) != 4326;
Advanced Queries¶
Nearest Neighbor Search¶
-- Find 10 nearest restaurants to user location
SELECT
id,
name,
ST_Distance(coordinates, user_location) as distance_m
FROM restaurants
WHERE ST_DWithin(coordinates, user_location, 50000) -- Within 50km
ORDER BY distance_m
LIMIT 10;
Spatial Joins¶
-- Find all customers in each delivery zone
SELECT
z.zone_id,
z.zone_name,
COUNT(c.customer_id) as customer_count
FROM delivery_zones z
LEFT JOIN customers c ON ST_Contains(z.polygon, c.location)
GROUP BY z.zone_id, z.zone_name;
Aggregate Spatial Functions¶
-- Create service area from all store locations (convex hull)
SELECT ST_ConvexHull(ST_Union(location_geom)) as service_area
FROM stores;
-- Calculate total coverage area
SELECT ST_Area(ST_Union(coverage_polygon)) as total_coverage_m2
FROM service_zones;
Complex Filtering¶
-- Find properties in multiple zones with area filtering
SELECT
p.property_id,
p.address,
COUNT(DISTINCT z.zone_id) as zone_count,
ST_Area(p.boundary) as lot_size
FROM properties p
JOIN zones z ON ST_Overlaps(p.boundary, z.polygon)
WHERE ST_Area(p.boundary) > 100 -- At least 100 square meters
GROUP BY p.property_id, p.address
HAVING COUNT(DISTINCT z.zone_id) >= 2;
Route Optimization¶
-- Find shortest path connecting all delivery points
WITH route AS (
SELECT
delivery_id,
location,
ST_Distance(location, LAG(location) OVER (ORDER BY delivery_seq)) as segment_distance
FROM deliveries
ORDER BY delivery_seq
)
SELECT
SUM(segment_distance) as total_route_distance,
COUNT(*) as stop_count
FROM route;
Performance Optimization¶
Index Selection¶
-- For distance queries, use spatial index
CREATE INDEX idx_nearby ON stores USING GIST(location);
-- Query uses index effectively
SELECT * FROM stores
WHERE ST_Distance(location, point) < 5000;
Query Optimization Tips¶
-- ❌ BAD: Functions on both sides prevent index use
SELECT * FROM stores
WHERE ST_Distance(location, center) < distance_func(param);
-- GOOD: Keep distance function on one side
SELECT * FROM stores
WHERE ST_Distance(location, center) < 5000;
-- GOOD: Use ST_DWithin for distance (more efficient)
SELECT * FROM stores
WHERE ST_DWithin(location, center, 5000);
Geometry Simplification¶
-- For zoomed-out maps, simplify geometries
SELECT
name,
ST_Simplify(boundary, 100) as simplified_boundary -- 100m tolerance
FROM regions
WHERE zoom_level < 10;
Partitioning Large Datasets¶
-- Partition by geographic region
CREATE TABLE locations_us_west PARTITION OF locations
FOR VALUES IN ('US_WEST');
CREATE TABLE locations_us_east PARTITION OF locations
FOR VALUES IN ('US_EAST');
-- Queries can now skip irrelevant partitions
Batch Operations¶
-- Batch insert with single index update
BEGIN;
INSERT INTO locations (name, coordinates)
SELECT name, ST_GeomFromText(geom) FROM import_data;
ANALYZE locations;
COMMIT;
Best Practices¶
1. Use Appropriate CRS¶
-- GOOD: Use appropriate CRS for calculations
INSERT INTO locations (coordinates)
SELECT ST_Transform(ST_GeomFromText(wgs84_text, 4326), 32618); -- Local UTM
-- ❌ BAD: Lon/lat calculations are inaccurate
INSERT INTO locations (coordinates)
SELECT ST_GeomFromText(wgs84_text, 4326);
2. Validate Geometries¶
-- Check for invalid geometries
SELECT id, ST_IsValid(geometry) as is_valid
FROM locations
WHERE NOT ST_IsValid(geometry);
-- Make geometries valid
UPDATE locations
SET geometry = ST_MakeValid(geometry)
WHERE NOT ST_IsValid(geometry);
3. Index Critical Columns¶
-- Always index primary spatial columns
CREATE INDEX idx_delivery_location ON deliveries USING GIST(location);
CREATE INDEX idx_store_location ON stores USING GIST(location);
-- Index derived columns if frequently queried
ALTER TABLE stores ADD COLUMN location_buffered GEOMETRY;
CREATE INDEX idx_store_buffer ON stores USING GIST(location_buffered);
4. Monitor Index Performance¶
-- Check index usage
SELECT
schemaname,
tablename,
indexname,
idx_scan as index_scans,
idx_tup_read as tuples_read,
idx_tup_fetch as tuples_returned
FROM pg_stat_user_indexes
WHERE tablename LIKE '%location%'
ORDER BY idx_scan DESC;
5. Use Appropriate Units¶
-- GOOD: Consistent units
SELECT * FROM stores
WHERE ST_Distance(location, center) < 5000; -- 5000 meters
-- ❌ BAD: Mixed units
SELECT * FROM stores
WHERE ST_Distance(location, center) < 5; -- 5 what? Ambiguous
Troubleshooting¶
Issue 1: Queries Running Slow¶
Symptoms: - Spatial queries taking >1 second - EXPLAIN shows Seq Scan instead of Index Scan
Diagnosis:
-- Check if index exists
SELECT * FROM pg_indexes
WHERE tablename = 'locations' AND indexname LIKE '%geom%';
-- Check table statistics
SELECT
n_live_tup as row_count,
n_dead_tup as dead_rows,
last_vacuum
FROM pg_stat_user_tables
WHERE relname = 'locations';
Solution:
-- Create spatial index
CREATE INDEX idx_locations_gist ON locations USING GIST(geometry);
-- Analyze statistics
ANALYZE locations;
-- Vacuum to clean dead rows
VACUUM ANALYZE locations;
Issue 2: Invalid Geometries¶
Symptoms: - ST_Contains returning no results - ST_Distance producing errors
Diagnosis:
-- Check for invalid geometries
SELECT id, ST_IsValid(geometry) as valid, ST_IsValidReason(geometry)
FROM locations
WHERE NOT ST_IsValid(geometry);
Solution:
-- Fix invalid geometries
UPDATE locations
SET geometry = ST_MakeValid(geometry)
WHERE NOT ST_IsValid(geometry);
-- Or identify and quarantine
ALTER TABLE locations ADD COLUMN validation_status VARCHAR;
UPDATE locations
SET validation_status = CASE
WHEN ST_IsValid(geometry) THEN 'VALID'
ELSE ST_IsValidReason(geometry)
END;
Issue 3: High Memory Usage¶
Symptoms: - Database server running out of memory - Slow spatial queries on large datasets
Solution:
-- Use BRIN index for very large tables (lower memory)
DROP INDEX idx_locations_gist;
CREATE INDEX idx_locations_brin ON locations USING BRIN(geometry);
-- Simplify geometries for display
SELECT ST_Simplify(geometry, 50) FROM locations; -- 50m tolerance
-- Partition table by region
CREATE TABLE locations_west PARTITION OF locations
FOR VALUES IN ('western_region');
Issue 4: Coordinate System Confusion¶
Symptoms: - Results appear on wrong side of Earth - Distances don't match expectations
Diagnosis:
-- Check SRID
SELECT ST_SRID(geometry) FROM locations LIMIT 1;
-- Result: 0 (undefined) or wrong number
-- Check bounds
SELECT ST_XMin(geometry), ST_XMax(geometry), ST_YMin(geometry), ST_YMax(geometry)
FROM locations LIMIT 5;
Solution:
-- Set correct SRID
UPDATE locations
SET geometry = ST_SetSRID(geometry, 4326)
WHERE ST_SRID(geometry) = 0;
-- Transform to correct CRS if needed
SELECT ST_Transform(geometry, 4326) FROM locations;
Summary¶
HeliosDB Geospatial Operations enables powerful location-based applications with:
- PostGIS Compatibility - Familiar function names and behavior
- High Performance - Optimized spatial indexing with KD-trees
- Full SQL Integration - Combine spatial with standard queries
- Flexible Coordinate Systems - Support for any SRID
- Distributed Scaling - Efficient sharding across regions
Start with basic spatial indexes on your location columns, then gradually adopt more advanced spatial functions for complex geographic queries.
Related Documentation: - PostgreSQL 17 Protocol Guide - PostGIS Compatibility Matrix - Query Optimization Guide