Skip to content

HeliosDB Geospatial Operations Guide

Version: 1.0 Last Updated: 2025-11-30 Status: Complete


Table of Contents

  1. Overview
  2. Getting Started
  3. Spatial Data Types
  4. Spatial Functions
  5. Spatial Indexing
  6. Coordinate Reference Systems
  7. Advanced Queries
  8. Performance Optimization
  9. Best Practices
  10. Troubleshooting

Overview

HeliosDB Geospatial provides PostGIS-compatible spatial data capabilities for location-based applications. With 11,357 lines of optimized Rust code, it delivers enterprise-grade performance for:

  • Location-based search and discovery
  • Route optimization and navigation
  • Geographic analysis and reporting
  • Proximity-based recommendations
  • GIS applications and mapping services
  • Real estate and property management

Why Geospatial in HeliosDB?

Feature Benefit
PostGIS Compatible Migrate existing PostGIS applications seamlessly
High Performance KD-tree indexing with SIMD optimization
Distributed Support Efficient sharding across regions
Full SQL Integration Use spatial queries alongside standard SQL
Real-Time Updates No data warehouse delays
Topology Support Advanced geometric relationships

Getting Started

Step 1: Enable Spatial Support

-- Create table with spatial columns
CREATE TABLE locations (
  id SERIAL PRIMARY KEY,
  name VARCHAR(255),
  coordinates GEOMETRY,  -- Store latitude/longitude as POINT
  service_area GEOMETRY  -- Store polygon areas
);

-- Or enable on existing table
ALTER TABLE restaurants ADD COLUMN location GEOMETRY(POINT, 4326);

Step 2: Insert Spatial Data

-- Insert a point (latitude, longitude)
INSERT INTO locations (name, coordinates)
VALUES ('Eiffel Tower', ST_GeomFromText('POINT(48.8584 2.2945)', 4326));

-- Insert multiple points
INSERT INTO locations (name, coordinates)
SELECT
  name,
  ST_GeomFromText(CONCAT('POINT(', lat, ' ', lon, ')'), 4326)
FROM import_data;

-- Insert polygon (area)
INSERT INTO locations (name, service_area)
VALUES (
  'Downtown Zone',
  ST_GeomFromText('POLYGON((40.7128 -74.0060, 40.7129 -74.0061, 40.7130 -74.0062, 40.7128 -74.0060))', 4326)
);

Step 3: Query Spatial Data

-- Find all locations within 5km of a point
SELECT name, coordinates
FROM locations
WHERE ST_Distance(coordinates, ST_GeomFromText('POINT(40.7128 -74.0060)', 4326)) <= 5000;

-- Find locations within a polygon
SELECT name, coordinates
FROM locations
WHERE ST_Contains(service_area, coordinates);

-- Calculate distance between two points
SELECT
  name,
  ST_Distance(coordinates, ST_GeomFromText('POINT(40.7128 -74.0060)', 4326)) / 1000 as distance_km
FROM locations
ORDER BY distance_km
LIMIT 10;

Spatial Data Types

GEOMETRY Type

-- Create a table with various geometry types
CREATE TABLE geo_features (
  id SERIAL PRIMARY KEY,
  name VARCHAR(255),
  point_geom GEOMETRY(POINT),          -- Single point
  line_geom GEOMETRY(LINESTRING),      -- Line segment
  polygon_geom GEOMETRY(POLYGON),      -- Closed area
  multi_point GEOMETRY(MULTIPOINT),    -- Multiple points
  multi_line GEOMETRY(MULTILINESTRING), -- Multiple lines
  multi_polygon GEOMETRY(MULTIPOLYGON)  -- Multiple areas
);

GEOGRAPHY Type (Spherical)

For Earth-based calculations with proper accounting for curvature:

-- GEOGRAPHY automatically handles Earth curvature
CREATE TABLE cities (
  id SERIAL PRIMARY KEY,
  name VARCHAR(255),
  location GEOGRAPHY(POINT, 4326)  -- Lat/Long with Earth radius
);

-- Distances are in meters (great circle distance)
SELECT
  name,
  ST_Distance(location, ST_GeographyFromText('POINT(0 0)')) / 1000 as distance_km
FROM cities;

WKT (Well-Known Text) Format

-- POINT: (longitude latitude)
POINT(0 0)

-- LINESTRING: connected points
LINESTRING(0 0, 1 1, 2 0)

-- POLYGON: closed ring, may have holes
POLYGON((0 0, 4 0, 4 4, 0 4, 0 0), (1 1, 2 1, 2 2, 1 2, 1 1))

-- MULTIPOINT: set of points
MULTIPOINT((0 0), (1 1), (1 2))

-- MULTILINESTRING: set of lines
MULTILINESTRING((0 0, 1 1, 1 2), (2 3, 3 2, 5 4))

-- MULTIPOLYGON: set of polygons
MULTIPOLYGON(((0 0, 4 0, 4 4, 0 4, 0 0)), ((5 5, 7 5, 7 7, 5 7, 5 5)))

GeoJSON Format

-- Insert from GeoJSON
INSERT INTO locations (name, coordinates)
SELECT
  properties->>'name',
  ST_GeomFromGeoJSON(geometry)
FROM import_geojson;

-- Export to GeoJSON
SELECT
  jsonb_build_object(
    'type', 'Feature',
    'properties', jsonb_build_object('name', name),
    'geometry', ST_AsGeoJSON(coordinates)::jsonb
  )
FROM locations;

Spatial Functions

Relationship Functions

ST_Contains

Checks if one geometry completely contains another.

-- Find all cities in a state
SELECT city_name
FROM cities
WHERE ST_Contains(state_boundary, city_center);

ST_Within

Checks if geometry is within another.

-- Find all parking lots within downtown area
SELECT location_name
FROM parking_lots
WHERE ST_Within(location_geom, downtown_polygon);

ST_Intersects

Checks if geometries overlap or touch.

-- Find all roads that cross a river
SELECT road_name
FROM roads
WHERE ST_Intersects(road_geom, river_geom);

ST_Touches

Checks if geometries touch at boundary.

-- Find adjacent properties
SELECT p1.id, p2.id
FROM properties p1
JOIN properties p2 ON ST_Touches(p1.boundary, p2.boundary)
WHERE p1.id < p2.id;

ST_Overlaps

Checks if geometries overlap.

-- Find overlapping flood zones
SELECT z1.zone_id, z2.zone_id
FROM flood_zones z1
JOIN flood_zones z2 ON ST_Overlaps(z1.polygon, z2.polygon)
WHERE z1.id < z2.id;

ST_Disjoint

Checks if geometries don't intersect.

-- Find buildings not affected by evacuation zone
SELECT building_id
FROM buildings
WHERE ST_Disjoint(building_geom, evacuation_zone);

Distance Functions

-- Planar distance (fastest, use for local areas)
SELECT ST_Distance(point1, point2);

-- Spherical distance (accurate for Earth)
SELECT ST_Distance(geography1::GEOGRAPHY, geography2::GEOGRAPHY);

-- Distance within 5km
SELECT * FROM stores
WHERE ST_Distance(location, user_location) < 5000;

Measurement Functions

-- Get area of polygon in square meters
SELECT ST_Area(polygon_geom) as area_m2;

-- Get length of line in meters
SELECT ST_Length(line_geom) as length_m;

-- Get bounding box
SELECT ST_Envelope(geometry) as bbox;

-- Get centroid of polygon
SELECT ST_Centroid(polygon_geom) as center;

-- Get perimeter of polygon
SELECT ST_Perimeter(polygon_geom) as perimeter_m;

Geometric Operations

-- Buffer: create zone around geometry
SELECT ST_Buffer(point_geom, 1000) as buffer_zone;  -- 1km buffer

-- Union: combine multiple geometries
SELECT ST_Union(ARRAY[geom1, geom2, geom3]) as combined;

-- Intersection: find overlap
SELECT ST_Intersection(polygon1, polygon2) as overlap;

-- Difference: subtract geometry
SELECT ST_Difference(polygon1, polygon2) as result;

-- Simplify: reduce points in geometry (faster processing)
SELECT ST_Simplify(complex_geom, 10) as simplified;

Constructors

-- Create from text
ST_GeomFromText('POINT(0 0)', 4326)
ST_GeomFromText('LINESTRING(0 0, 1 1, 2 2)', 4326)

-- Create from coordinates
ST_MakePoint(longitude, latitude)
ST_MakePointZ(longitude, latitude, elevation)

-- Create from GeoJSON
ST_GeomFromGeoJSON('{"type":"Point","coordinates":[0,0]}')

-- Create from WKB (binary)
ST_GeomFromWKB(binary_data)

-- Create specific types
ST_Point(lat, lon)
ST_LineString(point_array)
ST_Polygon(exterior_ring, interior_rings)

Spatial Indexing

Creating Spatial Indexes

-- Create GIST index (most common for PostGIS-like operations)
CREATE INDEX idx_locations_gist ON locations USING GIST(coordinates);

-- Create BRIN index (for very large tables, lower memory)
CREATE INDEX idx_locations_brin ON locations USING BRIN(coordinates);

-- Create KD-tree index (HeliosDB optimized)
CREATE INDEX idx_locations_kdtree ON locations USING KDTREE(coordinates);

Index Types Comparison

Index Type Use Case Memory Speed Notes
GIST General purpose Medium Fast Best for most queries
BRIN Very large tables Low Medium Great for sorted data
KD-tree Point searches Low Very Fast HeliosDB optimized
R-tree Box/Polygon queries Medium Fast Specialized

Query Planning with Spatial Indexes

-- Check if index is used
EXPLAIN ANALYZE
SELECT name
FROM locations
WHERE ST_Distance(coordinates, ST_GeomFromText('POINT(0 0)')) < 5000;

-- Should show "Index Scan" for efficient queries
-- If showing "Seq Scan", index may need rebuilding

Index Maintenance

-- Analyze index statistics
ANALYZE locations;

-- Rebuild index
REINDEX INDEX idx_locations_gist;

-- Check index size
SELECT
  schemaname,
  tablename,
  indexname,
  pg_size_pretty(pg_relation_size(indexrelid)) as index_size
FROM pg_indexes
WHERE tablename = 'locations';

Coordinate Reference Systems (CRS)

Understanding SRID

SRID (Spatial Reference ID) defines the coordinate system.

-- WGS84 (GPS coordinates) - SRID 4326
ST_GeomFromText('POINT(40.7128 -74.0060)', 4326)

-- Web Mercator (Web mapping) - SRID 3857
ST_Transform(geom, 3857)

-- UTM Zone 18N (Eastern US) - SRID 32618
ST_Transform(geom, 32618)

Common SRIDs

SRID CRS Name Use Case
4326 WGS84 GPS, global coordinates (lat/lon)
3857 Web Mercator Web maps, Google Maps, OpenStreetMap
32633 UTM Zone 33N Europe
32618 UTM Zone 18N Eastern North America
2154 Lambert 93 France
28350 MGA Zone 50 Australia

Transforming Between CRS

-- Convert WGS84 to Web Mercator for mapping
SELECT
  name,
  ST_Transform(coordinates, 3857) as web_mercator_coords
FROM locations;

-- Store data in one CRS, transform on query
UPDATE locations
SET coordinates = ST_Transform(coordinates, 4326)
WHERE ST_SRID(coordinates) != 4326;

Advanced Queries

-- Find 10 nearest restaurants to user location
SELECT
  id,
  name,
  ST_Distance(coordinates, user_location) as distance_m
FROM restaurants
WHERE ST_DWithin(coordinates, user_location, 50000)  -- Within 50km
ORDER BY distance_m
LIMIT 10;

Spatial Joins

-- Find all customers in each delivery zone
SELECT
  z.zone_id,
  z.zone_name,
  COUNT(c.customer_id) as customer_count
FROM delivery_zones z
LEFT JOIN customers c ON ST_Contains(z.polygon, c.location)
GROUP BY z.zone_id, z.zone_name;

Aggregate Spatial Functions

-- Create service area from all store locations (convex hull)
SELECT ST_ConvexHull(ST_Union(location_geom)) as service_area
FROM stores;

-- Calculate total coverage area
SELECT ST_Area(ST_Union(coverage_polygon)) as total_coverage_m2
FROM service_zones;

Complex Filtering

-- Find properties in multiple zones with area filtering
SELECT
  p.property_id,
  p.address,
  COUNT(DISTINCT z.zone_id) as zone_count,
  ST_Area(p.boundary) as lot_size
FROM properties p
JOIN zones z ON ST_Overlaps(p.boundary, z.polygon)
WHERE ST_Area(p.boundary) > 100  -- At least 100 square meters
GROUP BY p.property_id, p.address
HAVING COUNT(DISTINCT z.zone_id) >= 2;

Route Optimization

-- Find shortest path connecting all delivery points
WITH route AS (
  SELECT
    delivery_id,
    location,
    ST_Distance(location, LAG(location) OVER (ORDER BY delivery_seq)) as segment_distance
  FROM deliveries
  ORDER BY delivery_seq
)
SELECT
  SUM(segment_distance) as total_route_distance,
  COUNT(*) as stop_count
FROM route;

Performance Optimization

Index Selection

-- For distance queries, use spatial index
CREATE INDEX idx_nearby ON stores USING GIST(location);

-- Query uses index effectively
SELECT * FROM stores
WHERE ST_Distance(location, point) < 5000;

Query Optimization Tips

-- ❌ BAD: Functions on both sides prevent index use
SELECT * FROM stores
WHERE ST_Distance(location, center) < distance_func(param);

--  GOOD: Keep distance function on one side
SELECT * FROM stores
WHERE ST_Distance(location, center) < 5000;

--  GOOD: Use ST_DWithin for distance (more efficient)
SELECT * FROM stores
WHERE ST_DWithin(location, center, 5000);

Geometry Simplification

-- For zoomed-out maps, simplify geometries
SELECT
  name,
  ST_Simplify(boundary, 100) as simplified_boundary  -- 100m tolerance
FROM regions
WHERE zoom_level < 10;

Partitioning Large Datasets

-- Partition by geographic region
CREATE TABLE locations_us_west PARTITION OF locations
  FOR VALUES IN ('US_WEST');

CREATE TABLE locations_us_east PARTITION OF locations
  FOR VALUES IN ('US_EAST');

-- Queries can now skip irrelevant partitions

Batch Operations

-- Batch insert with single index update
BEGIN;
  INSERT INTO locations (name, coordinates)
  SELECT name, ST_GeomFromText(geom) FROM import_data;
  ANALYZE locations;
COMMIT;

Best Practices

1. Use Appropriate CRS

--  GOOD: Use appropriate CRS for calculations
INSERT INTO locations (coordinates)
SELECT ST_Transform(ST_GeomFromText(wgs84_text, 4326), 32618);  -- Local UTM

-- ❌ BAD: Lon/lat calculations are inaccurate
INSERT INTO locations (coordinates)
SELECT ST_GeomFromText(wgs84_text, 4326);

2. Validate Geometries

-- Check for invalid geometries
SELECT id, ST_IsValid(geometry) as is_valid
FROM locations
WHERE NOT ST_IsValid(geometry);

-- Make geometries valid
UPDATE locations
SET geometry = ST_MakeValid(geometry)
WHERE NOT ST_IsValid(geometry);

3. Index Critical Columns

-- Always index primary spatial columns
CREATE INDEX idx_delivery_location ON deliveries USING GIST(location);
CREATE INDEX idx_store_location ON stores USING GIST(location);

-- Index derived columns if frequently queried
ALTER TABLE stores ADD COLUMN location_buffered GEOMETRY;
CREATE INDEX idx_store_buffer ON stores USING GIST(location_buffered);

4. Monitor Index Performance

-- Check index usage
SELECT
  schemaname,
  tablename,
  indexname,
  idx_scan as index_scans,
  idx_tup_read as tuples_read,
  idx_tup_fetch as tuples_returned
FROM pg_stat_user_indexes
WHERE tablename LIKE '%location%'
ORDER BY idx_scan DESC;

5. Use Appropriate Units

--  GOOD: Consistent units
SELECT * FROM stores
WHERE ST_Distance(location, center) < 5000;  -- 5000 meters

-- ❌ BAD: Mixed units
SELECT * FROM stores
WHERE ST_Distance(location, center) < 5;  -- 5 what? Ambiguous

Troubleshooting

Issue 1: Queries Running Slow

Symptoms: - Spatial queries taking >1 second - EXPLAIN shows Seq Scan instead of Index Scan

Diagnosis:

-- Check if index exists
SELECT * FROM pg_indexes
WHERE tablename = 'locations' AND indexname LIKE '%geom%';

-- Check table statistics
SELECT
  n_live_tup as row_count,
  n_dead_tup as dead_rows,
  last_vacuum
FROM pg_stat_user_tables
WHERE relname = 'locations';

Solution:

-- Create spatial index
CREATE INDEX idx_locations_gist ON locations USING GIST(geometry);

-- Analyze statistics
ANALYZE locations;

-- Vacuum to clean dead rows
VACUUM ANALYZE locations;

Issue 2: Invalid Geometries

Symptoms: - ST_Contains returning no results - ST_Distance producing errors

Diagnosis:

-- Check for invalid geometries
SELECT id, ST_IsValid(geometry) as valid, ST_IsValidReason(geometry)
FROM locations
WHERE NOT ST_IsValid(geometry);

Solution:

-- Fix invalid geometries
UPDATE locations
SET geometry = ST_MakeValid(geometry)
WHERE NOT ST_IsValid(geometry);

-- Or identify and quarantine
ALTER TABLE locations ADD COLUMN validation_status VARCHAR;
UPDATE locations
SET validation_status = CASE
  WHEN ST_IsValid(geometry) THEN 'VALID'
  ELSE ST_IsValidReason(geometry)
END;

Issue 3: High Memory Usage

Symptoms: - Database server running out of memory - Slow spatial queries on large datasets

Solution:

-- Use BRIN index for very large tables (lower memory)
DROP INDEX idx_locations_gist;
CREATE INDEX idx_locations_brin ON locations USING BRIN(geometry);

-- Simplify geometries for display
SELECT ST_Simplify(geometry, 50) FROM locations;  -- 50m tolerance

-- Partition table by region
CREATE TABLE locations_west PARTITION OF locations
  FOR VALUES IN ('western_region');

Issue 4: Coordinate System Confusion

Symptoms: - Results appear on wrong side of Earth - Distances don't match expectations

Diagnosis:

-- Check SRID
SELECT ST_SRID(geometry) FROM locations LIMIT 1;
-- Result: 0 (undefined) or wrong number

-- Check bounds
SELECT ST_XMin(geometry), ST_XMax(geometry), ST_YMin(geometry), ST_YMax(geometry)
FROM locations LIMIT 5;

Solution:

-- Set correct SRID
UPDATE locations
SET geometry = ST_SetSRID(geometry, 4326)
WHERE ST_SRID(geometry) = 0;

-- Transform to correct CRS if needed
SELECT ST_Transform(geometry, 4326) FROM locations;


Summary

HeliosDB Geospatial Operations enables powerful location-based applications with:

  • PostGIS Compatibility - Familiar function names and behavior
  • High Performance - Optimized spatial indexing with KD-trees
  • Full SQL Integration - Combine spatial with standard queries
  • Flexible Coordinate Systems - Support for any SRID
  • Distributed Scaling - Efficient sharding across regions

Start with basic spatial indexes on your location columns, then gradually adopt more advanced spatial functions for complex geographic queries.


Related Documentation: - PostgreSQL 17 Protocol Guide - PostGIS Compatibility Matrix - Query Optimization Guide