Essential PostgreSQL Extensions Every Developer Should Know
Essential PostgreSQL Extensions Every Developer Should Know
PostgreSQL's extension system is one of its greatest strengths. Extensions let you add capabilities that would require switching databases entirely in other ecosystems. This guide covers the extensions that solve real developer problems: vector search for AI, time series optimization, scheduled jobs, geospatial queries, and performance analysis.
Why PostgreSQL Extensions Matter
Extensions add functionality to Postgres without forking the codebase. They're:
- First-class citizens — managed via SQL (
CREATE EXTENSION) - Versioned — upgradable with
ALTER EXTENSION - Composable — use multiple extensions together
- Packaged — available via system package managers or source
This means you can add vector search, time series optimizations, or cron jobs without leaving SQL.
pgvector: Vector Search and Embeddings
pgvector adds vector similarity search to Postgres. This is essential for AI applications using embeddings (text, images, audio).
Installation
# Ubuntu/Debian
sudo apt install postgresql-16-pgvector
# macOS (Homebrew)
brew install pgvector
# From source
git clone https://github.com/pgvector/pgvector.git
cd pgvector
make && sudo make install
Enable in your database:
CREATE EXTENSION vector;
Usage
Store embeddings as vectors:
CREATE TABLE documents (
id SERIAL PRIMARY KEY,
content TEXT,
embedding vector(1536) -- OpenAI ada-002 dimension
);
Insert embeddings:
INSERT INTO documents (content, embedding)
VALUES
('PostgreSQL is a relational database', '[0.1, 0.2, ...]'),
('Vectors enable semantic search', '[0.3, 0.1, ...]');
Find similar documents:
-- Cosine similarity
SELECT content, 1 - (embedding <=> '[0.15, 0.18, ...]') AS similarity
FROM documents
ORDER BY embedding <=> '[0.15, 0.18, ...]'
LIMIT 10;
Indexing for Performance
Create an index for fast similarity search:
-- HNSW index (best for most cases)
CREATE INDEX ON documents USING hnsw (embedding vector_cosine_ops);
-- IVFFlat index (good for very large datasets)
CREATE INDEX ON documents USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);
HNSW gives better recall but slower writes. IVFFlat is faster to build but needs tuning.
Real-World Example
Semantic search over documentation:
import { OpenAI } from 'openai'
import postgres from 'postgres'
const sql = postgres(DATABASE_URL)
const openai = new OpenAI()
async function semanticSearch(query: string) {
// Get query embedding
const response = await openai.embeddings.create({
model: 'text-embedding-ada-002',
input: query
})
const embedding = response.data[0].embedding
// Search database
const results = await sql`
SELECT content, 1 - (embedding <=> ${embedding}::vector) AS similarity
FROM documents
WHERE 1 - (embedding <=> ${embedding}::vector) > 0.8
ORDER BY embedding <=> ${embedding}::vector
LIMIT 10
`
return results
}
const docs = await semanticSearch("How do I install Postgres extensions?")
pgvector eliminates the need for separate vector databases (Pinecone, Weaviate) for many use cases.
TimescaleDB: Time Series Optimization
TimescaleDB optimizes Postgres for time series data (metrics, logs, sensor data) with automatic partitioning and compression.
Installation
# Ubuntu/Debian
sudo add-apt-repository ppa:timescale/timescaledb-ppa
sudo apt install timescaledb-2-postgresql-16
# macOS
brew install timescaledb
# Configure
sudo timescaledb-tune
Enable in your database:
CREATE EXTENSION timescaledb;
Usage
Convert a table to a hypertable:
CREATE TABLE metrics (
time TIMESTAMPTZ NOT NULL,
device_id INT,
temperature DOUBLE PRECISION,
humidity DOUBLE PRECISION
);
-- Convert to hypertable (auto-partitions by time)
SELECT create_hypertable('metrics', 'time');
Insert and query like normal:
INSERT INTO metrics (time, device_id, temperature, humidity)
VALUES (NOW(), 1, 72.5, 45.2);
-- Queries use time-based indexes automatically
SELECT time_bucket('1 hour', time) AS hour,
AVG(temperature) AS avg_temp
FROM metrics
WHERE time > NOW() - INTERVAL '7 days'
GROUP BY hour
ORDER BY hour DESC;
Continuous Aggregates
Pre-compute aggregations for fast queries:
CREATE MATERIALIZED VIEW hourly_metrics
WITH (timescaledb.continuous) AS
SELECT time_bucket('1 hour', time) AS hour,
device_id,
AVG(temperature) AS avg_temp,
MAX(temperature) AS max_temp,
MIN(temperature) AS min_temp
FROM metrics
GROUP BY hour, device_id;
-- Automatically refreshes
SELECT add_continuous_aggregate_policy('hourly_metrics',
start_offset => INTERVAL '3 hours',
end_offset => INTERVAL '1 hour',
schedule_interval => INTERVAL '1 hour');
Compression
TimescaleDB compresses old data:
ALTER TABLE metrics SET (
timescaledb.compress,
timescaledb.compress_segmentby = 'device_id',
timescaledb.compress_orderby = 'time DESC'
);
-- Compress chunks older than 7 days
SELECT add_compression_policy('metrics', INTERVAL '7 days');
Compression ratios of 10-20x are common.
pg_cron: Scheduled Jobs
pg_cron runs periodic jobs inside Postgres. Use it for cleanup, aggregations, or external API calls.
Installation
# Ubuntu/Debian
sudo apt install postgresql-16-cron
# macOS
brew install pg_cron
Add to postgresql.conf:
shared_preload_libraries = 'pg_cron'
cron.database_name = 'your_database'
Restart Postgres and enable:
CREATE EXTENSION pg_cron;
Usage
Schedule a job:
-- Delete old logs every day at 3 AM
SELECT cron.schedule('delete-old-logs', '0 3 * * *', $$
DELETE FROM logs WHERE created_at < NOW() - INTERVAL '30 days'
$$);
-- Refresh a materialized view every hour
SELECT cron.schedule('refresh-stats', '0 * * * *', $$
REFRESH MATERIALIZED VIEW CONCURRENTLY user_stats
$$);
-- Run a function every 5 minutes
SELECT cron.schedule('send-notifications', '*/5 * * * *', $$
SELECT send_pending_notifications()
$$);
List jobs:
SELECT * FROM cron.job;
Unschedule:
SELECT cron.unschedule('delete-old-logs');
pg_cron eliminates the need for external cron jobs or task queues for simple periodic work.
PostGIS: Geospatial Data
PostGIS adds geometry types and spatial functions for location-based queries.
Installation
# Ubuntu/Debian
sudo apt install postgresql-16-postgis-3
# macOS
brew install postgis
Enable:
CREATE EXTENSION postgis;
Usage
Store locations:
CREATE TABLE locations (
id SERIAL PRIMARY KEY,
name TEXT,
location GEOGRAPHY(POINT, 4326) -- WGS84 lat/lon
);
INSERT INTO locations (name, location)
VALUES
('Office', ST_MakePoint(-122.4194, 37.7749)), -- San Francisco
('Home', ST_MakePoint(-118.2437, 34.0522)); -- Los Angeles
Find nearby locations:
-- Find locations within 50km of a point
SELECT name, ST_Distance(location, ST_MakePoint(-122.0, 37.5)) / 1000 AS distance_km
FROM locations
WHERE ST_DWithin(location, ST_MakePoint(-122.0, 37.5), 50000)
ORDER BY distance_km;
Spatial index for performance:
CREATE INDEX ON locations USING GIST (location);
Real-world example (store locator):
-- Find nearest store to user
SELECT name,
ST_Distance(location::geography, ST_MakePoint($1, $2)::geography) / 1000 AS km
FROM stores
WHERE ST_DWithin(location::geography, ST_MakePoint($1, $2)::geography, 10000)
ORDER BY location <-> ST_MakePoint($1, $2)::geography
LIMIT 5;
pg_stat_statements: Query Performance Analysis
pg_stat_statements tracks execution stats for all queries. Essential for finding slow queries.
Installation
Add to postgresql.conf:
shared_preload_libraries = 'pg_stat_statements'
pg_stat_statements.track = all
Restart Postgres and enable:
CREATE EXTENSION pg_stat_statements;
Usage
Find slowest queries:
SELECT query,
calls,
total_exec_time / 1000 AS total_seconds,
mean_exec_time / 1000 AS avg_seconds,
max_exec_time / 1000 AS max_seconds
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 10;
Find queries with high variance:
SELECT query,
calls,
mean_exec_time,
stddev_exec_time,
stddev_exec_time / mean_exec_time AS variance
FROM pg_stat_statements
WHERE calls > 100
ORDER BY variance DESC
LIMIT 10;
Reset stats:
SELECT pg_stat_statements_reset();
Use this to identify candidates for indexing or query optimization.
pgBouncer: Connection Pooling
pgBouncer is a lightweight connection pooler. Not an extension (runs as a separate process), but essential for production.
Installation
# Ubuntu/Debian
sudo apt install pgbouncer
# macOS
brew install pgbouncer
Configure (/etc/pgbouncer/pgbouncer.ini):
[databases]
mydb = host=localhost port=5432 dbname=mydb
[pgbouncer]
listen_addr = *
listen_port = 6432
auth_type = md5
auth_file = /etc/pgbouncer/userlist.txt
pool_mode = transaction
max_client_conn = 1000
default_pool_size = 25
User file (/etc/pgbouncer/userlist.txt):
"myuser" "md5_hash_of_password"
Start:
sudo systemctl start pgbouncer
Connect to pgBouncer instead of Postgres:
postgres://myuser:password@localhost:6432/mydb
pgBouncer reduces connection overhead. Use transaction mode for stateless apps, session mode if you need prepared statements.
Combining Extensions
These extensions compose well:
-- Time series with geospatial data
CREATE TABLE device_readings (
time TIMESTAMPTZ NOT NULL,
device_id INT,
location GEOGRAPHY(POINT),
temperature DOUBLE PRECISION
);
SELECT create_hypertable('device_readings', 'time');
CREATE INDEX ON device_readings USING GIST (location);
-- Query: avg temp in SF over last week
SELECT time_bucket('1 hour', time) AS hour,
AVG(temperature) AS avg_temp
FROM device_readings
WHERE time > NOW() - INTERVAL '7 days'
AND ST_DWithin(location, ST_MakePoint(-122.4194, 37.7749), 10000)
GROUP BY hour;
Extension Management
List installed extensions:
SELECT * FROM pg_extension;
Upgrade an extension:
ALTER EXTENSION pgvector UPDATE TO '0.5.0';
Drop an extension:
DROP EXTENSION pgvector CASCADE;
Resources
PostgreSQL extensions let you stay in one database while adding vector search, time series, geospatial, and more. Before reaching for a specialized database, check if a Postgres extension solves your problem.