The Problem
We run an internal platform where users frequently search through tens of thousands of records — think job listings, company profiles, and related entities. The original search was built on naive ILIKE queries:
```sql
SELECT * FROM jobs
WHERE job_title ILIKE '%software engineer%'
OR employer_name ILIKE '%software engineer%'
ORDER BY created_at DESC;
```
This approach had two critical issues:
1. Performance: ILIKE with leading wildcards can't use B-tree indexes. Every query triggered a full sequential scan. At ~50k rows and growing, response times were 2-3 seconds.
2. Relevance: There was no ranking. A job titled "Senior Software Engineer" and one that merely mentioned "software" in an unrelated field were treated equally. Users were frustrated with irrelevant results drowning out exact matches.
We considered Elasticsearch and Typesense, but our dataset size didn't justify the operational overhead of a separate search infrastructure. PostgreSQL already had everything we needed — we just had to use it properly.
The Solution: Three Search Strategies, One Ranked Query
Our hybrid approach combines three PostgreSQL-native search techniques, each solving a different aspect of the problem:
The key insight: **ranking order matters more than any individual technique.** We rank results as: exact substring match > full-text match > fuzzy trigram match. This ensures that when a user searches "Python Developer," the job titled exactly "Python Developer" ranks above "Python-Adjacent Development Tooling."
Implementation Deep Dive
Step 1: Add Weighted Search Vectors
We added a search_vector column of type tsvector to our primary tables. The critical decision here was **weighting** — not all fields are equally important for search:
```sql
-- Add the search_vector column
ALTER TABLE jobs ADD COLUMN search_vector tsvector;
-- Populate with weighted vectors
UPDATE jobs SET search_vector =
setweight(to_tsvector('simple', coalesce(job_title, '')), 'A') ||
setweight(to_tsvector('simple', coalesce(source, '')), 'B');
```
Weight A (highest priority) goes to job_title — this is what users are almost always searching for. Weight B goes to secondary fields. PostgreSQL's ts_rank function uses these weights when scoring results.
We chose the 'simple' dictionary over 'english' deliberately. The simple dictionary doesn't apply stemming, which was important for our use case — we didn't want "engineering" to match "engine." For general prose search, 'english' is usually the better choice.
For the employers table, the vector is straightforward:
```sql
ALTER TABLE employers ADD COLUMN search_vector tsvector;
UPDATE employers SET search_vector =
to_tsvector('simple', coalesce(name, ''));
```
Step 2: Create the Right Indexes
Indexes are what make this fast. We created GIN indexes on the search vectors:
```sql
-- GIN index on the search vector
CREATE INDEX idx_jobs_search_vector ON jobs USING GIN (search_vector);
CREATE INDEX idx_employers_search_vector ON employers USING GIN (search_vector);
```
But we went further. Using the btree_gin extension, we created a **composite GIN index** that covers both the search vector and a frequently filtered column:
```sql
-- Enable the extension
CREATE EXTENSION IF NOT EXISTS btree_gin;
-- Composite GIN index for search + time-range filtering
CREATE INDEX idx_jobs_search_created ON jobs USING GIN (search_vector, created_at);
```
This composite index is a game-changer. Most of our queries filter by both search terms AND a date range. Without the composite index, PostgreSQL has to do a bitmap AND between two separate index scans. With it, both conditions are resolved in a single index lookup.
We also added targeted B-tree indexes on columns used in ORDER BY and JOIN clauses:
```sql
CREATE INDEX ix_jobs_updated_at ON jobs(updated_at);
CREATE INDEX ix_related_records_parent_id ON related_records(parent_id);
CREATE INDEX ix_related_records_updated_at ON related_records(updated_at);
```
Step 3: Build the Hybrid Ranking Query
Here's the core of the solution — a query that combines all three strategies with a custom ranking formula:
```sql
WITH search_results AS (
SELECT
j.id,
j.job_title,
j.employer_name,
j.created_at,
-- Ranking components (higher = better match)
-- 1. Exact substring match (highest priority)
CASE
WHEN lower(j.job_title) = lower(:query) THEN 100
WHEN lower(j.employer_name) = lower(:query) THEN 90
WHEN position(lower(:query) IN lower(j.job_title)) > 0 THEN 70
WHEN position(lower(:query) IN lower(j.employer_name)) > 0 THEN 60
ELSE 0
END AS substring_score,
-- 2. Full-text search score
ts_rank(j.search_vector, plainto_tsquery('simple', :query)) AS fts_score,
-- 3. Trigram similarity (fuzzy)
similarity(j.job_title, :query) AS trigram_score
FROM jobs j
WHERE
-- Use OR to cast a wide net, ranking will sort relevance
j.search_vector @@ plainto_tsquery('simple', :query)
OR similarity(j.job_title, :query) > 0.1
OR position(lower(:query) IN lower(j.job_title)) > 0
)
SELECT *
FROM search_results
ORDER BY
substring_score DESC,
fts_score DESC,
trigram_score DESC,
created_at DESC
LIMIT 50;
```
Let's break down why this ordering works:
- *substring_score first:** If the user typed exactly what's in the title, that's the best possible match. No ambiguity.
- *fts_score second:** Full-text search handles word variations well. "software engineers" matches "software engineer" through lexeme normalization.
- *trigram_score third:** Catches typos and fuzzy matches. "pyhton" will still find "python" results, but they'll rank below exact and full-text matches.
- *created_at last:** As a tiebreaker, newer records surface first.
Step 4: Separate Search and Non-Search Code Paths
A subtle but impactful optimization: **don't run search logic when there's no search query.** We split our API into two distinct query paths:
```python
def fetch_jobs(search_query: str | None, filters: dict):
base_query = select(Job)
if search_query:
# Hybrid search path — uses search_vector, trigram, ranking
base_query = apply_hybrid_search(base_query, search_query)
else:
# Simple list path — just filters and pagination
base_query = base_query.order_by(Job.created_at.desc())
base_query = apply_filters(base_query, filters)
return paginate(base_query)
```
This seems obvious, but the original code ran the same heavy query regardless of whether a search term was present. Splitting it improved the non-search list endpoint's performance as well.
Step 5: Reduce API Round-Trips on the Frontend
While optimizing the backend, we noticed the frontend was making multiple sequential API calls to assemble a single page:
1. Fetch the list of records
2. Fetch related statuses for each record (N+1 pattern)
3. Fetch additional metadata
We consolidated these into a single query using **correlated scalar subqueries** and **selective eager loading**:
```python
# Before: N+1 queries
records = db.query(Job).all()
for record in records:
record.status = db.query(Status).filter_by(job_id=record.id).first()
# After: Single query with subquery
latest_status = (
select(Status.name)
.where(Status.job_id == Job.id)
.order_by(Status.updated_at.desc())
.limit(1)
.correlate(Job)
.scalar_subquery()
)
records = db.query(Job, latest_status.label("current_status")).all()
```
This alone eliminated 2-3 extra API calls per page load on the frontend.
## The Results

All of this was achieved **without introducing any external search service** — purely PostgreSQL features that have been stable and production-ready for years.
Lessons Learned
PostgreSQL is more capable than you think. Before reaching for Elasticsearch or Meilisearch, evaluate whether PostgreSQL's built-in search features cover your use case. For datasets under a few million rows with moderate query complexity, they almost certainly do.
Weighted search vectors are non-negotiable. A flat tsvector across all fields will return results, but they'll be poorly ranked. Spend time thinking about which fields matter most to your users and assign weights accordingly.
Composite GIN indexes with btree_gin are underused. If your search queries also filter by date, status, or category, a composite GIN index can dramatically reduce query planning time. This was our single biggest performance win.
Separate your search and list code paths. Running search-specific logic (trigram calculations, vector ranking) when the user hasn't searched anything is wasted computation. A simple if check can meaningfully improve your default list performance.
Fix the N+1 problem while you're in there. Backend search optimization often reveals inefficiencies in how data reaches the frontend. Consolidating API calls was a "free" optimization that compounded the perceived performance improvement.
When to Use This Approach (and When Not To)
This hybrid strategy works well when:
- Your dataset is under ~5 million rows
- You're already running PostgreSQL
- You need decent relevance ranking but not ML-powered semantic search
- You want to avoid the operational complexity of a separate search cluster
Consider a dedicated search engine (Elasticsearch, Meilisearch, Typesense) when:
- You need faceted search with complex aggregations
- Your dataset exceeds tens of millions of rows
- You need real-time indexing of rapidly changing data
- You require features like synonyms, autocomplete, or geo-search at scale
Wrapping Up
Search is one of those features that seems simple on the surface but has enormous depth. PostgreSQL gives you a surprisingly powerful toolkit — tsvector for linguistic matching, pg_trgm for fuzzy search, and standard SQL for exact matching. Combining them with a thoughtful ranking strategy and proper indexing can deliver results that rival dedicated search engines for many real-world applications.
The key takeaway: **don't optimize for speed alone — optimize for relevance first, then speed.** A fast search that returns irrelevant results is worse than a slightly slower one that nails the ranking.
---
Sanket Dofe is a Software Engineer who enjoys solving search and data problems with PostgreSQL. Opinions are his own.
Written by
Sanket Dofe
