From ElasticSearch to PostgreSQL Vector Search: Optimizing Nepali Language Search - A Developer's Journey
January 6, 2025
By Kritartha Sapkota
The 2 AM Developer’s Dilemma: Overengineering Search Solutions
Picture this: You’re sitting at your desk at 2 AM, surrounded by coffee cups, furiously googling “best search engine solutions.” Sound familiar? We’ve all been there.
That’s exactly where I found myself – staring at an empty whiteboard, sketching out system designs that could probably handle NASA’s search needs. You know that moment when your imagination runs wild, and suddenly you’re planning for scenarios that might happen… someday… maybe.
That’s exactly where I found myself – diving deep into complex search engines like ElasticSearch and Meilisearch, convinced I needed their heavyweight features. But guess what? After all that complexity, the answer was sitting right there in our PostgreSQL database all along, quietly waiting with its vector search capabilities.
Sometimes the best solutions are the ones already in your toolbox.
Racing Cars vs. Daily Commutes: The Search Engine Dilemma
Ever caught yourself buying a race car when all you needed was a reliable sedan? That’s what I did with my search infrastructure. Sure, ElasticSearch and Meilisearch are amazing tools – they’re like those high-end sports cars with all the bells and whistles. But there I was, preparing for an F1 race when I really just needed to make it to the grocery store and back.
I had gotten caught up in the “what if” game: What if we hit a million users? What if we need to search through billions of records? What if we need super-complex queries? Meanwhile, my actual dataset was perfectly content sitting in a regular database, kind of like how most of us don’t actually need a Ferrari for our daily commute.
That’s when I discovered something pretty cool – PostgreSQL wasn’t just that reliable database we all know. It was like finding out your dependable family car actually had some serious hidden features under the hood. Turns out, it could handle full-text search and even vector-based queries like a champ. Talk about a pleasant surprise!
When a Flamethrower is Overkill for Toasting Marshmallows
Don’t get me wrong - ElasticSearch and Meilisearch are incredible pieces of technology. They’re like bringing a flamethrower to toast your marshmallows. Super cool? Absolutely. But necessary for my needs? Not exactly.
See, I fell into that classic tech trap. I was building for this imaginary future where I’d have millions of users firing off complex search queries every second. You know the kind of dreaming I’m talking about - we’ve all been there. Meanwhile, back in reality, my actual dataset was growing at a perfectly reasonable pace that any decent database could handle.
That’s when I had my “aha” moment with PostgreSQL. You know how sometimes you discover your old reliable phone has some amazing feature you never knew about? That’s what happened here. Good old Postgres wasn’t just sitting there managing my data - it had some serious search capabilities hidden up its sleeve. Full- text search? Check. Vector-based search? You bet. It was like finding out your trusty Swiss Army knife also had a built-in jetpack.
The PostgreSQL Solution: Simple Yet Powerful
PostgreSQL turned out to be a game-changer. With its robust extensions like
pgvector
and native support for full-text search, I implemented a solution
that was not only efficient but also tailored to my needs—searching through a
Nepali-language dataset.
One critical challenge was ensuring the system worked seamlessly while the search index was being built. During this time, I implemented a fallback search mechanism to maintain functionality. Here’s the Python code that brought it all together:
import psycopg
from typing import List
def connect_to_postgres():
"""Establish a connection to the PostgreSQL database."""
return psycopg.connect(
host="your_host",
database="your_database",
user="your_user",
password="your_password",
port="your_port"
)
def fallback_search(query: str, limit: int = 10) -> List[dict]:
"""Fallback search using a simple LIKE query."""
connection = None
try:
connection = connect_to_postgres()
cursor = connection.cursor()
# Define a simple fallback query
sql_query = """
SELECT id, title
FROM your_table_name
WHERE content ILIKE %s
LIMIT %s;
"""
cursor.execute(sql_query, (f"%{query}%", limit))
results = cursor.fetchall()
# Process and return results
documents = [
{"id": row[0], "title": row[1]} for row in results
]
return documents
except Exception as e:
print(f"Error during fallback search: {e}")
return []
finally:
if connection:
connection.close()
def vector_search(query: str, limit: int = 10) -> List[dict]:
"""Search for documents using PostgreSQL vector search."""
connection = None
try:
connection = connect_to_postgres()
cursor = connection.cursor()
# Define the query for vector search
sql_query = """
SELECT id, title, ts_rank(tsv, query) AS rank
FROM your_table_name,
plainto_tsquery('ne', %s) AS query
WHERE tsv @@ query
ORDER BY rank DESC
LIMIT %s;
"""
cursor.execute(sql_query, (query, limit))
results = cursor.fetchall()
# Process and return results
documents = [
{"id": row[0], "title": row[1], "rank": row[2]} for row in results
]
return documents
except Exception as e:
print(f"Error during vector search: {e}")
return fallback_search(query, limit) # Fallback to a simpler search
finally:
if connection:
connection.close()
# Example usage
if __name__ == "__main__":
query = "सर्वोच्च अदालत"
results = vector_search(query)
for doc in results:
print(doc)
Under the Hood: How It Works
- Text Search Vector (
tsvector
): Thetsvector
column stores preprocessed text data optimized for search operations. - Query Conversion: The
plainto_tsquery
function converts the search query into a format PostgreSQL understands, enabling accurate results in Nepali. - Fallback Mechanism: When the vector search encounters issues (e.g.,
during index building), a simple
ILIKE
-based fallback search ensures the system remains operational. - Ranking: Results are ranked using
ts_rank
, ensuring the most relevant entries appear first. - Unified Architecture: By keeping the search index in the same database, operational overhead is significantly reduced.
Lessons from the Trenches
Looking back, we learned some pretty valuable lessons from this project. The biggest one was probably about keeping things simple at the start. You know how tempting it is to build this super sophisticated system right away, thinking about all these “what if” scenarios? Yeah, we fell into that trap at first. Turns out, it’s much better to start basic and let the system grow naturally as you actually need new features.
We also had a bit of a wake-up call about downtime. Having a backup plan isn’t just nice to have – it’s essential. Whether you’re rebuilding indexes or facing unexpected issues, having a solid fallback option can really save your bacon when things go sideways.
Something else that surprised us was just how capable PostgreSQL turned out to be. It’s funny – we often rush to grab specialized tools, but good old Postgres handled things beautifully. It’s like that Swiss Army knife that’s been sitting in your toolbox all along.
And finally, we really learned the value of focusing on our specific needs. When we stopped trying to use one-size-fits-all solutions and actually optimized for Nepali-language search, everything just clicked. Sometimes the best solution is the one that’s custom-fitted to your exact problem.
Conclusion
Switching to PostgreSQL vector search was a revelation. It simplified the architecture, reduced costs, and delivered the performance I needed. While ElasticSearch and Meilisearch are phenomenal in the right contexts, they weren’t the right fit for this project. Sometimes, the tools you already have can be the most powerful. Building a fallback search not only kept the system resilient but also reinforced the importance of pragmatic development.