Loading...

Search and Indexing

Let’s say you are building an application that stores users, products, articles, or documents. At first, users browse content using simple lists and filters.
The database handles these queries easily.

As the application grows, users start asking for more. They want to search by name, keywords, partial text, tags, and combinations of fields. You try to handle this with database queries.

Slowly, queries become complex.Response times increase. Indexes multiply.
The database starts struggling.

This is the point where teams realize an important thing:

Databases are not search engines.

In this lesson, we’ll walk through when search is needed, how search systems work, and how real systems design search safely and correctly.

Not every system needs a search engine.

If users only filter by exact fields like ID, status, or date, a database index is usually enough.

Search engines are needed when users expect:

  • partial text matching,
  • typo tolerance,
  • ranking by relevance,
  • search across many fields,
  • or fast keyword-based queries.

Adding search too early increases complexity. Adding it too late hurts user experience. Good system design adds search when user needs demand it, not just because it sounds advanced.

2. What a Search Engine Actually Does

A search engine does not scan all data on every query.

Instead, it builds an index. An index is a specialized data structure that allows fast lookup of words and tokens.

When data is added or updated, it is processed and stored in this index.
When a user searches, the engine looks up the index instead of raw data.

This is why search engines can handle millions of documents efficiently.

3. Elasticsearch / OpenSearch Basics

Elasticsearch and OpenSearch are popular distributed search engines.

They store data as documents, usually in JSON format. Each document represents a searchable entity, such as a product or article.

Documents are grouped into indexes. Indexes are split into shards to allow horizontal scaling.

Search engines are built for:

  • fast reads,
  • relevance scoring,
  • and text-based queries.

They are not designed to be the primary source of truth.

4. Why Search Data Is Usually Denormalized

In databases, data is normalized to avoid duplication. In search systems, the opposite is often true.

Search engines work best when each document contains all the data needed for search and display. This avoids joins, which search engines handle poorly.

As a result, search documents often duplicate data from multiple tables. This denormalization improves performance and simplifies queries.

5. Designing the Indexing Pipeline

Data usually starts in a primary database. From there, it is sent to the search engine through an indexing pipeline.

This pipeline handles:

  • data transformation,
  • field selection,
  • tokenization,
  • and indexing.

The pipeline may run synchronously or asynchronously.

Asynchronous pipelines are more common because they reduce load on the main application.

6. Syncing Database to Search Safely

Keeping search data in sync with the database is critical.

The database remains the source of truth.
Search is a derived system. Most systems use events or change logs to update search indexes. When data changes in the database, an event is produced.
The search system consumes the event and updates the index.

This avoids tight coupling and improves reliability.

7. Handling Failures and Inconsistency

Search systems are usually eventually consistent.

A user may update data and not see it reflected in search results immediately. This delay is usually acceptable. What matters is that the system can recover if something goes wrong. Reindexing is the safety net.

If search data becomes corrupted or outdated, the system can rebuild the index from the database. This is why the database must always remain the source of truth.

8. When Not to Use a Search Engine

Search engines add operational complexity. They require monitoring, scaling, and maintenance.

If your system only needs exact lookups or simple filters, search engines are unnecessary. Good engineers avoid adding search unless it clearly improves the product.

Final Thoughts

Search systems solve a specific problem: fast, flexible, text-based queries at scale. They are powerful, but they are not replacements for databases.

In system design interviews, interviewers look for:

  • knowing when search is needed,
  • understanding indexing and denormalization,
  • and designing safe data synchronization.

If you can explain why search exists and how it stays in sync with the database, you are thinking at a production level.

Frequently Asked Questions

A search engine should be added when users need keyword search, partial matches, relevance ranking, or searching across many fields at scale.

Databases are optimized for exact lookups and structured queries, not for text search, ranking, and typo tolerance at large scale.

An index is a data structure that allows fast lookup of words and tokens without scanning all documents.

Denormalization allows each search document to contain all required fields, avoiding joins and improving search performance.

Most systems use events or change logs to update the search index asynchronously, keeping the database as the source of truth.

Still have questions?Contact our support team