System Design Guides

Design Search Autocomplete

A system design interview guide for typeahead search suggestions with tries, prefix indexes, ranking, freshness, personalization, and abuse filtering.

searchtypeaheadrankingprefix indexcaching

Interview Prompt

Design an autocomplete service that returns useful search suggestions as a user types.

Keeps query serving extremely fast by precomputing prefix suggestions.

Separates offline ranking/index building from online request serving.

Handles freshness, personalization, localization, and safe suggestions.

Avoids scanning raw queries for every keystroke.

Step 1

Clarify functional and non-functional requirements first.

Functional Requirements

  • Return ranked suggestions for a typed prefix.
  • Support multiple languages, locations, and device types if needed.
  • Update suggestions based on trending and recent queries.
  • Filter unsafe, spammy, or policy-violating suggestions.
  • Optionally personalize suggestions using user history.

Non-Functional Requirements

  • p95 latency should be under 50 ms because requests happen on every keystroke.
  • The service should handle very high read QPS.
  • Suggestions should be fresh enough for trending topics but safe.
  • Index updates should not disrupt serving.
  • The design should degrade gracefully for rare prefixes.

Scale Assumptions

  • 100 million daily active search users.
  • Each search session sends 5 autocomplete requests.
  • 1 billion distinct historical queries.
  • Top suggestions per prefix are small, usually 5 to 10.

Autocomplete QPS

~58k/sec average

100M users times 50 requests per day, with large peak multipliers.

Response size

<5 KB

Suggestions are tiny, so CPU and lookup latency matter more than bandwidth.

Prefix count

Large but compressible

Popular prefixes can be materialized; rare prefixes can fall back to search index.

Freshness

Minutes to hours

Trending suggestions may update faster than evergreen suggestions.

Step 2

Identify the key entities before picking storage.

EntityFields and RelationshipsInterview Notes
Suggestiontext, locale, score, category, safety_stateA candidate suggestion after aggregation and filtering.
PrefixIndexprefix, locale, suggestions[]Online serving reads this compact structure.
QueryEventquery, user_id_hash, locale, timestamp, click_signalRaw or sampled event stream for ranking.
PersonalHistoryuser_id, recent_queries, embeddings or categoriesOptional layer blended at serving time.

Step 3

Define the APIs around the user flows.

InterfaceRequest / ResponseContract Notes
GET /v1/autocomplete?q=iph&locale=en-US&limit=10Returns ranked suggestion strings and optional metadataMust be cacheable by normalized prefix and locale when not personalized.
POST /internal/query-events{ query, userId?, locale, resultClicks, timestamp }Feeds offline aggregation and ranking pipelines.
PUT /internal/suggestion-blocklist{ phrase, locale, reason }Policy updates should reach serving quickly.

Step 4

Trace the critical data flow step by step.

01

Collect query events

Search clients and results pages emit query and engagement events into a stream.

02

Aggregate candidates

Offline jobs normalize queries, count popularity, detect trends, and remove unsafe terms.

03

Build prefix index

For each prefix and locale, precompute the top suggestions and publish immutable index shards.

04

Serve requests

Autocomplete service normalizes the prefix, reads top suggestions from memory or fast KV, and blends personalization if enabled.

05

Refresh safely

New index versions are loaded side by side and atomically switched after validation.

Step 5

Convert the flow into a high-level design.

Final Design

Search Autocomplete final architecture

Loading Diagram

Serving Layer

Start with clients, routing, APIs, and the main synchronous path users depend on for this problem.

State Layer

Anchor the design around the key entities: Suggestion, PrefixIndex, QueryEvent, PersonalHistory.

Async Layer

Move slow, high-volume, or failure-prone work behind queues, workers, streams, caches, or background reconciliation.

Step 6

Deep dives interviewers are likely to probe.

Index structures

  • A trie is easy to explain and works well in memory for prefix lookup.
  • A key-value prefix map is simpler operationally when top suggestions are precomputed.
  • Finite state transducers can compress large dictionaries for advanced implementations.

Ranking

  • Basic ranking uses frequency, recency, click-through, and locale.
  • Trending detection should avoid promoting spam or one-off attacks.
  • Personalization can rerank a small candidate list instead of querying a user-specific index for everything.

Safety

  • Do policy filtering before suggestions are published to serving.
  • Keep an emergency blocklist path for urgent removals.
  • Log suggestion impressions and clicks for audits.

Step 7

Tradeoffs to explain out loud.

Precomputed prefix index vs live search

Use When

Precompute for low latency and predictable cost.

Watch Out

Freshness depends on the index build cadence.

In-memory trie vs distributed KV

Use When

In-memory tries are fastest for hot shards.

Watch Out

Large multilingual indexes may be easier to manage in sharded KV stores.

Global suggestions vs personalized suggestions

Use When

Global suggestions are cacheable and safer for anonymous users.

Watch Out

Personalization improves relevance but increases privacy and serving complexity.

Avoid

Common mistakes that weaken the answer.

  • Querying the full search backend on every keystroke.
  • Ignoring normalization such as case, accents, spacing, and locale.
  • Letting unsafe or spammy queries become suggestions automatically.
  • Using offset pagination or long result lists for typeahead.
  • Forgetting that autocomplete QPS can exceed actual search QPS.

Step 8

Follow-up questions with strong answers.

How do you make trending suggestions fresh?

Blend a small real-time trending overlay with the stable offline prefix index, then apply safety filters before serving.

How do you personalize without hurting latency?

Fetch global suggestions first, then rerank or inject a few user-history candidates from a local cache or low-latency profile store.

How do you support typos?

For short prefixes, avoid aggressive correction. For longer prefixes, use edit-distance candidates or a search index fallback, capped tightly for latency.

Step 9

What a strong answer should signal.

Serving latency

Uses precomputed prefix suggestions and memory or fast KV lookups.

Ranking

Covers popularity, recency, locale, clicks, trends, and personalization.

Safety

Includes blocklists, filtering, emergency removal, and abuse resistance.

Scalability

Recognizes autocomplete QPS is very high and keeps responses small.

Practice this problem under interview conditions.

Read the guide, then run the prompt live with LeetSys so you can practice requirements, key entities, API design, data flow, whiteboarding, tradeoff narration, and follow-up handling.

Practice Now

Related Guides