Design Search Autocomplete: System Design Interview Guide

Q: How do you make trending suggestions fresh?

Blend a small real-time trending overlay with the stable offline prefix index, then apply safety filters before serving.

Q: How do you personalize without hurting latency?

Fetch global suggestions first, then rerank or inject a few user-history candidates from a local cache or low-latency profile store.

Q: How do you support typos?

For short prefixes, avoid aggressive correction. For longer prefixes, use edit-distance candidates or a search index fallback, capped tightly for latency.

Interview Prompt

Design an autocomplete service that returns useful search suggestions as a user types.

Keeps query serving extremely fast by precomputing prefix suggestions.

Separates offline ranking/index building from online request serving.

Handles freshness, personalization, localization, and safe suggestions.

Avoids scanning raw queries for every keystroke.

Step 1

Clarify functional and non-functional requirements first.

Functional Requirements

Return ranked suggestions for a typed prefix.
Support multiple languages, locations, and device types if needed.
Update suggestions based on trending and recent queries.
Filter unsafe, spammy, or policy-violating suggestions.
Optionally personalize suggestions using user history.

Non-Functional Requirements

p95 latency should be under 50 ms because requests happen on every keystroke.
The service should handle very high read QPS.
Suggestions should be fresh enough for trending topics but safe.
Index updates should not disrupt serving.
The design should degrade gracefully for rare prefixes.

Scale Assumptions

100 million daily active search users.
Each search session sends 5 autocomplete requests.
1 billion distinct historical queries.
Top suggestions per prefix are small, usually 5 to 10.

Autocomplete QPS

~58k/sec average

100M users times 50 requests per day, with large peak multipliers.

Response size

<5 KB

Suggestions are tiny, so CPU and lookup latency matter more than bandwidth.

Prefix count

Large but compressible

Popular prefixes can be materialized; rare prefixes can fall back to search index.

Freshness

Minutes to hours

Trending suggestions may update faster than evergreen suggestions.

Step 2

Identify the key entities before picking storage.

Entity	Fields and Relationships	Interview Notes
Suggestion	text, locale, score, category, safety_state	A candidate suggestion after aggregation and filtering.
PrefixIndex	prefix, locale, suggestions[]	Online serving reads this compact structure.
QueryEvent	query, user_id_hash, locale, timestamp, click_signal	Raw or sampled event stream for ranking.
PersonalHistory	user_id, recent_queries, embeddings or categories	Optional layer blended at serving time.

Step 3

Define the APIs around the user flows.

Interface	Request / Response	Contract Notes
GET /v1/autocomplete?q=iph&locale=en-US&limit=10	Returns ranked suggestion strings and optional metadata	Must be cacheable by normalized prefix and locale when not personalized.
POST /internal/query-events	{ query, userId?, locale, resultClicks, timestamp }	Feeds offline aggregation and ranking pipelines.
PUT /internal/suggestion-blocklist	{ phrase, locale, reason }	Policy updates should reach serving quickly.

Step 4

Trace the critical data flow step by step.

Collect query events

Search clients and results pages emit query and engagement events into a stream.

Aggregate candidates

Offline jobs normalize queries, count popularity, detect trends, and remove unsafe terms.

Build prefix index

For each prefix and locale, precompute the top suggestions and publish immutable index shards.

Serve requests

Autocomplete service normalizes the prefix, reads top suggestions from memory or fast KV, and blends personalization if enabled.

Refresh safely

New index versions are loaded side by side and atomically switched after validation.

Step 5

Convert the flow into a high-level design.

Final Design

Search Autocomplete final architecture

Loading Diagram

Serving Layer

Start with clients, routing, APIs, and the main synchronous path users depend on for this problem.

State Layer

Anchor the design around the key entities: Suggestion, PrefixIndex, QueryEvent, PersonalHistory.

Async Layer

Move slow, high-volume, or failure-prone work behind queues, workers, streams, caches, or background reconciliation.

Step 6

Deep dives interviewers are likely to probe.

Index structures

A trie is easy to explain and works well in memory for prefix lookup.
A key-value prefix map is simpler operationally when top suggestions are precomputed.
Finite state transducers can compress large dictionaries for advanced implementations.

Ranking

Basic ranking uses frequency, recency, click-through, and locale.
Trending detection should avoid promoting spam or one-off attacks.
Personalization can rerank a small candidate list instead of querying a user-specific index for everything.

Safety

Do policy filtering before suggestions are published to serving.
Keep an emergency blocklist path for urgent removals.
Log suggestion impressions and clicks for audits.

Step 7

Tradeoffs to explain out loud.

Precomputed prefix index vs live search

Use When

Precompute for low latency and predictable cost.

Watch Out

Freshness depends on the index build cadence.

In-memory trie vs distributed KV

Use When

In-memory tries are fastest for hot shards.

Watch Out

Large multilingual indexes may be easier to manage in sharded KV stores.

Global suggestions vs personalized suggestions

Use When

Global suggestions are cacheable and safer for anonymous users.

Watch Out

Personalization improves relevance but increases privacy and serving complexity.

Avoid

Common mistakes that weaken the answer.

Querying the full search backend on every keystroke.
Ignoring normalization such as case, accents, spacing, and locale.
Letting unsafe or spammy queries become suggestions automatically.
Using offset pagination or long result lists for typeahead.
Forgetting that autocomplete QPS can exceed actual search QPS.

Step 8

Follow-up questions with strong answers.

How do you make trending suggestions fresh?

Blend a small real-time trending overlay with the stable offline prefix index, then apply safety filters before serving.

How do you personalize without hurting latency?

Fetch global suggestions first, then rerank or inject a few user-history candidates from a local cache or low-latency profile store.

How do you support typos?

For short prefixes, avoid aggressive correction. For longer prefixes, use edit-distance candidates or a search index fallback, capped tightly for latency.

Step 9

What a strong answer should signal.

Serving latency

Uses precomputed prefix suggestions and memory or fast KV lookups.

Ranking

Covers popularity, recency, locale, clicks, trends, and personalization.

Safety

Includes blocklists, filtering, emergency removal, and abuse resistance.

Scalability

Recognizes autocomplete QPS is very high and keeps responses small.

Practice this problem under interview conditions.

Read the guide, then run the prompt live with LeetSys so you can practice requirements, key entities, API design, data flow, whiteboarding, tradeoff narration, and follow-up handling.

Practice Now

Related Guides

Mid-Level

Design Search Autocomplete

Clarify functional and non-functional requirements first.

Functional Requirements

Non-Functional Requirements

Scale Assumptions

Identify the key entities before picking storage.

Define the APIs around the user flows.

Trace the critical data flow step by step.

Collect query events

Aggregate candidates

Build prefix index

Serve requests

Refresh safely

Convert the flow into a high-level design.

Search Autocomplete final architecture

Serving Layer

State Layer

Async Layer

Deep dives interviewers are likely to probe.

Index structures

Ranking

Safety

Tradeoffs to explain out loud.

Precomputed prefix index vs live search

In-memory trie vs distributed KV

Global suggestions vs personalized suggestions

Common mistakes that weaken the answer.

Follow-up questions with strong answers.

How do you make trending suggestions fresh?

How do you personalize without hurting latency?

How do you support typos?

What a strong answer should signal.

Practice this problem under interview conditions.

Related Guides

URL Shortener

News Feed

Rate Limiter