Interview Prompt
Design an autocomplete service that returns useful search suggestions as a user types.
Keeps query serving extremely fast by precomputing prefix suggestions.
Separates offline ranking/index building from online request serving.
Handles freshness, personalization, localization, and safe suggestions.
Avoids scanning raw queries for every keystroke.
Step 1
Clarify functional and non-functional requirements first.
Functional Requirements
- Return ranked suggestions for a typed prefix.
- Support multiple languages, locations, and device types if needed.
- Update suggestions based on trending and recent queries.
- Filter unsafe, spammy, or policy-violating suggestions.
- Optionally personalize suggestions using user history.
Non-Functional Requirements
- p95 latency should be under 50 ms because requests happen on every keystroke.
- The service should handle very high read QPS.
- Suggestions should be fresh enough for trending topics but safe.
- Index updates should not disrupt serving.
- The design should degrade gracefully for rare prefixes.
Scale Assumptions
- 100 million daily active search users.
- Each search session sends 5 autocomplete requests.
- 1 billion distinct historical queries.
- Top suggestions per prefix are small, usually 5 to 10.
Autocomplete QPS
~58k/sec average
100M users times 50 requests per day, with large peak multipliers.
Response size
<5 KB
Suggestions are tiny, so CPU and lookup latency matter more than bandwidth.
Prefix count
Large but compressible
Popular prefixes can be materialized; rare prefixes can fall back to search index.
Freshness
Minutes to hours
Trending suggestions may update faster than evergreen suggestions.
Step 2
Identify the key entities before picking storage.
| Entity | Fields and Relationships | Interview Notes |
|---|---|---|
| Suggestion | text, locale, score, category, safety_state | A candidate suggestion after aggregation and filtering. |
| PrefixIndex | prefix, locale, suggestions[] | Online serving reads this compact structure. |
| QueryEvent | query, user_id_hash, locale, timestamp, click_signal | Raw or sampled event stream for ranking. |
| PersonalHistory | user_id, recent_queries, embeddings or categories | Optional layer blended at serving time. |
Step 3
Define the APIs around the user flows.
| Interface | Request / Response | Contract Notes |
|---|---|---|
| GET /v1/autocomplete?q=iph&locale=en-US&limit=10 | Returns ranked suggestion strings and optional metadata | Must be cacheable by normalized prefix and locale when not personalized. |
| POST /internal/query-events | { query, userId?, locale, resultClicks, timestamp } | Feeds offline aggregation and ranking pipelines. |
| PUT /internal/suggestion-blocklist | { phrase, locale, reason } | Policy updates should reach serving quickly. |
Step 4
Trace the critical data flow step by step.
Collect query events
Search clients and results pages emit query and engagement events into a stream.
Aggregate candidates
Offline jobs normalize queries, count popularity, detect trends, and remove unsafe terms.
Build prefix index
For each prefix and locale, precompute the top suggestions and publish immutable index shards.
Serve requests
Autocomplete service normalizes the prefix, reads top suggestions from memory or fast KV, and blends personalization if enabled.
Refresh safely
New index versions are loaded side by side and atomically switched after validation.
Step 5
Convert the flow into a high-level design.
Final Design
Search Autocomplete final architecture
Serving Layer
Start with clients, routing, APIs, and the main synchronous path users depend on for this problem.
State Layer
Anchor the design around the key entities: Suggestion, PrefixIndex, QueryEvent, PersonalHistory.
Async Layer
Move slow, high-volume, or failure-prone work behind queues, workers, streams, caches, or background reconciliation.
Step 6
Deep dives interviewers are likely to probe.
Index structures
- A trie is easy to explain and works well in memory for prefix lookup.
- A key-value prefix map is simpler operationally when top suggestions are precomputed.
- Finite state transducers can compress large dictionaries for advanced implementations.
Ranking
- Basic ranking uses frequency, recency, click-through, and locale.
- Trending detection should avoid promoting spam or one-off attacks.
- Personalization can rerank a small candidate list instead of querying a user-specific index for everything.
Safety
- Do policy filtering before suggestions are published to serving.
- Keep an emergency blocklist path for urgent removals.
- Log suggestion impressions and clicks for audits.
Step 7
Tradeoffs to explain out loud.
Precomputed prefix index vs live search
Use When
Precompute for low latency and predictable cost.
Watch Out
Freshness depends on the index build cadence.
In-memory trie vs distributed KV
Use When
In-memory tries are fastest for hot shards.
Watch Out
Large multilingual indexes may be easier to manage in sharded KV stores.
Global suggestions vs personalized suggestions
Use When
Global suggestions are cacheable and safer for anonymous users.
Watch Out
Personalization improves relevance but increases privacy and serving complexity.
Avoid
Common mistakes that weaken the answer.
- Querying the full search backend on every keystroke.
- Ignoring normalization such as case, accents, spacing, and locale.
- Letting unsafe or spammy queries become suggestions automatically.
- Using offset pagination or long result lists for typeahead.
- Forgetting that autocomplete QPS can exceed actual search QPS.
Step 8
Follow-up questions with strong answers.
How do you make trending suggestions fresh?
Blend a small real-time trending overlay with the stable offline prefix index, then apply safety filters before serving.
How do you personalize without hurting latency?
Fetch global suggestions first, then rerank or inject a few user-history candidates from a local cache or low-latency profile store.
How do you support typos?
For short prefixes, avoid aggressive correction. For longer prefixes, use edit-distance candidates or a search index fallback, capped tightly for latency.
Step 9
What a strong answer should signal.
Serving latency
Uses precomputed prefix suggestions and memory or fast KV lookups.
Ranking
Covers popularity, recency, locale, clicks, trends, and personalization.
Safety
Includes blocklists, filtering, emergency removal, and abuse resistance.
Scalability
Recognizes autocomplete QPS is very high and keeps responses small.
Practice this problem under interview conditions.
Read the guide, then run the prompt live with LeetSys so you can practice requirements, key entities, API design, data flow, whiteboarding, tradeoff narration, and follow-up handling.
Related Guides
Mid-Level
URL Shortener
A practical system design interview guide for building a URL shortener with redirects, custom aliases, analytics, rate limits, and high availability.
Senior
News Feed
A complete system design guide for building a personalized social news feed with fanout, ranking, privacy, and timeline freshness tradeoffs.
Mid-Level
Rate Limiter
A focused system design guide for distributed rate limiting with token buckets, sliding windows, Redis, local caches, multi-region behavior, and abuse controls.