Interview Prompt
Design a real-time messaging system where users can send direct and group messages across multiple devices.
Separates connection management from durable message storage.
Explains per-conversation ordering without claiming global ordering.
Handles offline delivery, multi-device sync, receipts, and retries.
Discusses presence separately from message correctness.
Step 1
Clarify functional and non-functional requirements first.
Functional Requirements
- Users can send one-to-one and group messages.
- Connected recipients receive messages in near real time.
- Offline users receive messages when they reconnect.
- The system supports delivery receipts, read receipts, typing indicators, and presence.
- Users can access recent conversation history across devices.
Non-Functional Requirements
- Online message delivery should usually complete within hundreds of milliseconds.
- Messages must not be lost after the server acknowledges them.
- Ordering should be consistent within a conversation.
- The system should support millions of concurrent connections.
- Presence can be eventually consistent and lossy; messages cannot.
Scale Assumptions
- 20 million daily active users.
- 2 million concurrent WebSocket connections at peak.
- Each active user sends 40 messages per day.
- Average message payload is 1 KB before attachments.
Message writes
~9,300/sec average
20M users times 40 messages per day, with higher regional peaks.
Connections
2M concurrent
Requires horizontally scaled gateway fleet and connection-aware routing.
Raw message storage
~800 GB/day
Before replication, indexes, attachments, and metadata.
Fanout
Group-size dependent
Large groups need different fanout and notification behavior than direct chats.
Step 2
Identify the key entities before picking storage.
| Entity | Fields and Relationships | Interview Notes |
|---|---|---|
| Conversation | conversation_id, type, created_at, last_message_id | Stores direct or group metadata. |
| ConversationMember | conversation_id, user_id, role, joined_at, last_read_sequence | Needed for authorization and unread counts. |
| Message | conversation_id, sequence, message_id, sender_id, body, created_at | Partition by conversation and order by sequence for efficient history reads. |
| DeviceConnection | user_id, device_id, gateway_id, connected_at, last_seen | Ephemeral store for routing live events. |
Step 3
Define the APIs around the user flows.
| Interface | Request / Response | Contract Notes |
|---|---|---|
| WebSocket send_message | { clientMessageId, conversationId, body, attachments? } | Client-generated ID enables idempotent retries. |
| WebSocket message_event | { messageId, conversationId, sequence, senderId, body, sentAt } | Sequence is scoped to the conversation. |
| GET /v1/conversations/{id}/messages?after=... | Returns ordered message history | Used for reconnect, pagination, and device backfill. |
| POST /v1/receipts | { messageId, receiptType, deviceId } | Receipts are useful but should not block message delivery. |
Step 4
Trace the critical data flow step by step.
Connection gateway
Clients maintain WebSocket connections to gateway servers. Gateways authenticate users and publish heartbeats.
Message ingest
Gateway forwards sends to message service, which validates membership, assigns conversation sequence, and persists the message.
Delivery fanout
A delivery service finds online devices for conversation members and pushes message events through their gateway connections.
Offline sync
Offline users fetch missed messages using last seen sequence or receive push notifications with minimal payload.
Receipts and presence
Receipts are stored and broadcast asynchronously. Presence is maintained in an ephemeral store with TTLs.
Step 5
Convert the flow into a high-level design.
Final Design
Real-Time Chat final architecture
Serving Layer
Start with clients, routing, APIs, and the main synchronous path users depend on for this problem.
State Layer
Anchor the design around the key entities: Conversation, ConversationMember, Message, DeviceConnection.
Async Layer
Move slow, high-volume, or failure-prone work behind queues, workers, streams, caches, or background reconciliation.
Step 6
Deep dives interviewers are likely to probe.
Ordering
- Use per-conversation sequence numbers instead of global ordering.
- Assign sequence after authorization and before fanout.
- Clients should de-duplicate using message ID and repair gaps by fetching history.
Multi-device delivery
- A user may have phone, desktop, and web clients connected at the same time.
- Delivery state should track device-level delivery and user-level read state separately.
- Sync APIs need to reconcile messages sent while one device was offline.
Large groups
- Small groups can fan out to every online member immediately.
- Large groups may need lazy pull, batched notifications, or server-side rate controls.
- Mention and thread notifications should be filtered to reduce noise.
Step 7
Tradeoffs to explain out loud.
WebSockets vs long polling
Use When
Use WebSockets for bidirectional low-latency chat at scale.
Watch Out
WebSockets require connection management, load balancing, and backpressure handling.
Store by conversation vs store by user inbox
Use When
Store by conversation for ordered history and group consistency.
Watch Out
User inbox views may need secondary indexes or materialized summaries.
Strong receipts vs best-effort receipts
Use When
Best-effort receipts are usually acceptable and cheaper.
Watch Out
Enterprise compliance products may require auditable delivery state.
Avoid
Common mistakes that weaken the answer.
- Trying to guarantee total global message ordering.
- Keeping messages only in gateway memory.
- Making presence a source of truth for delivery correctness.
- Ignoring client retries and duplicate sends.
- Forgetting multi-device sync and unread state.
Step 8
Follow-up questions with strong answers.
How do you prevent duplicate messages when clients retry?
Require a clientMessageId scoped to sender and conversation, store an idempotency mapping, and return the existing message if the retry already succeeded.
How do you handle reconnect after a mobile network drop?
Client reconnects with last received sequence per conversation. Server returns missing messages and then resumes live delivery.
How would end-to-end encryption change the design?
Server still routes ciphertext and metadata, but cannot inspect message body. Key management, device identity, and encrypted backups become major design areas.
Step 9
What a strong answer should signal.
Realtime architecture
Uses gateways for connections and durable services for message persistence.
Correctness
Provides per-conversation ordering, idempotent sends, and reconnect repair.
Scale
Handles millions of concurrent connections and group fanout.
User experience
Covers receipts, typing, presence, push notifications, and multi-device sync.
Practice this problem under interview conditions.
Read the guide, then run the prompt live with LeetSys so you can practice requirements, key entities, API design, data flow, whiteboarding, tradeoff narration, and follow-up handling.
Related Guides
Senior
News Feed
A complete system design guide for building a personalized social news feed with fanout, ranking, privacy, and timeline freshness tradeoffs.
Senior
Video Streaming
A senior-level guide to designing a YouTube-style video streaming system with uploads, transcoding, CDN delivery, metadata, search, and recommendations.
Senior
Ride Sharing
A system design guide for Uber-style ride matching with geospatial indexing, driver location updates, dispatch, pricing, trip state, and reliability.