System Design Guides

Design a Real-Time Chat System

A practical chat system design guide covering WebSockets, message ordering, delivery receipts, offline sync, group chats, and end-to-end reliability.

websocketsmessagingorderingoffline syncpresence

Interview Prompt

Design a real-time messaging system where users can send direct and group messages across multiple devices.

Separates connection management from durable message storage.

Explains per-conversation ordering without claiming global ordering.

Handles offline delivery, multi-device sync, receipts, and retries.

Discusses presence separately from message correctness.

Step 1

Clarify functional and non-functional requirements first.

Functional Requirements

  • Users can send one-to-one and group messages.
  • Connected recipients receive messages in near real time.
  • Offline users receive messages when they reconnect.
  • The system supports delivery receipts, read receipts, typing indicators, and presence.
  • Users can access recent conversation history across devices.

Non-Functional Requirements

  • Online message delivery should usually complete within hundreds of milliseconds.
  • Messages must not be lost after the server acknowledges them.
  • Ordering should be consistent within a conversation.
  • The system should support millions of concurrent connections.
  • Presence can be eventually consistent and lossy; messages cannot.

Scale Assumptions

  • 20 million daily active users.
  • 2 million concurrent WebSocket connections at peak.
  • Each active user sends 40 messages per day.
  • Average message payload is 1 KB before attachments.

Message writes

~9,300/sec average

20M users times 40 messages per day, with higher regional peaks.

Connections

2M concurrent

Requires horizontally scaled gateway fleet and connection-aware routing.

Raw message storage

~800 GB/day

Before replication, indexes, attachments, and metadata.

Fanout

Group-size dependent

Large groups need different fanout and notification behavior than direct chats.

Step 2

Identify the key entities before picking storage.

EntityFields and RelationshipsInterview Notes
Conversationconversation_id, type, created_at, last_message_idStores direct or group metadata.
ConversationMemberconversation_id, user_id, role, joined_at, last_read_sequenceNeeded for authorization and unread counts.
Messageconversation_id, sequence, message_id, sender_id, body, created_atPartition by conversation and order by sequence for efficient history reads.
DeviceConnectionuser_id, device_id, gateway_id, connected_at, last_seenEphemeral store for routing live events.

Step 3

Define the APIs around the user flows.

InterfaceRequest / ResponseContract Notes
WebSocket send_message{ clientMessageId, conversationId, body, attachments? }Client-generated ID enables idempotent retries.
WebSocket message_event{ messageId, conversationId, sequence, senderId, body, sentAt }Sequence is scoped to the conversation.
GET /v1/conversations/{id}/messages?after=...Returns ordered message historyUsed for reconnect, pagination, and device backfill.
POST /v1/receipts{ messageId, receiptType, deviceId }Receipts are useful but should not block message delivery.

Step 4

Trace the critical data flow step by step.

01

Connection gateway

Clients maintain WebSocket connections to gateway servers. Gateways authenticate users and publish heartbeats.

02

Message ingest

Gateway forwards sends to message service, which validates membership, assigns conversation sequence, and persists the message.

03

Delivery fanout

A delivery service finds online devices for conversation members and pushes message events through their gateway connections.

04

Offline sync

Offline users fetch missed messages using last seen sequence or receive push notifications with minimal payload.

05

Receipts and presence

Receipts are stored and broadcast asynchronously. Presence is maintained in an ephemeral store with TTLs.

Step 5

Convert the flow into a high-level design.

Final Design

Real-Time Chat final architecture

Loading Diagram

Serving Layer

Start with clients, routing, APIs, and the main synchronous path users depend on for this problem.

State Layer

Anchor the design around the key entities: Conversation, ConversationMember, Message, DeviceConnection.

Async Layer

Move slow, high-volume, or failure-prone work behind queues, workers, streams, caches, or background reconciliation.

Step 6

Deep dives interviewers are likely to probe.

Ordering

  • Use per-conversation sequence numbers instead of global ordering.
  • Assign sequence after authorization and before fanout.
  • Clients should de-duplicate using message ID and repair gaps by fetching history.

Multi-device delivery

  • A user may have phone, desktop, and web clients connected at the same time.
  • Delivery state should track device-level delivery and user-level read state separately.
  • Sync APIs need to reconcile messages sent while one device was offline.

Large groups

  • Small groups can fan out to every online member immediately.
  • Large groups may need lazy pull, batched notifications, or server-side rate controls.
  • Mention and thread notifications should be filtered to reduce noise.

Step 7

Tradeoffs to explain out loud.

WebSockets vs long polling

Use When

Use WebSockets for bidirectional low-latency chat at scale.

Watch Out

WebSockets require connection management, load balancing, and backpressure handling.

Store by conversation vs store by user inbox

Use When

Store by conversation for ordered history and group consistency.

Watch Out

User inbox views may need secondary indexes or materialized summaries.

Strong receipts vs best-effort receipts

Use When

Best-effort receipts are usually acceptable and cheaper.

Watch Out

Enterprise compliance products may require auditable delivery state.

Avoid

Common mistakes that weaken the answer.

  • Trying to guarantee total global message ordering.
  • Keeping messages only in gateway memory.
  • Making presence a source of truth for delivery correctness.
  • Ignoring client retries and duplicate sends.
  • Forgetting multi-device sync and unread state.

Step 8

Follow-up questions with strong answers.

How do you prevent duplicate messages when clients retry?

Require a clientMessageId scoped to sender and conversation, store an idempotency mapping, and return the existing message if the retry already succeeded.

How do you handle reconnect after a mobile network drop?

Client reconnects with last received sequence per conversation. Server returns missing messages and then resumes live delivery.

How would end-to-end encryption change the design?

Server still routes ciphertext and metadata, but cannot inspect message body. Key management, device identity, and encrypted backups become major design areas.

Step 9

What a strong answer should signal.

Realtime architecture

Uses gateways for connections and durable services for message persistence.

Correctness

Provides per-conversation ordering, idempotent sends, and reconnect repair.

Scale

Handles millions of concurrent connections and group fanout.

User experience

Covers receipts, typing, presence, push notifications, and multi-device sync.

Practice this problem under interview conditions.

Read the guide, then run the prompt live with LeetSys so you can practice requirements, key entities, API design, data flow, whiteboarding, tradeoff narration, and follow-up handling.

Practice Now

Related Guides