Design a Chat System
Chat looks simple from the outside. You type a message, the other person sees it instantly. But once you try to build it for real users, unstable networks, offline phones, message ordering, retries, notifications, and group chats, it becomes one of the best system design questions out there.
Imagine you’re building chat for DevsCall. Learners can message mentors, mentors can message batches of students, and you also want group chats for cohorts. The experience must feel instant, but it must also be reliable: messages should not disappear, should not duplicate, and should show up in the right order as much as possible. This lesson walks through a practical design that interviewers love: simple core, clear tradeoffs, and production-friendly failure handling.
Start with requirements that decide everything
Before choosing WebSockets or databases, you need to clarify what “chat” means here. Is it 1-to-1 only, or groups too? Do we need typing indicators, read receipts, presence (online/offline), file sharing? How fast does it need to feel? Do we need message search later?
A classic interview assumption is: support 1-to-1 and small group chat, text messages first, near real-time delivery, offline support, and basic “delivered” status. Read receipts and typing indicators are nice-to-have. The key non-functional goals are low latency for delivery, high availability, and correctness around duplication and ordering.
Real-time transport: WebSockets vs polling
Polling is the simplest mental model. The client asks the server every few seconds: “Any new messages?” It’s easy to implement and works fine for low traffic, but it wastes resources and feels less real-time. If you poll frequently to feel instant, it increases load dramatically. If you poll slowly, users feel lag.
WebSockets keep a persistent connection open between client and server. The server can push messages immediately. This is what most real chat systems use because it improves latency and reduces repeated request overhead. The tradeoff is operational complexity: you now manage long-lived connections, load balancing sticky sessions (or shared connection state), and reconnect behavior.
A good interview answer is: start with WebSockets for real-time delivery, but keep a fallback like polling for environments where WebSockets fail.
Core flow: send, store, deliver, acknowledge
A chat message is not “sent” when the user taps send. It’s sent when the system can prove it is safely stored and has a delivery plan.
A common reliable flow works like this. The client generates a message ID (UUID) and sends the message to the server. The server stores it durably first. After storage succeeds, the server pushes it to recipients if they are online. The client receives it and sends back an acknowledgment. The server updates delivery status.
This sequence is important because it prevents losing messages during crashes. It also creates a clean way to handle retries: if the client reconnects and resends the same message ID, the server can detect it and avoid duplicates. That single idea—idempotency using message IDs—is a big interview win.
Message delivery semantics: at-least-once with idempotency
Real-time delivery is messy. Connections drop. Apps go to background. Requests time out. Because of this, many chat systems operate with at-least-once delivery: a message may be delivered more than once, but the system makes duplicates harmless.
The way you make duplicates harmless is idempotency. The server treats message IDs as unique, and the client also deduplicates based on message ID. If the same message arrives twice due to retries or reconnects, the UI does not show it twice.
Trying to guarantee exactly-once delivery end-to-end is difficult in distributed systems. It’s usually not worth the complexity. At-least-once + idempotency is practical and reliable.
Ordering: what users expect vs what systems can guarantee
Users want messages in order. But ordering across devices and networks is tricky.
In a 1-to-1 chat, you can often provide a strong ordering experience by assigning a server sequence number per conversation (or per thread) when the message is stored. Clients render messages based on that sequence. If two users send messages at nearly the same time, the server order becomes the tie-breaker.
In group chat, ordering can still be “good enough” if you assign sequence numbers at the conversation level. The system becomes the source of ordering, not the client timestamp. Client timestamps are unreliable because phones have wrong clocks and network delays.
A clean interview explanation is: use server-generated ordering keys per thread, accept that “perfect global ordering” is not always possible, but make the UI stable and predictable.
Offline support: make reconnecting a first-class feature
Offline support is not a bonus feature; it’s normal behavior. Mobile users lose connectivity all the time.
To support offline, the client must queue outgoing messages locally. When the network returns, it resends messages with the same message IDs. The server deduplicates and continues delivery.
For incoming messages, the client needs a catch-up mechanism. When reconnecting, it sends “give me messages after sequence N” or “after timestamp T (server timestamp).” The server returns missed messages, and then real-time streaming resumes.
This catch-up design is also how you handle multi-device login. A user may have the chat open on laptop and phone. Each device uses its own last-seen cursor and syncs independently.
Storage model: messages, threads, and read state
A practical storage model starts with the concept of a conversation (thread). A thread can represent a 1-to-1 chat or a group.
A typical schema includes a threads table, a thread_members table, and a messages table. Each message includes thread_id, message_id, sender_id, content, created_at, and a server sequence number. You also store message type if you plan for attachments later.
For read/delivery state, you usually do not update every message row for every reader (too expensive in groups). Instead, store per-user per-thread cursors, such as last_read_sequence and last_delivered_sequence. This scales better and still supports “unread counts” efficiently.
This design also plays nicely with pagination: fetch the latest N messages by thread_id ordered by sequence descending, and then page backward.
Fanout: pushing messages to recipients at scale
For 1-to-1 chats, fanout is trivial: one recipient. For groups, fanout can become expensive: a message may need to reach hundreds or thousands of users.
There are two broad patterns:
Fanout on write means when a message is sent, the system immediately creates delivery entries or pushes the message toward each recipient. This gives fast delivery and simple reads but can be write-heavy.
Fanout on read means store one message in the thread and let each recipient fetch it when they come online. This reduces write amplification but shifts work to reads and makes real-time push harder.
Many real systems use a hybrid: store the message once, and push to online recipients immediately; offline recipients fetch on reconnect. For very large groups, you often avoid pushing to everyone and instead rely more on pull-based syncing.
Notifications: don’t tie them to real-time delivery
Push notifications are not the same thing as chat delivery. Notifications are for bringing users back when they are offline or inactive.
A production-friendly approach is to publish an event when a message is stored. A separate notification service consumes events and decides whether to send push notifications based on user settings, quiet hours, and whether the user is currently online.
This separation matters because notifications can fail or be slow, and you never want that to block message storage or delivery.
Reliability and failure modes you should call out
If the WebSocket server restarts, clients reconnect and use catch-up to fetch missed messages. If the message store is down, sending fails, but the client can queue and retry. If notification service is down, chat still works; users may just not get push alerts. If your pub/sub or queue is lagging, real-time delivery may be delayed, but storage remains correct.
The key interview signal is this: chat must remain correct even when “real-time” is degraded. Users forgive delays more than they forgive message loss.
Wrap-up
A simple production architecture looks like this: clients connect via WebSockets to a real-time gateway, messages are written durably to a primary store, an event stream is emitted for delivery and notifications, online users receive immediate push over WebSockets, offline users sync on reconnect using cursors, and a separate notification pipeline handles push alerts.
The system feels real-time, but it is built on durable storage, idempotency, and predictable recovery.
Frequently Asked Questions
Chat systems test real-time communication, reliability, ordering, offline handling, scalability, and failure recovery in one cohesive design.
WebSockets allow servers to push messages instantly to clients, reducing latency and network overhead compared to frequent polling.
Messages are stored durably first, then delivered. Systems use at-least-once delivery with idempotent message IDs to avoid loss and duplication.
Servers assign sequence numbers per conversation or thread, making the server the source of truth for ordering rather than client timestamps.
WebSocket disconnects, message retries, storage outages, notification failures, and how the system recovers without losing messages.
Still have questions?Contact our support team