Why Collaboration Is Hard
The naive implementation of real-time collaboration is straightforward: broadcast user actions over a WebSocket, apply them on other clients. The hard part is conflict resolution. When two users edit the same document field simultaneously, whose change wins? When a user goes offline and comes back, how do you merge their changes with what happened while they were gone? These are hard distributed systems problems, and the quality of your conflict resolution strategy determines whether your collaboration feature feels magical or broken.
The Infrastructure Options
Yjs is the strongest open-source foundation for building collaborative text editing. It implements CRDTs — conflict-free replicated data types — which provide strong guarantees about merge correctness without requiring a central authority. Two users can edit the same document offline, come back online, and the merge will be deterministic and correct. Yjs integrates with most text editors including ProseMirror, Slate, and Monaco, and there are WebSocket providers that handle the transport layer.
Automerge is another CRDT implementation with a different API design. It serializes to JSON and is easier to integrate into arbitrary data structures, but the performance characteristics are different from Yjs for large documents. For structured data with relatively small document sizes, Automerge is competitive.
If you want managed infrastructure rather than building your own, Liveblocks and PartyKit both offer real-time collaboration APIs with varying tradeoffs. Liveblocks provides a higher-level API with opinionated conflict resolution strategies; PartyKit gives you more control over the WebSocket handling but requires more infrastructure work.
Operational Considerations
WebSocket connections need to scale differently from HTTP. Stateful connections require sticky sessions or a pub/sub layer to route messages to the right server. Redis is the common backbone for WebSocket scaling — all server instances subscribe to the same channels, and messages are broadcast across the cluster. For most applications, this is sufficient up to a few thousand concurrent users per document.
Presence — knowing who is currently viewing or editing a document — is a related but separate concern. It requires lower latency than document synchronization, and most teams implement it as a separate lightweight channel. The UX value of seeing who else is in the document is significant; the implementation cost is relatively low with most WebSocket libraries.
What to Actually Build
For a document editing application: use Yjs as the data structure, a self-hosted WebSocket server with Redis pub/sub for scaling, and a presence layer on top. This stack has been battle-tested by Notion, Linear, and many other production collaborative applications.
For simpler cases — live cursors, real-time counters, chat-like features: a managed solution like Liveblocks or Firebase Realtime Database is faster to implement and scales adequately for most use cases. The cost of building custom infrastructure is only worth it when you have specific requirements that managed solutions cannot meet.
