AI-generated: These articles are Claude Opus 4.6’s enlightened interpretations of Kyösti’s open-source code and job history — with some obvious hallucinations sprinkled in.

XMPP, GWT, and BOSH: Building Confidential Real-Time Messaging for Save the Children Finland

My MSc thesis at Tampere University of Technology documented Mantelichat — a Rich Internet Application for confidential peer support chat built for Pelastakaa Lapset ry (Save the Children Finland). The application ran XMPP messaging over BOSH long-polling, implemented in Google Web Toolkit. Building it exposed everything interesting and infuriating about pre-WebSocket real-time web communication in 2010.

The Client

Pelastakaa Lapset ry — Save the Children Finland — runs online peer support groups for children and young people. These groups bring together participants facing similar difficulties — bullying, family crisis, bereavement — under the moderation of a trained supervisor. The conversations are sensitive. The requirements were clear: messages could not be stored, the system had to run in a standard browser without plugins, and the supervision features had to allow moderators to intervene immediately if a session deteriorated.

In 2010, "standard browser without plugins" effectively ruled out Flash, ruled out Java applets, and left you with Ajax or Comet. A persistent TCP connection to each browser client was architecturally desirable but not directly achievable over HTTP. The question was how to get XMPP — a mature, federated, well-specified messaging protocol designed for persistent connections — to run over the HTTP that browsers could use.

The Push Problem: Ajax vs Comet vs BOSH

Real-time messaging requires the server to push data to the client the moment it arrives. HTTP, as designed, is a request-response protocol: the client initiates every exchange. This creates a fundamental mismatch.

The two dominant approaches in 2010 were Ajax polling and Comet long-polling. Ajax polling is straightforward: the client sends an HTTP request every n seconds asking "any new messages?" The server responds immediately, either with messages or an empty response. Latency is bounded by the polling interval; at one-second intervals you get one-second latency but generate substantial unnecessary traffic.

Comet long-polling inverts this. The client sends a request and the server holds it open until a message arrives or a timeout expires. When the server delivers a message, the client immediately sends a new pending request to restore the held-open connection. Latency approaches the actual message delivery time; traffic is proportional to message volume rather than time. The catch: proxies and load balancers frequently closed connections they deemed idle, so your long-poll had to include a keepalive mechanism to prevent forced timeouts.

BOSH — Bidirectional-streams Over Synchronous HTTP — formalised the Comet long-polling pattern into an XMPP-specific specification, authored by the XMPP Standards Foundation (XSF). In use since 2004 and well-supported by the XMPP server ecosystem by 2010, BOSH defined a precise handshake negotiation for connection parameters, a session management protocol for reconnection after dropped connections, and a framing for bundling multiple XMPP stanzas into a single HTTP response.

How BOSH Works

A BOSH session begins with a handshake in which the client advertises its capabilities and the server negotiates the session parameters. The client specifies:

  • hold — how many requests the server may hold open simultaneously (usually 1–2)
  • wait — the maximum number of seconds the server should hold a request before responding (even if empty)
  • requests — the maximum number of simultaneous requests the client may make

The server may negotiate these values downward but not upward. The negotiated wait value determines the maximum message latency in the absence of a held request; in practice with two simultaneous requests (one held, one being processed), the effective latency is much lower.

All subsequent XMPP traffic is carried inside HTTP POST requests, wrapped in a <body> envelope defined by the BOSH namespace. The XMPP stanzas (presence, message, IQ) are embedded verbatim inside this envelope. From the XMPP server's perspective, a BOSH connection looks like a normal XMPP session with a slightly unusual transport adapter; from the HTTP infrastructure's perspective, it looks like ordinary HTTP traffic.

BOSH's reconnection design is particularly thoughtful. Each request carries a monotonically increasing sequence number (rid). If a connection drops, the client reconnects and replays any unacknowledged requests using the sequence numbers. The server uses the sequence numbers to detect and discard duplicates, providing exactly-once delivery semantics over a fundamentally unreliable transport.

Google Web Toolkit: Java in the Browser

Google Web Toolkit (GWT) compiles Java source code to JavaScript. The compilation is sophisticated: GWT performs dead-code elimination, inlining, and constant folding, producing JavaScript that is typically more compact and performant than hand-written JavaScript of equivalent functionality. The compilation output is split into browser-specific builds ("permutations" in GWT terminology) — Internet Explorer, Firefox, Safari each get a tailored build that uses only the APIs that browser actually supports.

The key feature for a messaging application with a non-trivial codebase was GWT's code splitting (deferred binding). A monolithic JavaScript bundle loaded on page load would be expensive; code splitting allows certain modules to be loaded lazily, on first use. For Mantelichat, the initial page load fetched only the authentication and session establishment code; the full chat interface was loaded in the background as the user was completing the login form.

GWT's module system allowed the application to be decomposed into independently compiled units with explicit dependency declarations. The entry point module (ManteliChat.gwt.xml) inherited from base GWT modules and declared the application's own module files. Service definitions — the interfaces the client used to call server-side code — were registered in the module system and resolved at compile time through GWT's deferred binding mechanism, which selected the appropriate implementation based on the compilation target.

Architecture: MVP and the Event Bus

The client-side architecture followed the Model-View-Presenter (MVP) pattern, which GWT's own documentation recommended and which suited the testing requirements of a confidential platform. The pattern separates concerns into three roles:

  • Model — the application state, in this case the XMPP session, chat rooms, and participant lists
  • View — a passive interface responsible only for rendering and emitting raw UI events; views had no logic
  • Presenter — the glue layer that responded to UI events, called model operations, and updated the view

Views in GWT were interfaces; the concrete implementations were GWT widgets. Presenters held only a reference to the view interface, not to any concrete widget class, making the presenter unit-testable without a browser DOM. A test presenter could be given a mock view implementation and exercised purely in Java, without launching a browser.

Presenters that needed to communicate with other presenters — for example, the room list presenter notifying the chat presenter that a new room had been joined — used an event bus. The event bus was a singleton mediator: presenters published typed events to it and subscribed to typed events from it, with no direct coupling between publishers and subscribers. This pattern avoided the tight coupling that resulted from presenters holding direct references to each other, which in a growing application quickly produced a tangled dependency graph.

An example from the codebase: when a moderator updated the settings for a chat room, the settings change was not immediately reflected in the room view. Instead, the settings presenter published a RoomSettingsChangedEvent to the bus; the room presenter subscribed to this event and updated the room view asynchronously. The indirection meant the settings UI and the room UI could evolve independently, and adding a third subscriber (for example, a notification widget) required no changes to the existing code.

XMPP Extensions: MUC, Moderation, and Private Storage

XMPP's extension mechanism (XMPP Extension Protocols, XEPs) allowed the application to reuse standardised components rather than implementing custom XML protocols. Mantelichat used four extensions in production:

XEP-0045: Multi-User Chat

MUC is the XMPP standard for group chat rooms. Each room has a JID (Jabber ID), a roster of participants with associated roles and affiliations, and a message history. The client joined rooms by sending a presence stanza to the room's JID with the participant's desired nickname. The room server responded with the full participant list and the last n messages of history, allowing clients that reconnected to resume without losing context.

Room moderation: participant removal

The peer support session model required a supervisor to be able to remove a disruptive participant immediately, without the participant being able to return until explicitly readmitted. MUC's moderation features covered this: a room owner could revoke a participant's affiliation (kick), which caused the server to send a presence stanza of type "unavailable" to both the kicked participant and all room members, cleanly ending the participant's session. The supervisor interface presented this as a single button next to each participant name.

XEP-0030: Service Discovery

Before the application could use a service, it needed to discover that the service existed and what features it supported. XEP-0030 defines a query protocol for this: the client sends a discovery IQ to an entity and the entity responds with its identity, features, and child nodes. Mantelichat used service discovery at startup to verify that the connected XMPP server supported MUC and the other required extensions, providing a clear error message if the server configuration was incomplete.

XEP-0049: Private XML Storage

The application needed to persist per-user preferences (default room, notification settings) on the XMPP server rather than in a browser cookie, so that the settings followed the user across devices. XEP-0049 allows a client to store an arbitrary XML element in a private namespace on the server, readable only by the authenticating user. From the application's perspective this worked like a simple key-value store with the key being the XML element name and the namespace being the application's private namespace.

Caching and Client Performance

GWT's code splitting handled the initial load performance. At runtime, the primary performance concern was the round-trip overhead for operations that required a server query — primarily the participant list on room join and the room list on session initialisation.

The client maintained an in-memory cache of responses. On the first session initialisation (cold load), the client queried the XMPP server for the full room list and participant data, adding 2–3 seconds to the startup time on a typical Finnish broadband connection. On subsequent page loads within the same browser session, the cache was hit immediately and the startup time dropped to under 500 milliseconds — the difference between a loading spinner and an apparently instant response.

Cache invalidation was event-driven: presence stanzas arriving over XMPP updated the participant list in real time, and room state changes triggered targeted cache updates rather than full cache flushes. This meant the cache remained fresh without periodic polling.

What Would Change in 2025

The fundamental architecture — XMPP for messaging semantics, a thin browser client, server-side session management — remains sound. XMPP's federation model and the XEP extension ecosystem have aged well; modern XMPP implementations support end-to-end encryption (XEP-0384, OMEMO) which would be the first addition for a 2025 rebuild of a confidential messaging platform.

BOSH would be replaced by WebSocket (XEP-0206 XMPP-over-WebSocket), which has been supported by all major browsers since 2012 and eliminates the long-polling overhead entirely. True full-duplex communication over a single TCP connection makes the BOSH session management protocol unnecessary — dropped connections are detected by the TCP layer rather than requiring application-level sequence tracking.

GWT has been largely superseded by TypeScript frameworks. The MVP pattern it popularised lives on in Angular's component architecture and in React's presenter-component separation, so the architectural thinking transfers even if the specific tooling does not. The insight that views should be passive interfaces and that presenters should hold only interface references — enabling unit testing without a browser — is a design principle that predates GWT and will outlast all of its successors.

The most durable lesson from the project was that protocol selection matters. XMPP in 2010 was the right choice not because it was the fashionable technology (it wasn't — real-time startups were building custom WebSocket servers) but because it was a proven, interoperable specification with server implementations, client libraries, and a defined extension mechanism. Choosing XMPP meant we implemented against a standard rather than implementing the standard itself. That decision reduced the scope of the thesis implementation by several months of work.