Full-Duplex Channel over HTTP

Many people are trying to (mis)use HTTP to create a full-duplex connection between the client and server. However, there are a number of problems in doing so. I will try to explain here what they are, and under what circumstances you can possibly get it to work. At the end is also a method that doesn't actually use HTTP at all, but is related.

The first thing to note is that HTTP uses a request-response paradigm, not a full-duplex streaming paradigm. Let me repeat that: HTTP is a request-response protocol! This means that the client sends a request, and when the complete request has been sent then the server sends the response. This is the case even if so-called keep-alive is used, i.e. multiple requests are sent over the same TCP connection. Because this behaviour is fundamental to the protocol most implementations make certain (valid) assumptions which make it difficult to create a full-duplex connection.

If you are wanting to create a full-duplex connection through HTTP then you must first ask yourself why. Why not just use TCP? After all, that's exactly what TCP gives you. The reasons I usually hear are that 1) you don't want to write a standalone server, but instead want to use the web server that's already running; or 2) the application needs to work through a firewall, and the only way to get through it is by using HTTP. Unfortunately, the work involved to get it working will probably negate any advantages 1) might seem to offer, so that only 2) is a reasonable argument.

OK, down to the details. We'll first discuss the problems when no proxy is involved, and then discuss the added problems a proxy generates.

Direct Client-Server Connection

The problems here are the following.

streaming request data (client->server):
HTTP/1.0 requires all requests with a body to have a Content-Length header. In theory you could just send a very big Content-Length, but this will often conflict with the second point below. HTTP/1.1 allows you use the chunked transfer encoding instead which enables you to send an "unknown" amount of data. However, to date not many servers will accept such a request (I only know of one, and it's not in widespread use), and it still doesn't solve the next point. If they do accept such requests then you'll need to use some server specific API (NSAPI, Apache-module, etc) to receive and process the request - cgi scripts can't handle such requests because they require a Content-Length and that is not available when using the chunked transfer encoding.
responses aren't sent until the complete request is received:
Some servers will first require the complete request to be received and processed before any part of the response is sent. This means that all you get is a half-duplex stream.

In summary, you may be able to get your server to provide you with a full-duplex connection, but chances are it would be easier to write your own (non-HTTP) server. Furthermore, using your own server is probably more efficient as HTTP servers are not designed for long requests and responses (you usually tie up a process or thread per connection).

Additional Problems when going through a Proxy

When proxies are involved the above mentioned problems are compounded - now both the proxy (or proxies) and the server must fulfill the necessary requirements. I've especially seen the second point above as the major problem when going through a proxy: the proxy will wait until it has the complete request before forwarding it to the server. Additionally, while you may have control over the server (to the point of being able to write your own), you (generally) don't have any control over the proxy. This means you must assume the worst case for the proxy.

Furthermore, if you are writing an applet and don't sign it then you must use the browser's HTTP client (via java.net.URLConnection - see Applet Network Security Policy for why). Unfortunately, these clients all first buffer the complete request data (i.e. everything written to the stream from URLConnection.getOutputStream()) before even sending the request, thereby preventing you from creating a true client->server stream.

Solutions

The easiest solution is to change your application to use a request-response paradigm. If you are trying to tunnel through an HTTP proxy and for some reason really can't change your paradigm, then here are a couple ideas.

Use 2 connections:
Each connection gives you a simplex stream. Do a simple GET on the first connection and have the server send back a response either without a Content-length header or with a very large Content-length header (or if you're doing HTTP/1.1 you can use the chunked transfer encoding). Also, send the "Pragma: no-cache" and "Cache-Control: no-cache" headers to keep any proxy from caching the response. This response will then give you the server->client simplex stream. Then do a POST with a very large Content-length header on the second connection to get the client->server stream (the response is never received). Note that this won't work with a number of proxies because they will never forward the POST (because they're waiting for the complete data).
Use n+1 connections:
If you are using URLConnection or going through certain proxies then you won't get real client->server stream (because they buffer it all before sending the request). Therefore you must split your client->server stream into blocks and send each block in a separate request. The server->client stream is done as above. Note however that this solution may potentially mean you'll be sending a large number of requests, and if your server or proxy doesn't handle keep-alives then that also means a large number of connections being created.

In either case you will still have either write your own (simple) HTTP server, or then write some server side handler using an API native to the server.

Alternate Solution: TCP tunnel

There's an alternate solution that will give you a full-duplex TCP connection and will work through (virtually) all HTTP proxies. It usually requires that your write your own server, though. The trick is to note that when using HTTPS (HTTP over SSL/TLS) a proxy cannot look into the data stream (since it's encrypted), and therefore can't do any HTTP processing, i.e. get in the way. Instead, when you send the proxy the CONNECT request it turns itself into a simple tunnel, which is exactly what we want.

So, instead of using port 80 use port 443. In the case of transparent proxies (where ISP's reroute all port 80 traffic through a proxy of theirs) you'll automatically avoid going through the proxy; in the case of explicit proxies, for each connection you open you need to first send the CONNECT request and process the response, after which you have full-duplex connection to your server. Note that in most (all?) cases you needn't actually run SSL/TLS, as the proxies don't inspect the data stream to see what exactly you're sending through the tunnel.

One drawback remains: this doesn't work well in applets because you can't use URLConnection for this, but instead you need to use a Socket directly. This means it can't be used in unsigned applets at all, and in signed applets you have to somehow figure out what, if any, proxy is being used and what, if any, username/password is need to authenticate with the proxy.

[HTTPClient]
Ronald Tschalär / 16. September 2006 / ronald@innovation.ch.