Frameworks often hide/abstract parts of HTTP away. I think this is often a bit of a shame: it hides what's possible with HTTP, and so can lead to effects on engineering decisions.
This short guide aims to rectify that. It details a few of the most common and useful parts of HTTP, and is aimed for developers with some experience making or receiving HTTP requests. [In this post, the term HTTP is used to refer to the bytes of the HTTP protocol, which is the same if those bytes are sent over plain TCP, or through a TLS tunnel. The term HTTPS is used only when necessary to distinguish HTTP over TLS.]
Initialising the connection
Say we ask our HTTP client to make a GET request to the URL https://example.com/the/path. Firstly, there is no such a thing as a URL in HTTP: it's just a shorthand that the client parses and uses the different components at various points in the process.
- The client resolves example.com to an IP address, say to 22.214.171.124. Note that typically in many cases this would involve sending the string example.com unencrypted across the network.
- Initiate a TCP connection to the IP address 126.96.36.199 on port 443: if no port is specified in an HTTPS URL, port 443 is assumed. For HTTP URLs, port 80 is assumed.
- Initiate a TLS connection over the top of this TCP connection. This again would uses the domain example.com, in both SNI, and verification of the subsequently supplied certificate. In many cases, the domain name example.com would be transmitted unencrypted across the internet.
- Then start the HTTP request/response process. This would use both the domain example.com, as well as /the/path. This process is detailed below.
The HTTP request
The client sends the bytes of the HTTP request message over the TLS connection. This is made of a request-line containing the method and the path, followed by a number of header key:value pair lines, a blank line, and then the body. In this case, the body is 0-bytes long, which is typical for GET requests.
GET /the/path HTTP/1.1\r\n host: example.com\r\n \r\n
A "line" ends with the two characters
\r\n. The visual line breaks in the examples shown here are for ease of comprehension, and are not characters that are transmitted.
The HTTP response
The server would then respond with a status-line, some headers, a blank line, and the body of the response.
HTTP/1.1 200 OK\r\n content-length: 21\r\n \r\n The bytes of the body
The important parts of HTTP are the headers: bits of metadata sent before the body of the message [usually].
Usually HTTP clients add a
host header automatically from the supplied URL. It has two common uses.
- CDNs or reverse proxies use the host header to determine how to route requests onwards.
- Application server code uses the host header in a best-effort attempt to determine the domain the HTTP client used to make the request. Depending on the configuration of intermediate proxies, this can mean the application server may not be able to correctly determine what the original domain was.
HTTP is sent over TCP [or TCP+TLS], which surfaces as a stream of bytes in client code. A "stream of bytes" means that the receiver could just receive a single byte at a time, perhaps even with seconds of delay between each. The receiver has no way to know if it has received all of the bytes, or the connection is just a bit slow. For this reason, requests and responses [that can have a body], can supply a header that tells the other end how many bytes are part of the body of the message. This is the
Often HTTP clients add this automatically if they know at the time of starting sending the HTTP message how many bytes will be sent in the body.
A HTTP message sender may want to send a body, but it does not know at the start how many bytes make up that body. One option is that it can wait until it knows how many bytes, and set the
content-length header appropriately. However, this may involve having to buffer all the bytes in memory, which may not be possible or desirable.
An alternative is to use
transfer-encoding: chunked. With this header, the body of the message is sent in chunks, each prefixed by the number of bytes in that chunk [as it happens, in hexadecimal]. This means the body transfer can be started without knowing how many bytes in total will be sent. Common chunk sizes are between 8kb and 64kb. Often HTTP clients "do the chunking" themselves, adding the chunk header before each chunk as needed.
However, it is usually better to avoid
transfer-encoding: chunked and instead set a
content-length header. The receiver can use this in various ways, such as to or be able to allocate resources needed at the start of downloading the body, or estimate time remaining. If the receiver needs to know how many bytes are in the body, using
transfer-encoding: chunked may be forcing it to buffer the entire body in memory before it can process it further.
Wonderfully, you can often still stream bodies with a correctly set
content-length, but you may need to go to a bit of effort to find the right value. For example, to stream a file you may need to query the file system explicity to find the length of the file before starting to fetch its bytes.
HTTP/1.1 by default keeps connections open after a HTTP request/response, so they can be used for subsequent request/response, and avoid the overhead of new TCP [or TCP+TLS] connections. This referred to as persistant connections, and is often a good thing, but has downsides.
Usually servers would only keep the connections alive for a certain period of time, and then close them. This means there is a race condition: a server could have closed the connection from their point of view, but the client not be aware of this and attempt to re-use the connection, send its bytes [but the server wouldn't process them], and only later some time would the client would be aware of an error condition. The client may not know if it's safe to retry the request or not. For example, a client may have no way of determining if a POST errored before or after it was processed by the server. If designing an API, you may wish to implement some sort of unique idempotency-key for such requests. With this, the client can safely retry requests that have failed from its point of view, while the server knows not to reprocess any duplicates, and can still return the response corresponding to the original request.
Another downside is that if you don't end up re-using the connection, resources would continue to be used needlessly on both the client and the server.
If you want a smaller chance of issues like this, you may explicitly set a
connection: close header. If you can deal with such issues, you may wish to design the system to take better advantage of persistant connections. For example, instead of choosing to have multiple S3 buckets each on a different domain, you choose to have one, to take better advantage of per-domain HTTP persistent connections and speed up S3 requests/responses.
This is a modern header: it is often added to requests by HTTP-aware intermediate CDNs or reverse proxies. If the proxy has received an HTTPS connection, it can add
x-forwarded-proto: https, and otherwise adds
Without this header, the application server behind a reverse proxy would have no mechanism to know if the client made its request via HTTP or HTTPS. This may be important if you would like to respond to HTTP requests with redirects to HTTPS URLs.
Often an application server would like to know the IP address of the client. However, if the client connects to a reverse proxy, and then the reverse proxy connects to the application server, the application server only has details of that final TCP connection. From its point of view, its TCP client is the reverse proxy. This is often not helpful.
The solution to this is that each intermediate proxy adds (to) the
x-forwarded-for header in the request, setting the IP address that its incoming TCP connection is from. If there is already an
x-forwarded-for header on its incoming HTTP request, it appends the IP address to this in a comma separated list before forwarding the HTTP request onwards.
This means that the application server can receive an
x-forwarded-for with a long list of IP addresses in it, for example
x-forwarded-for: 188.8.131.52, 184.108.40.206, 220.127.116.11. Because each server adds to the value of the existing
x-forwarded-for header supplied by a potentially untrustworthy client, care must be taken before trusting any particular value in this list.
For example, you may have an application accessible behind a CDN, which adds an
x-forwarded-for, so in the application server you may be tempted to trust the first IP in
x-forwarded-for. However, the CDN would append to any existing values in
x-forwarded-for. This means that an evil client can send a request with an existing
x-forwarded-for header, set with some IP, and trick the application into thinking the client is at that IP. Knowing this, you may choose to use the last IP address in the list, thinking that this can be trusted. However, this may also not be a good choice: often applications are accessible both from the CDN, but also directly, even if just via an IP address. An evil client could connect to this with a spoofed
x-forwarded-for header, and again trick the application.
Solutions to this trust issue involve only using the last N values of
x-forwarded-for, where you have a mechanism to ensure that those N hops a) definitely involved certain infrastructure and b) you trust that infrastruture to manipulate any existing
x-forwarded-for in a certain way.
Summary: Reconstructing URLs
Reconstructing the URL that a client used involves multiple parts of the HTTP request: the path of the start-line, the
host header, as well as the
x-forwarded-proto header. For all this to work, intermediate proxies must be appropriately configured.
HTTP is often enough for streaming: you may not need anything fancier. If you can determine the full length of the body, set the
content-length header; otherwise, use
Summary: HTTP is leaky
All non-trivial abstractions, to some degree, are leaky.Joel Spolsky
HTTP is a leaky abstraction, exposing information on the lower-level TCP [or TCP+TLS] connection via the
x-forwarded-* headers; giving the ability to control that connection via the
connection header; and requiring one of
transfer-encoding headers to make up for the fact that TCP doesn't have any concept of message length.
If you want to take full advantage of HTTP, you should be aware of these; compensate for them; and even be able to leverage them when needed to avoid unnecessary time, memory, code, or infrastructure use.