HTTP Descynchronisation attacks are a type of attack within a wider class of exploits known as HTTP request smuggling attacks that aim to cause confusion as to where the boundaries of individual HTTP requests begin and end. These attack types permit an attacker to merge two HTTP requests, insert new HTTP requests, or cause devices to drop/delete existing HTTP requests made by legitimate users.
In this article, we’ll briefly look at a refresher as to how HTTP requests are constructed, and look at the underlying transport mechanisms that support them, in order to understand how attacks can subvert them and cause unexpected and unintended behaviour to the benefit of attackers. Lastly, we’ll look at how the vulnerabilities that enable these attacks can best be detected and mitigated.
Every time you make a request to a webserver over HTTP, the data and metadata that comprise the request are “packaged” and passed to a lower level protocol within the network stack. At a high level you have an abstracted software layer known as the application layer – typically the HTTP requests at this layer are handled by your browser. They are encoded at lower and lower levels of extraction, ending up as binary streams (“1010101010001”) that are ultimately encoded as electronic or light-based signals and transmitted via a physical medium such as cable or radio wave:
This may seem a bit abstract, but the reason why this is important will become apparent shortly. Critically in relation to HTTP Descynchronisation attacks, HTTP requests are transmitted over TCP (“Transport Control Protocol”) connections.
When you type in a URL such as http://www.example.com/index.html to your browser, the request is broken into discrete parts by the browser so that the request can be submitted – it is broken into constituent elements of:
1. The HTTP Method or verb that describes what type of operation should be performed (e.g GET/POST);
2. the path (e.g “/index.html”) that describes the web document or resource to be retrieved or acted upon;
3. the HTTP protocol and version that should be used (e.g “HTTP v 1.1”);
4. Zero or more headers, such as the “Host” that describes which host the request should be passed to on a webserver serving multiple hosts); and
5. a blank line to indicate the end of the request
GET /index.html HTTP/1.1
In the first version of HTTP (v1.0), individual requests were viewed as isolated, standalone entities. Whilst the “Connection: close” header was optional to perform a graceful connection closure, there was still a one-to-one mapping between HTTP requests and TCP connections – that is, a TCP connection could only support a single HTTP request. Specifically, the “connection: close” header refers to the underlying TCP channel or connection, not just the HTTP request.
Opening a TCP connection has overheads for a server and client, which leads to delay on making requests. When clients made single requests for basic static resources in the early days of the web, the restriction made sense. However in increasingly complex web documents that load resources such as images, client-side scripts, CSS files and other resources, the restriction was problematic. An update to HTTP/1.1 therefore introduced support for sending multiple HTTP requests over a single underlying TCP socket. The protocol is extremely simple – HTTP requests are simply placed back to back on the TCP connection, and the server that receives the requests parses headers to figure out where each HTTP request on the channel ends and the next one starts.
The process is known as HTTP persistent connections, HTTP keep-alive, or HTTP connection reuse, and essentially allows a single TCP connection to multiplex or interleaved multiple HTTP requests. If the client supports keep-alive, it adds an additional header to the request:
The operation looks something like this for a single user connecting to a single server:
However, in modern web service infrastructures within organisations, its not just remote users who act as “clients”. So-called “front-end” devices such as Reverse Proxies, Load Balancers, Web Application Firewalls (WAFs) and Content Delivery Networks (CDNs) may terminate the user’s request, and then open a separate request to a “back-end” resource (the origin server). In this scenario, the front-end device is acting both as a server (to the requesting user) but also as a client (to the back-end or origin server).
Critically, this means that the interleaving of requests can also happen for connections made from the front-end to back-end devices within an organisation. And unlike the simple scenario of a multiplexed connection made by a requesting user, these devices will be multiplexing requests from different remote users onto a single TCP connection:
Alarm bells may be ringing already here for you – the scenario described means that the TCP connection established is no longer unique to specific user, the channel is now a shared or multi-tenanted medium. Keepalive additionally makes it difficult for our front-end device to communicate (and for our back-end device to determine) where one response ends and the next response begins during pipelined HTTP operation:
If we recall from a little earlier in the article, the Connection: close header was used in HTTP 1.0 to indicate that a TCP connection should be closed, but that cannot be used any more, since we do not want the (multiplexed) connection to close when our HTTP request is communicated, since the connection is servicing other user requests. Closing the TCP connection would now be like taking a shared highway in use by hundreds of cars and making it suddenly vanish from underneath them just because our only journey along it was completed.
To determine where HTTP requests end on a pipelined connection, the HTTP 1.1 specifications allow the boundaries of requests to be indicated in one of two ways. The first method is to use the request body length given by a Content-Length (CL) header. This header provides the length of the request body that follows in octets (8-bit bytes), e.g:
However, there is a problem with the Content-Length header in that, whilst it works well for resources that we want to transmit that we already know the size of in advance (e.g a transmitted image), it cannot be used in dynamic situations such as content streaming, where the total resource size is not known in advance.
To solve this problem, HTTP 1.1 introduced a second mechanism known as “chunked transfer encoding”. This allows a potential indefinite-length resource such as a live stream to be broken into discrete chunks, on the basis that we may not know the total stream size, but we know the data size from the stream waiting to be transmitted right now. The stream is broken into multiple HTTP requests, with a so-called “last-chunk bit” set at the end of each response. In a chunked message, the body consists of 0 or more chunks. Each chunk consists of the chunk size, followed by a newline (rn), followed by the chunk contents. For example:
The correct implementation of RFC protocols is important in order to ensure that services can interact with one another in a common language and provide expected response to requests. However there are some complicated edge cases in HTTP RFC standards relating to pipeline, that may be misunderstood or badly coded by developers – in browser software, front-end devices (e.g lad balancers) or back-end services (e.g webservers). By itself, this is often relatively harmless or swiftly detected and fixed.
However, in the instance where a front-end and back-end server differentially interpret the use of Content-Length and/or Transfer-Encoding, it is possible for request boundaries to be misinterpreted, and request from different users to be “jumbled up” – either mixed with one another, split, or ignored:
If a back-end server interprets the request boundaries differently to the front-end load balancer or other system as to where each HTTP request message ends within the TCP connection, an attacker is able to leverage this by sending an ambiguous message which gets interpreted as two distinct HTTP requests by the back-end. For the vulnerability to be present, the front-end and back-end servers need to use different methods to determine where the HTTP requests they receive end. Potentially, this enables an attacker to bypass security measures, interfere with other user sessions, and gain unauthorized access to sensitive data.
There are a few different variants of HTTP Desynchronisation attacks. They all reply on differential or incorrect interpretation of ambiguous HTTP request headers, but each is slightly different. Some are hypothetical and not yet known to be exploitable on specific devices, but attackers may try several variants. We’ll take a look at two of the more common examples that major websites have known to be vulnerable to to date:
The HTTP specification does permit the use of multiple HTTP request headers with the same key (name) but different values. Attackers may submit requests with multiple Content-Length headers with the intent that a front-end device (e.g a WAF) would make a request boundary decision based on the first whereas the origin server may pick the second. If this occurred, then the front-end and back-end would “see” different request payloads. Specifically, if a WAF were to read the Content-Length value from an initial Content-Length header with a higher value, it may see only a single request while, if the origin server reads the Content-Length value from the second header, it will see two full requests. For example:
POST / HTTP/1.1
GPOST / HTTP/1.1
That is, the “G” character is differentially interpreted. The WAF believes that there are two POST requests here, whereas the origin server sees one request – the “GPOST” method is not valid.
An alternative variant uses both a Content-Length and Transfer-Encoding header, which is ambiguous since these are two competing schemes and the split between requests only occurs in one of the two:
POST / HTTP/1.1
GPOST / HTTP/1.1
In this example, if the “Content-Length” header is interpreted as authoritative for the request, then the request body of 6 characters is seen “[empty line][empty line]0[empty line]G”, and hence two POST requests are seen. However, if the chunked encoding header is interpreted as authoritative and the Content-Length ignored, then the server sees a single request and an invalid “GPOST” method.
The ability to influence HTTP request boundaries gives the attacker the unprecedented ability to directly influence the content of other user requests, by – for example – prepending content and headers at the start of the next legitimate user’s request. The attack tricks the back-end server into splicing the content of an attacker’s malicious request into the content of the victim’s request. Importantly, the attack is possible even when the victim is connected to the server using transport encryption such as SSL/TLS, since the channel between user and front-end remains secure, and the request is only modified when “unpacked” and parsed by servers within the organisation serving the web application
A significant defense against Desynchronisation attacks is to upgrade the connectivity between the front-end and back-end services to HTTP/2 (v2 of the protocol). This uses a framed structure that removes ambiguity regarding where HTTP packets end. Importantly, this change is transparent to users since it only occurs on screened connections within the organisation’s network, between their front-end and back-end servers.
Where this cannot be done due to legacy or third-party systems that do not support HTTP/2, an alternative is to configure front-end systems such as WAFs, load balancers, CDNs and reverse proxies to either drop ambiguous requests rather than routing them on to the back end. This would need to involve:
Lastly, an option that can be considered as a final alternative if the above approaches are not feasible for an organisation for one reason or another is to disable pipelined connections for front-end to back-end services multiple back-end connections on the basis that if each request is sent over a separate connection, smuggled malicious requests would no longer be effective. However this would likely have significant performance impact and should be carefully considered before implementing.
AppCheck help you with providing assurance in your entire organisation’s security footprint. AppCheck performs comprehensive checks for a massive range of web application vulnerabilities from first principle to detect vulnerabilities in in-house application code. AppCheck also draws on checks for known infrastructure vulnerabilities in vendor devices and code from a large database of known and published CVEs. The AppCheck Vulnerability Analysis Engine provides detailed rationale behind each finding including a custom narrative to explain the detection methodology, verbose technical detail and proof of concept evidence through safe exploitation.
As always, if you require any more information on this topic or want to see what unexpected vulnerabilities AppCheck can pick up in your website and applications then please get in contact with us: email@example.com
No software to download or install.
Contact us or call us 0113 887 8380