HTTP/2

2017-02-16

https://developers.google.com/web/fundamentals/performance/http2/

Ilya Grigorik

2016-10-13

Issues with HTTP/1.x
- Reducing perceived latency with HTTP/1.x required tricks
  - e.g.
    - spriting of images
    - concatenation of CSS/JS/images
    - inlining of CSS/JS/images
    - domain sharding
  - these tricks all reduce the maximum number of sequential requests per domain, so that perceived latency is decreased.
  - but they also increase complexity for server operators
  - Each has side-effects (increased computational overhead at clients; decreased cache effectiveness; increased traffic use on the network)
- Does not compress request and response headers, causing unnecessary network traffic
- Does not allow resource prioritization
Binary Framing Layer
- During transit, a binary encoding mechanism is injected between the socket interface and the higher HTTP API exposed to applications
- What in HTTP/1.x is a newline-separated header+body pair (for requests and responses alike) is in HTTP/2 a headers frame and a data frame.
Streams, Messages, and Frames
- Terminology
  - Stream: a bidirectional flow of bytes within an established connection, which may carry one or more messages
  - Message: A complete sequence of frames that map to a logical request or response message
  - Frame: the smallest unit of communication HTTP/2, each containing a frame header, which at a minimum identifies the stream to which the frame belongs.
- All communication is performed over a single TCP connection that can carry any number of bidirectional streams.
- Because frames declare the stream to which they belong, they may be interleaved in transmission and correctly reassembled when received.
Request and Response Multiplexing
- With HTTP/1.x
  - clients wishing to make multiple requests to improve performance must use different TCP connections to do so.
  - This results in head-of-line blocking in each TCP connection
  - This also makes TCP inefficient (e.g., congestion control does not know that two connections are to the same host)
Stream prioritization
- With messages being splittable and frames interleavable, order of interleaving becomes a critical performance consideration.
- HTTP/2 allows each stream to declare a weight and a dependency
  - Each stream may be assigned an integer weight between 1 and 256
  - Each stream may be given an explicit dependency on another stream.
- From this, a client can express a ‘prioritization tree’ indicating how it would prefer to receive responses.
- This lets the server allocate CPU, memory, bandwidth, etc to ensure optimal delivery
- The prioritization tree
  - If stream C declares dependency on stream D, then D should be processed and delivered before C.
  - Assuming no outstanding dependencies, servers should assign priority to each stream based on the stream’s weight divided by the sum of weights of itself and its siblings.
- Prioritization is a transport preference expressed by the client, servers needn’t abide by it. For example, the server should not be blocked from making progress on a lower priority resource if a higher priority resource is blocked.
One Connection per Origin
- Multiplexing permits HTTP/2 to get by on only one persistent connection per origin.
- For HTTP/1.x, 74% of active connections carry a single transaction, and persistent connections are mostly a waste
- For HTTP/2, 25% of active connections carry a single transactions
- Most HTTP transfers are short and bursty, whereas TCP is optimized for long-lived, bulk data transfers.
- Reusing the same connection permits TCP’s congestion control to shine.
- Reusing existing connections also lowers connection creation, maintenance, and teardown overhead on client, intermediaries, and origin alike.
- Reusing connections also reduces the number of costly TLS handshakes for HTTPS traffic.
Flow Control
- Flow control: a mechanism by which receivers can tell senders to back off
- TCP offers its own flow control, but because the HTTP/2 streams are multiplexed within a single TCP connection, TCP flow control is both not granular enough, and does not provide the necessary application-level APIs to regulate the delivery of individual streams.
- HTTP/2 provides building blocks so that server and client can implement their own flow control at stream- and connection-level:
  - Flow control is directional. Each receiver may choose any window size for each stream and for the connection as a whole.
  - Flow control is credit-based. Each receivere advertises its initial connection and stream flow control window (in bytes), which is reduced whenever the sender emits a DATA frame and incremented when a WINDOW_UPDATE frame is sent by the receiver.
  - When HTTP/2 connection is established, the client and server exchange SETTINGS frames, which set the flow control window sizes in both directions (default is 2^16 - 1 bytes, maximum is 2^31 - 1)
  - Flow control is hop-by-hop, not end-to-end, so intermediaries can set their own utilization policies
    - When running h2 (HTTP/2 over TLS connections), intermediaries on the network cannot use this. However, flow control policies can be set or propagated any point after encryption is stripped. For h2c (HTTP/2 over unencrypted TCP connections), intermediate hosts along the network can inject their own flow control directives. http://stackoverflow.com/questions/40747040/how-is-http-2-hop-by-hop-flow-control-accomplished#40750973
- Application-layer flow control means the browser can fetch only a part of a particular resource, set the window to 0, and wait to open it up again.
  - This gives options for rendering low-res versions of images with high priority and delaying download of high-res versions until after the main paint is complete
  - This means clients can manage download speed for video; small window trickle data in when buffers are nearly full; large windows can be used to open the firehose.
Server Push
- HTTP/2 server push lets servers offer promises to clients
  - “Here, you’ll need this too”: This can be used to provide multiple assets directly in response to a single request
  - “The world has changed”: This can be used to push updates to clients
- This exists to overcome some problems with HTTP/1.x as she is used
  - Clients needing an asset from the server only know so (and only fire off the request) once they’ve been able to process the part of the response body that references the asset.
  - This synchronous behavior is a bottleneck, and typically gets worked around with inlining, which itself hinders caching.
  - Initiating relevant server pushes early in the processing of a request can make asset retrieval asynchronous (but low-priority) with respect to the main request.
PUSH_PROMISE 101
- Server push streams are initiated with PUSH_PROMISE frames.
- PUSH_PROMISE frames for dependencies of another “main request’s” response should arrive at the client before the DATA frames of the “main request’s” response, so that clients don’t create duplicate requests for these resources.
- Clients receiving a PUSH_PROMISE frame can decline the stream (via an RST_STREAM frame), useful if the resource is already in cache.
- Each pushed resource is a standard stream. Though one restriction of pushes (not needed for standard responses from servers) is that resources must obey the same-origin policy.
Header Compression
- HTTP/1.x headers consume 500-800 bytes, sometimes kilobytes more (if cookies are used).
- HTTP/2 compreses header metadata using HPACK compression format, which
  - allows the transmitted header fields to be encoded by a static Huffman code
  - requires both client and server to maintain an indexed list of previously seen header fields that the protocol ensures stay in sync.
  - So HTTP headers in the specification are part of a static table of huffman codes, and custom/non-specification headers are maintained in dynamic tables kept per connection.
- As a convenience/annoyance, HTTP/2 leaves basically the same headers, except
  - all header field names are lowercase
  - the request line is split into separate pseudo-header fields with weird names (':method', ‘:scheme’, ‘:host’, ‘:path’)
Further Reading