My Ramblings

High Performance Web Sites: Essential Knowledge for Front-End Engineers

02 November 2009

Steve Souders is the leader in web performance and this book was the first of its kind to identify all the bottlenecks that are common and easily avoided by those developing web based applications. There are a ton of good tips and tricks that are useful for front-end developers as well as those operations folks that are responsible for the back end systems. If you are in the field of web operations then this is a must read because it is critical that you understand the challenges in getting optimal performance out of your application.

Here are the notes that I took jotted down so that I can always come back for a refresher when needed.

Looking at the HTTP traffic in this way, we see that at least 80% of the end user response time is spent on the components in the page.
Parallel requests don’t happen during requests for scripts. That’s because in most situations, browsers block additional HTTP requests while they download scripts.
If we were able to cut backend response times in half, the end user response time would decrease only 5–10% overall. If, instead, we reduce the frontend performance by half, we would reduce overall response times by 40–45%.
Only 10–20% of the end user response time is spent downloading the HTML document. The other 80–90% is spent downloading all the components in the page.
If the browser has a copy of the component in its cache, but isn’t sure whether it’s still valid, a conditional GET request is made. If the cached copy is still valid, the browser uses the copy from its cache, resulting in a smaller response and a faster user experience.
If the component has not been modified since the specified date, the server returns a “304 Not Modified” status code and skips sending the body of the response, resulting in a smaller and faster response.
The Expires header eliminates the need to check with the server by making it clear whether the browser can use its cached copy of a component.
As long as the component hasn’t expired, the browser uses the cached version and avoids making any HTTP requests.
Persistent Connections (also known as Keep-Alive in HTTP/1.0) was introduced to solve the inefficiency of opening and closing multiple socket connections to the same server. It lets browsers make multiple requests over a single connection.
The browser or server can close the connection by sending a Connection: close header. Technically, the Connection: keep-alive header is not required in HTTP/1.1, but most browsers and servers still include it.
A simple way to improve response time is to reduce the number of components, and, in turn, reduce the number of HTTP requests.
To use CSS sprites, multiple images are combined into a single image
The HTML element is positioned over the desired part of the background image using the CSS background-position property.
If you use a lot of images in your pages for backgrounds, buttons, navbars, links, etc., CSS sprites are an elegant solution that results in clean markup, fewer images to deal with, and faster response times.
It’s possible to include images in your web page without any additional HTTP requests by using the data: URL scheme.
You might not want to inline your company logo, because it would make every page grow by the encoded size of the logo. A clever way around this is to use CSS and inline the image as a background. Placing this CSS rule in an external stylesheet means that the data is cached inside the stylesheet.
Putting the inline image in an external stylesheet adds an extra HTTP request, but has the additional benefit of being cached with the stylesheet.
Multiple scripts should be combined into a single script, and multiple stylesheets should be combined into a single stylesheet. In the ideal situation, there would be no more than one script and one stylesheet in each page.
It’s easy to imagine a build process that includes combining scripts and stylesheets—simply concatenate the appropriate files into a single file. Combining files is easy. This step could also be an opportunity to minify the files (see Chapter 10). The difficult part can be the growth in the number of combinations. If you have a lot of pages with different module requirements, the number of combinations can be large.
Make Fewer HTTP Requests
If the application web servers are closer to the user, the response time of one HTTP request is improved. On the other hand, if the component web servers are closer to the user, the response times of many HTTP requests are improved.
When optimizing for performance, the server selected for delivering content to a specific user is based on a measure of network proximity. For example, the CDN may choose the server with the fewest network hops or the server with the quickest response time.
One drawback to relying on a CDN is that your response times can be affected by traffic from other web sites, possibly even those of your competitors. A CDN service provider typically shares its web servers across all its clients.
Use a Content Delivery Network
The Cache-Control header was introduced in HTTP/1.1 to overcome limitations with the Expires header. Because the Expires header uses a specific date, it has stricter clock synchronization requirements between server and client. Also, the expiration dates have to be constantly checked, and when that future date finally arrives, a new date must be provided in the server’s configuration.
Using Cache-Control with max-age overcomes the limitations of Expires, but you still might want an Expires header for browsers that don’t support HTTP/1.1 (even though this is probably less than 1% of your traffic). You could specify both response headers, Expires and Cache-Control max- age. If both are present, the HTTP specification dictates that the max-age directive will override the Expires header.
If a component does not have a far future Expires header, it’s still stored in the browser’s cache. On subsequent requests the browser checks the cache and finds that the component is expired (in HTTP terms it is “stale”). For efficiency, the browser sends a conditional GET request to the origin server. See Chapter B for an example. If the component hasn’t changed, the origin server avoids sending back the entire component and instead sends back a few headers telling the browser to use the component in its cache.
You can cut your response times in half by using the Expires header to avoid these unnecessary HTTP requests.
If an HTTP request results in a smaller response, the transfer time decreases because fewer packets must travel from the server to the client. This effect is even greater for slower bandwidth speeds.
This is the easiest technique for reducing page weight and it also has the biggest impact. There are other ways you can reduce the HTML document’s page weight (strip comments and shorten URLs, for example), but they are typically less effective and require more work.
Many web sites gzip their HTML documents. It’s also worthwhile to gzip your scripts and stylesheets, but many web sites miss this opportunity (in fact, it’s worthwhile to compress any text response including XML and JSON, but the focus here is on scripts and stylesheets since they’re the most prevalent).
There is a cost to gzipping: it takes additional CPU cycles on the server to carry out the compression and on the client to decompress the gzipped file.
Generally, it’s worth gzipping any file greater than 1 or 2K.
Gzipping generally reduces the response size by about 70%.
If the CPU load caused by streaming compression is an issue, consider caching the compressed responses, either on disk or in memory. Compressing your responses and updating the cache manually adds to your maintenance work and can become a burden.
The web server tells the proxy to vary the cached responses based on one or more request headers. Because the decision to compress is based on the AcceptEncoding request header, it makes sense to include Accept- Encoding in the server’s Vary response header. This causes the proxy to cache multiple versions of the response, one for each value of the Accept- Encoding request header.
A safe approach is to serve compressed content only for browsers that are proven to support it, such as Internet Explorer 6.0 and later and Mozilla 5.0 and later.
Because this virtually defeats proxy caching, another approach is to disable proxy caching explicitly using a Vary: * or Cache-Control: private header. Because the Vary: * header prevents the browser from using cached components, the CacheControl: private header is preferred and is used by both Google and Yahoo!. Keep in mind that this disables proxy caching for all browsers and therefore increases your bandwidth costs because proxies won’t cache your content.
ETags (explained in Chapter 13) don’t reflect whether the content is compressed, so proxies might serve the wrong content to a browser.
The problem with putting stylesheets near the bottom of the document is that it prohibits progressive rendering in many browsers. Browsers block rendering to avoid having to redraw elements of the page if their styles change.
Notice how putting stylesheets near the end of the document can delay page loading. This problem is harder to track down because it only happens in Internet Explorer and depends on how the page is loaded.
there are also performance benefits to using LINK instead of @import. The @import rule causes the blank white screen phenomenon
With scripts, progressive rendering is blocked for all content below the script. Moving scripts lower in the page means more content is rendered progressively.
HTTP/1.1 specification, which suggests that browsers download two components in parallel per hostname (http://www.w3.org/ Protocols/rfc261 6/rfc261 6-sec8.html#sec8.1 .4). Many web pages download all their components from a single hostname. Viewing these HTTP requests reveals a stairstep pattern
Instead of relying on users to modify their browser settings, frontend engineers could simply use CNAMEs (DNS aliases) to split their components across multiple hostnames. Maximizing parallel downloads doesn’t come without a cost. Depending on your bandwidth and CPU speed, too many parallel downloads can degrade performance.
parallel downloading is actually disabled while a script is downloading—the browser won’t start any other downloads, even on different hostnames. One reason for this behavior is that the script may use document.write to alter the page content, so the browser waits to make sure the page is laid out appropriately. Another reason that the browser blocks parallel downloads when scripts are being loaded is to guarantee that the scripts are executed in the proper order. If multiple scripts were downloaded in parallel, there’s no guarantee the responses would arrive in the order specified.
At this point, the effects that scripts can have on web pages are clear: • Content below the script is blocked from rendering. • Components below the script are blocked from being downloaded. If scripts are put at the top of the page, as they usually are, everything in the page is below the script, and the entire page is blocked from rendering and downloading until the script is loaded.
The best place to put scripts is at the bottom of the page. The page contents aren’t blocked from rendering, and the viewable components in the page are downloaded as early as possible.
Despite these results, using external files in the real world generally produces faster pages. This is due to a benefit of external files that is not captured by these examples: the opportunity for the JavaScript and CSS files to be cached by the browser.
if a typical user has many page views, the browser is more likely to have external components (with a far future Expires header) in its cache. The benefit of serving JavaScript and CSS using external files grows along with the number of page views per user per month or page views per user per session.
If the nature of your site results in higher primed cache rates for your users, the benefit of using external files is greater. If a primed cache is less likely, inlining becomes a better choice.
Categorize your pages into a handful of page types and then create a single script and stylesheet for each one.
The other extreme is to create a single file that is the union of all the JavaScript, and create another single file for all of the CSS. This has the benefit of subjecting the user to only one HTTP request, but it increases the amount of data downloaded on a user’s first page view. In this case, users will be downloading more JavaScript and CSS than is necessary for the page currently being viewed. Also, this single file must be updated whenever any of the individual scripts or stylesheets changes, invalidating the version currently cached by all users. This alternative makes the most sense on sites with a high number of sessions per user per month, where the typical session includes visits to multiple different pages.
The only exception I’ve seen where inlining is preferable is with home pages.
For home pages that are the first of many page views, we want to inline the JavaScript and CSS for the home page, but leverage external files for all secondary page views. This is accomplished by dynamically downloading the external components in the home page after it has completely loaded (via the onload event). This places the external files in the browser’s cache in anticipation of the user continuing on to other pages.
DNS has a cost. It typically takes 20–120 milliseconds for the browser to look up the IP address for a given hostname.
after a user requests a hostname, the DNS information remains in the operating system’s DNS cache (the “DNS Client service” on Microsoft Windows), and further requests for that hostname don’t require more DNS lookups, at least not for a while.
most browsers have their own caches, separate from the operating system’s cache. As long as the browser keeps a DNS record in its own cache, it doesn’t bother the operating system with a request for the record. Only after the browser’s cache discards the record does it ask the operating system for the address—and then the operating system either satisfies the request out of its cache or sends a request to a remote server, which is where potential slowdowns occur.
What’s implied (but not explained very well) is that DNS server TTL values less than 30 minutes have little effect on how frequently the browser does DNS lookups. Once the browser caches a DNS record, it is used for 30 minutes. If there is an error, the DNS lookup is refreshed sooner than that; under normal conditions, a short (under 30 minutes) TTL value won’t increase the number of DNS lookups made in Internet Explorer.
Keep-Alive avoids repeated DNS lookups by reusing the existing connection.
This is important information for network operations centers when trying to divert traffic by making DNS changes. If the IP addresses that the traffic is being diverted from are left running, it will take at least 30 minutes for Internet Explorer users with the old DNS record to get the DNS update. Users actively hitting the site (at least once every two minutes) will keep going to the old IP address and never get the DNS update until a failure occurs.
Reducing the number of unique hostnames has the potential to reduce the amount of parallel downloading that takes place in the page. Avoiding DNS lookups cuts response times, but reducing parallel downloads may increase response times.
My guideline is to split these components across at least two but no more than four hostnames. This results in a good compromise between reducing DNS lookups and allowing a high degree of parallel downloads.
The advantage of using Keep-Alive, described in Chapter B, is that it reuses an existing connection, thereby improving response times by avoiding TCP/IP overhead.
Minification is the practice of removing unnecessary characters from code to reduce its size, thereby improving load times. When code is minified, all comments are removed, as well as unneeded whitespace characters (space, newline, and tab).
Obfuscation is an alternative optimization that can be applied to source code. Like minification, it removes comments and whitespace, but it also munges the code. As part of munging, function and variable names are converted into smaller strings making the code more compact, as well as harder to read. This is typically done to make it more difficult to reverse-engineer the code, but munging can help performance because it reduces the code size beyond what is achieved by minification.
Minifying scripts reduces response times without carrying the risks that come with obfuscation.
minifying the files in addition to gzipping them reduces the payload by an average of 4K (20%) over gzip alone.
Despite their names, neither a 301 nor a 302 response is cached in practice unless additional headers, such as Expires or Cache-Control, indicate that it should be.
Sending a redirect when a trailing slash is missing is the default behavior for many web servers, including Apache.
The key is to find a way to have these simpler URLs without the redirects. Rather than forcing users to undergo an additional HTTP request, it would be better to avoid the redirect using Alias, mod_rewrite, DirectorySlash, and directly linking code
In Internet Explorer, if an external script is included twice and is not cacheable, the browser generates two HTTP requests during page loading.
In addition to generating unnecessary HTTP requests in Internet Explorer, time is wasted evaluating the script multiple times.
the ETag header thwarts caching when a web site is hosted on more than one server.
Entity tags (ETags) are a mechanism that web servers and browsers use to validate cached components.
There are two ways in which the server determines whether the cached component matches the one on the origin server: • By comparing the last-modified date • By comparing the entity tag
ETags were added to provide a more flexible mechanism for validating entities than the last-modified date. If, for example, an entity changes based on the User- Agent or Accept-Language headers, the state of the entity can be reflected in the ETag. Later, if the browser has to validate a component, it uses the If-None-Match header to pass the ETag back to the origin server.
The problem with ETags is that they are typically constructed using attributes that make them unique to a specific server hosting a site. ETags won’t match when a browser gets the original component from one server and later makes a conditional GET request that goes to a different server—a situation that is all too common on web sites that use a cluster of servers to handle requests.
This ETag issue also degrades the effectiveness of proxy caches. The ETag cached by users behind the proxy frequently won’t match the ETag cached by the proxy, resulting in unnecessary requests back to the origin server.
The If- None-Match header takes precedence over If-Modified-Since.
If you have components that have to be validated based on something other than the last-modified date, ETags are a powerful way of doing that. If you don’t have the need to customize ETags, it is best to simply remove them. Both Apache and IIS have identified ETags as a performance issue, and suggest changing the contents of the Etag
Following these suggestions leaves an ETag that contains just the size and timestamp (Apache) or just the timestamp (IIS). However, because this is basically duplicate information, it’s better to just remove the ETag altogether—the Last-Modified header provides sufficiently equivalent information, and removing the ETag reduces the size of the HTTP headers in both the response and subsequent requests.
if both of these headers are in the request, the origin server “MUST NOT return a response status of 304 (Not Modified) unless doing so is consistent with all of the conditional header fields in the request.”
If you have components that have to be validated based on something other than the last-modified date, ETags are a powerful way of doing that. If you don’t have the need to customize ETags, it is best to simply remove them. Both Apache and IIS have identified ETags as a performance issue, and suggest changing the contents of the Etag
Following these suggestions leaves an ETag that contains just the size and timestamp (Apache) or just the timestamp (IIS). However, because this is basically duplicate information, it’s better to just remove the ETag altogether—the Last-Modified header provides sufficiently equivalent information, and removing the ETag reduces the size of the HTTP headers in both the response and subsequent requests.

My Ramblings

High Performance Web Sites: Essential Knowledge for Front-End Engineers

books