Cache-control demestified

原文链接

http://palizine.plynt.com/issues/2008Jul/cache-control-attributes/

Many years ago, HTTP 1.1 introduced specialized Cache Control directives to control the behavior of browser caches and proxy caches. These were a refinement over the HTTP 1.0 headers that programmers were using to control the behavior of caches. Though these directives are several years old, we still see them being used incorrectly. In this article, we explain the meaning and relevance of the most important cache control directives.

Pragma: No-cache

This is a HTTP 1.0 directive that was retained in HTTP 1.1 for backward compatibility. When specified in HTTP requests, this directive instructs proxies in the path not to cache the request. This is useful when you are submitting sensitive details like usernames and passwords in the request.

Notice that “pragma: no-cache” is linked to requests, and not responses. The RFC did not specify the behavior of this directive for responses. Hence, this directive does NOT instruct the browser not to cache a page. We often see this tag being misused when a page is served to the browser. Developers mistakenly set this directive expecting that the page will not be cached on the browser.

For all practical purposes, you can ignore this directive today. HTTP 1.1 introduced better directives, as we’ll see shortly.

Expires header

This is yet another HTTP 1.0 directive that was retained for backward compatibility. This directive tells the browser when a page is set to expire. Once the page expires, the browser does not display the page to the user. Instead it shows a message like “Warning: Page has expired”.

In earlier days, developers played a nifty trick with this directive to ensure that a page expires immediately and is not served from the cache: they would set the expiration date to a day in the distant past. Thus, the browser treats the page has expired and never displays it from cache.

Though the page is not served from cache, browsers still used to store the page in the cache. You could navigate to the cache folder of the browser and open the file directly from there. Thus, this was not really a secure directive.

Cache-Control: public

HTTP 1.1 introduced an array of cache control directives. These give greater flexibility and control to the developer. The “cache-control: public” directive is the most basic directive and tells the browser and proxies in the path that the page may be cached. This is good for non-sensitive pages, as caching improves performance.

Cache-Control: private

The next higher directive is “cache-control: private”. It instructs proxies in the path not to cache the page. But it permits browsers to cache the page. Proxies are shared resources used by multiple users, and this directive tells them not to cache the response. Browsers, as we have already noted, may still cache the page.

Cache Control: No-cache

Though this directive sounds like it is instructing the browser not to cache the page, there’s a subtle difference. The “no-cache” directive, according to the RFC, tells the browser that it should revalidate with the server before serving the page from the cache. Revalidation is a neat technique that lets the application conserve band-width. If the page the browser has cached has not changed, the server just signals that to the browser and the page is displayed from the cache.

Hence, the browser (in theory, at least), stores the page in its cache, but displays it only after revalidating with the server. In practice, IE and Firefox have started treating the no-cache directive as if it instructs the browser not to even cache the page. We started observing this behavior about a year ago. We suspect that this change was prompted by the widespread (and incorrect) use of this directive to prevent caching.

Cache Control: No-store

This is the most secure of the cache-control directives. It tells the browser not only not to cache the page, but also not to even store the page in its cache folder. Whenever you’re serving a sensitive page, this is the cache control directive to use.

Notice that of late, “cache-control: no-cache” has also started behaving like the “no-store” directive. To be on the safer side, we recommend that you use both “no-cache” and “no-store” when serving sensitive pages.

Cache Control: max-age

This is the HTTP 1.1 equivalent of the earlier Expires header available in HTTP 1.0. It implicitly tells the browser it may cache the page, but must re-validate with the server if the max-age is exceeded. Setting max-age to zero ensures that a page is never served from cache, but is always re-validated against the server.

Cache Control: must-revalidate

This directive insists that the browser must revalidate the page against the server before serving it from cache. Note that it implicitly lets the browser cache the page. The “no-store” directive is a safer option if you want to prevent a sensitive page from being stored on the browser.

Cache Control: proxy-revalidate

This is similar to the must-revalidate directive, except that this is targeted at proxy servers. It insists that proxy servers must revalidate when serving this request, whereas the user’s browser need not revalidate. This is useful when an authenticated page is being cached in the browser. You don’t want the proxy to cache and serve the page, whereas it’s fine for your browser to cache and serve the page.

http://blog.marek.sapota.org/article/2012/08/17/web-browsers-and-cache-revalidation.html

eb browsers and cache revalidation

Published on 08/17/2012 14:15

By Marek Sapota

In software

Tags: software web caching

Browser caching is essential if you want your web page to load quickly.Google Developers has a good article showing a basic caching set-up. If that article satisfied your needs you do not have to read further. I was not satisfied though, not by this article and not by many others that tell you the same thing — add some headers for CSS, JS and images and congratulations, you have successfully leveraged browser caching! Well, that is not exactly true. Google Developers article is much more in-depth than this, but it still does not tackle an important issue: how do I use caching and make sure my web page/web application looks consistently for different users at the same time? By consistently I mean, what if I update my web page, change some ids and classes in HTML and publish a new CSS style sheet? Some users may have cached the old CSS and their browsers will try to use it with the new HTML, probably with disastrous results. You may think that if you updated your CSS your web server will send an updated “Last-Modified” or “ETag” header so browsers will detect the change and pull the updated file. Depending on other set headers this might be true, but generally (source: Google Developers):

Expires and Cache-Control: max-age. These specify the “freshness lifetime” of a resource, that is, the time period during which the browser can use the cached resource without checking to see if a new version is available from the web server. They are “strong caching headers” that apply unconditionally; that is, once they’re set and the resource is downloaded, the browser will not issue any GET requests for the resource until the expiry date or maximum age is reached.

So how can you use caching and make sure everyone is using the latest version of your files at the same time? There are a couple of possibilities:

Change the URL if the file changed, for example by including hash of the data in the file name. This way browsers will be forced to reload the resource for each new version. This works well for URLs that are not meant to be seen by the user directly such as CSS or JS files, but is not suitable for HTML files that may be bookmarked, etc. Side note: some web frameworks do this for you by default — Ruby on Rails would be one of them.
Use “Cache-Control: no-cache” header. Despite the name it does not stop browser caching but instead it forces browsers to revalidate the cache on each request. See rfc2616 for the explanation of this header. Be aware that while the specification does not prevent browsers from caching resources with this header some browsers still choose to do so. At the time of writing IE will not cache resources with this header at all (source).
Set the “Cache-Control: must-revalidate” header and the “Expires” header (alternatively “Cache-Control: max-age” header) in the past or to an invalid value which will have the same result (seerfc2616) — “0” and “-1” seem to be popular choices. This will mark the resource as already expired and will force revalidation. You can also drop the “Cache-Control: must-revalidate” to tell browsers that they should revalidate, instead of having to do so. Quick empirical test reveals that Google Chrome works as described above only when “Cache-Control: max-age” is set — “Expires” header alone does not enforce revalidation and when used with “Cache-Control: must-revalidate” it only revalidates HTML pages and gets CSS from the cache. Look here for more information about the differences between this method and the previous one.
Use both “Expires” in the past and “Cache-Control: no-cache” for good measure. This is what Nginx does when you use “expires epoch” in “nginx.conf”. Be advised that even with both of these present some browsers will not revalidate certain resources. For example Google Chrome will not revalidate fonts downloaded with the “@font-face” CSS directive.

I personally use “Cache-Control: public, must-revalidate, max-age=0” because it seems to work as intended in most browsers. This configuration will make sure HTML, CSS and JS are kept in sync but it is suboptimal for resources that will not change or ones that will not break the web page even if they change — fonts downloaded with “@font-face” are a good example. For such files you should set a positive “Cache-Control: max-age” header so browsers do not waste time on revalidating and can render pages faster.