What are ETags: A Help or a Hindrance?

ETags! Or entity tags, if you prefer to make them sound less like a system for keeping tabs on alien beings. There’s a reasonable chance that even if you’re a techy SEO you might not have heard of these or, if you have, exactly what they do and whether they do a good job of it. I’m working an awful lot on site speed for a number of clients recently and these are an oft-forgotten element that sit roughly under the category of web caching.

Oft-forgotten because PageSpeed Insights doesn’t mention them despite having a “Leverage browser caching” section – and this tool is generally Point A for anybody looking to brush up a given page’s speed.

So what is an ETag and why on earth should you care about it?

What’s an ETag?

Well, it’s a mechanism that HTTP provides for web cache validation – specifically it allows for a client to make a request that is conditional on whether the content it’s requesting from the server has changed or not. If it hasn’t changed, then this allows the resource to be fetched from cache more efficiently, saving bandwidth.

The method by which an ETag does this is as follows; each entity tag acts as a unique identifier for its resource – this identifier is made up of 3 values:

  • Inode – this is the file’s inode number. The inode number essentially stores data about the file (or directory) such as its ownership, access/permissions, type and location.
  • MTime – this identifies the date and time when the file was last modified.
  • Size – how big the file is in bytes.

So put that all together and you end up with a unique identifier made up of 3 parts. Exactly what it looks like will depend on your server configuration.

What does it do?

So the resource attached to that ETag has been requested – the server reads that unique identifier and, if it hasn’t changed, sees that the file hasn’t been modified and draws it from cache instead. If the ETag *has* changed then the client knows that to return the correct resource then it must re-download it from the server, rather than use the cached version.

Sounds good, that.

Yeah, doesn’t it. The only problem is that if your ETags are mis-configured then the test to see whether the resource has changed will *always* fail, leading to un-changed resources being drawn from the server rather from cache.

This is a common issue on sites that are being served from multiple servers, such as those utilising a CDN or other (perhaps cookie-less) domains. The inode (the first part of that identifier) will differ from server to server, and therefore invalidate the cache – returning a 200 OK status code rather than the small, fast 304 Not Modified we were hoping for. Due to this, even if your components have a Far-Future expiration header, a GET request is still going to be made whenever the user refreshes.

I’m hosting via multiple servers; how do I avoid this problem?

Luckily ETags are quite easily removed.

  • Apache: add the following line to your main Apache configuration file or the .htacess:
    • FileETag None
  • Nginx: add the following to the main Nginx configuration file:
    • etag off;
  • IIS: add the following to your web.config file:
    • <outboundRules>
      <rule name=”Remove ETag”>
      <match serverVariable=”RESPONSE_ETag” pattern=”.+” />
      <action type=”Rewrite” value=”” />
      </rule>
      </outboundRules>

Once removed you can rely on the Last-Modified header – they both serve the same purpose and simply using the latter will reduce the header size of your response.

Syntax:

Last-Modified: <day-name>, <day> <month> <year> <hour>:<minute>:<second> GMT

Example:

Last-Modified: Wed, 21 Oct 2015 07:28:00 GMT

about the author: "Blueclaw's Senior Technical SEO likes canonical tags, URL parameters and long walks on the beach (alright, site migrations). Can typically be found tinkering with the innards of the nearest eCommerce site."