Http Caching and Memcached – made for each other

First the problem. You have a feed or a web page that changes infrequently and you know when it becomes invalid. The classic pattern here is a blog feed or a friendfeed feed. These feeds are great cache candidates i.e. cache it and then invalidate the cache when a new post is added. The important factors here are to minimize database usage, to cache as close to the client as is possible and to have very little logic required to determine if the cache is stale.

Http Caching
was designed for this. It allows for the page to be cached in any number of proxy servers anywhere in the world. All the app server is now left to do is indicate whether or not the cached page is stale. This takes a significant load off the app server and the database as the page doesn’t need to get rebuilt.

To use http caching the app server sends down a last-modified header with the original retrieval of the page. Subsequent requests (that come via an http cache) send an if-modified-since header that contains the value of the last-modified header from the first page retrieval. If nothing has changed then the server can issue a 304 return code and the page is served from the cache. If something has changed then the full page is returned with a new last-modified header and a 200. This is explained in more detail elsewhere on the web.

It’s possible to implement this approach very efficiently using memcached. In the case of a cache hit, only memcached is used by the application and no load is placed on the database. To achieve this memcached simply stores the pages last-modified time keyed by the page. The page key often corresponds to something very natural in the application e.g. the blogs unique id, or the friend’s unique id.

The logic is as follows.

If the page request is a conditional get i.e. there is an if-modified-since header
     If memcached contains a timestamp for this page key
          If the timestamp matches the one in if-modified-since
               return 304
          Else
               Build the page and return 200, use the timestamp from memcached for the last-modified header
          End-If
     Else
          Calculate the current time and put it into memcached using the pages key
          Build the page and return 200, use the timestamp just calculated for the last-modified header
     End-If         
Else
     If Memcached contains a timestamp for this page key
          Build the page and return 200, use the timestamp for the last-modifed header
     Else
          Calculate the current time and put it into memcached using the pages key
          Build the page and return 200, use the timestamp just calculated for the last-modified header
     End-If
End-If

In addition, when the application decides that the pages cache is invalid e.g. new blog post was added to a blog, then it simply deletes the corresponding key from memcached.

The nice thing about the pattern is that it doesn’t mandate keeping a bunch of timestamps in the database up to date when things change and it can serve up a lot of pages without needing to reference the database at all.

2 thoughts on “Http Caching and Memcached – made for each other”

  1. You mentioned it in the last sentence, but the disadvantage is of course that the page that changes some data, needs to know the keys to delete. As long as you have only 1 entry-point, this is ok, but it can become a bit more complicated when your data can be changed via different routes. Then suddenly everyone needs to know about the cache keys, leading to increased coupling. But for everyone else, this scheme will work just fine.

Leave a Reply

Your email address will not be published. Required fields are marked *