First the problem. You have a feed or a web page that changes infrequently and you know when it becomes invalid. The classic pattern here is a blog feed or a friendfeed feed. These feeds are great cache candidates i.e. cache it and then invalidate the cache when a new post is added. The important factors here are to minimize database usage, to cache as close to the client as is possible and to have very little logic required to determine if the cache is stale.
Http Caching was designed for this. It allows for the page to be cached in any number of proxy servers anywhere in the world. All the app server is now left to do is indicate whether or not the cached page is stale. This takes a significant load off the app server and the database as the page doesn’t need to get rebuilt.
To use http caching the app server sends down a last-modified header with the original retrieval of the page. Subsequent requests (that come via an http cache) send an if-modified-since header that contains the value of the last-modified header from the first page retrieval. If nothing has changed then the server can issue a 304 return code and the page is served from the cache. If something has changed then the full page is returned with a new last-modified header and a 200. This is explained in more detail elsewhere on the web.
It’s possible to implement this approach very efficiently using memcached. In the case of a cache hit, only memcached is used by the application and no load is placed on the database. To achieve this memcached simply stores the pages last-modified time keyed by the page. The page key often corresponds to something very natural in the application e.g. the blogs unique id, or the friend’s unique id.
The logic is as follows.
If the page request is a conditional get i.e. there is an if-modified-since header If memcached contains a timestamp for this page key If the timestamp matches the one in if-modified-since return 304 Else Build the page and return 200, use the timestamp from memcached for the last-modified header End-If Else Calculate the current time and put it into memcached using the pages key Build the page and return 200, use the timestamp just calculated for the last-modified header End-If Else If Memcached contains a timestamp for this page key Build the page and return 200, use the timestamp for the last-modifed header Else Calculate the current time and put it into memcached using the pages key Build the page and return 200, use the timestamp just calculated for the last-modified header End-If End-If
In addition, when the application decides that the pages cache is invalid e.g. new blog post was added to a blog, then it simply deletes the corresponding key from memcached.
The nice thing about the pattern is that it doesn’t mandate keeping a bunch of timestamps in the database up to date when things change and it can serve up a lot of pages without needing to reference the database at all.