Skip to content

Http Caching and Memcached - made for each other

First the problem. You have a feed or a web page that changes infrequently and you know when it becomes invalid. The classic pattern here is a blog feed or a friendfeed feed. These feeds are great cache candidates i.e. cache it and then invalidate the cache when a new post is added. The important factors here are to minimize database usage, to cache as close to the client as is possible and to have very little logic required to determine if the cache is stale.

Http Caching
was designed for this. It allows for the page to be cached in any number of proxy servers anywhere in the world. All the app server is now left to do is indicate whether or not the cached page is stale. This takes a significant load off the app server and the database as the page doesn’t need to get rebuilt.

To use http caching the app server sends down a last-modified header with the original retrieval of the page. Subsequent requests (that come via an http cache) send an if-modified-since header that contains the value of the last-modified header from the first page retrieval. If nothing has changed then the server can issue a 304 return code and the page is served from the cache. If something has changed then the full page is returned with a new last-modified header and a 200. This is explained in more detail elsewhere on the web.

It’s possible to implement this approach very efficiently using memcached. In the case of a cache hit, only memcached is used by the application and no load is placed on the database. To achieve this memcached simply stores the pages last-modified time keyed by the page. The page key often corresponds to something very natural in the application e.g. the blogs unique id, or the friend’s unique id.

The logic is as follows.

If the page request is a conditional get i.e. there is an if-modified-since header
     If memcached contains a timestamp for this page key
          If the timestamp matches the one in if-modified-since
               return 304
          Else
               Build the page and return 200, use the timestamp from memcached for the last-modified header
          End-If
     Else
          Calculate the current time and put it into memcached using the pages key
          Build the page and return 200, use the timestamp just calculated for the last-modified header
     End-If
Else
     If Memcached contains a timestamp for this page key
          Build the page and return 200, use the timestamp for the last-modifed header
     Else
          Calculate the current time and put it into memcached using the pages key
          Build the page and return 200, use the timestamp just calculated for the last-modified header
     End-If
End-If

In addition, when the application decides that the pages cache is invalid e.g. new blog post was added to a blog, then it simply deletes the corresponding key from memcached.

The nice thing about the pattern is that it doesn’t mandate keeping a bunch of timestamps in the database up to date when things change and it can serve up a lot of pages without needing to reference the database at all.

Friendfeed, Twitter, Alert Thingy and Delicious

As part of my current project one of the things I am delving into a little deeper is social networking. To that end I now have a twitter account, a delicious account and a friendfeed account that aggregates the other two plus my google reader shares and this blog. Am keeping track of all of this with alert thingy.

Will see how much use I make of them.

Update Changed some accounts, apparently I should be using my real name as much as possible.

Certified Http

REST api’s are being developed for more and more business function. For certain business function once and once only delivery of a message over a REST call is required. Enter Certified Http, an effort led by second life with participation by IBM. It is simple to understand, has a reference implementation in python and they have this to say about their main competition.

httpr and ws-reliable.

These tend to be thoroughly engineered protocol specifications which regrettably repeat the mistakes of the nearly defunct XMLRPC and the soon to join it SOAP — namely, treating services as a function call. This is a reasonable approach, and is probably the most obvious to the engineers working on the problem. The most obvious path, which is followed in both of the examples, is to package a traditional message queue body into an HTTP body sent via POST. Treating web services as function calls severely limits the expressive nature of HTTP and should be avoided.

Installing php Shinding

Shindig is the open source implementation of both the opensocial spec and the gadgets spec.

Using cPanel I first created a subdomain of robubu.com called shindig.robubu.com. This seems to be necessary as a lot of the code seems to assume it is running in the root web directory. It also provides a security layer as the widget gets run in the context of shindig.robubu.com and therefore can’t access cookies, dom etc. delivered from robubu.com.

Then on my local machine I exported the svn head of shindig and uploaded it to robubu.

mkdir ~/src/shindig
cd ~/src/shindig
svn export http://svn.apache.org/repos/asf/incubator/shindig/trunk/
cd trunk
scp -r . admin@robubu.com:public_html/shindig

and that was it. It’s important to note that the subdomain’s root web directory is mapped to public_html/shindig/php.

Then to use it with an embedded gadget, I included the following code in the html head.


<link rel="stylesheet" href="http://shindig.robubu.com/gadgets/files/container/gadgets.css">
<script type="text/javascript" src="http://shindig.robubu.com/gadgets/js/rpc.js?c=1&amp;debug=1"></script>
<script type="text/javascript" src="http://shindig.robubu.com/gadgets/files/container/cookies.js"></script>
<script type="text/javascript" src="http://shindig.robubu.com/gadgets/files/container/util.js"></script>
<script type="text/javascript" src="http://shindig.robubu.com/gadgets/files/container/gadgets.js"></script>
<script type="text/javascript" src="http://shindig.robubu.com/gadgets/files/container/cookiebaseduserprefstore.js"></script>
<script type="text/javascript">
var specUrl0 = 'http://www.labpixies.com/campaigns/todo/todo.xml';


function renderGadgets() {
  var gadget0 = gadgets.container.createGadget({specUrl: specUrl0});
  gadget0.setServerBase("http://shindig.robubu.com/gadgets/");
  gadgets.container.addGadget(gadget0);
  gadgets.container.layoutManager.setGadgetChromeIds(
      ['gadget-chrome-x']);
  gadgets.container.renderGadget(gadget0);
};
</script>

Added an onLoad="renderGadgets()" to the html body and then added this DIV tag <div id="gadget-chrome-x" class="gadgets-gadget-chrome"></div> for where I wanted the Gadget to appear.

The “todo” gadget, rendered through the local shindig gadget container, shows up below if you are reading this on my blog. During testing very few of the widgets available through google managed to work, but the sample ones are working. I have no idea why this is, suggestions welcome.


Opensocial and OAuth specs

The REST api for opensocial makes its appearance in opensocial 0.8. The specification references some other specifications that are also worth a look.

OAuth Consumer Request - This is proposed as the means for server to server authentication between the Consumer site and the Service provider. It has the potential to replace basic auth over SSL which is the only real standards based approach for securely authenticating using a shared secret, given that digest was underspecified.

XRDS-Simple - This also looks promising and it is tackling the whole xri / yadis discovery mess that openid 2.0 seems burdened with.

Alan Greenspan’s Age Of Innocence

Alan Greenspan: Japan in recent years has had to struggle back from the stock-market and real estate crash of 1990. Japanese banks became heavily invested in loans backed by real estate as collateral, as real estate prices soared. When the turn came and prices cascaded downward, the collateral became inadequate. But instead of calling the loans, as most Western banks would do, the bankers refrained. It took years and many government bailouts before real estate prices stabilized and the banking system returned to normal lending, with realistic estimates of bad loans and, hence, capital.
I concluded from this that Japan behaved differently from other capitalistic countries.

Really? Sounds fairly familiar.

$94.5 Billion

Scary Stuff from the lender of last resort

Blocked by Google

Several folks have told me recently that Google is identifying my site as evil. I assure you it isn’t, however I know why google has flagged it. A couple of weeks ago my site was hacked and a hidden iframe was introduced. I cleaned it up, changed my password, upgraded wordpress and even changed the theme. All should now be well.

I have contacted google and we’ll see how long it takes to no longer be flagged as ‘Evil’.

Scaling Databases and Google’s App Engine

Assaf Arkin: if anything I wrote sounds vaguely familiar because you somehow managed to dumb your RDBMS into storing structured data in BLOBs, added versions and timestamps on all records, grappled with minimizing transactions and locks, denormalized data like there’s no tomorrow, or relied too much on a message queue, then time to rethink. Are you using a hammer to polish your china? (Tip: not a good idea, invest in soft cloth)

I know I am late (particularly as an IBMer) to the whole couchDB / bigtable thing, but I have to admit that until I read Assaf’s article I didn’t really understand what all the fuss was about.

I also didn’t understand the fuss about google’s app engine and shared the view that it looked like “The World’s Worst Webhost“. My current webhost lets me run ruby, perl, php and python with either mysql or postgres, so why would I want to limit myself to python and some proprietary, lock-in database. The answer is, of course, scale.

And so here is the dilemma. If you are developing a new web application from scratch what do you do if you think / hope that one day you will be wildly successful? There is the flickr / facebook / myspace path i.e. use a single relational database till it breaks, then use master / slave and then shard (gory details for myspace) and now there is the google app engine path i.e. build the application from the start in a way that is guaranteed to scale out with no herculean efforts.

This is not an easy choice. The problem with the google path is that their datastore has no joins, limited transaction support and some random query limitations. Now, I have never built an application with those limitations, but I would guess that it takes longer to build and requires a whole new way of thinking.

The google app engine path seems difficult to justify for a new app. It’s built on the premise that the app will be wildly successful and most web apps aren’t. Justifying the additional upfront expense to investors / management won’t be easy and given that many big names have already successfully traversed the other path, it may be better to start by choosing one of the world’s better webhosters and only invest in scaling the db when the money starts rolling in.

Thin Ice

Paul Volcker: So I think we are skating on increasingly thin ice. On the present trajectory, the deficits and imbalances will increase. At some point, the sense of confidence in capital markets that today so benignly supports the flow of funds to the United States and the growing world economy could fade. Then some event, or combination of events, could come along to disturb markets, with damaging volatility in both exchange markets and interest rates.
[snip]…
What I am talking about really boils down to the oldest lesson of economic policy: a strong sense of monetary and fiscal discipline.

Volcker wrote that in 2005, central bankers and government ignored him then as they do today.