Scaling Databases and Google’s App Engine

Assaf Arkin: if anything I wrote sounds vaguely familiar because you somehow managed to dumb your RDBMS into storing structured data in BLOBs, added versions and timestamps on all records, grappled with minimizing transactions and locks, denormalized data like there’s no tomorrow, or relied too much on a message queue, then time to rethink. Are you using a hammer to polish your china? (Tip: not a good idea, invest in soft cloth)

I know I am late (particularly as an IBMer) to the whole couchDB / bigtable thing, but I have to admit that until I read Assaf’s article I didn’t really understand what all the fuss was about.

I also didn’t understand the fuss about google’s app engine and shared the view that it looked like “The World’s Worst Webhost“. My current webhost lets me run ruby, perl, php and python with either mysql or postgres, so why would I want to limit myself to python and some proprietary, lock-in database. The answer is, of course, scale.

And so here is the dilemma. If you are developing a new web application from scratch what do you do if you think / hope that one day you will be wildly successful? There is the flickr / facebook / myspace path i.e. use a single relational database till it breaks, then use master / slave and then shard (gory details for myspace) and now there is the google app engine path i.e. build the application from the start in a way that is guaranteed to scale out with no herculean efforts.

This is not an easy choice. The problem with the google path is that their datastore has no joins, limited transaction support and some random query limitations. Now, I have never built an application with those limitations, but I would guess that it takes longer to build and requires a whole new way of thinking.

The google app engine path seems difficult to justify for a new app. It’s built on the premise that the app will be wildly successful and most web apps aren’t. Justifying the additional upfront expense to investors / management won’t be easy and given that many big names have already successfully traversed the other path, it may be better to start by choosing one of the world’s better webhosters and only invest in scaling the db when the money starts rolling in.

8 thoughts on “Scaling Databases and Google’s App Engine”

  1. “It’s built on the premise that the app will be wildly successful and most web apps aren’t.”

    Not clear to me how you came to this conclusion – that the premise is that apps built on GAE will be wildly successful. Rather, I suspect there’s plenty of low-fruit to be picked here in terms of the ‘long-tail’ – most apps will NOT be wildly successful, and GAE will be a good fit for them.

  2. @Geoff and @Assaf: many thanks for pointing this out, will have a post discussing this.

    @Patrick: I came to this conclusion as it appears to be harder to write a GAE than a PHP / mysql application, this is due to all the limitations on what you can do with the data layer. If you don’t need huge scalability I don’t see why you would want to make life more complicated for yourself. Get a webhost and code a PHP/ Mysql application :)

  3. Rob, perhaps what you’re trying to say is:

    sure, I like the turnkey provisioning, the built-in authentication, provision of email, URL mapping, and load-balancing. But I don’t like the fact that the only data API I get is to this “substandard” BigTable thing. I want my ACID, maaaan. And why shouldn’t I get it? Why shouldn’t I get normal relational SQL?

    The googlers might tell me that I don’t want it. But they’re living a world where apps might scale up so far and so fast that giving real fully-serializable access to shared data is prohibitively expensive. So they’re purposely given us an API that is -less- than that. Problem is, for most, -MOST- apps, they’ll never get to the level where a panty-waist MySQL on a weenie machine isn’t enough. If I believe that my app is in that category, -why- would I want to write my app to this much-more-complicated concurrency-control regime? Really, why would I want to do it?

  4. After some reflection, I’ve changed my mind. To paraphrase a very smart guy I met once who works for a hosting company (of sorts):

    Competent sysadmin is a -very- scarce resource.

    -That’s- the real value proposition here, not the data apis. And at some level, perhaps I buy it. I’m just not sure yet.

  5. Have to agree with @6, for a lot of people it’s not that they’re worried about scalability, they just don’t want to pay big bucks to host/maintain their own infrastructure.

    Rob, do you think Amazon.com’s data storage service offers more to the data-literate developer than Google does? If so, is it worth experimenting on combinations?

Leave a Reply

Your email address will not be published. Required fields are marked *