How to get a little more concurrency / performance out of a database

The database should always be the bottleneck in any large scale web application. As such, it is becoming increasingly important to minimize concurrency and to eek out every last bit of performance from the database.

There are a set of techniques for improving concurrency in sql code. The one I have personally seen most widely applicable is best seen with an inventory example. Suppose that you are buying 10 copies of Jim’s book from Amazon. The logic starts out with a predicate i.e. does the store have 10 copies of Jim’s book in stock. If it does then reduce the quantity by 10. Pseudo code would be as follows.


Select quantity_on_hand from Inventory where ISBN = 'IUWHSUY' FOR UPDATE;
If quantity_on_hand >= 10
Update Inventory set quantity_on_hand = quantity_on_hand - 10 where ISBN = 'IUWHSUY';

While this is pretty efficient it can be improved upon by combining the predicate i.e. ‘if quantity_on_hand >= 10’ with the transform i.e. ‘set quantity_on_hand = quantity_on_hand – 10’ into a single SQL statement.

The resulting single sql statement is as follows


Update Inventory set quantity_on_hand = quantity_on_hand - 10 where quantity_on_hand >= 10

The application then checks the return of the update call to see how many rows were effected. If it was 1 then all is well (there was enough stock and the inventory level has been reduced), if it is zero then there aren’t enough items in stock.

This pattern comes up over and over again and the single sql statement is ALWAYS preferable to sending off two separate sql requests i.e. one that checks whether some condition is satisfied and then another to perform the update.

Update:Removed reference to ‘field calls’ which was pointed out to me actually refers to pushing this support further into the db manager, the above pattern still holds though, just not called a field call

Http Caching and Memcached – made for each other

First the problem. You have a feed or a web page that changes infrequently and you know when it becomes invalid. The classic pattern here is a blog feed or a friendfeed feed. These feeds are great cache candidates i.e. cache it and then invalidate the cache when a new post is added. The important factors here are to minimize database usage, to cache as close to the client as is possible and to have very little logic required to determine if the cache is stale.

Http Caching
was designed for this. It allows for the page to be cached in any number of proxy servers anywhere in the world. All the app server is now left to do is indicate whether or not the cached page is stale. This takes a significant load off the app server and the database as the page doesn’t need to get rebuilt.

To use http caching the app server sends down a last-modified header with the original retrieval of the page. Subsequent requests (that come via an http cache) send an if-modified-since header that contains the value of the last-modified header from the first page retrieval. If nothing has changed then the server can issue a 304 return code and the page is served from the cache. If something has changed then the full page is returned with a new last-modified header and a 200. This is explained in more detail elsewhere on the web.

It’s possible to implement this approach very efficiently using memcached. In the case of a cache hit, only memcached is used by the application and no load is placed on the database. To achieve this memcached simply stores the pages last-modified time keyed by the page. The page key often corresponds to something very natural in the application e.g. the blogs unique id, or the friend’s unique id.

The logic is as follows.

If the page request is a conditional get i.e. there is an if-modified-since header
     If memcached contains a timestamp for this page key
          If the timestamp matches the one in if-modified-since
               return 304
          Else
               Build the page and return 200, use the timestamp from memcached for the last-modified header
          End-If
     Else
          Calculate the current time and put it into memcached using the pages key
          Build the page and return 200, use the timestamp just calculated for the last-modified header
     End-If         
Else
     If Memcached contains a timestamp for this page key
          Build the page and return 200, use the timestamp for the last-modifed header
     Else
          Calculate the current time and put it into memcached using the pages key
          Build the page and return 200, use the timestamp just calculated for the last-modified header
     End-If
End-If

In addition, when the application decides that the pages cache is invalid e.g. new blog post was added to a blog, then it simply deletes the corresponding key from memcached.

The nice thing about the pattern is that it doesn’t mandate keeping a bunch of timestamps in the database up to date when things change and it can serve up a lot of pages without needing to reference the database at all.

Http Caching (not as easy as it first appears)

REST is all the rage and so as we start to read and actually use the specification that powers the internet it appears that practice does not follow theory.

I wanted to cache a web page generated by one of our products.  I wanted to cache it in the browser and in a reverse proxy and I wanted it localized. The localization utilizes the Accept_Language to determine the locale to use for the page (and so yes, our products use content negotiation).  In theory I SHOULD be able to set these headers on the page to have the page cached for a day.

Cache_Control: max-age=86400, public
Vary: Accept-Language

The "vary" header (section 14.44) is needed as the caches should not return the english representation when the french one has been requested. In theory I was done. However as we tested this we found that few browsers know what to do with the "vary" header.  The spec is only 7 years old after all.  The worst offender at least for what I was trying to achieve was IE (gory details).  It simply refused to cache the page and so the request always made its way back to the server.  It is also not clear how many proxy servers properly support "Vary".

So what to do?, hacks are available (apache’s is called force-no-vary).  The reverse proxy or the web server can strip out the vary header when IE is making the request, but all this adds up to is a lot of product documentation and a slew of defects.

It was interesting to discover (thanks Elias) that the main author of the http 1.1 specification has a recommendation that circumvents the problem and avoids content negotiation.  Essentially the French and English versions have separate urls e.g. /language/en/page1.html and language/fr/page1.html and if anyone requests /page1.html then the "Accept-Language" header is used to calculate the page to redirect to e.g /language/en/page1.html for the english version. This is all well and good, but what about the fact that we have a several J2EE web applications already built.

It turns out that it is possible to write a filter that can do all of this automagically, assuming that the "localized" pages (i.e. those with different representations depending on the language) are separate from the non-localized stuff e.g. images, css etc.  The filter described below and available here redirects any request for e.g. /service/* to /service/language/ACCEPT_LANGUAGE/* where ACCEPT_LANGUAGE is the user’s preferred language as specified in the Accept_Language header e.g. /service/language/en/*. Also when a request comes in for /service/language/en/* the filter strips out the additional language portion of the url, sets the locale (based on this portion of the url) and forwards it on (internally) to the old url. So nothing, really, needs to be changed in the original application, url’s don’t need to be remapped, code doesn’t need changing, etc. Here is the bulk of the filter.

public void init(FilterConfig config) throws ServletException {
    pathToReplace = config.getInitParameter(PATH_TO_REPLACE_PARAM_NAME);
    pathWithLanguage = pathToReplace + "language/"; 
    regex = Pattern.compile(pathWithLanguage+"(.*?)/(.*)");
}
public void doFilter(ServletRequest request, ServletResponse response,FilterChain chain)
    throws IOException, ServletException {
    //map to httpServletRequest and Response
    HttpServletRequest httpReq = (HttpServletRequest)request;
    HttpServletResponse httpRes = (HttpServletResponse)response;

    String pathWithContextRemoved = 
        httpReq.getRequestURI().substring(httpReq.getContextPath().length());

    //if we already have the language path the set the locale and forward
    if(pathWithContextRemoved.startsWith(pathWithLanguage)){            
        Matcher matcher = regex.matcher(pathWithContextRemoved);
        matcher.find();
        String language = matcher.group(1);
        String path = pathToReplace+matcher.group(2);
        Config.set(request, Config.FMT_LOCALE, language);
        httpReq.getRequestDispatcher(path).forward(request,
            new ResponseWrapper(httpRes, httpReq.getContextPath() +
                pathToReplace, httpReq.getContextPath() + pathWithLanguage+language+"/"));

    //replace the path if necessary
    }else if(pathWithContextRemoved.startsWith(pathToReplace)) {
        httpRes.sendRedirect(httpReq.getContextPath()+pathWithLanguage+
            httpReq.getLocale().toString()+
            pathWithContextRemoved.substring(pathToReplace.length()-1));            

    //else just do what we used to do
    }else{
        chain.doFilter(request,response);
    }
}

The final thing that is needed is to rewrite of any of those old (non-language based urls) to the new ones in the pages that get returned.  The approach assumes that the best practice of ensuring that any url to be returned to the user is first passed to response.encodeURL(String url) and it turns out that all the tag libraries (e.g. struts, jstl) do this and so for our applications the approach works nicely.

public class ResponseWrapper extends HttpServletResponseWrapper implements HttpServletResponse {
    
    private String replacementPath;
    private String pathToReplace;
    
    public ResponseWrapper(HttpServletResponse response, String pathToReplace, String replacementPath){
        super(response);
        this.pathToReplace = pathToReplace;
        this.replacementPath = replacementPath;
    }

    public String encodeRedirectUrl(String arg0) {
        arg0 = replacePathIfNeeded(arg0);
        return super.encodeRedirectUrl(arg0);
    }

    public String encodeRedirectURL(String arg0) {
        arg0 = replacePathIfNeeded(arg0);
        return super.encodeRedirectURL(arg0);
    }

    public String encodeUrl(String arg0) {
        arg0 = replacePathIfNeeded(arg0);
        return super.encodeUrl(arg0);
    }

    public String encodeURL(String arg0) {
        arg0 = replacePathIfNeeded(arg0);
        return super.encodeURL(arg0);
    }
    
    private String replacePathIfNeeded(String arg0){
        return arg0.startsWith(pathToReplace)?replacementPath + arg0.substring(pathToReplace.length()):arg0;
    }    
}

For completeness here is an example web.xml that uses the filter to have /service/* changed to /service/language/ACCEPT_LANGUAGE e.g. /service/language/en.

  <filter>
    <filter-name>Conneg</filter-name>
    <filter-class>ConnegFilter</filter-class>
    <init-param>
        <param-name>path-to-replace</param-name>
        <param-value>/service/</param-value>
    </init-param>
  </filter>

  <filter-mapping>
    <filter-name>Conneg</filter-name>
    <url-pattern>/service/*</url-pattern>
  </filter-mapping>

I know that the code probably needs some fixes to make it tolerant to bad configurations etc., however this gets the basic idea across. The complete code is available here. If your application already uses filters then you need to be careful about the order that you specify them in and understand what the <dispatcher> element does in the <filter-mapping> element.

Please let me know if you have suggestions for improvements or find this useful. 

Note: While I outlined the problems and a solution for a traditional web application with a browser client, the same approach could and possibly should be used with any REST api that can return localized content.