Http Caching (not as easy as it first appears)

REST is all the rage and so as we start to read and actually use the specification that powers the internet it appears that practice does not follow theory.

I wanted to cache a web page generated by one of our products.  I wanted to cache it in the browser and in a reverse proxy and I wanted it localized. The localization utilizes the Accept_Language to determine the locale to use for the page (and so yes, our products use content negotiation).  In theory I SHOULD be able to set these headers on the page to have the page cached for a day.

Cache_Control: max-age=86400, public
Vary: Accept-Language

The "vary" header (section 14.44) is needed as the caches should not return the english representation when the french one has been requested. In theory I was done. However as we tested this we found that few browsers know what to do with the "vary" header.  The spec is only 7 years old after all.  The worst offender at least for what I was trying to achieve was IE (gory details).  It simply refused to cache the page and so the request always made its way back to the server.  It is also not clear how many proxy servers properly support "Vary".

So what to do?, hacks are available (apache’s is called force-no-vary).  The reverse proxy or the web server can strip out the vary header when IE is making the request, but all this adds up to is a lot of product documentation and a slew of defects.

It was interesting to discover (thanks Elias) that the main author of the http 1.1 specification has a recommendation that circumvents the problem and avoids content negotiation.  Essentially the French and English versions have separate urls e.g. /language/en/page1.html and language/fr/page1.html and if anyone requests /page1.html then the "Accept-Language" header is used to calculate the page to redirect to e.g /language/en/page1.html for the english version. This is all well and good, but what about the fact that we have a several J2EE web applications already built.

It turns out that it is possible to write a filter that can do all of this automagically, assuming that the "localized" pages (i.e. those with different representations depending on the language) are separate from the non-localized stuff e.g. images, css etc.  The filter described below and available here redirects any request for e.g. /service/* to /service/language/ACCEPT_LANGUAGE/* where ACCEPT_LANGUAGE is the user’s preferred language as specified in the Accept_Language header e.g. /service/language/en/*. Also when a request comes in for /service/language/en/* the filter strips out the additional language portion of the url, sets the locale (based on this portion of the url) and forwards it on (internally) to the old url. So nothing, really, needs to be changed in the original application, url’s don’t need to be remapped, code doesn’t need changing, etc. Here is the bulk of the filter.

public void init(FilterConfig config) throws ServletException {
    pathToReplace = config.getInitParameter(PATH_TO_REPLACE_PARAM_NAME);
    pathWithLanguage = pathToReplace + "language/"; 
    regex = Pattern.compile(pathWithLanguage+"(.*?)/(.*)");
}
public void doFilter(ServletRequest request, ServletResponse response,FilterChain chain)
    throws IOException, ServletException {
    //map to httpServletRequest and Response
    HttpServletRequest httpReq = (HttpServletRequest)request;
    HttpServletResponse httpRes = (HttpServletResponse)response;

    String pathWithContextRemoved = 
        httpReq.getRequestURI().substring(httpReq.getContextPath().length());

    //if we already have the language path the set the locale and forward
    if(pathWithContextRemoved.startsWith(pathWithLanguage)){            
        Matcher matcher = regex.matcher(pathWithContextRemoved);
        matcher.find();
        String language = matcher.group(1);
        String path = pathToReplace+matcher.group(2);
        Config.set(request, Config.FMT_LOCALE, language);
        httpReq.getRequestDispatcher(path).forward(request,
            new ResponseWrapper(httpRes, httpReq.getContextPath() +
                pathToReplace, httpReq.getContextPath() + pathWithLanguage+language+"/"));

    //replace the path if necessary
    }else if(pathWithContextRemoved.startsWith(pathToReplace)) {
        httpRes.sendRedirect(httpReq.getContextPath()+pathWithLanguage+
            httpReq.getLocale().toString()+
            pathWithContextRemoved.substring(pathToReplace.length()-1));            

    //else just do what we used to do
    }else{
        chain.doFilter(request,response);
    }
}

The final thing that is needed is to rewrite of any of those old (non-language based urls) to the new ones in the pages that get returned.  The approach assumes that the best practice of ensuring that any url to be returned to the user is first passed to response.encodeURL(String url) and it turns out that all the tag libraries (e.g. struts, jstl) do this and so for our applications the approach works nicely.

public class ResponseWrapper extends HttpServletResponseWrapper implements HttpServletResponse {
    
    private String replacementPath;
    private String pathToReplace;
    
    public ResponseWrapper(HttpServletResponse response, String pathToReplace, String replacementPath){
        super(response);
        this.pathToReplace = pathToReplace;
        this.replacementPath = replacementPath;
    }

    public String encodeRedirectUrl(String arg0) {
        arg0 = replacePathIfNeeded(arg0);
        return super.encodeRedirectUrl(arg0);
    }

    public String encodeRedirectURL(String arg0) {
        arg0 = replacePathIfNeeded(arg0);
        return super.encodeRedirectURL(arg0);
    }

    public String encodeUrl(String arg0) {
        arg0 = replacePathIfNeeded(arg0);
        return super.encodeUrl(arg0);
    }

    public String encodeURL(String arg0) {
        arg0 = replacePathIfNeeded(arg0);
        return super.encodeURL(arg0);
    }
    
    private String replacePathIfNeeded(String arg0){
        return arg0.startsWith(pathToReplace)?replacementPath + arg0.substring(pathToReplace.length()):arg0;
    }    
}

For completeness here is an example web.xml that uses the filter to have /service/* changed to /service/language/ACCEPT_LANGUAGE e.g. /service/language/en.

  <filter>
    <filter-name>Conneg</filter-name>
    <filter-class>ConnegFilter</filter-class>
    <init-param>
        <param-name>path-to-replace</param-name>
        <param-value>/service/</param-value>
    </init-param>
  </filter>

  <filter-mapping>
    <filter-name>Conneg</filter-name>
    <url-pattern>/service/*</url-pattern>
  </filter-mapping>

I know that the code probably needs some fixes to make it tolerant to bad configurations etc., however this gets the basic idea across. The complete code is available here. If your application already uses filters then you need to be careful about the order that you specify them in and understand what the <dispatcher> element does in the <filter-mapping> element.

Please let me know if you have suggestions for improvements or find this useful. 

Note: While I outlined the problems and a solution for a traditional web application with a browser client, the same approach could and possibly should be used with any REST api that can return localized content.

3 thoughts on “Http Caching (not as easy as it first appears)”

  1. Rob,

    Thanks for that – looks like a great solution to the problem. One question I have, though…

    I presume the filter that enables this needs to be aware of the languages supported by the application, right? I was under the impression that “Accept-Language” is a list of locales, in order of preference. Then the filter will need to parse this list and take the first language accepted by the application.

    Does that make sense?

    Charles.

  2. Charles,
    so the filter doesn’t need to know the available languages. In the sample code above it leaves the resolution up to the platform i.e. “httpReq.getLocale()”. There may be other servers that base the calculation on stored values etc. in which case the code needs to be changed.

Leave a Reply

Your email address will not be published. Required fields are marked *