HttpOnly please – more

So my previous post described some of the challenges involved in maintaining security in a site, such as a blogging site, that allows unrestricted / unfiltered user authored content and suggested "HttpOnly" cookies could mitigate some of the risk . "HttpOnly" cookies are, however, not a complete solution. 

The remaining problem is described in one of the comments in the mozilla "HttpOnly" bug posting. Here’s a concrete example. I log into my blog at http://blogs.com/robyates. I then visit the blog http://blogs.com/attacker/. Let’s assume that I am using I.E. and that blogs.com uses "HttpOnly" cookies.  The javascript on the attacker’s blog can’t get access to my "HttpOnly" cookie’s, so it can’t steal my session, but it can open a hidden iframe and then use this iframe to make posts, add spam etc. etc.and given that I have an authenticated session it can do all this under my identity, pretty bad.  It can do this as the attackers blog and my blog are in the same domain i.e. http://blogs.com.

Fortunately, this problem is well understood by the large public blogging organizations such as livejournal. Their approach gives each user their own domain and this domain is separate from the management domain. So, for example, my blog could now be http://robyates.blogs.com, the attackers is http://attacker.blogs.com and my blog is managed at http://manage.blogs.com/robyates.  Now due to cross frame scripting security which also applies to XMLHttpRequests, the javascript on the attackers site is rendered useless. Any javascript running on the attacker.blogs.com domain can’t get access to the data on the robyates.blogs.com or manage.blogs.com domain, so my postings can’t be deleted and spam can’t be added.

The key point here, when designing an application that permits user supplied html, is to segment the application into discrete security regions and assign each region a unique domain. This way any erroneous javascript is constrained to some subset of the complete application.

So in combination with carefully constructed domain partitioning of the application "HttpOnly" cookies show real potential  With any luck we’ll see it show up in firefox real soon, as the bug looks to be heading in the right direction.

Finally, having recently learnt all about this so we can recommend topologies for our new blogging application, it’s got me thinking about how secure any JSON based api is, scary stuff!

HttpOnly please

I am currently working on a multi-user blogging application for corporate deployment. One of the more interesting challenges is how much flexibility we should allow blog posters with their content.  Do we allow them to post javascript and if we do what do we do about XSS vulnerabilities.

Here’s the problem, a user can make a blog post containing any javascript (a property we want to preserve so that we can populate the blog with fancy charting and other tricks only available through javascript).  This post can use XSS attacks against any user viewing the post.  At first glance this doesn’t seem like much of a problem, the attacker only gets to sniff their own blog, however they also get access to the user’s cookies and in a corporate environment, which may be utilizing single sign on, that opens up a big hole in the form of session hijacking.

So what to do?  Google turned up what seems like a really nice solution in IE and it’s a solution that appears to be gaining momentum. Essentially, it allows a cookie to declare that it is not available to javascript in the browser and so session hijacking becomes practically impossible. It does this by simply adding HttpOnly to the end of the Set-Cookie header e.g.

Set-Cookie: USER=123; expires=Wednesday, 09-Nov-99 23:12:40 GMT; HttpOnly

done, right ?………… WRONG.  It turns out that there are a few things that still need to fall in the place. The firefox community has been debating exactly how to implement it since 2002.  Then there’s the need to be able to set it from Java uh oh, and we still have to figure out what support we get from the cookies set by Websphere, Netegrity, Web Seal et al.

Anyway, it shows a fair bit of promise and yes I know it doesn’t shut down all the vulnerabilities, but it is a step in the right direction and something we’ll certainly be looking into in more detail.

CalAtom – draft

So here is the first rough draft of the CalAtom spec.

We’ve made a few changes since the earlier posts on CalAtom, probably the biggest is the fact that CalAtom clients and servers MUST support xcal representations of calendar data.

I expect this to change a fair bit before formally posting the spec (I, at least need to spell check it), but this provides a general idea as to how it is all going to work. Feedback and anyone interested in implementing, helping out with the spec etc. are all very welcome. Please contact me or just respond to this blog post.

I know already that it is out of date as it is dependent on two specs that weren’t publically available when it was written. One now resides here and another one I believe is being published later today, I will update this post once that happens.

Http Caching (not as easy as it first appears)

REST is all the rage and so as we start to read and actually use the specification that powers the internet it appears that practice does not follow theory.

I wanted to cache a web page generated by one of our products.  I wanted to cache it in the browser and in a reverse proxy and I wanted it localized. The localization utilizes the Accept_Language to determine the locale to use for the page (and so yes, our products use content negotiation).  In theory I SHOULD be able to set these headers on the page to have the page cached for a day.

Cache_Control: max-age=86400, public
Vary: Accept-Language

The "vary" header (section 14.44) is needed as the caches should not return the english representation when the french one has been requested. In theory I was done. However as we tested this we found that few browsers know what to do with the "vary" header.  The spec is only 7 years old after all.  The worst offender at least for what I was trying to achieve was IE (gory details).  It simply refused to cache the page and so the request always made its way back to the server.  It is also not clear how many proxy servers properly support "Vary".

So what to do?, hacks are available (apache’s is called force-no-vary).  The reverse proxy or the web server can strip out the vary header when IE is making the request, but all this adds up to is a lot of product documentation and a slew of defects.

It was interesting to discover (thanks Elias) that the main author of the http 1.1 specification has a recommendation that circumvents the problem and avoids content negotiation.  Essentially the French and English versions have separate urls e.g. /language/en/page1.html and language/fr/page1.html and if anyone requests /page1.html then the "Accept-Language" header is used to calculate the page to redirect to e.g /language/en/page1.html for the english version. This is all well and good, but what about the fact that we have a several J2EE web applications already built.

It turns out that it is possible to write a filter that can do all of this automagically, assuming that the "localized" pages (i.e. those with different representations depending on the language) are separate from the non-localized stuff e.g. images, css etc.  The filter described below and available here redirects any request for e.g. /service/* to /service/language/ACCEPT_LANGUAGE/* where ACCEPT_LANGUAGE is the user’s preferred language as specified in the Accept_Language header e.g. /service/language/en/*. Also when a request comes in for /service/language/en/* the filter strips out the additional language portion of the url, sets the locale (based on this portion of the url) and forwards it on (internally) to the old url. So nothing, really, needs to be changed in the original application, url’s don’t need to be remapped, code doesn’t need changing, etc. Here is the bulk of the filter.

public void init(FilterConfig config) throws ServletException {
    pathToReplace = config.getInitParameter(PATH_TO_REPLACE_PARAM_NAME);
    pathWithLanguage = pathToReplace + "language/"; 
    regex = Pattern.compile(pathWithLanguage+"(.*?)/(.*)");
}
public void doFilter(ServletRequest request, ServletResponse response,FilterChain chain)
    throws IOException, ServletException {
    //map to httpServletRequest and Response
    HttpServletRequest httpReq = (HttpServletRequest)request;
    HttpServletResponse httpRes = (HttpServletResponse)response;

    String pathWithContextRemoved = 
        httpReq.getRequestURI().substring(httpReq.getContextPath().length());

    //if we already have the language path the set the locale and forward
    if(pathWithContextRemoved.startsWith(pathWithLanguage)){            
        Matcher matcher = regex.matcher(pathWithContextRemoved);
        matcher.find();
        String language = matcher.group(1);
        String path = pathToReplace+matcher.group(2);
        Config.set(request, Config.FMT_LOCALE, language);
        httpReq.getRequestDispatcher(path).forward(request,
            new ResponseWrapper(httpRes, httpReq.getContextPath() +
                pathToReplace, httpReq.getContextPath() + pathWithLanguage+language+"/"));

    //replace the path if necessary
    }else if(pathWithContextRemoved.startsWith(pathToReplace)) {
        httpRes.sendRedirect(httpReq.getContextPath()+pathWithLanguage+
            httpReq.getLocale().toString()+
            pathWithContextRemoved.substring(pathToReplace.length()-1));            

    //else just do what we used to do
    }else{
        chain.doFilter(request,response);
    }
}

The final thing that is needed is to rewrite of any of those old (non-language based urls) to the new ones in the pages that get returned.  The approach assumes that the best practice of ensuring that any url to be returned to the user is first passed to response.encodeURL(String url) and it turns out that all the tag libraries (e.g. struts, jstl) do this and so for our applications the approach works nicely.

public class ResponseWrapper extends HttpServletResponseWrapper implements HttpServletResponse {
    
    private String replacementPath;
    private String pathToReplace;
    
    public ResponseWrapper(HttpServletResponse response, String pathToReplace, String replacementPath){
        super(response);
        this.pathToReplace = pathToReplace;
        this.replacementPath = replacementPath;
    }

    public String encodeRedirectUrl(String arg0) {
        arg0 = replacePathIfNeeded(arg0);
        return super.encodeRedirectUrl(arg0);
    }

    public String encodeRedirectURL(String arg0) {
        arg0 = replacePathIfNeeded(arg0);
        return super.encodeRedirectURL(arg0);
    }

    public String encodeUrl(String arg0) {
        arg0 = replacePathIfNeeded(arg0);
        return super.encodeUrl(arg0);
    }

    public String encodeURL(String arg0) {
        arg0 = replacePathIfNeeded(arg0);
        return super.encodeURL(arg0);
    }
    
    private String replacePathIfNeeded(String arg0){
        return arg0.startsWith(pathToReplace)?replacementPath + arg0.substring(pathToReplace.length()):arg0;
    }    
}

For completeness here is an example web.xml that uses the filter to have /service/* changed to /service/language/ACCEPT_LANGUAGE e.g. /service/language/en.

  <filter>
    <filter-name>Conneg</filter-name>
    <filter-class>ConnegFilter</filter-class>
    <init-param>
        <param-name>path-to-replace</param-name>
        <param-value>/service/</param-value>
    </init-param>
  </filter>

  <filter-mapping>
    <filter-name>Conneg</filter-name>
    <url-pattern>/service/*</url-pattern>
  </filter-mapping>

I know that the code probably needs some fixes to make it tolerant to bad configurations etc., however this gets the basic idea across. The complete code is available here. If your application already uses filters then you need to be careful about the order that you specify them in and understand what the <dispatcher> element does in the <filter-mapping> element.

Please let me know if you have suggestions for improvements or find this useful. 

Note: While I outlined the problems and a solution for a traditional web application with a browser client, the same approach could and possibly should be used with any REST api that can return localized content.

Freebusy Demo

As a follow up to the Freebusy Lookup posting I wanted to point out a demo that Dan Gurney has put together on top of Domino.  This approach to free busy is similar to the one outlined in the Freebusy Lookup post.  Here are the details for how to get yourself setup with a Calendar and the try out the freebusy lookups. Feedback is very very welcome. 

Please note that the demo server will only be available for the next few weeks,

From Dan…..

Good news!

IBM will be hosting a "Freetime" servlet running on a Domino server. You’ll be able to create your own mail files and create/update/remove calendar entries.

You can create your own demo account at any time.

To create a demo account:
1. Go to http://www.notes.net
2. Click the menu: Lotus – Live Demos
3. Click the link: Lotus Domino Web Access
4. Go thru the legal stuff, give your account a user name and password, and voila … you will have a Domino mail file. Domino Web Access is (hopefully) a straight forward web application that allows you full control over your mail database and its contents.

I have a demo account that you can schedule meetings with. I’ll set the account to auto-respond to all requests:
Spider_Man@showcase2.notes.net

Once you have your account, you can then get free-time information via a <free-busy-request/> POST (there is currently no way for Domino to enable the "REPORT" verb) or a simple GET request.

Here is the URL for any account on this test server for POST or GET:
https://showcase2.notes.net/servlet/Freetime/<email>

where <email> is your RFC-821 email address. For instance, my demo account can be reached like this:

https://showcase2.notes.net/servlet/Freetime/Spider_Man@showcase2.notes.net

You can pass dates as optional parameters for a GET request:
https://showcase2.notes.net/servlet/Freetime/Spider_Man@showcase2.notes.net?start-min=2006-07-12&start-max=2006-07-14

Start-min and start-max are ISO-8601 dates. You can use partial or full ISO-8601 dates (I am showing my age here … there is probably another name for the CalDAV date format now).

Set the "ACCEPT" header to "text/icalendar" if you wish to receive iCalendar instead of xCalendar.

Abdera – Turbo

So James pretty much single-handedly wrote abdera (now apache’s atom parser).  The one contribution I can probably claim is a performance turbo boost. The turbo comes from the fact that it is often the case that only part of the atom feed is actually used by a program. If you need all of the feed to be placed in the tree structure produced by adbera then this turbo won’t help you.

One of the things we frequently do is just pull out the titles, the links and maybe the categories for the entries so we can display them in a list. We don’t need or even want the rest of the feed structure to be parsed or resident in memory. Abdera’s FOMParserOptions allows a program to declare the elements that they want parsed ahead of the parse by setting the ParseFilter (a List of QName’s).  This provides an opportunity to optimize a given parse.

To test the improvements I wrote a little program (see below) that extracts just the entries’ titles from a feed (the feed from robubu.com as of today). I then did performance comparisons against ROME, and abdera without the ParseFilter.  Here’s the results (allocated bytes and machine instructions)

ROME:            2.64 MB, 223 mil.
Abdera:            286 KB, 30 mil.
Abdera+turbo: 66 KB, 25 mil.

So the turbo gives a slight saving for cpu but as would be expected a significant saving in terms of allocated bytes, using 40 times less memory than the same parse in ROME.  Now, I do admit that the test is biased and not that realistic, but it does give you an idea of the kinds of savings that can achieved if there are large portions of a feed that you don’t want to parse.

The code to do this is fairly straighforward.

InputStream is = this.getClass().getResourceAsStream("robubu.atom"); 

//create the list to hold the qname's to parse
List elementsToParse = new ArrayList(); 
elementsToParse.add(new 
  QName("http://www.w3.org/2005/Atom","feed"));
elementsToParse.add(new 
  QName("http://www.w3.org/2005/Atom","entry")); 
elementsToParse.add(new 
  QName("http://www.w3.org/2005/Atom","title")); 

/*create a ParserOptions object and set the 
parse filter using the list just created */ 
FOMParserOptions options = new FOMParserOptions();
options.setParseFilter(elementsToParse); 

//parse using the options 
Document doc = Parser.INSTANCE.parse(is,"",options); 

/*parse as normal, but only elements 
in the list appear in the tree*/
Feed feed = (Feed)doc.getRoot(); 
List entries = feed.getEntries(); 
Iterator i = entries.iterator(); 
while(i.hasNext()){ 
  Entry entry = (Entry)i.next(); 
  System.out.println(entry.getTitle()); 
}
is.close();

I think that this is pretty self explanatory but please let me know if it isn’t.  It is also worth noting that all the parents of any element required by the program must also appear in the filter list. i.e. in the example above "entry" and "feed" must also be in the filter along with "title" so that the title of the entries in the feed can be retrieved from the dom. Also note that if no filter is specified then all elements are parsed and available in the resulting tree. Please let us know if this is an optimization that is useful to you.

Sorting and Filtering in Atom – CardAtom

So CalAtom can get kind of complicated when it comes to querying, so before tackling the querying in that properly (I think what we have proposed to date needs a lot more work) I wanted to first see if the querying / sorting required for CardAtom could be accomplished. CardAtom would be an attempt to remote an api for managing contacts in an address book. It would store, update and manipulate collections of vCards via the Atom Publishing Protocol. The CRUD operations would be identical to those described in CalAtom only with vCard payloads (see the early slides in this CalAtom presentation for details on how this can work). While this works, APP is presently limited to always returning the entire collection and doing so in last modified order.  So how should sorting and filtering be accomplished in APP. We’ll take these one at a time.

Sorting

The atom publishing protocol mandates the collection order to be by last modified date. This is not that useful to a CardAtom client that wants to display the collection by familyname or givenname. The client could download the entire collection and then do local sorting, but as the number of contacts increases this becomes less and less viable.

Servers can produce collections in any order and make these alternative sort orders available via a url, the only tricky bit is communicating their existence and location to the client. This can, however, be accomplished by placing the sort order’s url in a link element within the original feed. The links "rel" attribute is used to indicate the particular sort order available at the url. So the feed for the vCard collection now looks like this.


<feed xmlns="http://www.w3.org/2005/Atom">
<title type="text">rob's contacts</title>
<updated>2005-07-31T12:29:29Z</updated>
<id>tag:example.org,2003:3</id>
<link rel="self" type="application/atom+xml"
href="http://example.org/contacts" />
<link
rel="http://purl.org/CardAtom/sort/byFamilyName/asc"
type="application/atom+xml"
title="by Family Name"
href="http://example.org/contacts/byFamilyName"/>

<entry>
<title>Rob Yates</title>
<summary type="html">
&lt;p>&lt;strong>Tel:&lt;/strong>+1-234-567-8901&lt;/p>
</summary>
<link rel="edit-media" type="text/directory" href="http://example.org/contact/1"/>
<id>tag:example.org,2003:3.2397</id>
<updated>2005-07-31T12:29:29Z</updated>
<content src="http://example.org/contact/1" type="text/directory" />
</entry>
<entry> .....
</feed>

Note that this uses the fact that link/@rel (as defined by atom) can actually take any arbitrary url to define its meaning. A CardAtom specification could therefore define a set of link relationships that define the mandatory and optional sorts that a CardAtom collection supports. A client reading the feed can search for a particular sort order using the value of link/@rel and if it wants to render the collection in that order it can simply retrieve the corresponding urls contents, nice.

I can also imagine "standard" sort orders being defined by specific "rel" values, e.g. "by Title" or "by Author Name".

Filtering

Filtering is much trickier. How does the server communicate to the client the searches/filters that it supports. The server could allow for very flexible and complex queries to be written in which case something like XQuery or SPARQL should be used. While extremely flexible, the problem with those is that the server MUST allow any attribute to be searched and this dramatically increases both the cost of implementation and the subsequent optimizations. For CardAtom it seems that we really only need to support full text searches as well as filtering by FamilyName and GivenName. We just need a way to describe these options to the client, and so it was that James reminded me of A9’s opensearch. Opensearch contains a description document that describes a search supported by the site. For CardAtom we want to offer a full text search, a familyName search and a givenName search. First off here is one that describes the full text search.


<OpenSearchDescription xmlns="http://a9.com/-/spec/opensearch/1.1/">
<ShortName>Search</ShortName>
<Description>Search the contact store</Description>
<Url type="application/atom+xml"
template="http://example.org/contacts?q={searchTerms}/>
</OpenSearchDescription>

note that {searchTerms} in the url above defines where a substitution should be made.  Opensearch also defines the meaning of "searchTerms".  Then here’s the familyName search.


<OpenSearchDescription xmlns="http://a9.com/-/spec/opensearch/1.1/"
xmlns:xcard="http://www.ietf.org/internet-drafts/draft-dawson-vCard-xml-dtd-
04.txt">
<ShortName>by Family Name</ShortName>
<Description>Search the contact store for a given family name</Description>
<Url type="application/atom+xml"
template="http://example.org/contacts?familyName={xcard:family}/>
</OpenSearchDescription>

Note that this one uses the xCard namespace to indicate that the substitution variable should be of type <family> as described in the xCard specification.

And so, if we make these two description files available at an appropriate location we can now link to those in the feed as well (although I only show one for brevity), e.g.


<feed xmlns="http://www.w3.org/2005/Atom">
<title type="text">rob's contacts</title>
<updated>2005-07-31T12:29:29Z</updated>
<id>tag:example.org,2003:3</id>
<link rel="self" type="application/atom+xml"
href="http://example.org/contacts" />
<link
rel="http://purl.org/CardAtom/sort/byFamilyName/asc"
type="application/atom+xml"
title="by Family Name"
href="http://example.org/contacts/byFamilyName"/>

<link
rel="http://purl.org/CardAtom/search/byFamilyName"
type="application/opensearchdescription+xml"
href="http://example.org/contacts/search/byFamilyName"/>

<entry>
<title>Rob Yates</title>
<summary type="html">
&lt;p>&lt;strong>Tel:&lt;/strong>+1-234-567-8901&lt;/p>
</summary>
<link rel="edit-media" type="text/directory" href="http://example.org/contact/1"/>
<id>tag:example.org,2003:3.2397</id>
<updated>2005-07-31T12:29:29Z</updated>
<content src="http://example.org/contact/1" type="text/directory" />
</entry>
<entry> .....
</feed>

So this seems to all be working, although we haven’t coded it yet :).  The feed/collection describes its alternate sort orders and its possible filters and how to invoke them.

One thing that I think still needs further thought though is whether these sort or search/filter links are discoverable outside of the collection document.  It seems wrong that a client must first load a collection with the default sort order only to locate the sort order that it actually wants to use  Should there be an introspection document per collection that can somehow be retrieved from the collection url?  Maybe a GET against the collection url with an accept header of "application/atomserv+xml", not sure…

 I welcome suggestions for improvements or alternative approaches, these are features that as we use Atom for more things it seems like we need. Once the core spec is complete I hope these are considerations for the working group.

Freebusy Lookups

I attended part of the CalConnect roundtable in Cambridge this week, had some great discussions and also got to see the group’s reaction to CalAtom. It was met with very mixed reactions and we definitely left a little bruised but I think there is some real potential and interest there.

Anyway this post wasn’t intended to be about CalAtom. Instead it is on freebusy lookups, i.e. how do I find out when you are available to meet. It was pointed out to me that one of the first interoperability aspects that any calendaring standard faces is to figure out when you are free so that I can schedule a meeting with you. If that can be accomplished then passing around iCal files in email messages can get us a first stab at interoperability. I can see when you are free, send you a meeting invitation, you can accept it and I get told you accepted. Not too shabby, although admittedly a long long way from having this work with any calendaring client server combo. It’s surprising that an existing standard for this doesn’t exist yet, but having researched some implementations I presently cannot find one. Please post responses if you know of one that I have missed.

I then looked in detail at both Google’s approach and CalDav. I first looked at what google had done. They provide a “projection” for free-busy. You can get your free-busy back as a feed, here’s a snippet.

<snip/>.....I removed the feed header and shortened the urls in one or two places as well
<entry>
 <id>http://www.google.com/4ca650bc087d/free-busy/4g0ncm350mc5pv8487fd8megmo</id>
 <published>2006-04-17T15:00:00.000Z</published>
 <updated>2006-05-02T18:23:59.000Z</updated>
 <category term="http://schemas.google.com/g/2005#event"/>
 <link href="http://www.google.com/calendar/0eWF0ZXNAZ21haWwuY29t"/>
 <link rel="self" href="http://www.google.com/calendar/fee8487fd8megmo"/>
 <gd:when startTime="2006-04-17T08:00:00.000-07:00"
   endTime="2006-04-17T10:00:00.000-07:00"/>
</entry>
<entry>
 <id>http://www.google.com/calendar/feeds/</id>
 <published>2006-05-25T22:00:00.000Z</published>
 <updated>2006-04-28T20:05:27.000Z</updated>
 <category term="http://schemas.google.com/g/2005#event"/>
 <link href="http://www.google.com/calen1haWwuY29t"/>
 <link rel="self"  href="http://www.google50eotvfa8qgronj9oje08"/>
 <gd:when startTime="2006-05-25T15:00:00.000-07:00" 
  endTime="2006-05-25T16:00:00.000-07:00"/>
</entry>

Now I am a BIG BIG fan of ATOM, but there is a time and a place. On the plus side this was really easy to do and with any old xml parser I can make sense of it. I should also have been able to use standard atom parsers with it, but…. Well, first off this is not valid atom. Any atom parser will barf at this as it doesn’t provide a title for the entries. It is also extremely verbose containing information that is not needed by clients that are only interested in your free time. Contrast it with a similar ical representation.

BEGIN:VFREEBUSY
ORGANIZER:jsmith@host.com
FREEBUSY:19980314T233000Z/19980315T003000Z
FREEBUSY:19980316T153000Z/19980316T163000Z
END:VFREEBUSY

hmmm, not a great start, so next up was CalDav. The relevant section is 7.9 and if we ignore some complexity due to depth and a few other things and instead head straight onto the example, then this seems much more reasonable, very straightforward. On deeper examination though, the first thing that stands out is the use of the REPORT method. Ignoring very valid firewall concerns and ploughing straight into using it I first went in search of a java library to use. The standard library of choice for the java programmer when using http is the apache commons httpclient. Uh oh!, so no luck there. So off to google which eventually leads me to the Slide WebDav client and a solution, yeh!. It’s slightly concerning though to see that it has not been updated since 2004 and more worrisome that it still depends on a version of httpclient that was end of life’d on 27th Feb 2006. But let’s assume that this works, onto the next bit if trickery, how to parse the response. Writing a custom parser seems wrong, someone must have done this, so off to google again and I find iCal4J, yippeeee! now all I have to do is learn how to use it and ensure that I can usher this licence past my employer’s legal department and I am flying. What’s that? you want to also do this in php and ruby? and you don’t know of any libraries available for those languages?

This is too hard, too hard. There are plenty of examples of complicated http based api’s and ALL of them, ALL of them are easier to consume than this, and why is this? It’s because they use standard http verbs that all the language http utilities support and that they represent their data in a format that is accessible by parsers that all the languages support.

Contrast this with the following approach (that I will submit for discussion to the CalDav mailing list).

GET /bernard/work/freebusy?start=20060104T140000Z&end= 20060105T220000Z
Host: cal.example.com

response

HTTP/1.1 200 OK
Date: Fri, 11 Nov 2006 09:32:12 GMT
Content-Type: application/calendar+xml
Content-Length: xxxx

<vcalendar>
 <version>2.0</version>
 <prodid>-//hacksw/handcal//NONSGML 1.0//EN</prodid>
 <vfreebusy>
  <dtstart> 20060104T140000Z </dtstart>
  <dtend> 20060105T220000Z </dtend>
  <freebusy fbtype="BUSY-TENTATIVE">20060104T150000Z/PT1H</freebusy>
  <freebusy>20060104T190000Z/PT1H</freebusy>
 </vfreebusy>
</vcalendar>
  

As a consumer I can now use HttpClient and the bundled xml parser in java, that I have used a thousand times before, done.

This has certainly been an interesting exercise for me. One of the claims I had leveled at CalDav, in an attempt to justify CalAtom, is that CalDav was just too hard. I stand by that claim. For a product group to incorporate just the freetime lookup it seems they have to convince management that it is ok to depend on a end of life’d jar file, that it really isn’t too hard to write or learn a custom parser for some proprietary data format and that as other programming languages gain popularity that there will, of course, be the base utilities in those languages to also use these features. I think that this is presently a very hard sell.

CalDav may indeed lead to a protocol that allows calendaring systems to interoperate with each other, but I wonder whether that’s enough. To be successful doesn’t it also need to interoperate with emerging web applications? and, more importantly, be consumable in a similar way? In sampling many of the current approaches there are clear patterns emerging, namely the use of http 1.1 methods and xml/json data packets. These patterns are strongly supported in all the major languages and will continue to be for the foreseeable future (this is where it’s presently at).

With the majority of http applications heading down a different path, are the custom parsers, end of life’d jars and special firewall compensations, that CalDav requires, really going to be tolerated? I’m not sure, but all this seems to beg the question “what’s so special about calendaring?” 30Boxes, google, evdb, yahoo nothing?

CalAtom

Updated May 17th: I posted some of the issues that I had to the ATOM working group.  Their advice, as ever, was very useful.  I have updated the post to contain their recommendations and so no longer require content negotiation.

Updated, May 12th: In recent discussions it has been pointed out to me that I don’t need to choose a specific format for the exchange format (I had previously limited CalAtom to xCal). I have made significant changes to the post to reflect these thoughts.

CalDav is close to final call. I have read the spec and it seems that the bar to implement is fairly high. The spec is 90 pages long, excluding appendices, and it builds upon many other specifications. Google’s Calendar Data api, on the other hand, seems like a much simpler approach but doesn’t reuse any existing calendaring specifications and in no way comes close to the feature set that is in CalDav. CalAtom would be an attempt to rework CalDav to have the underlying protocol be APP instead of WebDav. I think this has the potential to simplify clients and servers and can build on some recent advances in data storage.

The first decision that needs to be made is the data format that will be exchanged by clients and servers. The most natural is iCalendar and this was the one chosen by CalDav. The iCalendar format has also been mapped to xml (xCal, xCalendar) and to rdf (rdfCal). ATOM will actually let us support and mix all of them together and this is the path that I now want to explore. It is worth noting that 30 pages of the CalDav specification are devoted to querying. It would be nice if, instead, we could use a standard query language. There have already been some examples of this. If CalAtom could use a standard search api such as sql, xquery or even sparql then that may save a lot of work, especially if our data store natively supported the chosen query language. Given that ATOM is in xml and that oracle, db2 and sqlserver now support xml storage it would seem that one avenue of investigation should be XQuery or XPath.

So, how would CalAtom work? First up, there would need to be an ATOM collection that can accept calendar posts. The ATOM introspection document would declare a collection that “accepts” data of the appropriate media type e.g. “application/calendar+xml”., “text/calendar” or “application/rdf+xml” (I can’t find a mime type for rdfCal). So with James’s new magic, proposed for the next draft of the atom publishing protocol, the introspection document looks like this.

<?xml version="1.0" encoding='utf-8'?>
<service xmlns="http://purl.org/atom/app#">
 <workspace title="Main Site" >
  <collection title="My Calendar" href="http://example.org/calendar" >
   <accept>application/calendar+xml, text/calendar, application/rdf+xml</accept>
  </collection>
 </workspace>
</service>

A client can now determine that the “My Calendar” collection will accept xCal, iCalendar and rdfCal (hand waving for rdfCal, as the accept element isn’t limiting enough). So far, so good. Now to create a Calendar event. (For anyone paying real close attention I am going to assume that we follow the same rules for the xCal posts as outlined in section 4.1 of the current CalDav draft, we probably also need some additional extensions to the introspection document that govern the type of xCal entry i.e. vType, vEvent, vJournal or vFreebusy, we’ll ignore this for now). So to create a simple calendar event this is the post (if we are posting as xCal)

POST /calendar HTTP/1.1
Host: example.org
User- Agent: Thingio/1.0
Content- Type: application/calendar+xml
Content- Length: nnn

<?xml version="1.0" encoding="UTF-8"?>
<vcalendar version="2.0" 
  prodid="-//hacksw/handcal//NONSGML 1.0//EN">
 <vevent>
  <uid>19970901T130000Z-123401@host.com</uid>
  <dtstamp>19970901T130000Z</dtstamp>
  <dtstart>19970903T163000Z</dtstart>
  <dtend>19970903T190000Z</dtend>
  <summary>Annual Employee Review</summary>
  <class>PRIVATE</class>
  <categories>
   <item>Business</item>
   <item>Human Resources</item>
  </categories>
 </vevent>
</vcalendar>

with response

HTTP/1.1 201 Created
Date: Fri, 7 Oct 2005 17:17:11 GMT
Content- Length: nnn
Content- Type: application/atom+xml; charset="utf-8"
Content- Location: http://example.org/calendar/1.atom
Location: http://example.org/calendar/1.atom

<?xml version="1.0"?>
<entry xmlns="http://www.w3.org/2005/Atom">
 <title>Annual Employee Review</title>
 <id>urn:uuid:1225c695-cfb8-4ebb-aaaa-80da344efa6a</id>
 <updated>2003-12-13T18:30:02Z</updated>
 <author><name>John Doe</name></author>
 <summary type="text" />
 <link rel="alternate" type="text/calendar" 
   href="http://example.org/calendar/1.ics" />
 <link rel="alternate" type="application/rdf+xml" 
   href="http://example.org/calendar/1.rdf" />
 <content type="application/calendar+xml" 
   src=http://example.org/calendar/1.xml/>
 <link rel="edit" type="application/atom+xml" 
   href="http://example.org/calendar/1.atom" />
 <link rel="edit-media" type="application/calendar+xml" 
   href=”http://example.org/calendar/1.xml” />
 <link rel="edit-media" type="text/calendar" 
   href="http://example.org/calendar/1.ics" />
 <link rel="edit-media" type="application/rdf+xml" 
   href="http://example.org/calendar/1.rdf" />
</entry>

 

So this looks like it will work. Note that the response actually lists 3 “edit-media” urls. The server has accepted a post with an xCal body and made available editable representations of it in its original xcal and also in two additional formats. The server would also accept iCalendar or rdfCal posts. This seems really nice, clients can choose the representation that they want to work with and edit that one. Conventional APP can now be used to get paged access to the collection and to manipulate the collection. This gets us CRUD operations on calendaring resources and as a bonus we also have a feed to the calendar that gets updated when entries change or get added, however it is still a long long way from CalDav.

There are several things still needed, but they all look very doable. I am going to punt on Access Control, although there are some starts on this. Access control will eventually be needed in APP and so we will wait for that discussion to get started. It also appears that CalDav’s Calendar Collection properties (section 5.2) shouldn’t be too difficult to incorporate (they are simply extensions in the Introspection document). Then the biggest remaining item and admittedly the hardest part is querying the calendar, repeating events force things to get complicated pretty quickly and so I am going to leave that for another day, although it seems that the use of XPath or Xquery is going to be the way to go here. For the record Google has very limited query capability at the moment when compared to CalDav.

CalAtom seems to hold a lot of promise, it has (in Atom) a simpler underlying model than CalDav. ATOM and APP has already enticed google into giving it a go for Calendaring and so it seems worthy of further investigation. The challenge for APP is whether it is currently specified enough to start tackling problems outside of its blogging homeland. Only time will tell.