Abdera – Turbo

So James pretty much single-handedly wrote abdera (now apache’s atom parser).  The one contribution I can probably claim is a performance turbo boost. The turbo comes from the fact that it is often the case that only part of the atom feed is actually used by a program. If you need all of the feed to be placed in the tree structure produced by adbera then this turbo won’t help you.

One of the things we frequently do is just pull out the titles, the links and maybe the categories for the entries so we can display them in a list. We don’t need or even want the rest of the feed structure to be parsed or resident in memory. Abdera’s FOMParserOptions allows a program to declare the elements that they want parsed ahead of the parse by setting the ParseFilter (a List of QName’s).  This provides an opportunity to optimize a given parse.

To test the improvements I wrote a little program (see below) that extracts just the entries’ titles from a feed (the feed from robubu.com as of today). I then did performance comparisons against ROME, and abdera without the ParseFilter.  Here’s the results (allocated bytes and machine instructions)

ROME:            2.64 MB, 223 mil.
Abdera:            286 KB, 30 mil.
Abdera+turbo: 66 KB, 25 mil.

So the turbo gives a slight saving for cpu but as would be expected a significant saving in terms of allocated bytes, using 40 times less memory than the same parse in ROME.  Now, I do admit that the test is biased and not that realistic, but it does give you an idea of the kinds of savings that can achieved if there are large portions of a feed that you don’t want to parse.

The code to do this is fairly straighforward.

InputStream is = this.getClass().getResourceAsStream("robubu.atom"); 

//create the list to hold the qname's to parse
List elementsToParse = new ArrayList(); 

/*create a ParserOptions object and set the 
parse filter using the list just created */ 
FOMParserOptions options = new FOMParserOptions();

//parse using the options 
Document doc = Parser.INSTANCE.parse(is,"",options); 

/*parse as normal, but only elements 
in the list appear in the tree*/
Feed feed = (Feed)doc.getRoot(); 
List entries = feed.getEntries(); 
Iterator i = entries.iterator(); 
  Entry entry = (Entry)i.next(); 

I think that this is pretty self explanatory but please let me know if it isn’t.  It is also worth noting that all the parents of any element required by the program must also appear in the filter list. i.e. in the example above "entry" and "feed" must also be in the filter along with "title" so that the title of the entries in the feed can be retrieved from the dom. Also note that if no filter is specified then all elements are parsed and available in the resulting tree. Please let us know if this is an optimization that is useful to you.

3 thoughts on “Abdera – Turbo”

  1. Hi Rob.
    How does the abdera approach compare to simply applying an XSLT (based on the XSLTC engine) to extract the elements of interest?

    Best regards

  2. Carsten,
    I have not run any performance comparisons with XSLT processors. There are some very very efficient parsers that may compare favourably, but I simply haven’t tried them out,

Leave a Reply

Your email address will not be published. Required fields are marked *