Skip to content

Abdera – Turbo

So James pretty much single-handedly wrote abdera (now apache’s atom parser).  The one contribution I can probably claim is a performance turbo boost. The turbo comes from the fact that it is often the case that only part of the atom feed is actually used by a program. If you need all of the feed to be placed in the tree structure produced by adbera then this turbo won’t help you.

One of the things we frequently do is just pull out the titles, the links and maybe the categories for the entries so we can display them in a list. We don’t need or even want the rest of the feed structure to be parsed or resident in memory. Abdera’s FOMParserOptions allows a program to declare the elements that they want parsed ahead of the parse by setting the ParseFilter (a List of QName’s).  This provides an opportunity to optimize a given parse.

To test the improvements I wrote a little program (see below) that extracts just the entries’ titles from a feed (the feed from robubu.com as of today). I then did performance comparisons against ROME, and abdera without the ParseFilter.  Here’s the results (allocated bytes and machine instructions)

ROME:            2.64 MB, 223 mil.
Abdera:            286 KB, 30 mil.
Abdera+turbo: 66 KB, 25 mil.

So the turbo gives a slight saving for cpu but as would be expected a significant saving in terms of allocated bytes, using 40 times less memory than the same parse in ROME.  Now, I do admit that the test is biased and not that realistic, but it does give you an idea of the kinds of savings that can achieved if there are large portions of a feed that you don’t want to parse.

The code to do this is fairly straighforward.

InputStream is = this.getClass().getResourceAsStream("robubu.atom"); 

//create the list to hold the qname's to parse
List elementsToParse = new ArrayList();
elementsToParse.add(new
  QName("http://www.w3.org/2005/Atom","feed"));
elementsToParse.add(new
  QName("http://www.w3.org/2005/Atom","entry"));
elementsToParse.add(new
  QName("http://www.w3.org/2005/Atom","title")); 

/*create a ParserOptions object and set the
parse filter using the list just created */
FOMParserOptions options = new FOMParserOptions();
options.setParseFilter(elementsToParse); 

//parse using the options
Document doc = Parser.INSTANCE.parse(is,"",options); 

/*parse as normal, but only elements
in the list appear in the tree*/
Feed feed = (Feed)doc.getRoot();
List entries = feed.getEntries();
Iterator i = entries.iterator();
while(i.hasNext()){
  Entry entry = (Entry)i.next();
  System.out.println(entry.getTitle());
}
is.close();

I think that this is pretty self explanatory but please let me know if it isn’t.  It is also worth noting that all the parents of any element required by the program must also appear in the filter list. i.e. in the example above "entry" and "feed" must also be in the filter along with "title" so that the title of the entries in the feed can be retrieved from the dom. Also note that if no filter is specified then all elements are parsed and available in the resulting tree. Please let us know if this is an optimization that is useful to you.

{ 2 } Comments

  1. Carsten | August 1, 2006 at 8:00 am | Permalink

    Hi Rob.
    How does the abdera approach compare to simply applying an XSLT (based on the XSLTC engine) to extract the elements of interest?

    Best regards
    Carsten

  2. Rob Yates | August 1, 2006 at 1:23 pm | Permalink

    Carsten,
    I have not run any performance comparisons with XSLT processors. There are some very very efficient parsers that may compare favourably, but I simply haven’t tried them out,
    Rob

{ 1 } Trackback

  1. [...] Rob Yates: “…One of the things we frequently do is just pull out the titles, the links and maybe the categories for the entries so we can display them in a list. We don’t need or even want the rest of the feed structure to be parsed or resident in memory. Abdera’s FOMParserOptions allows a program to declare the elements that they want parsed ahead of the parse by setting the ParseFilter (a List of QName’s). This provides an opportunity to optimize a given parse. [...]

Post a Comment

Your email is never published nor shared. Required fields are marked *