Sorting and Filtering in Atom – CardAtom

So CalAtom can get kind of complicated when it comes to querying, so before tackling the querying in that properly (I think what we have proposed to date needs a lot more work) I wanted to first see if the querying / sorting required for CardAtom could be accomplished. CardAtom would be an attempt to remote an api for managing contacts in an address book. It would store, update and manipulate collections of vCards via the Atom Publishing Protocol. The CRUD operations would be identical to those described in CalAtom only with vCard payloads (see the early slides in this CalAtom presentation for details on how this can work). While this works, APP is presently limited to always returning the entire collection and doing so in last modified order.  So how should sorting and filtering be accomplished in APP. We’ll take these one at a time.

Sorting

The atom publishing protocol mandates the collection order to be by last modified date. This is not that useful to a CardAtom client that wants to display the collection by familyname or givenname. The client could download the entire collection and then do local sorting, but as the number of contacts increases this becomes less and less viable.

Servers can produce collections in any order and make these alternative sort orders available via a url, the only tricky bit is communicating their existence and location to the client. This can, however, be accomplished by placing the sort order’s url in a link element within the original feed. The links "rel" attribute is used to indicate the particular sort order available at the url. So the feed for the vCard collection now looks like this.


<feed xmlns="http://www.w3.org/2005/Atom">
<title type="text">rob's contacts</title>
<updated>2005-07-31T12:29:29Z</updated>
<id>tag:example.org,2003:3</id>
<link rel="self" type="application/atom+xml"
href="http://example.org/contacts" />
<link
rel="http://purl.org/CardAtom/sort/byFamilyName/asc"
type="application/atom+xml"
title="by Family Name"
href="http://example.org/contacts/byFamilyName"/>

<entry>
<title>Rob Yates</title>
<summary type="html">
&lt;p>&lt;strong>Tel:&lt;/strong>+1-234-567-8901&lt;/p>
</summary>
<link rel="edit-media" type="text/directory" href="http://example.org/contact/1"/>
<id>tag:example.org,2003:3.2397</id>
<updated>2005-07-31T12:29:29Z</updated>
<content src="http://example.org/contact/1" type="text/directory" />
</entry>
<entry> .....
</feed>

Note that this uses the fact that link/@rel (as defined by atom) can actually take any arbitrary url to define its meaning. A CardAtom specification could therefore define a set of link relationships that define the mandatory and optional sorts that a CardAtom collection supports. A client reading the feed can search for a particular sort order using the value of link/@rel and if it wants to render the collection in that order it can simply retrieve the corresponding urls contents, nice.

I can also imagine "standard" sort orders being defined by specific "rel" values, e.g. "by Title" or "by Author Name".

Filtering

Filtering is much trickier. How does the server communicate to the client the searches/filters that it supports. The server could allow for very flexible and complex queries to be written in which case something like XQuery or SPARQL should be used. While extremely flexible, the problem with those is that the server MUST allow any attribute to be searched and this dramatically increases both the cost of implementation and the subsequent optimizations. For CardAtom it seems that we really only need to support full text searches as well as filtering by FamilyName and GivenName. We just need a way to describe these options to the client, and so it was that James reminded me of A9’s opensearch. Opensearch contains a description document that describes a search supported by the site. For CardAtom we want to offer a full text search, a familyName search and a givenName search. First off here is one that describes the full text search.


<OpenSearchDescription xmlns="http://a9.com/-/spec/opensearch/1.1/">
<ShortName>Search</ShortName>
<Description>Search the contact store</Description>
<Url type="application/atom+xml"
template="http://example.org/contacts?q={searchTerms}/>
</OpenSearchDescription>

note that {searchTerms} in the url above defines where a substitution should be made.  Opensearch also defines the meaning of "searchTerms".  Then here’s the familyName search.


<OpenSearchDescription xmlns="http://a9.com/-/spec/opensearch/1.1/"
xmlns:xcard="http://www.ietf.org/internet-drafts/draft-dawson-vCard-xml-dtd-
04.txt">
<ShortName>by Family Name</ShortName>
<Description>Search the contact store for a given family name</Description>
<Url type="application/atom+xml"
template="http://example.org/contacts?familyName={xcard:family}/>
</OpenSearchDescription>

Note that this one uses the xCard namespace to indicate that the substitution variable should be of type <family> as described in the xCard specification.

And so, if we make these two description files available at an appropriate location we can now link to those in the feed as well (although I only show one for brevity), e.g.


<feed xmlns="http://www.w3.org/2005/Atom">
<title type="text">rob's contacts</title>
<updated>2005-07-31T12:29:29Z</updated>
<id>tag:example.org,2003:3</id>
<link rel="self" type="application/atom+xml"
href="http://example.org/contacts" />
<link
rel="http://purl.org/CardAtom/sort/byFamilyName/asc"
type="application/atom+xml"
title="by Family Name"
href="http://example.org/contacts/byFamilyName"/>

<link
rel="http://purl.org/CardAtom/search/byFamilyName"
type="application/opensearchdescription+xml"
href="http://example.org/contacts/search/byFamilyName"/>

<entry>
<title>Rob Yates</title>
<summary type="html">
&lt;p>&lt;strong>Tel:&lt;/strong>+1-234-567-8901&lt;/p>
</summary>
<link rel="edit-media" type="text/directory" href="http://example.org/contact/1"/>
<id>tag:example.org,2003:3.2397</id>
<updated>2005-07-31T12:29:29Z</updated>
<content src="http://example.org/contact/1" type="text/directory" />
</entry>
<entry> .....
</feed>

So this seems to all be working, although we haven’t coded it yet :).  The feed/collection describes its alternate sort orders and its possible filters and how to invoke them.

One thing that I think still needs further thought though is whether these sort or search/filter links are discoverable outside of the collection document.  It seems wrong that a client must first load a collection with the default sort order only to locate the sort order that it actually wants to use  Should there be an introspection document per collection that can somehow be retrieved from the collection url?  Maybe a GET against the collection url with an accept header of "application/atomserv+xml", not sure…

 I welcome suggestions for improvements or alternative approaches, these are features that as we use Atom for more things it seems like we need. Once the core spec is complete I hope these are considerations for the working group.

8 thoughts on “Sorting and Filtering in Atom – CardAtom”

  1. We can probably talk about this off-line (literally) but a few comments:

    Sorting:
    If there’s no standard notation I’m not sure how clients are supposed to automatically parse a URL like /sort/byFamilyName/asc into “sort by family name in ascending order.” Of lesser concern, I think that you’re going to wind up with dozens and dozens of links here.

    Also, kind of a semantics question – You said this: “The atom publishing protocol mandates the collection order to be by last modified date.”

    Wouldn’t the “mandatory sort by last mod time” rule apply to the collection returned by your “sort by” links?

    Filtering:
    This seems like an approach that could be generalized to allow client-side construction of URLs (which seems to me to be a big drawback in using ATOM for these sorts of applications). Could we do something similar to let you specify an array of values (for batch delete) or something?

    >Should there be an introspection document per collection that can somehow be retrieved from the collection url?

    I don’t know if it’s actually implemented anywhere, but there’s a HTTP method called “HEAD”. The HTTP 1.1 spec says that HEAD should return exactly the same information as GET, but without the document body (just the headers). I don’t know how this applies to ATOM, but it seems like it would make sense to have an ATOM feed respond to a HEAD request by returning everything except the entries.

  2. DeWitt,
    the microformats stuff looks interesting, although I wonder if you would still choose the namespaces defined in the microformats specification if there was already a standard xml namespace for xCard and xCal.

    Henry,

    the idea is that clients would parse all the Links in the feed looking for one that has a “rel” attribute set to a specific value. When they find a value that matches “http://purl.org/CardAtom/sort/byFamilyName/asc”, for example, then they know that the url in the corresponding href points to an alternate view of the current collection in the corresponding sort order. I agree that there will probably be lots of them and that the three that I have mentioned so far are not enough.

    The mandated sort order per the atom publishing protocol only applies to the order that is located at the collection url. Nothing in the standard prevents other sort orders on different urls being made available by the server.

    I think that it is a big advantage to have the server define templates that the client then uses to construct the url to hit. Without this the only way I can think of achieving the same end result is to have predifined url suffixes which places undesirable restrictions on the urls that a given implementation uses.

  3. Why not put the OpenSearch description into the collection declaration in the Atom introspection document? E.g.:

    Collection search
    Returns entries for this.

  4. Arthur, so you are correct that some of this could and probably should be done in the introspection document, especially the opensearch description. The problem we are working through at the moment is how to indicate the possible sort orders to the client ahead of time, so that they don’t have to first get the results in last modified order and then re-ask for them in the order they actually want.

    The best we can think of at the moment is to have an opensearch field that is the sort order and its possible values have to be known by the client or embedded in some other xml element in the introspection document. If anyone has any suggestions, we are all ears.

  5. Hi. I’ve been trying to write a generic OpenSearch RSS client in Java using Rome and the plugin modules for OpenSearch. I found some examples on the Rome wiki at java.net, but they’re very terse and don’t really explain what’s going on. Everything compiles nicely, but I can’t seem to be able to execute the actual search. Sorry if I’m in the wrong forum, but if you guys know anything about the Rome Java API’s for OpenSearch, could you please point me to some good documentation? Any help is much appreciated. Thanks,

    Basil

Leave a Reply

Your email address will not be published. Required fields are marked *