AppSwitching Diary

Core elements of RSS

Several examples and primers for RSS 2.0 syndication have appeared over the past few days, which makes the job of selecting a basic implementation much easier than it looked at the beginning of the week. The first signs of 2.0 adoption have appeared too, providing an early indication of how sites are choosing to implement its features (I've highlighted some links at the end of this posting).

Mark Pilgrim has created an RSS 2.0 template for Movable Type, which provides an excellent sample of how to implement a core set of features using core RSS supplemented by namespaces. His commentary is well worth reading, as it provides a very clear explanation of some of the key issues that you need to understand when implementing 2.0 in all its glory.

Another very useful resource is Timothy Appnel's exposition of Extremely Simple Syndication (XSS). He defines the core tags that are shared by virtually all versions of RSS, and sets out some recommendations for their use in a way that's designed to maintain compatibility with all previous versions. One of the encouraging things about comparing Timothy's sample of "extensible XSS" with Mark's sample of his RSS 2.0 implementation is that they're virtually identical, despite using a number of namespace references to include various elements. This demonstrates that there's already a great deal of agreement out there on best practice.

But before venturing into that territory, I'd like to start off with the more basic implementation which Timothy has set out under the name of XSS-strict. This is what I had in mind when I was talking about "2.0 Bare" in my posting a week ago. As his example shows, this is an ultra-lightweight, bare-bones implementation of RSS that you can syndicate just about anywhere without it breaking somebody's reader. Mind you, as Mark points out in his 2.0 template article, "If all you want is title-link-description, stick with your existing RSS 0.91 template and stop reading now." It's only if you want to prepare yourself to move beyond these "bare-necessities" core tags that you need to be concerned with 2.0.

You do however need to be aware of what content is generally considered acceptable — or not acceptable — in each of these elements:

<RSS> must cite the version number (eg version="2.0" — though adding /XSS-strict is the one thing I would recommend you don't copy from Timothy's spec). ~~Strictly speaking, you should also declare a namespace for the RSS version, such as xmlns="http://backend.userland.com/rss2"~~. UPDATE: It turns out there are grave problems with declaring a namespace for the core elements of RSS, which I'll discuss in a new posting.
<channel> includes three elements, none of which should include any HTML coding (apparently, some people feel they want to be able to syndicate content complete with all their characteristic formatting. If that's what they want to do, they should be using JavaScript, not RSS ... but if you really insist, the extensible features of RSS 2.0 allow you to send formatted text in optional fields for the benefit of those recipients who want it):
- <title> is the name of the information source, for example the name of your weblog. It should correspond to the title of the relevant HTML page. Early versions of RSS required it to be less than 100 characters.
- <link> is the URL for your originating page. It must be unique, which means that using the URL for your website is a bad idea if you're going to be publishing several feeds. Early versions required less than 500 characters.
- <description> is optional, but many aggregators make use of it, so it's worth including it. This is your opportunity to promote your resource by explaining what it does. It's not a bad idea to use the same description as in the header of your HTML page. But whatever you do, aim to do it in less than 500 characters. Some aggregators will truncate or reject longer entries, others will use a format that looks awful with a long description.

<item> also contains three elements, and again, HTML coding is frowned upon in each of these elements. All of these elements are considered optional in the RSS 2.0 spec, even though not including them makes your feed incompatible with some readers and aggregators. You have to decide whether you want to take the risk that your feed will be rejected (or worse, cause a glitch in the reader):
- <title> is the most controversial of the optional items, as it was required in early versions of RSS, and then made optional later. For people who are syndicating what I call "scrapbook" type weblogs, where many of the postings are just links to other articles or one-line observations, it's neither convenient nor appropriate to add a title, so omitting this field in such cases seems legitimate to me. But some readers and aggregators don't like it, so if you want to be sure of being universally acceptable, you must add a title, even if you simply repeat the first few words of the body of each item. Like the channel title, this element was originally limited to 100 characters and it seems best to respect that rule so as to avoid overloading readers with unwieldy titles.
- <link> also causes problems, because most people quite naturally assume that it links back to the relevant item on the syndicator's web site. However a legitimate use in the "scrapbook" style of blog is to link to the external item that the blogger is referring to. Personally I think this should be done using a separate, optional field rather than using this core element.
- <description> was originally intended to be a short summary of the item, and was limited to 500 characters. But many webloggers have taken to syndicating their entire posting into this field, and a lot of users of RSS readers prefer it that way. However the practice is not very friendly to aggregators, who as a consequence are quite likely to truncate lengthy descriptions. If you want to send the entire entry (with or without HTML markup), do it in a separate, optional field. It's easy enough to automatically truncate the first couple of lines of your entry to create a description on the fly, and then you can choose how to abbreviate your content rather than leaving it up to the aggregator.

One other point to bear in mind is that all of this has to be in valid XML format. This should be taken care of by whatever you're using to generate the RSS, but if you have any doubts, it's worth checking to make sure. Valid XML means that getting capitalization right (or rather the lack of it) in your markup tags is vital. It also means that there are four special characters that you must convert, either into numeric character references, or using escape codes. They are (with their escape codes in brackets), & (&), ' ('), " ("), <(<) and > (>). The W3C site has a full explanation of these and related rules.

That just about covers it for now, but of course that's just a starting point for using RSS. Having got the core established, the next step is to consider how to include all those optional elements. That will be the subject of future postings, and probably the subject of much speculation and discussion in various other weblogs and forums over the next few weeks. As I mentioned earlier, some very useful resources have already sprung up elsewhere this week. Here are a selection of the most notable:

Kevin Hemenway has written a marvelously clear primer on Extending RSS 2.0 With Namespaces. Don't extend without it.
The RSS-DEV discussion list is currently having a productive exchange of ideas about RSS 2.0, including evolving an agreed list of mappings between optional elements in the 2.0 spec and their usual namespace equivalents.
Dave Winer has started compiling a catalog of RSS resources.
Syndic8 has begun to list RSS 2.0 feeds.
Bruce Loebrich provides a practical illustration of the usefulness of the new "guid" element, which helped him fix a problem he had this week with his syndicated Google News feeds.

posted by Phil 10:26 AM (GMT) | comments | link

What to do about RDF

Much of the controversy around RSS 2.0 derives from its relationship to another specification, RDF. Deciding what to do about RDF in RSS 2.0 is crucial before we can go any further. There are three possible options:

Build it in from the start;
Add it in later;
Forget about it altogether.

The position you take will depend on what you want to achieve with your syndication. If you're aiming to reach as widely as possible, then you need to fall in with whatever the mainstream is going to adopt. If you're aiming at a smaller, known community, then there may be special reasons for going against the crowd. Whatever you decide, the most important thing is to make sure your implementation is convertible, so that if you find you made the wrong choice initially, you can switch without having to start again completely from scratch.

Build it in from the start — This was the position that a highly motivated group of technologists took two years ago, when they published a specification for syndication based on RDF, which they named RSS 1.0. The RDF acronym stands for Resource Description Framework, a W3C specification for describing Web resources in a form that can be processed automatically. The authors of RSS 1.0 believed that rebasing the syndication format on RDF would be an important step towards automating the discovery and processing of knowledge on the Web, and move us all much closer to the W3C's vision of a Semantic Web. Which is all well and good, but adding RDF to RSS 1.0 broke compatibility with all previous versions of RSS, and thus began the schism between the 1.0 and 0.9x branches of RSS.

There has been a lot of discussion recently about how to build RDF into RSS 2.0 in a less intrusive way than was done in RSS 1.0. In particular, one proposal by Shelley Powers has been picked up approvingly by several other people, but as Timothy Appnel notes, "this format does break backward compatibility with all previous specifications." So, whatever its technical merits, this proposal has little to offer a webmaster who wants to syndicate to as wide a universe as possible, certainly in the short term.

As to the longer-term merits of RDF, it seems to me that trying to build it into the core of a syndication format is inappropriate anyway. Syndication is ephemeral by nature, whereas RDF is concerned with mapping stores of knowledge. Shelley herself touches on this in her recent article, RDF: As Simple as A, B, C: "RSS captures a rich set of information about a specific web page or weblog posting: the author and creation date, as well as category, and possibly even links to other resources. What a pity to put this into a form that will only be thrown away." Shelley says she stores that information in RDF format in the header of her archive pages, which I'd say is a far more useful place to put it than in an RSS feed.

As you can see, I'm coming down against building RDF in to your RSS 2.0 implementation right from the start. But when creating the syndication feed, there is something to be said for storing the content of the feed either directly in RDF format, or in a format that could easily be converted into RDF at some point in the future. I'll be discussing ways of achieving that in later postings.

Add it in later — Later may actually come sooner than you think. The inclusion of support for XML namespaces in Userland's RSS 2.0 specification makes it very easy to call on RDF definitions, and that will be something I'll be considering for my implementation of 2.0 Lite. You can even argue for making use of it in a "bare essentials" implementation. So the decision against building in RDF right from the start is only a matter of degree. We're not throwing the baby out with the bathwater.

Forget about it altogether — OK, this is the full baby-and-bathwater stance, and in my view, it's not an option. While RDF has little relevance to real-time feeds of what's new on a website, it's clearly designed with a view to organizing some of the other types of content that I mentioned I'd like to syndicate, such as directory information. So it evidently has an important role to play, one that I'll be exploring in a lot of detail when we come to consider 2.0 Rich.

UPDATE: At the very same time that I was writing the above comments, Shelley Powers was posting an article making much the same points about the relevance of RDF to RSS. Which is spooky. But reassuring to know we're both tuned into the same meme here.

posted by Phil 5:07 AM (GMT) | comments | link

home	news	weblog	resources	services	about
Weekly emails:					how to	advanced search

> how to > AppSwitching diary

Friday, September 27, 2002

Core elements of RSS

Monday, September 23, 2002

What to do about RDF

current

archives

Jan-Mar 2003

Oct-Dec 2002

Jul-Sep 2002

May-Jun 2002

Loosely Coupled weblog