to homepage
 Weekly emails: how to advanced search
 Glossary lookup:


> how to > AppSwitching diary

Friday, January 24, 2003

Spelling out acronyms

As an experiment, I'm publishing a separate RSS 2.0 file for every definition in the Loosely Coupled glossary. I'm not sure where this will lead, but I'm sure it will lead somewhere. One potential application is to provide a ready reference for spelling out acronyms. Because there's currently no easily accessible authority that journalists, analysts, marketing people and others can use to check up what terms like HTTP or BPML stand for, incorrect versions tend to proliferate. Indeed, as one early visitor to the glossary discovered this week, we ourselves blundered with our initial rendition of HTTP.

Having tightened up the checking procedures to make sure a similar error won't occur in the future, I'm ready to make a commitment that you'll always be able to trust that the spelt-out version of an acronym will be authoritative when you look it up in the Loosely Coupled glossary. I'm hoping I'll be able to add a field to the RSS feed that includes the spelt-out version, but for the moment I'm going to hold off as I want to think some more about the structure of those RSS files. In the meantime, the spelt-out version is in parentheses at the beginning of the definition (see below for a tip on how to extract this using PHP).

There is a problem in being punctiliously accurate, of course. It means that common misrenderings — such as giving the M in BPML as 'markup' when it should be 'modeling', or our initial substitution of 'transport' instead of the correct word 'transfer' for the second T in HTTP — will produce a 'not found' response when looked up in the glossary.

This problem reminds me of the discussion that ensued after Mark Pilgrim's O'Reilly article this week, where he recommended parsing RSS feeds in a way that was forgiving of badly formed XML. Purists maintain that aggregators shouldn't encourage bad behavior by feed publishers, whereas Mark took the position (quite rightly in my view) that, in order to provide the service their users expect, they have no choice but to do so. Likewise, I'm going to need to add some mechanism for handling common mis-spellings of popuar acronyms to the glossary. That again would be a useful addition to the RSS file, but it needs to be done in a way that won't lead to confusion between the correct and incorrect renderings, so I'll have to give some more thought to how it might work.

As I mentioned above, the correct rendering has a consistent format and position at the beginning of the definition. This means it can be extracted by looking for the opening and closing parentheses. However take care if using PHP when you specify the function for finding the opening parenthesis. As the first character, it's at position 0 in the description, but of course if there's no parenthesis (ie the glossary term is a word rather than an acronym) then the function will return false, which is another type of the value '0'. So to make sure a 'not found' doesn't produce the same result as 'found at position 0', in PHP you have to use three equals signs (ie 'identical to') rather than the normal two (ie 'equals') to evaluate the expression, as shown in the following example:

if (strpos($description, "(") === 0) {
// if the description has an opening parenthesis at position 0
  $end_paren = strpos($description, ")");
  // find the position of the closing parenthesis
  $spelt = substr($description, 1, $end_paren-1);
  // the spelt-out version is the text in between
  print $spelt; 
  } // print it out

posted by Phil 2:31 PM (GMT) | comments | link
Archives update

The cause of this problem is now closer to being identified, although it is not yet fully resolved. I had another repeat last week, this time when publishing a new weblog entry, and Blogger's Steve Jenson was able to find the corresponding records in the publishing logs. He reported back: "Your first publish took 2,560 seconds, or 43 minutes, to finally time out due to a network failure (which no one else experienced so I can only assume it happened on your hosting provider's side), and your other two publishes, which start before this faulty one has finished, each took around 3 seconds and were successful."

Separately, I've noticed when making DOS-FTP uploads direct to the server that it occasionally freezes for no apparent reason. On one of these occasions this week I ended the session and opened a new session. Lo and behold, the file I had been attempting to upload a new version of was sitting there as a 0k directory entry. So I think Steve is right; there's a glitch on my hosting provider's side that intermittently puts FTP transfers into limbo for extended periods, sometimes during a write operation, resulting in an empty file.

I'll report it to the provider's helpdesk and see if they can make any progress on sorting it out, but in the meantime this new evidence about what's happening is helping me to reduce the number of times the problem occurs. Now if I see a "transferring files" message in Blogger's editing console I know that it means the process has temporarily gone into limbo, and instead of panicking and immediately attempting to republish again, I just monitor the situation until I'm sure the process has completed. Meanwhile, if I want to update the archive template, I do it on my alternative server and copy the files across rather than risking it on the affected server.

This has cut down on the number of problems, but it is more of a pain than it should be, so I'm hoping to reach a permanent resolution soon. Steve says there will be a new version of Blogger within days, and that it will be much easier to debug publishing problems with the new software. In the meantime, I'm well impressed by Steve's commitment and effort to getting to the bottom of this.

posted by Phil 4:32 AM (GMT) | comments | link

Building a website using plug-in online services: the Loosely Coupled experience

read an RSS feed from this weblog



Loosely Coupled weblog RSS source


Copyright © 2002-2006, Procullux Media Ltd. All Rights Reserved.