to LooselyCoupled.com homepage
 
 Weekly emails: how to advanced search
 Glossary lookup:

 

> how to > AppSwitching diary


Friday, December 20, 2002

Hiding your email address from harvesters

A huge problem for anyone operating a website is the harvesting of email addresses by spammers. I get at least sixty spam emails a day, mainly because my email address has been published openly for the past few years under every article I've written on ASPnews.com. Harvesters are automated programs that scan web pages looking for email addresses, which are then sold by the million to would-be spammers. If you publish an email address on your website, you can more or less guarantee it will get harvested, and the more popular your site is, the more frequently your addresses will get harvested.

Fortunately, there are ways of masking emails that will keep most harvesters from finding them. Although in principle harvesters could find their way around the masking, in practice most harvester authors don't bother to go the extra mile to add that capability to their programs. As long as plenty of webmasters still publish emails in plain text, it's not worth the extra effort to pick up a few extra masked emails.

I was thinking about this a few days ago since I would like to start publishing contact emails on this site, but I've held back because of the harvester problem. I came across a useful article via Google that succinctly sets out the various methods available for masking emails. My preferred method is to use Javascript, which results in the email address being broken up in the page code, and therefore most harvesters won't be able to read it. A variation on this is to put the Javascript in a separate file which is then called from the page using an embedded line of script like this:

<script type="text/javascript" 
src="http://www.looselycoupled.com/img/pwemail.js">
</script>
I like this idea, because it means I can change the email address at any time by just altering the single Javascript file, whose contents would look (for example) like this:
document.write('pw');
document.write('@');
document.write('philwainewright.com');
The only catch with using Javascript is that the information isn't available to visitors who have Javascript turned off in their browsers, but this site doesn't really cater for such visitors anyway, as various elements are delivered using Javascript. There's also a slight overhead in page-loading time when you use a separate file, so this is not a recommended method for pages that you want to load fast, such as the home page and other popular entry points.

All of these masking techniques work fine for HTML pages, but one gaping hole in my defensive armory that I will have to leave wide open are the various RSS feeds on the site. These necessarily include contact information, and XML parsers tend to have difficulties with encoded characters, so it's not really realistic to do anything else than leave those email addresses as easily-harvested plain text. A second line of defense is therefore necessary, by adding spam filtering at the mail server, where emails are received. I'm investigating some alternatives for adding this capability, and will report back on my findings in a later posting.

posted by Phil 1:03 PM (GMT) | comments | link

Building a website using plug-in online services: the Loosely Coupled experience

read an RSS feed from this weblog

current


archives

Loosely Coupled weblog RSS source


 
 


Copyright © 2002-2006, Procullux Media Ltd. All Rights Reserved.