Throughout this article, rather than using my real email addresses and presenting what exactly my new emails are, I'll be using equivalent substitutes instead. The example domain and subdomains I'll use will be based off RFC2606-friendly "example.com" and hopefully at least a few of you out there will find this information useful in your own personal battle against spam.
The Problem Outlined
About a year ago, I began getting deluged with spam--not just the usual one-offs from smalltime spammers who'd harvested my email from Usenet, or from the various popular websites I visit and post on, but a real, honest-to-goodness flood. Hundreds a day. Foolishly, I believed in the superiority of geek technology and began implementing a powerful set of filters: a kind of combination between spamoracle and bogofilter, where I'd run incoming mail through both, let them mark up the email, and then sort the results into their own bin that I could later peruse and make sure that I hadn't thrown away any real email in my rabid efforts to clean my inbox.
It seemed to work well for a while, but occasionally I'd get an important email from my boss, or a friend or relative, and my filters--not recognising the new non-technical words--would choke and throw away something that would otherwise have been vitally important to my daily life. I thought this was simply the price of being on the Internet, and my vanity domain allowed me to use the simple "email@example.com" as my primary email: it was cool.
The turning point when I realised filters were useless came when they tagged an email from an old friend and he, not realising the volatile nature of email, became insulted when he thought I was simply making a conscious choice to ignore him. I had missed his email in the tidal wave of endless spam I was being subjected to, and accidentally tossed it in the trash after too-hastily combing my spambin.
The Problem Defined
My folly was not my filters. My folly lay in the fact that I was approaching it from the wrong angle. What required finesse and discipline was being approached by hackers like Eric Raymond in terms of heavy artillery filtration mechanisms and acceptable collateral damage in order to account for simple carelessness.
It seems to me their outlook directs them to express themselves in terms of what they feel is a software solution. I've come to the tentative conclusion that this approach is a waste of time--a waste of time training the filters, a waste of time feeding them, and a waste of time keeping the software up-to-date.
Whereas filters often seem remarkably accurate, there will always be mistakes because of one simple fact: the filters are not humans and can't readily adapt to new, unforeseen emails. They can't comprehend the subtleties of tone, familiarity, nor intent. Besides, if you're manually checking your spambin to verify the accuracy of your filters, why use filters at all?
It turns out that simple catch-all domain names where firstname.lastname@example.org and email@example.com are both delivered to me without any initial set-up, are now detected as such by spammers, and used to provide another level of misdirection when spamming others: randomName@example.com, if it's checked by the remote machine and is listed as valid, means there's one less defense for that poor victim server. This is not a friendly way to host a mail server on the Internet.
Instead, all exposure and vectors via which spammers actually obtain my email must be controlled and carefully monitored:
- Each and every email address I give to a website or forum must be unique and completely traceable in a human-friendly way. Example: firstname.lastname@example.org
- Every Usenet post must contain enough information for a human to unmangle it and write to it. That, or all Usenet posts are simply from a fake email entirely. Example: usenet099No@spammersexample.com
- Mailing lists which accept email from non-subscribers should be posted-to using a different email than the one receiving it. Example: email@example.com is never revealed, and then firstname.lastname@example.org is used for sending to the list, which itself doesn't accept incoming emails.
- Mailing lists which don't accept mail from non-subscribers usually have a way of subscribing but then indicating that you don't want to receive any list-related email. Once you've subscribed, select those options, shut the alias down to external email, and use it to post to the mailing list. Don't forget to put a friendly bounce-message at the end of it so other list members who hit Reply-All don't get the wrong impression.
- Any email addresses posted to actual webpages are either in the form of an obscured image, or a custom email such as email@example.com which contains the harvesting IP address of the bot in question. Unfortunately this only works where a program snippet (for example PHP) can be embedded to generate the emails on a per-visitor basis, but it does provide another good datapoint in tracking and reporting the harvesters themselves.
I've long been a fan of the concept of Information Warfare. One of the most approachable treatments of it is in Neal Stephenson's Cryptonomicon which I managed to finally slog through after I realised it could be applied to my everyday life.
The gist of my point is that every tiny scrap of information directly about you or which can be used to infer conclusions about you can be manipulated in a way that puts you at an advantage over your opponent. The main topic of the book is actually wartime cryptography, but Neal portrays ways in which even the slightest pattern--for example a secretary looking at the lottery balls she's pulling out of a box for use in one-time pads, thus biasing her choices--can in turn be exploited and subverted by a determined opponent. This must therefore be accounted-for.
Spam-Free Technical Details
Actually implementing this was a real bitch. Simplistically, though, here's a list of software I'm using and some methods I had to employ to convert my inbox to completely spam-free.
- Built a list of every email address that ever wrote to firstname.lastname@example.org, categorised into friends, relatives, mailing lists that accept non-subscribed emails, mailing lists that don't, commercial subscribed websites, and commercial unsubscribed websites.
- Built a special, private email alias for close, savvy friends and family and notified them the old email was disappearing.
- Built a special, private email alias for acquaintances and notified them also.
- Changed my email to specific-use aliases for all websites, forums, and mailing lists.
- After monitoring the old email for three months for incoming non-spam (and rectifying any lingering uncaught problems) it was finally shut down, and an entry like the following was placed in my virtusertable: "email@example.com error:nouser Please interpret the image at www.example.com to write to me. Too much spam has forced this email closed. Sorry!"
Unfortunately this technique requires that you keep a complete archive of every email you ever received, which I do, but most people won't. Those of you that don't will find this process more painful than I did.
- sendmail 8.12.x The latest sendmail has some nice virtusertable functionality. I have about 280 actual aliases and I use subdomains rabidly.
- KMail KMail from the KDE project has some nice filtering capabilities that allow it to integrate seamlessly with these many email addresses and folders. Plus, it's never crashed on me in such a way that I lost any email.
- bogofilter It's still a nice filter, and it's tunable to the n'th degree. It's still a filter, though, so its use is lessened considerably now that I receive no spam.
- spamoracle is a nice, OCaml-based implementation of a filter too.
If sendmail virtusertable could be programmed to do a regex on incoming emails I could create better one-off aliases, but until then I'll just use throwaway subdomains and some supporting scripts to deal with new addresses.
Other Spam Elimination Systems
There are actually other systems out there that can help with your spam problem. Tagged Message Delivery Agent for example, allows you to place the onus of authentication on the sender by using such techniques as challenge-response, and white- and blacklists. In challenge-response, people who are unknown to you must visit a webpage, prove they're a human, and then the email gets through.
The problem with these kinds of systems is that for most people--especially the non-savvy Internet users such as grandmothers and those who simply can't afford (or are unable) to spend the time to jump through your hoops, you are effectively unreachable. Just imagine if everyone who phoned you had to answer a timed, skill-testing question!
DNS-based blacklists can also be very effective, but then again the base cost is the latency of a DNS lookup against multiple external servers. Also, you're placing your spam solution in the hands of someone else who might someday disappear, much like how ORBZ shutdown in the face of mounting lawsuit pressure.
The average cost of a refused-email message on my own server is actually very minimal, because the only servers that continue to try to email me at defunct aliases are ones that usually drop the connection instantly the moment a "nouser" error message pops up. That's only a few hundred bytes or so.
Those that are sending legitimate email are told in the bounce message a way to contact me via an image interpretation, along with a website they can visit.
Automation is easily accomplished with some supporting scripts to manage the email aliases and subsequently rebuild a sendmail virtusertable, and also to present a nice user-friendly interface that someone who's in a rush can then bookmark and return to often.
Attacks against this method will only show up after a large segment of people the spammers wish to bypass or reach begin using extensive aliases in a similar pattern. Possible attacks include dictionary attacks where the spammer guesses what websites I'm active on, and the possibility that multiple opponents (read: sites) will collaborate, collate, and interpret the fact that I've given them email addresses with a discernable pattern.
At that point, we can begin moving up to signed-hash one-offs and beyond. Hopefully at that point the spammers will realise that people going to such lengths aren't worth the effort to spam after all.