The beautiful theory of early 1980s
First, let's discuss how spam thrives. The de-facto email protocol, SMTP, has been around since early 1980s, and was designed when Internet users were very trusting of others, and the very idea of spam probably hasn't occurred to anyone.
Here is the lifetime of a typical email message before the rise of anti-spam solutions, and how the designers of SMTP envisioned it:
- Using a mail client such as Mozilla Thunderbird, Kristy writes me a message and clicks 'Send'.
- Her mail client connects to her ISP's (say, Comcast's) mail server, tells it from whom the message is and where it's going, and gives it the message itself.
- Comcast's server accepts the message and leaves a mark on it, stating that this message was received from Kristy's computer by Comcast's systems at X o'clock.
- Comcast's server looks at where this message needs to go, and makes a connection to that host. For example, if the destination is firstname.lastname@example.org, it connects to mail.qnan.org because that's where I configured mail destined for qnan.org to go.
- My server accepts the message from Comcast and leaves another mark on it, stating that this message was received from Comcast by qnan.org at Y o'clock. My server then delivers the message to my mailbox.
- I use a mail client to access my mailbox and read the message. I can also see the complete path of this message, from Kristy all the way to me, which theoretically allows me to complain to Comcast if Kristy is spamming me.
Like a cheap hooker, this setup is easy-peasy and bursting with opportunities to exploit it. It has also been a virile vector for viruses and undesired hints that you don't measure up and require penile enhancements. Let's see why.
Exploitations; a.k.a. The End of Innocence
Today, the biggest problem is "zombie" computers that have been hijacked by trojans, viruses, or other badness to do various nefarious tasks without the owners' knowledge. A very popular nefarious task is--surprise!--spamming. On a regular DSL connection, a regular PC can attempt to deliver up to 10,000 messages per minute.[source] The zombie machine goes down its list of addresses, tries to connect to the mailserver associated with the next address, and if connects, it delivers the message. If it doesn't connect, it just goes down the list.
Another big problem has been open relays. These are well-intentioned mail servers that do not check whether a user is authorized to use the server--they just blindly accept , which allow Pavel-- another Comcast user--to dump his email message in step 2 on AOL's mail server instead of on Comcast's. This might not be a big deal for normal users, but spammers, for whom 10,000 messages per minute are a norm, may cause a serious starvation of resources (Denial of Service) for mail servers that were not configured for such aggressive behavior.
Worse, let's say that AOL sees this behavior and blocks Pavel after he has sent 100,000 messages to random Internet users. No problem; he simply and instantly switches to spamming through Southwestern Bell's mailservers.
China has been historically bad when it comes to open relays: it is a safe haven for spammers due to an enormous number of misconfigured servers that are set up by amateurs and never fixed. Many system administrators in the U.S. have even claimed support for entirely blocking mail from China until the Chinese fix their problems.
One reason that open relays are worse than zombie computers is that they follow mail standards: they insist on trying to deliver mail, including spam, for up to 5 days per message.
And of course, Pavel wouldn't want to be caught spamming from email@example.com or whatever his real email address is--that's too traceable. Instead, he'll cleverly use "From: Bob Guccione <firstname.lastname@example.org>" since that will likely draw more buyers for penile enhancements. And guess what? Until recently, mail servers were happy to oblige with this obviously fake header.
Or like email viruses of the last few years, he could pick random people that know each other, and email Person X supposedly from Person Y to instantly gain attention and trust. The mail protocol is happy to comply.
So, what has the Internet intelligensia come up with to combat this decade-long problem? I will list only the solutions that I've implemented on my server, but this list sufficiently covers the breadth of all of them.
First, the number of open relays (not zombies, but well-intentioned but misconfigured mail servers) is reducing due to greater attention to security both from software programmers and administrators. Now, if Chris is traveling, staying in a hotel room, and wants to send a message through his work's mail server, it is no longer sufficient simply to throw the message at any random mail server--all well-behaving mail servers now require that users that are not explicitly within their network (such as for ISPs) authenticate prior to having their messages accepted. The most common method of authentication is SASL, which requires a username and password pair for every authorized user. The less-common method, but one which I use, is TLS, which issues eah authorized user an identity certificate that allows Chris to prove his identity to the mail server.
Now, onto the actual message. The following solutions are ordered by the order in which they are triggered as a new message arrives.
At the forefront of protection is the oldest and still a very effective solution is the use of Realtime Blackhole Lists, which are actively-maintained lists of IP addresses that are known to send spam. When a message arrives, the receiving mail server queries the list: "Do you have any dirt on the sending IP address?" If the list was made aware of recent malicious activity, the mail server will reject the message. If Maegan's machine acquires a virus and becomes a zombie for a spammer, it's entirely possible that just one hour later many mail servers will begin rejecting messages from her computer as well as from all other computers that share her IP address. This block may be in effect for as long as 6 months! For more information about Blackhole Lists, visit Spamhaus which "tracks the Internet's Spammers, Spam Gangs and Spam Services, provides dependable realtime anti-spam protection for Internet networks".
Next, the mail server may check what's called a Sender Policy Framework, which allows mail server administrators to specify exactly which IP addresses are allowed to send mail from a certain domain name. In the example above, Pavel pretended that his spam came from penthouse.com. If the administrator of that domain name specified that only the IP address 126.96.36.199 can claim to be from that domain name, Pavel's message would be rejected. This can be very effective, but is not yet configured by many domain names. Eventually (and theoretically), Pavel will ONLY be able to specify a Comcast email account as his "From" address, as all other domain names will refuse to honor his IP address.
A system of a similar ilk which was developed by Yahoo within the last few years is called DomainKeys. This relies on the tried-and-true concepts of public/private key cryptography, and has strong similarities to PGP. The idea is that every incoming message is signed by the domain name from whom the message claims to be, using a private key. The receiving system then retrieves the public key of that domain name and checks whether it can verify the message. If it cannot, or if the signature is missing entirely, then the receiver cannot be sure that the message was sent from the domain name from which it claims to be sent, since only the authorized mailserver of that domain has the private key needed to sign the message.
Sender Policy Framework and DomainKeys are systems that focus less on the question of whether a message is spam or not and more on whether the sender can be held accountable for the message, no matter what it is. Imagine--if you had to leave a piece of paper with your home address each time you vandalized a building with graffiti, it might greatly cut down on your desire to vandalize.
If the message passes all the checks above, we can be sure that our incoming message is not being sent by a known zombie machine, and it is addressed truthfully. Now time to see whether it is a zombie machine which is yet unknown, or a well-behaved mail server.
One of the most recent concepts which is very rapidly becoming popular across the Internet is called greylisting. This solution takes advantage of the fact that the SMTP standard was designed with the ability to defer a message in case the receiving mail server is down, overloaded, or temporarily misconfigured. This causes the sending mail server to keep it on its system just a little while longer while waiting for the receiving server to heal itself. Again, a well-configured and well-intentioned server may keep a message for up to 5 days while waiting for the next hop to become available. Greylisting takes advantage of this by playing hard-to-get for unverified senders, mimicking a temporary disability, and watching whether the sending server will retry or give up. If the server gives up, it's safe to assume that it was a zombie machine. Most well-behaving servers try again within 15 minutes.
Now we know that the incoming message is not sent by a zombie, and it is addressed truthfully. We can now accept it for delivery. But, is it well-transmitted garbage?
The last, most computationally-expensive, and arguably the most effective, hoop through which a prospective 'ham' must jump through before being left alone in the user's inbox, is the merciless content inspection by SpamAssassin. It is a set of tests that check the message for all sorts of tell-tale signs of being spam. For example, do you mention "rolex"? Do you use tricks such as "v1 a g r4"? Does your email address end in numbers? Do you use poorly-formed HTML? Do you use large fonts or poor contrast? Each of these increases the probability of your message being spam. Once a certain threshold is reached, your message is kicked down the drain.
To decrease the chance of your spammy-looking message from being deleted by SpamAssassin, a sender can use a system called HashCash, which works like postage stamps: it adds a "cost" to sending a message, except this cost is measured in energy and time. It requires that the sender spend at least several seconds calculating something the correctness of which only takes milliseconds to verify. This has the capability to reduce a spammer's output from 10,000+ messages per minute to maybe only 30 that have a chance of passing SpamAssassin.
Now realize: some or all of these advanced systems are working across the Internet for every message you send and receive.
What can I do to aid the battle?
If you control a mail server, the best thing you can do for yourself and your users is to implement some or all of these systems. The technology is there. It is strikingly effective. It has very few downsides. The ball is in your court.
If you own a domain name, then regardless of whether you run mail or not, configure a Sender Policy Framework! This will prevent spammers from getting away with spoofing their mails with your domain name.
If you are a user... are you still getting spam? Is it more than just a few messages per day? Contact your mail provider and demand (or ask politely, if your mail is free) that they implement some or all of these systems.
Spammers are slowly adapting to new weapons, but I am certain that the hive of bright minds against spam will continue outpacing them.
(Note: this is a slightly-modified version of my recent LiveJournal post.)