« Touchy Problems | Main | Obvious Too »

February 20, 2003

Get Thee Behind Me

Recently, I've started to get more and more spam. Unwanted and unsolicited email has begun to clog my mailbox with advertisements for lasting longer, growing bigger, getting out of debt, or getting into debt (via mortgages). So much so that I've even thought about closing the email account and opening a new one. But before I get to that point, I decided to see what kind of anti-spam software there is out there.

One of the better known is SpamAssasin. Be aware that there are only so many automated ways to identify spam. SpamAssasin uses:

  • header analysis: spammers use a number of tricks to mask their identities, fool you into thinking they've sent a valid mail, or fool you into thinking you must have subscribed at some stage. SpamAssassin tries to spot these.

  • text analysis: again, spam mails often have a characteristic style (to put it politely), and some characteristic disclaimers and CYA text. SpamAssassin can spot these, too.

  • blacklists: SpamAssassin supports many useful existing blacklists, such as mail-abuse.org, ordb.org or others.

  • Razor: Vipul's Razor is a collaborative spam-tracking database, which works by taking a signature of spam messages. Since spam typically operates by sending an identical message to hundreds of people, Razor short-circuits this by allowing the first person to receive a spam to add it to the database -- at which point everyone else will automatically block it.

Another is called CRM114 (from the Stanley Kubrick movie Dr. Strangelove Or How I Stopped Worrying and Learned to Love the Bomb in which a radio device called the CRM114 plays a pivotal role in the plot). CRM114 uses "sparse binary polynomial matching with a Bayesian Chain Rule evaluation" to determine what is or is not spam.

Another is called ifile and is different from others in the following ways:

  1. ifile does not require the user to generate a set of rules in order to successfully filter mail
  2. ifile uses the entire content of messages for filtering purposes
  3. ifile learns as the user moves incorrectly filtered messages to new mailboxes

ifile is not dependent upon any specific mail system and should be adaptable to any system which allows an outside program to perform mail filtering.

I have not tried any of these yet, but I plan to. More as I learn more.

Trackback Pings

TrackBack URL for this entry:
http://www.seto.org/cgi-sys/cgiwrap/dkseto/mt-diary/mt-tb.cgi/37

Comments

The problem with all of them is that you still haul in the spam. Trough a pay by the second (approx) phone line. It's not funny when you wait five or six minutes while mail is downloading just to get those 4, 268 byte long, messages you've been waiting for.
No fun at all.
For me most of the anti-spam filters are more like post mortem tools.

Posted by: sjon at February 20, 2003 11:15 PM

Perhaps RBT's solution is the best answer: a public execution of spammers. {g}

Posted by: Dan at February 21, 2003 08:21 AM