12/13/2004 Archived Entry: "SpamAssassin"

More techie musings today, this time about my new experiences with the SpamAssassin program. No political content. Skip it if you're bored by computer trivia.

One of the features I was seeking from a new web host was spam filtering. Mozilla Mail on my PC has a spam filter, but it's much more efficient to filter at the email server, so as not to waste bandwidth downloading spam emails.

Our new host has SpamAssassin, one of the more popular open-source spam filters. It uses a set of rules to score "spam points" for each message; if the message gets too many points, it's marked as spam. I'm told an "aggressive" filter setting would be a score of 5; since I'm reluctant to toss possibly valid mail, I currently have the threshold set at 8.

While I'm experimenting, I just have *SPAM* added to the subject line, and I let it through to Mozilla so I can check it. When I have it adjusted to my desires -- no false positives -- I'll have these emails tossed directly into the trash.

The nice thing about SpamAssassin is that it can filter for criteria that Mozilla's simple Bayesian filter can't, like bogus IP addresses. Here's one of the first very spammy messages (34 points!) that I received. HELO is one of the commands used for Internet mail transfer. XBL, SORBS, DSBL, and such are all "blacklist" databases of computers known to send or relay spam.

Content analysis details: (34.1 points, 8.0 required)

pts rule name description
---- ---------------------- --------------------------------------------------
2.3 TO_MALFORMED To: has a malformed address
2.3 INVALID_TZ_EST Invalid date in header (wrong EST timezone)
4.2 X_MESSAGE_INFO Bulk email fingerprint (X-Message-Info) found
0.1 FORGED_RCVD_HELO Received: contains a forged HELO
0.6 RCVD_HELO_IP_MISMATCH Received: HELO and IP do not match, but should
1.5 RCVD_NUMERIC_HELO Received: contains an IP address used for HELO
0.4 DIET_1 BODY: Lose Weight Spam
0.5 BIZ_TLD URI: Contains an URL in the BIZ top-level domain
1.4 DOMAIN_RATIO BODY: Message body mentions many internet domains
0.0 HTML_MESSAGE BODY: HTML included in message
1.5 MPART_ALT_DIFF BODY: HTML and text parts are different
0.6 HTML_OBFUSCATE_20_30 BODY: Message is 20% to 30% HTML obfuscation
0.6 HTML_BACKHAIR_8 BODY: HTML tags used to obfuscate words
2.5 RCVD_IN_XBL RBL: Received via a relay in Spamhaus XBL
[ listed in sbl-xbl.spamhaus.org]
0.1 RCVD_IN_SORBS_DUL RBL: SORBS: sent directly from dynamic IP address
[ listed in dnsbl.sorbs.net]
2.8 RCVD_IN_DSBL RBL: Received via a relay in list.dsbl.org
1.8 RCVD_IN_BL_SPAMCOP_NET RBL: Received via a relay in bl.spamcop.net
[Blocked - see http://www.spamcop.net/bl.shtml?]
1.7 RCVD_IN_NJABL_DUL RBL: NJABL: dialup sender did non-local SMTP
[ listed in combined.njabl.org]
0.6 URIBL_SBL Contains an URL listed in the SBL blocklist
[URIs: obsession.com still.com rxsupply.biz]
0.5 URIBL_WS_SURBL Contains an URL listed in the WS SURBL blocklist
[URIs: rxsupply.biz insurance.com]
2.0 URIBL_OB_SURBL Contains an URL listed in the OB SURBL blocklist
[URIs: rxsupply.biz]
3.9 URIBL_SC_SURBL Contains an URL listed in the SC SURBL blocklist
[URIs: rxsupply.biz]
0.0 DRUGS_ERECTILE Refers to an erectile drug
2.3 LONGWORDS Long string of long words

I have no idea why "refers to an erectile drug" scores 0 points. I'll be curious to see how the ifeminists.net weekly newsletter scores, since I frequently see it bounced by various spam filters.

The bad thing about SpamAssassin is that it needs to be installed on the server, not on your home PC. (There are ways to use SpamAssassin at home, but they're not trivial to install, and besides, then you have to download all the junk.) It's worth looking for this program, or something like it, when you shop around for Internet services.

I expect that using two successive spam filters with very different strategies -- SpamAssassin, followed by the Bayesian filter in Mozilla -- will really help me to keep spam under control. Actually Mozilla was doing quite well by itself, but as I said, I'd like to block more of this at the server.


