ii.com • Reverse Spam Filtering: "Winning Without Fighting" by Nancy McGough

MetaNote 1 This is a work in progress.

MetaNote 2 The copyright notice above refers to my writing. I would be delighted if people adopt this strategy or terminology (“greenlist” etc.), and help refine it.

MetaNote 3 This page was mentioned in the 22 August 2002 issue of The Naked PC Newsletter, which has over 100,000 subscribers!

Keywords (for bots) spam fighting, junk email, UCE, UBE, filter, filtering, procmail, negative filter, reverse filter, inverse filter, The Art of War

On This Page

Winning-Without-Fighting Strategy
Fighting and Not Winning
Email Deflexion Flowchart
Keys to Implementation
My Implementation
Dissecting the Flowchart
See Also

Winning-Without-Fighting Strategy

Therefore, one hundred victories in one
hundred battles is not the most skillful.
Subduing the other’s military without battle
is the most skillful.

The Art of War by Sun Tzu, Chap ter 3

After years of thinking about, writing about, and filtering messages, I've decided that the best strategy for me is to not filter spam, but instead to filter non-spam and let the dregs, which are often spam, fall through my filters and land in catchall mailboxes. I then periodically open each catchall box, sort it by spam score, and visually scan it looking for non-spam. Since the non spam messages, if there are any, bubble to the top of the sorted-by-spam-score box, I need to look carefully at only the top of these catchall mailboxes. When I find a non-spam message in a catchall box, I “bounce forward” it to one of my magnet-updater email addresses so that my non-spam filters will catch this type of message in the future.

Fighting and Not Winning

For an example of a strategy that seems antithetical to the wisdom of The Art of War, see Paul Graham's article Filters that Fight Back. For a discussion about this article and some other strategies, see the 10-August-2003 thread Paul Graham: Filters that Fight Back at slashdot.org. An especially insightful comment is Re: And now by the mysterious Zeinfeld, a former experimental physicist.

Email Deflexion Flowchart

Below is a flowchart that illustrates my strategy. Note that this flowchart leaves out a lot of the SMTP level filtering that is essential nowadays. For example, nowadays it is common for an SMTP server to:

reject messages to unknown recipients;
check if the combination of the IP address, sender, and recipient is familiar and, if it's not, embargo or temporarily reject the message. This is called greylisting;
use other SMTP level anti-spam techniques, some of which are described in Stopping e-mail abuse at Wikipedia.

My focus below is on aspects of the mail flow that are needed for my reverse spam filtering strategy. For example, recipient splitting (exploding) is essential so that each user can use his personal bluelist and personal greenlist to pre-filter his mail before expensive or error-prone direct spam filtering -- such as content-based filtering, Bayesian filtering, etc. -- is done.

---
¹ The word “box” can be interpreted to mean a traditional mailbox, a bucket, or a label. Labels, which are also called keywords, can be used to construct virtual mailboxes (aka virtual folders or smart folders).

² When a message is moved to the green box, its sender will be automatically greenlisted (for example, by a script that is regularly run against the green box).

Tip: Do not put your own email addresses in your greenlist because spammers often forge the From header with your address as a way to sneak into your green box! Another reason to keep your addresses out of your greenlist is that you can see how spammy your messages are (because they will get analyzed and tagged with a spam score).

³ LOAF is a GPL'd distributed-social-network filter that is a private way to greenlist your correspondents and limelist the correspondents of your correspondents (aka your “2nd degree correspondents”).

SUPPORT INFINITE INK!

all plans include shell
access, pine, & procmail
Current promotion:
77¢/mo for 1st year

Keys to Implementation

The keys to implementing this strategy are

good message filtering software (at all layers: MTA, MDA, and in the mail client)
good tool for analyzing and assigning a spam score/probability to the fall-through (yellow and red) messages
good mail client that makes it easy to
- process multiple incoming mailboxes
- sort (order) a mailbox by spam score (for example the yellow or red box)
- “bounce forward” (aka resend or redirect) a message to magnet-updater addresses
easy way to train Bayes filters; one possibility is Jem Berkes' webfilt - Flexible web interface to server-side spam filters
good program for checking multiple incoming mailboxes for new messages (I'm collecting a list here)
automatic way to update bluelist and greenlist filter recipes (magnets)

Tip: Do not put your own email addresses in your greenlist because spammers often forge the From header with your address as a way to sneak into your green box!
Message transfer agent (MTA) that munges your incoming messages and puts the original envelope addresses in headers such as X-Original-From, X-Original-To, or X-Rcpt-To. This header is used to determine which messages should be deflected to the magnet-updater program. For more about the original envelope recipient address and problems with Bcc'd messages, see this section of my Procmail Quick Start. If you know of a mail-hosting provider that has set up their MTA to do this, please let me know and I'll link to them.
good mailbox-naming scheme that makes it easy to organize, archive, find, sort (order), and search mailboxes. I discuss my naming style in my Procmail Quick Start.

My Implementation

I implement this strategy using . . .

Procmail to snag viruses and sort solicited-bulk (blue) & trusted-non-bulk-sender (green) incoming messages as they arrive at the mail server.
The latest SpamAssassin (which at the moment is 3.0.2) to tag most of my “catchall” (yellow and red) messages with a spam score. Note that you will get significantly better results if you use the most up-to-date SpamAssassin.

One of the keys to my deflexion strategy is to have the spam score (_HITS_) inserted into the beginning of the Subject header. This makes it easy to order messages by score because I can simply do a sort-by-subject in my mail client. I recommend that you surround the Subject tag with braces (squiggly brackets) because 1) these tagged Subjects will then sort below almost all other Subjects when the Subjects are sorted using an ASCII sort (because { is ASCII character 123 out of 127); and 2) a popular alternative, square brackets, will not work in an IMAP sort because the IMAP sort and thread specification says to ignore text that is in square brackets during a sort. If you want to muck around with the format of the _HITS_ token, see Keith C. Ivey's message Re: adding SPAM hits score to headers in the SA mailing list.

Below are the especially useful setting in my SA 2.64 user_prefs file (comments are preceded by #). The first two settings are essential to my deflexion strategy.


	The above SpamAssassin `user_prefs` variables work SA 2.64. Some of these variables do not exist in earlier or later versions of SA so make sure that you read the documentation for your version of SA. For example, In SA 3.x, The `rewrite_subject` and `subject_tag` configuration options were deprecated and are now removed. Instead, use `rewrite_header Subject [your desired setting]`

To see if your user_prefs file is syntactically valid, type

spamassassin --lint

and make sure that this invokes the spamassassin that is processing your messages! To check which spamassassin the above command invokes, type

which spamassassin

To check what version this is, type

spamassassin --version

To see what tests SA performs, for example to check if network tests are happening, type

spamassassin --lint --debug

Procmail to filter SA-tagged messages into yellow, red, and infrared (to-be-devnulled) mailboxes. The Procmail recipes that I use for this are in my Procmail Quick Start in the Using SpamAssassin section.
Mozilla, Mulberry, and Pine to check, process, and optionally snarf multiple mailboxes that reside on the mail server (using IMAP, the Internet Message Access Protocol)
A magnet-updater script that I run on my mail server to automatically update my blue and green lists.
Procmail recipes that detect almost any mailing-list message and delivers it to a mailbox with a name that contains the local part (left-hand side) of the mailing-list address.

But, you can implement this strategy using many different tools. For example, if your mail client can do filtering and can be set up to do spam-scoring (possibly via a plug-in), you can use it to do the scoring, filtering, and processing of your mailboxes. Another option is to find a mail hosting service that gives their users the option to use server-based filtering and spam-detection tools. My IMAP Service Providers page includes many such providers.

Dissecting the Flowchart

(To Be Written)

[prism separating light] explanation of my strategy (why i do things this way and in this order); the imperfect metaphor I'm using (light:prism :: incoming-messages:message-filtering software); details about how I implement and adapt it using Procmail, SpamAssassin, Mozilla, Pine, Mulberry; my magnet-updater script

	A message sent to one or more of my addresses (& possibly other addresses)
\|\| \|\| \/
Receiving Message Transfer Agent on My Mail-Hosting Provider's Mail Server
\|\| \/
No MX or A record for hostname of envelope sender?		=====> yes	SMTP-level reject
\|\| no \/
Sent through an open relay or open proxy server? or Hostname of envelope sender is in a respected blocklist (for example RFC-ignorant)? or Caught by other SMTP level anti-spam techniques?		=====> yes	SMTP-level reject
\|\| no \/
“Explode/Split” message so there is a separate copy for each local recipient address
\|\| \/
Inject two headers: 1) original envelope sender and 2) original envelope recipient of this instance of the exploded message
\|\| \/
Local Message Delivery Agent (leaving the SMTP world; no longer kosher to bounce back)
\|\| \|\| \/
Global Server-Based Filters (e.g. snag viruses; should be user configurable)
\|\| \/
Does this seem like a virus?		=====> yes	violet (virus quarantine) box¹
\|\| no \/
Personal Filters (server-based or client-based or a combination of these)
\|\| \|\| \/
Sent to one of my secret magnet-updater addresses (possibly via Bcc or “bounce forward”)?		=====> yes	magnet-updater program (updates blue & green magnets)
\|\| \|\| \|\| \|\| \|\| \|\| no \|\| \/			\|\| \/
			magenta (magnet) box

Mailing list or other solicited bulk email?		=====> yes	appropriate blue (bulk) box
\|\| \|\| no \|\| \/
From a trusted non-bulk sender (but not From one of my addresses²)?		=====> yes	green box
\|\| no \/
From a sender who is trusted by one of my trusted senders³ (but not From one of my addresses)?		=====> yes	lime green box [auto move undeleted messages to green box]²
\|\| no \/
Analyze & tag with a spam score or probability
\|\| \/
Is spam score low?		=====> yes	yellow box [auto move undeleted messages to green box]²
\|\| \|\| \|\| \/
=========================> no			red (rubbish) box [use mail client to list messages ordered by score; move non-spam to green box² & delete the rest]

REVERSE SPAM FILTERING
Winning Without Fighting

Keywords (for bots) spam fighting, junk email, UCE, UBE, filter, filtering, procmail, negative filter, reverse filter, inverse filter, The Art of War

Winning-Without-Fighting Strategy

Fighting and Not Winning

Email Deflexion Flowchart

Keys to Implementation

My Implementation

Dissecting the Flowchart

(To Be Written)

See Also

Definitions and Terminology

Spam in the News

Strategies for Reverse Spam Filtering

Tools for Reverse Spam Filtering

Miscellaneous

MetaLinks


Reverse Spam Fitlering: Winning Without Fighting <http://www.ii.com/internet/messaging/spam/> Copyright © Infinite Ink & Nancy McGough	1st published 16-Feb-2002 updated 22-Mar-2004 tweaked 07-Mar-2005

REVERSE SPAM FILTERING Winning Without Fighting

Keywords (for bots) spam fighting, junk email, UCE, UBE, filter, filtering, procmail, negative filter, reverse filter, inverse filter, The Art of War

Winning-Without-Fighting Strategy

Fighting and Not Winning

Email Deflexion Flowchart

Keys to Implementation

My Implementation

Dissecting the Flowchart

(To Be Written)

See Also

Definitions and Terminology

Spam in the News

Strategies for Reverse Spam Filtering

Tools for Reverse Spam Filtering

Miscellaneous

MetaLinks

REVERSE SPAM FILTERING
Winning Without Fighting