Deflexion.com blog >> 

[Infinite Ink logo]

REVERSE SPAM FILTERING
Winning Without Fighting

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Copyright © Nancy McGough & Infinite Ink
Last modified 07-Mar-2005

 

 

 
 

MetaNote 1   This is a work in progress.

MetaNote 2   The copyright notice above refers to my writing. I would be delighted if people adopt this strategy or terminology (“greenlist” etc.), and help refine it.

MetaNote 3   This page was mentioned in the 22 August 2002 issue of The Naked PC Newsletter, which has over 100,000 subscribers!

Keywords (for bots)    spam fighting, junk email, UCE, UBE, filter, filtering, procmail, negative filter, reverse filter, inverse filter, The Art of War

 

 

 On This Page 

 

Winning-Without-Fighting Strategy

Therefore, one hundred victories in one
       hundred battles is not the most skillful.
Subduing the otherís military without battle
       is the most skillful.
The Art of War by Sun Tzu, Chapter 3

 

After years of thinking about, writing about, and filtering messages, I've decided that the best strategy for me is to not filter spam, but instead to filter non-spam and let the dregs, which are often spam, fall through my filters and land in catchall mailboxes. I then periodically open each catchall box, sort it by spam score, and visually scan it looking for non-spam. Since the non spam messages, if there are any, bubble to the top of the sorted-by-spam-score box, I need to look carefully at only the top of these catchall mailboxes. When I find a non-spam message in a catchall box, I “bounce forward” it to one of my magnet-updater email addresses so that my non-spam filters will catch this type of message in the future.

 

Fighting and Not Winning

For an example of a strategy that seems antithetical to the wisdom of The Art of War, see Paul Graham's article Filters that Fight Back. For a discussion about this article and some other strategies, see the 10-August-2003 thread Paul Graham: Filters that Fight Back at slashdot.org. An especially insightful comment is Re: And now by the mysterious Zeinfeld, a former experimental physicist.

 

Email Deflexion Flowchart

Below is a flowchart that illustrates my strategy. Note that this flowchart leaves out a lot of the SMTP level filtering that is essential nowadays. For example, nowadays it is common for an SMTP server to:

My focus below is on aspects of the mail flow that are needed for my reverse spam filtering strategy. For example, recipient splitting (exploding) is essential so that each user can use his personal bluelist and personal greenlist to pre-filter his mail before expensive or error-prone direct spam filtering -- such as content-based filtering, Bayesian filtering, etc. -- is done.

 

   A message sent to one or more of my
addresses (& possibly other addresses)
         
||   
||   
\/   
     
Receiving Message Transfer Agent on My Mail-Hosting Provider's Mail Server
||   
\/   
     
No MX or A record for hostname of envelope sender?    
 
 
=====>
yes  
  SMTP-level reject
|| no
\/   
     
 Sent through an open relay or open proxy server?
or
Hostname of envelope sender is in a
respected blocklist (for example RFC-ignorant)?
or
Caught by other SMTP level anti-spam techniques?
 
 
 
=====>
yes  
  SMTP-level reject
|| no
\/   
     
Explode/Split” message so there is a separate copy
for each local recipient address
 
 
 
  
   
||   
\/   
     
Inject two headers: 1) original envelope sender
and 2) original envelope recipient of
this instance of the exploded message
 
 
 
  
   
||   
\/   
     
Local Message Delivery Agent (leaving the SMTP world; no longer kosher to bounce back)
||   
||   
\/   
     
Global Server-Based Filters (e.g. snag viruses; should be user configurable)
||   
\/   
     
 Does this seem like a virus?   

 
=====>
yes  
  violet (virus quarantine) box¹
|| no
\/   
     
Personal Filters (server-based or client-based or a combination of these)
||   
||   
\/   
     
Sent to one of my secret magnet-updater addresses
(possibly via Bcc or “bounce forward”)?
 

 
=====>
yes  
  magnet-updater program
(updates blue & green magnets)
||   
||   
||   
||   
||   
|| no
||   
\/   
     
||   
\/   
 
 
 
    magenta (magnet) box



   
 Mailing list or other solicited bulk email?    
 
 
=====>
yes  
  appropriate blue (bulk) box
||   
|| no
||   
\/   
     
 From a trusted non-bulk sender
(but not From one of my addresses²)? 
 

 
=====>
yes  
  green box
|| no
\/   
     
 From a sender who is trusted by one of my trusted senders³
(but not From one of my addresses)? 
 

 
=====>
yes  
  lime green box
[auto move undeleted
messages to green box]²
|| no
\/   
     
 Analyze & tag with a spam score or probability   
 
 
  
   
||   
\/   
     
 Is spam score low?    

 
=====>
yes  
  yellow box
[auto move undeleted
messages to green box]²
||   
||   
||   
\/   
     
=========================>
no  



red (rubbish) box
[use mail client to list messages
ordered by score; move non-spam
to green box² & delete the rest]

 

---
¹ The word “box” can be interpreted to mean a traditional mailbox, a bucket, or a label. Labels, which are also called keywords, can be used to construct virtual mailboxes (aka virtual folders or smart folders).

² When a message is moved to the green box, its sender will be automatically greenlisted (for example, by a script that is regularly run against the green box).

Tip:  Do not put your own email addresses in your greenlist because spammers often forge the From header with your address as a way to sneak into your green box! Another reason to keep your addresses out of your greenlist is that you can see how spammy your messages are (because they will get analyzed and tagged with a spam score).

³ LOAF is a GPL'd distributed-social-network filter that is a private way to greenlist your correspondents and limelist the correspondents of your correspondents (aka your “2nd degree correspondents”).

 

 SUPPORT INFINITE INK! 

[DreamHost.com: Plans Starting at Just $7.95/mo!]
all plans include shell
access, pine, & procmail
Current promotion:
77¢/mo for 1st year
[MailSnare.net]
[FastMail.FM logo]

 

 

Keys to Implementation

The keys to implementing this strategy are

 

My Implementation

I implement this strategy using . . .


# Site-wide settings go in ...etc/mail/spamassassin/local.cf (location depends on installation)
# User-specific settings override site-wide settings and go in ~/.spamassassin/user_prefs
   
# To insert the spam score at the beginning of the Subject, include the following two settings.
# This is an essential part of my deflexion strategy.
rewrite_subject  1
subject_tag      {* _HITS_ *}
 			
# Because automatic spam/non-spam separation is not as simple as black/white or yes/no,
# I want the "spamminess" to be put in the X-Spam-Level header and Subject header 
# (see above 2 settings) of every message that gets a score (hits) of 1.00 or more.
# [Note that this is a much lower threshold than what most people use because I use
#  the scores to stratify my messages (rather than to try to say 'YES this is spam.')]
required_hits    1.00
   
# I need the spam-level "stars" (default character is *) for my Procmail recipes
spam_level_stars 1
  
# I use R (for Red) as the spam-level character because it is easier than * to match in Procmail and
# because spammers are more likely to forge SA headers using the default spam-level character, which is *
add_header all Level _STARS(R)_
# NOTE: add_header was introduced in 2.60. If you use SA 2.55 or earlier, use the following instead:
## spam_level_char  R

 
# I (unfortunately) read only English so I use the following settings
ok_languages     en
ok_locales       en
  

# Since I use Pine 4.61 as my mail client and Dallman Ross's Virus Snaggers (vsnag), 
# I do not need to worry about malicious attachments. And I need each 
# message to be close to its original state for the message processing that
# I do using Procmail, IMAP, and "bounce forward"
report_safe      0 
# NOTE: report_safe is available in SA version 2.50 and later (and you need 2.51 or
#  later to be able to set its value to 2). If you are using 2.4x or earlier, use
#  defang_mime instead (but you really should upgrade!)


   
# NOTE: use_bayes is available in SA version 2.50 and later  
# I do not use SA's Bayesian analysis because with my deflexion system, almost every 
# message that SA sees is spam and thus SA cannot automatically learn much about my 
# non-spam messages (and I don't want to spend time using sa-learn to train SA; I'd
# rather spend that time training my procmail-invoked greenlists and bluelists!)
use_bayes        0

# NOTE: trusted_networks is available in SA version 2.60 and later
# Some of my email addresses are hosted on other systems and automatically
# redirected or forwarded to this system. To save SA from checking those 
# system's IP addresses, I tell SA to trust them. 
## trusted_networks ip.add.re.ss[/mask]

# I do not skip RBL checks because they improve the accuracy of SA's scores. 
# NOTE 1: SA 2.60-rc4 and later are much better at timing out if a Real-time
# Block List (RBL) is having problems or is dead. NOTE 2: SA does not use RBLs
#  to "block" mail, it uses them to help determine a message's score.
skip_rbl_checks  0

# I use the following so that I (& my users) will be able to see that a message
# was processed by SA running at spam.deflexion.com. I prepend 'www.' because
# some mail clients, for example pine, will then turn it into a link.  
add_header all Checker-Version SpamAssassin _VERSION_ (_SUBVERSION_) on www.spam.deflexion.com
  
  
# If you are thinking about changing some of the default scores that SA gives, I recommend
# that you first read Matt Kettler's SA-rules-howto.txt and note that in order for 
# user rules to work with spamd 2.55 & later, local.cf must contain allow_user_rules
# (if you run spamassassin rather than spamd, custom user rules will work by default)
## put custom user rules here
  
# For info about the following rule, see Re: habeas problems in the SA discussion group
# If you use Bayes, I suggest you read the thread bayes should ignore habeas headers?
score HABEAS_SWE -0.1


But, you can implement this strategy using many different tools. For example, if your mail client can do filtering and can be set up to do spam-scoring (possibly via a plug-in), you can use it to do the scoring, filtering, and processing of your mailboxes. Another option is to find a mail hosting service that gives their users the option to use server-based filtering and spam-detection tools. My IMAP Service Providers page includes many such providers.

 

 

Dissecting the Flowchart

(To Be Written)

[prism separating light] explanation of my strategy (why i do things this way and in this order); the imperfect metaphor I'm using (light:prism :: incoming-messages:message-filtering software); details about how I implement and adapt it using Procmail, SpamAssassin, Mozilla, Pine, Mulberry; my magnet-updater script

 

 

 

 

See Also

Definitions and Terminology

 

Spam in the News

 

Strategies for Reverse Spam Filtering

 

Tools for Reverse Spam Filtering

 

[chromaticity] Miscellaneous

 

MetaLinks


Reverse Spam Fitlering: Winning Without Fighting
<http://www.ii.com/internet/messaging/spam/>
Copyright © Infinite Ink & Nancy McGough

1st published 16-Feb-2002
updated 22-Mar-2004
tweaked 07-Mar-2005