From jm@jmason.org Fri Sep 20 13:03:34 2002 Return-Path: Delivered-To: yyyy@spamassassin.taint.org Received: by spamassassin.taint.org (Postfix, from userid 500) id 6CA7916F03; Fri, 20 Sep 2002 13:03:34 +0100 (IST) Received: from spamassassin.taint.org (localhost [127.0.0.1]) by jmason.org (Postfix) with ESMTP id 69CACF7B1; Fri, 20 Sep 2002 13:03:34 +0100 (IST) To: "Michael Moncur" Cc: "Justin Mason" , "Daniel Quinlan" , SpamAssassin-devel@lists.sourceforge.net Subject: Re: [SAdev] phew! In-Reply-To: Message from "Michael Moncur" of "Thu, 19 Sep 2002 21:42:52 MDT." From: yyyy@spamassassin.taint.org (Justin Mason) X-GPG-Key-Fingerprint: 0A48 2D8B 0B52 A87D 0E8A 6ADD 4137 1B50 6E58 EF0A X-Habeas-Swe-1: winter into spring X-Habeas-Swe-2: brightly anticipated X-Habeas-Swe-3: like Habeas SWE (tm) X-Habeas-Swe-4: Copyright 2002 Habeas (tm) X-Habeas-Swe-5: Sender Warranted Email (SWE) (tm). The sender of this X-Habeas-Swe-6: email in exchange for a license for this Habeas X-Habeas-Swe-7: warrant mark warrants that this is a Habeas Compliant X-Habeas-Swe-8: Message (HCM) and not spam. Please report use of this X-Habeas-Swe-9: mark in spam to . Date: Fri, 20 Sep 2002 13:03:29 +0100 Sender: yyyy@spamassassin.taint.org Message-Id: <20020920120334.6CA7916F03@spamassassin.taint.org> "Michael Moncur" said: > My corpus is about 50% spamtrap spam at any given time. Let me know if I > should leave that out next time, I do keep it separate. My spamtraps are > pretty clean of viruses and bounce messages most of the time. IMO spamtrap data that's well-cleaned and monitored is fine. To my mind there's 3 types of spamtraps: 1. old user addresses, recycled into spamtraps when the user closes the account 2. old user addresses, recycled into spamtraps several months after the user closes the account, scanned for newsletters, unsubscribed from them etc. 3. real spamtrap addresses to trap website crawlers. The latter 2 are the most effective, but #1 is a real PITA; it takes lots of maintainance to avoid ham getting in there. Some of my spamtrap data had a few 1's contributed by ISPs, and I hadn't spent enough time sifting for legit mail that was slipping through. So I felt better leaving them out for this run, apart from what I'd hand-cleaned. --j.