92 lines
3.4 KiB
Plaintext
92 lines
3.4 KiB
Plaintext
Replied: Mon, 16 Sep 2002 00:33:33 +0100
|
|
Replied: yyyy@example.com (Justin Mason)
|
|
Replied: Daniel Quinlan <quinlan@pathname.com>
|
|
Replied: Matt Sergeant <msergeant@startechgroup.co.uk>
|
|
Replied: Spamassassin-devel <spamassassin-devel@example.sourceforge.net>
|
|
From quinlan@pathname.com Mon Sep 16 00:08:16 2002
|
|
Return-Path: <quinlan@pathname.com>
|
|
Delivered-To: yyyy@localhost.example.com
|
|
Received: from localhost (jalapeno [127.0.0.1])
|
|
by jmason.org (Postfix) with ESMTP id 4B65C16F03
|
|
for <jm@localhost>; Mon, 16 Sep 2002 00:08:15 +0100 (IST)
|
|
Received: from jalapeno [127.0.0.1]
|
|
by localhost with IMAP (fetchmail-5.9.0)
|
|
for jm@localhost (single-drop); Mon, 16 Sep 2002 00:08:15 +0100 (IST)
|
|
Received: from proton.pathname.com
|
|
(adsl-216-103-211-240.dsl.snfc21.pacbell.net [216.103.211.240]) by
|
|
dogma.slashnull.org (8.11.6/8.11.6) with ESMTP id g8ELf9C18215 for
|
|
<jm@jmason.org>; Sat, 14 Sep 2002 22:41:10 +0100
|
|
Received: from quinlan by proton.pathname.com with local (Exim 3.35 #1
|
|
(Debian)) id 17qKfW-0001tr-00; Sat, 14 Sep 2002 14:41:34 -0700
|
|
To: yyyy@example.com (Justin Mason)
|
|
Cc: Matt Sergeant <msergeant@startechgroup.co.uk>,
|
|
Spamassassin-devel <spamassassin-devel@lists.sourceforge.net>
|
|
Subject: Re: [SAdev] testing with less rules
|
|
References: <20020913211335.564DD16F03@example.com>
|
|
From: Daniel Quinlan <quinlan@pathname.com>
|
|
Date: 14 Sep 2002 14:41:34 -0700
|
|
In-Reply-To: yyyy@example.com's message of "Fri, 13 Sep 2002 22:13:30 +0100"
|
|
Message-Id: <yf2k7los0z5.fsf@proton.pathname.com>
|
|
Lines: 51
|
|
X-Mailer: Gnus v5.7/Emacs 20.7
|
|
X-Spam-Status: No, hits=-8.7 required=7.0
|
|
tests=AWL,EMAIL_ATTRIBUTION,IN_REP_TO,LINES_OF_YELLING,
|
|
LINES_OF_YELLING_2,QUOTED_EMAIL_TEXT,REFERENCES,
|
|
SPAM_PHRASE_00_01
|
|
version=2.50-cvs
|
|
X-Spam-Level:
|
|
|
|
jm@jmason.org (Justin Mason) writes:
|
|
|
|
>>> DATE_IN_PAST_48_96
|
|
>>> SPAM_PHRASE_00_01
|
|
>>> SPAM_PHRASE_01_02
|
|
>>> SPAM_PHRASE_02_03
|
|
>>> SPAM_PHRASE_03_05
|
|
>>> SPAM_PHRASE_05_08
|
|
|
|
> I was thinking of just removing those particular rules, but keeping the
|
|
> other entries in the range, since they're proving too "noisy" to be
|
|
> effective. But I'd be willing to keep those ones in, all the same. What
|
|
> do you think? Matt/Craig, thoughts?
|
|
|
|
I think I could handle commenting out the lowest SPAM_PHRASE_XX_YY
|
|
scores. If the GA could handle this sort of thing so they'd
|
|
automatically be zeroed, I'd feel better since the ranges could change
|
|
next time the phrase list is regenerated or the algorithm tweaked.
|
|
|
|
I think we need to understand why DATE_IN_PAST_48_96 is so low before
|
|
we remove it. The two rules on either side perform quite well.
|
|
|
|
>> And here are the rules that seem like they should be better or should
|
|
>> be recoverable:
|
|
|
|
>>> FROM_MISSING
|
|
>>> GAPPY_TEXT
|
|
>>> INVALID_MSGID
|
|
>>> MIME_NULL_BLOCK
|
|
>>> SUBJ_MISSING
|
|
|
|
> well, I don't like SUBJ_MISSING, I reckon there's a world of mails from
|
|
> cron jobs (e.g.) which hit it.
|
|
|
|
Okay, drop SUBJ_MISSING.
|
|
|
|
> But, yes, the others for sure should be recoverable, and I'm sure there's
|
|
> more.
|
|
|
|
Probably a few, those seemed like the best prospects to me.
|
|
|
|
> BTW do you agree with the proposed methodology (ie. remove the rules and
|
|
> bugzilla each one?)
|
|
|
|
I only want a bugzilla ticket for each one if people are okay with
|
|
quick WONTFIX closes on the ones deemed unworthy of recovery.
|
|
|
|
If you could put the stats for each rule in the ticket somehow (should
|
|
be automatable with email at the very least), it would help.
|
|
|
|
Dan
|
|
|
|
|