StanfordMLOctave/machine-learning-ex6/ex6/easy_ham/1625.08690a3d25fcaacc8929c3...

115 lines
5.3 KiB
Plaintext

From spamassassin-devel-admin@lists.sourceforge.net Fri Oct 4 11:08:14 2002
Return-Path: <spamassassin-devel-admin@example.sourceforge.net>
Delivered-To: yyyy@localhost.example.com
Received: from localhost (jalapeno [127.0.0.1])
by jmason.org (Postfix) with ESMTP id D5F3316F73
for <jm@localhost>; Fri, 4 Oct 2002 11:05:55 +0100 (IST)
Received: from jalapeno [127.0.0.1]
by localhost with IMAP (fetchmail-5.9.0)
for jm@localhost (single-drop); Fri, 04 Oct 2002 11:05:55 +0100 (IST)
Received: from usw-sf-list2.sourceforge.net (usw-sf-fw2.sourceforge.net
[216.136.171.252]) by dogma.slashnull.org (8.11.6/8.11.6) with ESMTP id
g944tnK03832 for <jm@jmason.org>; Fri, 4 Oct 2002 05:55:49 +0100
Received: from usw-sf-list1-b.sourceforge.net ([10.3.1.13]
helo=usw-sf-list1.sourceforge.net) by usw-sf-list2.sourceforge.net with
esmtp (Exim 3.31-VA-mm2 #1 (Debian)) id 17xKUS-0003W7-00; Thu,
03 Oct 2002 21:55:04 -0700
Received: from smtp10.atl.mindspring.net ([207.69.200.246]) by
usw-sf-list1.sourceforge.net with esmtp (Exim 3.31-VA-mm2 #1 (Debian)) id
17xKTV-0000Pd-00 for <spamassassin-devel@lists.sourceforge.net>;
Thu, 03 Oct 2002 21:54:05 -0700
Received: from user-2injgi2.dsl.mindspring.com ([165.121.194.66]
helo=belphegore.hughes-family.org) by smtp10.atl.mindspring.net with esmtp
(Exim 3.33 #1) id 17xKTO-0006ze-00 for
spamassassin-devel@lists.sourceforge.net; Fri, 04 Oct 2002 00:53:59 -0400
Received: by belphegore.hughes-family.org (Postfix, from userid 48) id
92EA622876; Thu, 3 Oct 2002 21:51:06 -0700 (PDT)
From: bugzilla-daemon@hughes-family.org
To: spamassassin-devel@example.sourceforge.net
X-Bugzilla-Reason: AssignedTo
Message-Id: <20021004045106.92EA622876@belphegore.hughes-family.org>
Subject: [SAdev] [Bug 1054] New: Split up FROM_ENDS_IN_NUMS
Sender: spamassassin-devel-admin@example.sourceforge.net
Errors-To: spamassassin-devel-admin@example.sourceforge.net
X-Beenthere: spamassassin-devel@example.sourceforge.net
X-Mailman-Version: 2.0.9-sf.net
Precedence: bulk
List-Help: <mailto:spamassassin-devel-request@example.sourceforge.net?subject=help>
List-Post: <mailto:spamassassin-devel@example.sourceforge.net>
List-Subscribe: <https://example.sourceforge.net/lists/listinfo/spamassassin-devel>,
<mailto:spamassassin-devel-request@lists.sourceforge.net?subject=subscribe>
List-Id: SpamAssassin Developers <spamassassin-devel.example.sourceforge.net>
List-Unsubscribe: <https://example.sourceforge.net/lists/listinfo/spamassassin-devel>,
<mailto:spamassassin-devel-request@lists.sourceforge.net?subject=unsubscribe>
List-Archive: <http://sourceforge.net/mailarchives/forum.php?forum=spamassassin-devel>
X-Original-Date: Thu, 3 Oct 2002 21:51:06 -0700 (PDT)
Date: Thu, 3 Oct 2002 21:51:06 -0700 (PDT)
X-Spam-Status: No, hits=-40.5 required=5.0
tests=AWL,BUGZILLA_BUG,FORGED_RCVD_TRAIL,KNOWN_MAILING_LIST,
NO_REAL_NAME,T_NONSENSE_FROM_00_10
version=2.50-cvs
X-Spam-Level:
http://www.hughes-family.org/bugzilla/show_bug.cgi?id=1054
Summary: Split up FROM_ENDS_IN_NUMS
Product: Spamassassin
Version: unspecified
Platform: Other
OS/Version: other
Status: NEW
Severity: enhancement
Priority: P2
Component: Rules
AssignedTo: spamassassin-devel@example.sourceforge.net
ReportedBy: matt@nightrealms.com
The current FROM_ENDS_IN_NUMS triggers on any from where the user name ends
in two or more digits. I think this should be split up into different
numbers of trailing digits, so that rules with different S/O ratios can
get different scores. So I've made test rules that look from From
names that end with two, three, four, or five digitis, and one for six or
more digitis; I also put in a test rule that looks for Froms that end in a
single digit, just the sake of completeness.
Here is what I got:
OVERALL% SPAM% NONSPAM% S/O RANK SCORE NAME
15113 4797 10316 0.32 0.00 0.00 (all messages)
100.000 31.741 68.259 0.32 0.00 0.00 (all messages as %)
1.244 3.877 0.019 1.00 0.64 0.01 T_FROM_ENDS_IN_NUMS6
0.576 1.459 0.165 0.90 0.43 0.01 T_FROM_ENDS_IN_NUMS5
4.982 10.694 2.326 0.82 0.38 0.01 T_FROM_ENDS_IN_NUMS4
3.434 6.337 2.084 0.75 0.35 0.01 T_FROM_ENDS_IN_NUMS3
8.979 12.383 7.396 0.63 0.30 0.01 T_FROM_ENDS_IN_NUMS2
8.847 6.817 9.791 0.41 0.22 0.01 T_FROM_ENDS_IN_NUMS1
I should note that I get rather bad S/O's for FROM_ENDS_IN_NUMS, probably
because so much of my corpus is made up of Yahoo! Groups traffic, which
seems to have a lot of users that like sticking numbers at the ends of their
names. Here is the normal stats for FROM_ENDS_IN_NUMS:
7.024 28.480 2.834 0.91 0.50 0.88 FROM_ENDS_IN_NUMS
and my stats:
19.228 34.793 11.991 0.74 0.35 0.88 FROM_ENDS_IN_NUMS
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
Spamassassin-devel mailing list
Spamassassin-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/spamassassin-devel