37 lines
1.5 KiB
Plaintext
37 lines
1.5 KiB
Plaintext
Return-Path: guido@python.org
|
|
Delivery-Date: Sat Sep 7 21:19:16 2002
|
|
From: guido@python.org (Guido van Rossum)
|
|
Date: Sat, 07 Sep 2002 16:19:16 -0400
|
|
Subject: [Spambayes] test sets?
|
|
In-Reply-To: Your message of "Sat, 07 Sep 2002 16:11:36 EDT."
|
|
<LNBBLJKPBEHFEDALKOLCOENCBCAB.tim.one@comcast.net>
|
|
References: <LNBBLJKPBEHFEDALKOLCOENCBCAB.tim.one@comcast.net>
|
|
Message-ID: <200209072019.g87KJGe15250@pcp02138704pcs.reston01.va.comcast.net>
|
|
|
|
> [Guido]
|
|
> > Perhaps more useful would be if Tim could check in the pickle(s?)
|
|
> > generated by one of his training runs, so that others can see how
|
|
> > Tim's training data performs against their own corpora.
|
|
|
|
[Tim]
|
|
> I did that yesterday, but seems like nobody bit.
|
|
|
|
I downloaded and played with it a bit, but had no time to do anything
|
|
systematic. It correctly recognized a spam that slipped through SA.
|
|
But it also identified as spam everything in my inbox that had any
|
|
MIME structure or HTML parts, and several messages in my saved 'zope
|
|
geeks' list that happened to be using MIME and/or HTML.
|
|
|
|
So I guess I'll have to retrain it (yes, you told me so :-).
|
|
|
|
> Just in case <wink>, I
|
|
> uploaded a new version just now. Since MINCOUNT went away, UNKNOWN_SPAMPROB
|
|
> is much less likely, and there's almost nothing that can be pruned away (so
|
|
> the file is about 5x larger now).
|
|
>
|
|
> http://sf.net/project/showfiles.php?group_id=61702
|
|
|
|
I'll try this when I have time.
|
|
|
|
--Guido van Rossum (home page: http://www.python.org/~guido/)
|