GeronBook/Ch3/datasets/spam/easy_ham/01706.582f22e10f4f792eb0efe...

28 lines
1.1 KiB
Plaintext

Return-Path: tim.one@comcast.net
Delivery-Date: Sat Sep 7 21:11:36 2002
From: tim.one@comcast.net (Tim Peters)
Date: Sat, 07 Sep 2002 16:11:36 -0400
Subject: [Spambayes] test sets?
In-Reply-To: <200209061424.g86EOcd14363@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: <LNBBLJKPBEHFEDALKOLCOENCBCAB.tim.one@comcast.net>
[Guido]
> Perhaps more useful would be if Tim could check in the pickle(s?)
> generated by one of his training runs, so that others can see how
> Tim's training data performs against their own corpora.
I did that yesterday, but seems like nobody bit. Just in case <wink>, I
uploaded a new version just now. Since MINCOUNT went away, UNKNOWN_SPAMPROB
is much less likely, and there's almost nothing that can be pruned away (so
the file is about 5x larger now).
http://sf.net/project/showfiles.php?group_id=61702
> This could also be the starting point for a self-contained distribution
> (you've got to start with *something*, and training with python-list data
> seems just as good as anything else).
The only way to know anything here is for someone to try it.