Return-Path: tim.one@comcast.net Delivery-Date: Sat Sep 7 01:32:26 2002 From: tim.one@comcast.net (Tim Peters) Date: Fri, 06 Sep 2002 20:32:26 -0400 Subject: [Spambayes] understanding high false negative rate In-Reply-To: <15737.16782.542869.368986@slothrop.zope.com> Message-ID: [Jeremy Hylton[ > The total collections are 1100 messages. I trained with 1100/5 > messages. I'm reading this now as that you trained on about 220 spam and about 220 ham. That's less than 10% of the sizes of the training sets I've been using. Please try an experiment: train on 550 of each, and test once against the other 550 of each. Do that a few times making a random split each time (it won't be long until you discover why directories of individual files are a lot easier to work -- e.g., random.shuffle() makes this kind of thing trivial for me).