StanfordMLOctave/machine-learning-ex6/ex6/easy_ham/1863.bfe819f6abc823642064dc...

61 lines
1.8 KiB
Plaintext

Return-Path: anthony@interlink.com.au
Delivery-Date: Thu Sep 12 05:26:41 2002
From: anthony@interlink.com.au (Anthony Baxter)
Date: Thu, 12 Sep 2002 14:26:41 +1000
Subject: [Spambayes] Current histograms
Message-ID: <200209120426.g8C4QfI23085@localhost.localdomain>
> They weren't partitioned in any particular scheme - I think I'll write a
> reshuffler and move them all around, just in case (fwiw, I'm using MH
> style folders with numbered files - means you can just use MH tools to
> manipulate the sets.)
Freak show. Obviously there _was_ some sort of patterns to the data:
Training on Data/Ham/Set1 & Data/Spam/Set1 ... 1798 hams & 1546 spams
0.779 0.582
0.834 0.840
0.945 0.452
0.667 1.164
Training on Data/Ham/Set2 & Data/Spam/Set2 ... 1798 hams & 1547 spams
1.112 0.776
0.834 0.969
0.779 0.646
0.667 1.100
Training on Data/Ham/Set3 & Data/Spam/Set3 ... 1798 hams & 1548 spams
1.168 0.582
1.001 0.646
0.834 0.582
0.667 0.453
Training on Data/Ham/Set4 & Data/Spam/Set4 ... 1798 hams & 1547 spams
0.779 0.712
0.779 0.582
0.556 0.840
0.779 0.970
Training on Data/Ham/Set5 & Data/Spam/Set5 ... 1798 hams & 1546 spams
0.612 0.517
0.779 0.517
0.723 0.711
0.667 0.582
total false pos 144 1.60177975528
total false neg 101 1.30592190328
(before the shuffle, I was seeing:
total false pos 273 3.03501945525
total false neg 367 4.74282760403
)
For sake of comparision, here's what I see for partitioned into 2 sets:
Training on Data/Ham/Set1 & Data/Spam/Set1 ... 4492 hams & 3872 spams
0.490 0.776
Training on Data/Ham/Set2 & Data/Spam/Set2 ... 4493 hams & 3868 spams
0.401 0.491
total false pos 40 0.445186421814
total false neg 49 0.633074935401
more later...
Anthony