23 lines
866 B
Plaintext
23 lines
866 B
Plaintext
Return-Path: skip@pobox.com
|
|
Delivery-Date: Mon Sep 9 17:31:12 2002
|
|
From: skip@pobox.com (Skip Montanaro)
|
|
Date: Mon, 9 Sep 2002 11:31:12 -0500
|
|
Subject: [Spambayes] deleting "duplicate" spam before training? good idea or
|
|
bad?
|
|
Message-ID: <15740.52432.861148.597750@12-248-11-90.client.attbi.com>
|
|
|
|
|
|
Because I get mail through several different email addresses, I frequently
|
|
get duplicates (or triplicates or more-plicates) of various spam messages.
|
|
In saving spam for later analysis I haven't always been careful to avoid
|
|
saving such duplicates.
|
|
|
|
I wrote a script some time ago to try an minimize the duplicates I see by
|
|
calculating a loose checksum, but I still have some duplicates. Should I
|
|
delete the duplicates before training or not? Would people be interested in
|
|
the script? I'd be happy to extricate it from my local modules and check it
|
|
into CVS.
|
|
|
|
Skip
|
|
|