StanfordMLOctave/machine-learning-ex6/ex6/easy_ham/1845.e5c02ab6101d2a7759f059...

24 lines
976 B
Plaintext

Return-Path: skip@pobox.com
Delivery-Date: Mon Sep 9 20:35:04 2002
From: skip@pobox.com (Skip Montanaro)
Date: Mon, 9 Sep 2002 14:35:04 -0500
Subject: [Spambayes] deleting "duplicate" spam before training? good idea
orbad?
In-Reply-To: <20020909192542.GB2002@cthulhu.gerg.ca>
References: <15740.52432.861148.597750@12-248-11-90.client.attbi.com>
<LNBBLJKPBEHFEDALKOLCIECKBDAB.tim.one@comcast.net>
<20020909192542.GB2002@cthulhu.gerg.ca>
Message-ID: <15740.63464.611324.2220@12-248-11-90.client.attbi.com>
Greg> OTOH, look into DCC (Distributed Checksum Clearinghouse,
Greg> http://www.rhyolite.com/anti-spam/dcc/), which uses fuzzy
Greg> checksums. It's quite likely that DCC's checksumming scheme is
Greg> better than something any of us would throw together for personal
Greg> use (no offense, Skip!).
None taken. I wrote my little script before I was aware DCC existed. Even
now, it seems like overkill for my use.
Skip