24 lines
968 B
Plaintext
24 lines
968 B
Plaintext
Return-Path: tim.one@comcast.net
|
|
Delivery-Date: Sun Sep 8 21:56:59 2002
|
|
From: tim.one@comcast.net (Tim Peters)
|
|
Date: Sun, 08 Sep 2002 16:56:59 -0400
|
|
Subject: [Spambayes] All Cap or Cap Word Subjects
|
|
In-Reply-To: <3D7B7F11.22376.29256B69@localhost>
|
|
Message-ID: <LNBBLJKPBEHFEDALKOLCCEPPBCAB.tim.one@comcast.net>
|
|
|
|
[Brad Clements]
|
|
> Just curious if subject line capitalization can be used as an indicator.
|
|
>
|
|
> Either the percentage of characters that are caps..
|
|
>
|
|
> Or, percentage starting with a capital letter (if number of words > xx)
|
|
|
|
Supply a mod to tokenizer.py and I'll test it (eventually <wink>). Note
|
|
that the tokenizer already *preserves* case in subject-line words, because
|
|
experiment showed that this was better than folding case away in this
|
|
specific context (but experiment also showed-- against my
|
|
expectations --that preserving case everywhere didn't make a significant
|
|
difference to either error rate -- the subject line is a special case for
|
|
this).
|
|
|