Return-Path: tim.one@comcast.net Delivery-Date: Sun Sep 8 21:56:59 2002 From: tim.one@comcast.net (Tim Peters) Date: Sun, 08 Sep 2002 16:56:59 -0400 Subject: [Spambayes] All Cap or Cap Word Subjects In-Reply-To: <3D7B7F11.22376.29256B69@localhost> Message-ID: [Brad Clements] > Just curious if subject line capitalization can be used as an indicator. > > Either the percentage of characters that are caps.. > > Or, percentage starting with a capital letter (if number of words > xx) Supply a mod to tokenizer.py and I'll test it (eventually ). Note that the tokenizer already *preserves* case in subject-line words, because experiment showed that this was better than folding case away in this specific context (but experiment also showed-- against my expectations --that preserving case everywhere didn't make a significant difference to either error rate -- the subject line is a special case for this).