StanfordMLOctave/machine-learning-ex6/ex6/easy_ham/1582.ea28e13144adb7ec8718ca...

From quinlan@pathname.com  Tue Sep 17 23:30:38 2002
Return-Path: <quinlan@pathname.com>
Delivered-To: yyyy@localhost.example.com
Received: from localhost (jalapeno [127.0.0.1])
	by jmason.org (Postfix) with ESMTP id EF15F16F03
	for <jm@localhost>; Tue, 17 Sep 2002 23:30:37 +0100 (IST)
Received: from jalapeno [127.0.0.1]
	by localhost with IMAP (fetchmail-5.9.0)
	for jm@localhost (single-drop); Tue, 17 Sep 2002 23:30:38 +0100 (IST)
Received: from proton.pathname.com
    (adsl-216-103-211-240.dsl.snfc21.pacbell.net [216.103.211.240]) by
    dogma.slashnull.org (8.11.6/8.11.6) with ESMTP id g8HKVLC25673 for
    <jm@jmason.org>; Tue, 17 Sep 2002 21:31:21 +0100
Received: from quinlan by proton.pathname.com with local (Exim 3.35 #1
    (Debian)) id 17rP0f-0004fo-00; Tue, 17 Sep 2002 13:31:49 -0700
To: yyyy@example.com (Justin Mason)
Cc: spamassassin-devel@example.sourceforge.net
Subject: Re: [SAdev] Re: [SAtalk] SpamAssassin and unconfirmed.dsbl.org
References: <20020917142054.5C4E916F16@example.com>
From: Daniel Quinlan <quinlan@pathname.com>
Date: 17 Sep 2002 13:31:49 -0700
In-Reply-To: yyyy@example.com's message of "Tue, 17 Sep 2002 15:20:49 +0100"
Message-Id: <yf2znuge4sq.fsf@proton.pathname.com>
Lines: 25
X-Mailer: Gnus v5.7/Emacs 20.7
X-Spam-Status: No, hits=-8.9 required=7.0
	tests=AWL,EMAIL_ATTRIBUTION,IN_REP_TO,QUOTED_EMAIL_TEXT,
	      REFERENCES
	version=2.50-cvs
X-Spam-Level:

jm@jmason.org (Justin Mason) writes:

> Folks who've been hacking on the DNSBLs: would it be worthwhile commenting
> this in HEAD, seeing as it only gets .77 anyway?
>
> Sounds like the (a) broken server and (b) low hitrate combine to make it
> not-so-useful IMO.

No, in my opinion, it's purely a bug in SA (or the libraries we use,
which is the same thing) that we don't handle outages of network
services better.

The rule is useful and it does help reduce spam, we should keep it.  I
have a feeling the DNSBL rules will cluster a bit more heavily around
the 1.0 to 2.0 range once we start using the new GA on them.

Also, 0.77 was a slightly conservative number.  Since I didn't have
real-time data, I typically used the lower or median number of different
periods (most recent month, two months, six months), depending on the
trend of the period data (better performance for recent messages ->
favor recent scores, worse performance for recent messages -> favor
lowest scores, never pick the highest number unless the rule was very
accurate and the highest number was for the most recent data).

Dan