Why are BAYES_00 to BAYES_40 scores negative?

View: New views
4 Messages — Rating Filter:   Alert me  

Why are BAYES_00 to BAYES_40 scores negative?

by Robert Case :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

I'm going to ask a really silly question...

First, my particulars:
Fedora Core 8 x86_64
Qmail 1.03 (Running a Modified QmailRocks configuration, which is everything except vpopmail)
Qscan
ClamAV
SpamAssassin 3.2.4

I periodically audit messages that get through SpamAssassin to see why they didn't reach the score threshold (mine is set at 3.5).  I compare the messages with the scoring details that get logged in "maillog".

I noticed that in many of the messages that got through were hitting the BAYES_00 through BAYES_40 rules.  I looked at the rules page, and the scores for those rules are negative (ranging from -2.599 (eek!) to -0.185).  When you get to BAYES_50 and higher, the scores turn positive.  Also, in many instances, the negative BAYES_* scores made the difference between reaching the threshold and not.

My question is WHY are those rules negative?

I went ahead and assigned a positive score for those rules (ranging from 0.001 to 0.040), but I figured I had better ask here why those scores are negative.  I'm figuring there's a good reason, and I don't want to shoot myself in the foot.

Thanks,

Robert...

Re: Why are BAYES_00 to BAYES_40 scores negative?

by Sahil Tandon :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Robert Case <consulting@...> wrote:

> I periodically audit messages that get through SpamAssassin to see why they
> didn't reach the score threshold (mine is set at 3.5).  I compare the
> messages with the scoring details that get logged in "maillog".
>
> I noticed that in many of the messages that got through were hitting the
> BAYES_00 through BAYES_40 rules.  I looked at the rules page, and the scores
> for those rules are negative (ranging from -2.599 (eek!) to -0.185).  When
> you get to BAYES_50 and higher, the scores turn positive.  Also, in many
> instances, the negative BAYES_* scores made the difference between reaching
> the threshold and not.
>
> My question is WHY are those rules negative?

Because bayesian rules are not only supposed to stop spam, but also help ham
get through your filter.  Your bayesian database thinks those spammy mails
have hammy attributes.  You can try sa-learning those emails so SA will
eventually start assigning a positive score to similar emails in the future.

[...]

--
Sahil Tandon <sahil@...>

Re: Why are BAYES_00 to BAYES_40 scores negative?

by Theo Van Dinter-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Thu, Jul 03, 2008 at 05:00:13PM -0700, Robert Case wrote:
> I noticed that in many of the messages that got through were hitting the
> BAYES_00 through BAYES_40 rules.  I looked at the rules page, and the scores
> for those rules are negative (ranging from -2.599 (eek!) to -0.185).  When
> you get to BAYES_50 and higher, the scores turn positive.  Also, in many
> instances, the negative BAYES_* scores made the difference between reaching
> the threshold and not.

Yep.

> My question is WHY are those rules negative?

Because they are ham detection rules.

> I went ahead and assigned a positive score for those rules (ranging from
> 0.001 to 0.040), but I figured I had better ask here why those scores are
> negative.  I'm figuring there's a good reason, and I don't want to shoot
> myself in the foot.

Bayes provides a probability of a message being spam.  Therefore: 50% is "not
sure either way", 0% is "not spam", 100% is "definitely spam".

--
Randomly Selected Tagline:
"lp1 on fire" - Linux kernel error message


attachment0 (196 bytes) Download Attachment

Re: Why are BAYES_00 to BAYES_40 scores negative?

by Matt Kettler-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Robert Case wrote:

> I'm going to ask a really silly question...
>
> First, my particulars:
> Fedora Core 8 x86_64
> Qmail 1.03 (Running a Modified QmailRocks configuration, which is everything
> except vpopmail)
> Qscan
> ClamAV
> SpamAssassin 3.2.4
>
> I periodically audit messages that get through SpamAssassin to see why they
> didn't reach the score threshold (mine is set at 3.5).  I compare the
> messages with the scoring details that get logged in "maillog".
>
> I noticed that in many of the messages that got through were hitting the
> BAYES_00 through BAYES_40 rules.
>  
As several have pointed out already, BAYES_00 is a very strong
indication the message matches your non-spam training. Anything under 50
would indicate it is more likely to be not spam, and the lower the
number, the more likely it is to be nonspam. (in general the two numbers
are the percent chance the message is spam. 00 means 0% chance it's
spam, therefore 100% chance it's not, 40 would mean 40% chance of spam,
and therefore 60% chance it's not.)

If you've got a significant amount of spam matching low-scoring bayes
rules, you should re-examine your bayes training.

LightInTheBox - Buy quality products at wholesale price