[Users] spam filtering for tcltk at perl.org

RW rwmaillists at googlemail.com
Tue Apr 7 22:24:42 CEST 2020


On Tue, 7 Apr 2020 11:25:10 +0100
Dave Howorth wrote:

> On Sun, 5 Apr 2020 16:10:48 +0100
> RW <rwmaillists at googlemail.com> wrote:
> 

> > Try saving one of them before training and piping it through 
> > 
> >   bogofilter -vvv
> > 
> > This should tell you what tokens are contributing to the
> > classification. If your repeat this after training you can compare
> > the results and see if things are moving in the right direction.   
> 
> Well, I've now had a new message from the list so I've done this, but
> I don't really understand what I'm looking at or how it can help me.
> I've attached the bogofilter -vvv output from before and after I
> classified it as ham, and the source of the message itself, if
> anybody can understand the data.

The output contains a list of tokens sorted from least to most spammy.
Those token lines that end in a '+' contribute to the result, the middle
tokens that end in a '-' are ignored. What we are looking for are the
tokens the bottom that are counteracting a ham result.

These tokens are coming from the X-PMX-* headers, which I think are
from the Sophos PureMessage spam filter. It's not possible to tell from
the ordering at what stage they were added, but the name and bogofilter
statistics of the X-PMX-Perl header suggest they are added by the list.

You appear to have learned around 550 spams with these headers, and you
will probably have to train about 50 more list emails before bogofilter
settles down.






 





More information about the Users mailing list