[Users] bogofilter gone mad

richard richard.bown at blueyonder.co.uk
Fri Feb 24 11:27:30 CET 2012


On Thu, 23 Feb 2012 18:00:29 -0500
David Relson <relson at osagesoftware.com> wrote:

> On Thu, 23 Feb 2012 22:18:05 +0000
> richard wrote:
> 
> > On Thu, 23 Feb 2012 13:50:30 +0100
> > Gour <gour at atmarama.net> wrote:
> > 
> > > On Thu, 23 Feb 2012 12:46:18 +0000
> > > richard <richard.bown at blueyonder.co.uk> wrote:
> > > 
> > > > the word list was a bit large 4Mb, so I deleted it
> > > 
> > > [gour at atmarama gour] ls -lh .bogofilter
> > > total 375M
> > > -rw-r--r-- 1 gour users 375M Vel 21 17:14 wordlist.db
> > > 
> > > [gour at atmarama gour] file .bogofilter/wordlist.db 
> > > .bogofilter/wordlist.db: SQLite 3.x database
> > > 
> > > Sincerely,
> > > Gour
> > > 
> > 
> > Must have a lot of words.
> > But I have big problems as its still taking 80% of incoming mail as
> > spam as putting into trash, I have two inboxes, inbox and trash.
> > 
> > This is after deleting the wordlist.db, are any other files used by
> > bogofilter ??
> 
> Bogofilter assigns scores to words using information in the wordlist.
> The wordlist has counts of the number of spam messages and the
> number of ham messages used to build it.  For each token (word) in the
> wordlist, ham and spam counts are stored indicating how many ham and
> spam messages (respectively) the word has been used in.  From this
> information bogofilter can decide whether a word is "hammy" or
> "spammy". For example, a word used in 10% of all spam messages and in
> 90% of all ham messages is "hammy".   A word in 10% of spam and 1% of
> ham is "spammy".  The balance of hammy vs spammy results for all words
> in a message results in a score for the message.
> 
> It may help to train bogofilter with more ham messages (than you've
> already done) in order to increase bogofilter's vocabulary.  The more
> information in the wordlist, the better bogofilter can do.
> 
> Bogofilter also has a configuration file that sets the thresholds
> for considering a message to be ham, spam, or unsure.  Changing
> those values will affect whether message with a given score is
> considered ham, or spam, or unsure.
> 
> HTH,
> 
> David
> _______________________________________________
>

Thanks David & Leroy
I have a much better understanding of how it works, except
How much information does the subject material have, or is it entirely
the contents of the message body that's used.
Thanks
-- 

-- 
Best wishes / 73
Richard Bown

e-mail: richard at g8jvm.com   or   richard.bown at blueyonder.co.uk

nil carborundum a illegitemis
##################################################################################
Ham Call G8JVM . OS Fedora FC16 x86_64 on a Dell Insiron N5030 laptop
Maidenhead QRA: IO82SP38, LAT. 52 39.720' N LONG. 2 28.171 W ( degs
mins ) QRV HF + VHF Microwave 23 cms:140W,13 cms:100W,6 cms:10W & 3
cms:5W
################################################################################## 





More information about the Users mailing list