[Users] bogofilter gone mad
richard
richard.bown at blueyonder.co.uk
Fri Feb 24 11:27:30 CET 2012
On Thu, 23 Feb 2012 18:00:29 -0500
David Relson <relson at osagesoftware.com> wrote:
> On Thu, 23 Feb 2012 22:18:05 +0000
> richard wrote:
>
> > On Thu, 23 Feb 2012 13:50:30 +0100
> > Gour <gour at atmarama.net> wrote:
> >
> > > On Thu, 23 Feb 2012 12:46:18 +0000
> > > richard <richard.bown at blueyonder.co.uk> wrote:
> > >
> > > > the word list was a bit large 4Mb, so I deleted it
> > >
> > > [gour at atmarama gour] ls -lh .bogofilter
> > > total 375M
> > > -rw-r--r-- 1 gour users 375M Vel 21 17:14 wordlist.db
> > >
> > > [gour at atmarama gour] file .bogofilter/wordlist.db
> > > .bogofilter/wordlist.db: SQLite 3.x database
> > >
> > > Sincerely,
> > > Gour
> > >
> >
> > Must have a lot of words.
> > But I have big problems as its still taking 80% of incoming mail as
> > spam as putting into trash, I have two inboxes, inbox and trash.
> >
> > This is after deleting the wordlist.db, are any other files used by
> > bogofilter ??
>
> Bogofilter assigns scores to words using information in the wordlist.
> The wordlist has counts of the number of spam messages and the
> number of ham messages used to build it. For each token (word) in the
> wordlist, ham and spam counts are stored indicating how many ham and
> spam messages (respectively) the word has been used in. From this
> information bogofilter can decide whether a word is "hammy" or
> "spammy". For example, a word used in 10% of all spam messages and in
> 90% of all ham messages is "hammy". A word in 10% of spam and 1% of
> ham is "spammy". The balance of hammy vs spammy results for all words
> in a message results in a score for the message.
>
> It may help to train bogofilter with more ham messages (than you've
> already done) in order to increase bogofilter's vocabulary. The more
> information in the wordlist, the better bogofilter can do.
>
> Bogofilter also has a configuration file that sets the thresholds
> for considering a message to be ham, spam, or unsure. Changing
> those values will affect whether message with a given score is
> considered ham, or spam, or unsure.
>
> HTH,
>
> David
> _______________________________________________
>
Thanks David & Leroy
I have a much better understanding of how it works, except
How much information does the subject material have, or is it entirely
the contents of the message body that's used.
Thanks
--
--
Best wishes / 73
Richard Bown
e-mail: richard at g8jvm.com or richard.bown at blueyonder.co.uk
nil carborundum a illegitemis
##################################################################################
Ham Call G8JVM . OS Fedora FC16 x86_64 on a Dell Insiron N5030 laptop
Maidenhead QRA: IO82SP38, LAT. 52 39.720' N LONG. 2 28.171 W ( degs
mins ) QRV HF + VHF Microwave 23 cms:140W,13 cms:100W,6 cms:10W & 3
cms:5W
##################################################################################
More information about the Users
mailing list