Bayesian filtering of this group's riff raff. [Archive]

View Full Version : Bayesian filtering of this group's riff raff.

Tman

September 4th 08, 09:53 PM

Kind of on topic, let me tell ya.
Anyone happen across a good Usenet filter that uses Bayesian filtering
or the similar, to filter out unwanted articles based upon a user's (or
a community's) preference.

For those that don't know, these are commonly used in email spam
filters. I think it would work better than a crude old kill file.
It'll identify keywords, or patterns and attributes of messages that you
like (want to see), and those that you don't. It'll learn from your
feedback and then get smart enough to classify the messages. It works
_really well_ for email, in spite of spammers trying to thwart it.

This could be even more powerful by having a (secure) community of users
with common interests within the subset of RAP vote to train the
classifier with more volume and less user effort. WOuld need to think
about vandalism and make sure it is secure however.

Anything like this out there - for NNTP / news? I'd apprecaite any
leads. Guess I can just google this,,, and I will now, but also
interested in what some feedback might be.

RAP motivates me to write a crude one and see how it works, perhaps by
adopting pieces of spamassassin (a classifier commonly used for email).

T

John Clear

September 4th 08, 10:04 PM

In article >, Tman <x@x> wrote:
>Kind of on topic, let me tell ya.
>Anyone happen across a good Usenet filter that uses Bayesian filtering
>or the similar, to filter out unwanted articles based upon a user's (or
>a community's) preference.

You don't need anything so complex. I just killfile MX, Bertie,
and ALL followups to them, and that kills 99% of the crap.

200 new articles goes to 20 after the killfile takes care of the
circle jerkers.

John
--
John Clear - http://www.clear-prop.org/

Mike[_22_]

September 4th 08, 10:11 PM

"Tman" <x@x> wrote in message
. ..
> Kind of on topic, let me tell ya.
> Anyone happen across a good Usenet filter that uses Bayesian filtering or
> the similar, to filter out unwanted articles based upon a user's (or a
> community's) preference.

The best you're going to get is a client side filter like nfilter. It takes
a bit to learn how to use it and set it up, but with it you can even filter
out the nymshifters by using parts of their header they can't change.

alexy

September 4th 08, 11:01 PM

Tman <x@x> wrote:

>Kind of on topic, let me tell ya.
>Anyone happen across a good Usenet filter that uses Bayesian filtering
>or the similar, to filter out unwanted articles based upon a user's (or
>a community's) preference.
>
>For those that don't know, these are commonly used in email spam
>filters. I think it would work better than a crude old kill file.
>It'll identify keywords, or patterns and attributes of messages that you
>like (want to see), and those that you don't. It'll learn from your
>feedback and then get smart enough to classify the messages. It works
>_really well_ for email, in spite of spammers trying to thwart it.
>
>This could be even more powerful by having a (secure) community of users
>with common interests within the subset of RAP vote to train the
>classifier with more volume and less user effort. WOuld need to think
>about vandalism and make sure it is secure however.
>
>Anything like this out there - for NNTP / news? I'd apprecaite any
>leads. Guess I can just google this,,, and I will now, but also
>interested in what some feedback might be.
>
>RAP motivates me to write a crude one and see how it works, perhaps by
>adopting pieces of spamassassin (a classifier commonly used for email).
>
>T

Popfile, which I use to classify emails, has an NNTP client proxy
component. I haven't used it, but it might be worth a try. For email,
what I like about popfile is that it allows multiple classifications
(all using bayesian filtering), not just spam/nonspam. I use it
classify mail as
spam/personal/business/client/shopping/bills/newletters/othernonspam/unclassified.
Currently running about 97% accuracy, and most of the 3% misfiled are
"false negatives", i.e. "unclassified", not ones that have been
classified incorrectly.

Fine popfile on sourceforge.

Report back if you find it or another program to work well for this.

There is another approach. Newsproxy (aka nfilter) is an old program
no longer supported by its author that provides rules-based filtering
_before_ your client, and with much more flexibility than any news
client I have seen.
--
Alex -- Replace "nospam" with "mail" to reply by email. Checked infrequently.

Bertie the Bunyip[_25_]

September 5th 08, 01:50 AM

Tman <x@x> wrote in :

> Kind of on topic, let me tell ya.
> Anyone happen across a good Usenet filter that uses Bayesian filtering
> or the similar, to filter out unwanted articles based upon a user's (or
> a community's) preference.
>
> For those that don't know, these are commonly used in email spam
> filters. I think it would work better than a crude old kill file.
> It'll identify keywords, or patterns and attributes of messages that you
> like (want to see), and those that you don't. It'll learn from your
> feedback and then get smart enough to classify the messages. It works
> _really well_ for email, in spite of spammers trying to thwart it.
>
> This could be even more powerful by having a (secure) community of users
> with common interests within the subset of RAP vote to train the
> classifier with more volume and less user effort. WOuld need to think
> about vandalism and make sure it is secure however.
>
> Anything like this out there - for NNTP / news? I'd apprecaite any
> leads. Guess I can just google this,,, and I will now, but also
> interested in what some feedback might be.
>
> RAP motivates me to write a crude one and see how it works, perhaps by
> adopting pieces of spamassassin (a classifier commonly used for email).
>

oooh!

A challenge!

Bertie
>

Tman

September 5th 08, 02:04 AM

Bertie the Bunyip wrote:
> Tman <x@x> wrote in :

>
> oooh!
>
> A challenge!
>
> Bertie
>

For me and for you.

I'm also intrigued with the possibility of finding out who you are and
revealing it. That would be kind of fun.

Can you tell me who I am? Anyone?

T

Bertie the Bunyip[_24_]

September 5th 08, 02:27 AM

Tman <x@x> wrote in :

> Bertie the Bunyip wrote:
>> Tman <x@x> wrote in :
>
>>
>> oooh!
>>
>> A challenge!
>>
>> Bertie
>>
>
> For me and for you.
>
> I'm also intrigued with the possibility of finding out who you are and
> revealing it. That would be kind of fun.

oh I love a god game of "find the bunyip"

Many have tried. No luck yet anyhow.
>
> Can you tell me who I am? Anyone?

You need to look into your center to know that, grasshopper.

Bertie

Frank Olson

September 5th 08, 07:03 AM

Tman wrote:
> Kind of on topic, let me tell ya.
> Anyone happen across a good Usenet filter that uses Bayesian filtering
> or the similar, to filter out unwanted articles based upon a user's (or
> a community's) preference.

I dunno... I'm still looking for a "Maxian" filter... <<ducking>>

Bob Fry

September 6th 08, 07:42 PM

It's not Bayesian, but it's pretty good: gnus. Problem is you have to
install the emacs text editor or variant (xemacs, etc), then learn
some of the rather obscure keystrokes. It's still very much keyboard
based rather than GUI/mouse.

gnus uses not kill files but score files:

All articles have a default score (`gnus-summary-default-score'),
which is 0 by default. This score may be raised or lowered either
interactively or by score files. Articles that have a score lower than
`gnus-summary-mark-below' are marked as read.

Gnus will read any "score files" that apply to the current group
before generating the summary buffer.

There are several commands in the summary buffer that insert score
entries based on the current article. You can, for instance, ask Gnus
to lower or increase the score of all articles with a certain subject.

There are two sorts of scoring entries: Permanent and temporary.
Temporary score entries are self-expiring entries. Any entries that are
temporary and have not been used for, say, a week, will be removed
silently to help keep the sizes of the score files down.

.. . . . .

7.6 Adaptive Scoring
====================

If all this scoring is getting you down, Gnus has a way of making it all
happen automatically--as if by magic. Or rather, as if by artificial
stupidity, to be precise.

When you read an article, or mark an article as read, or kill an
article, you leave marks behind. On exit from the group, Gnus can sniff
these marks and add score elements depending on what marks it finds.
You turn on this ability by setting `gnus-use-adaptive-scoring' to `t'
or `(line)'.

--
I did not know how to say goodbye. It was harder still, when I refused
to say it.
~ Native American saying