Re: seekrules over French spam (was Re: [Rule Set proposal] French Rules

2008-06-27 Thread mouss
John GALLET wrote: Re, Anyway, these are the patterns I tried to code in FR_SPAMISLEGAL and FR_HOWTOUNSUBSCRIBE, plus one I considered too generic (if you can't read this mail in html, click here). It might be worth collecting more ham that includes any such common text -- or even _generating

Re: seekrules over French spam (was Re: [Rule Set proposal] French Rules

2008-06-24 Thread Justin Mason
John Wilcock writes: > Justin Mason a écrit : > > John GALLET writes: > >> Well, thanks for writing it. I think its main weak point for French and > >> other accented languages is handling the different encodings for a same > >> char with an accent, some kind of "synonyms" list. The same letter,

Re: seekrules over French spam (was Re: [Rule Set proposal] French Rules

2008-06-24 Thread John Wilcock
Justin Mason a écrit : John GALLET writes: Well, thanks for writing it. I think its main weak point for French and other accented languages is handling the different encodings for a same char with an accent, some kind of "synonyms" list. The same letter, say "a with an accent", can be misspell

Re: seekrules over French spam (was Re: [Rule Set proposal] French Rules

2008-06-24 Thread Justin Mason
John GALLET writes: > Re, > > >> Anyway, these are the patterns I tried to code in FR_SPAMISLEGAL and > >> FR_HOWTOUNSUBSCRIBE, plus one I considered too generic (if you can't > >> read this mail in html, click here). > > > > It might be worth collecting more ham that includes any such common > >

Re: seekrules over French spam (was Re: [Rule Set proposal] French Rules

2008-06-24 Thread John GALLET
Re, Anyway, these are the patterns I tried to code in FR_SPAMISLEGAL and FR_HOWTOUNSUBSCRIBE, plus one I considered too generic (if you can't read this mail in html, click here). It might be worth collecting more ham that includes any such common text -- or even _generating_ mails along those

Re: seekrules over French spam (was Re: [Rule Set proposal] French Rules

2008-06-24 Thread Justin Mason
John GALLET writes: > Hi, > > > You run "seek-phrases-in-corpus" over the 2 corpora, and it'll spit out > > the patterns; you can then write rules based on these. > > I did so, the results are interesting, though I do not really know where > to go from there. If I take the first 50 "best" patte

Re: Philosophy for opt-in (was Re: [Rule Set proposal] French Rules

2008-06-24 Thread John Wilcock
John GALLET a écrit : I think I have a newbye simple problem of philosophy/strategy: my approach, for what it's worth, was that I flag anything that contains some unsubscribe links and French law reminders because anyway all the ones I receive are spam, and I add the opt-in mailing/newsletter I

Philosophy for opt-in (was Re: [Rule Set proposal] French Rules

2008-06-24 Thread John GALLET
Hi, If these are hit rates with a very minimal daily corpus, don't know if the present ruleset is ready for production unless you have 0 tolerance for any bulk, period I'm afraid I must agree. I don't have a confirmed and sorted corpus per se, but after a single night's live testing with ver

Re: hit frequencies (was Re: [Rule Set proposal] French Rules

2008-06-24 Thread Michael Monnerie
On Dienstag, 24. Juni 2008 John Wilcock wrote: > with just a bit of fine tuning I guess John Gallet needs a bigger corpus, maybe you could share some ham/spam with him. He does the work to create the rules, and with better corpus the rules will become better. I know this, I maintain the GERMAN

Re: hit frequencies (was Re: [Rule Set proposal] French Rules

2008-06-24 Thread John Wilcock
Yet Another Ninja a écrit : If these are hit rates with a very minimal daily corpus, don't know if the present ruleset is ready for production unless you have 0 tolerance for any bulk, period I'm afraid I must agree. I don't have a confirmed and sorted corpus per se, but after a single night'

Re: hit frequencies (was Re: [Rule Set proposal] French Rules

2008-06-23 Thread John GALLET
Re, I excluded the last two rules from my masscheck to avoid FPs as these ESPs/X-Mailers are definitely grey, "import rcpt list and blast" sort of ESPs not black for global use. If you can point me to some more information on how to do that, on-list or off-list, I am interested. I am new to

seekrules over French spam (was Re: [Rule Set proposal] French Rules

2008-06-23 Thread John GALLET
Hi, You run "seek-phrases-in-corpus" over the 2 corpora, and it'll spit out the patterns; you can then write rules based on these. I did so, the results are interesting, though I do not really know where to go from there. If I take the first 50 "best" patterns and strip off the obvious stand

Re: hit frequencies (was Re: [Rule Set proposal] French Rules

2008-06-23 Thread Yet Another Ninja
On 6/23/2008 4:36 PM, John GALLET wrote: Hi, First of all, thanks to Justin for patiently helping me to install mass-check and pointing me in the right direction. I will try to run the algorithms tonight to see what they come up with. In the meantime, you can find a hit-frequencies report at

Re: hit frequencies (was Re: [Rule Set proposal] French Rules

2008-06-23 Thread John GALLET
Thanks for taking this burden upon yourself. One other thing you should be prepared to do, if you're willing to devote long-term responsibility to these rules, is to provide sa-update-compatible feeds of your dynamic rules. This is another thing that Justin can probably help you with. I am hap

Re: hit frequencies (was Re: [Rule Set proposal] French Rules

2008-06-23 Thread John GALLET
Re, Looking at the rules, I'm worried about false positives on genuine opt-in advertising. I have a number of users who choose to receive all kinds of advertising blurb, This is one of the reasons why I did not hunt for "click here" and "if you can't see this email in html". Now correct me i

Re: hit frequencies (was Re: [Rule Set proposal] French Rules

2008-06-23 Thread John Wilcock
John GALLET a écrit : Any feedback on the results (not enough in corpus, bad rules, good rules, etc.) appreciated. Looking at the rules, I'm worried about false positives on genuine opt-in advertising. I have a number of users who choose to receive all kinds of advertising blurb, so I'll run

Re: hit frequencies (was Re: [Rule Set proposal] French Rules

2008-06-23 Thread John Hardin
On Mon, 23 Jun 2008, John GALLET wrote: First of all, thanks to Justin for patiently helping me to install mass-check and pointing me in the right direction. Applause for Justin! This is the sort of thing we need to see for many more specialized spam categories... I will try to run the alg

hit frequencies (was Re: [Rule Set proposal] French Rules

2008-06-23 Thread John GALLET
Hi, First of all, thanks to Justin for patiently helping me to install mass-check and pointing me in the right direction. I will try to run the algorithms tonight to see what they come up with. In the meantime, you can find a hit-frequencies report at: http://www.saphirtech.fr/spam/freqs_2008

Re: [Rule Set proposal] French Rules

2008-06-19 Thread Justin Mason
Giampaolo Tomassoni writes: > > -Original Message- > > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] > > Sent: Thursday, June 19, 2008 5:49 PM > > To: Giampaolo Tomassoni > > Cc: [EMAIL PROTECTED]; users@spamassassin.apache.org > > Subject:

RE: [Rule Set proposal] French Rules

2008-06-19 Thread Giampaolo Tomassoni
> -Original Message- > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] > Sent: Thursday, June 19, 2008 5:49 PM > To: Giampaolo Tomassoni > Cc: [EMAIL PROTECTED]; users@spamassassin.apache.org > Subject: Re: [Rule Set proposal] French Rules > > ...omissis... >

Re: [Rule Set proposal] French Rules

2008-06-19 Thread Justin Mason
Giampaolo Tomassoni writes: > > -Original Message- > > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] > > Sent: Thursday, June 19, 2008 5:28 PM > > To: Giampaolo Tomassoni > > Cc: [EMAIL PROTECTED]; users@spamassassin.apache.org > > Subject:

RE: [Rule Set proposal] French Rules

2008-06-19 Thread Giampaolo Tomassoni
> -Original Message- > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] > Sent: Thursday, June 19, 2008 5:28 PM > To: Giampaolo Tomassoni > Cc: [EMAIL PROTECTED]; users@spamassassin.apache.org > Subject: Re: [Rule Set proposal] French Rules > > >

Re: [Rule Set proposal] French Rules

2008-06-19 Thread Justin Mason
Giampaolo Tomassoni writes: > > -Original Message- > > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] > > Sent: Wednesday, June 18, 2008 12:10 PM > > To: John GALLET > > Cc: users@spamassassin.apache.org > > Subject: Re: [Rule Set propos

RE: [Rule Set proposal] French Rules

2008-06-19 Thread Giampaolo Tomassoni
> -Original Message- > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] > Sent: Wednesday, June 18, 2008 12:10 PM > To: John GALLET > Cc: users@spamassassin.apache.org > Subject: Re: [Rule Set proposal] French Rules > > ...omissis... > > by the way, if you&

Re: [Rule Set proposal] French Rules

2008-06-19 Thread John GALLET
I still miss samples for two rules, even if I did had hits according to /var/spool/maillog I did not save them. I added a sample for the FR_NOTSPAM rule, and I removed the FR_YOURELUCKY rule as I see other forms of the text getting through so it is not efficient. On the other hand, nearly al

Re: [Rule Set proposal] French Rules

2008-06-18 Thread Justin Mason
John GALLET writes: > Hi, > > This is my first post on this list and first ruleset, so please point me > to the right place/documents if I am doing anything wrong. > > According to a search of this list on markmail.org, there have been few > subjects about spam in French and (no disrespect mea

Re: [Rule Set proposal] French Rules

2008-06-17 Thread John GALLET
Hi, I was able to access the URL you mentioned, but not all of the files below it. I received: "Forbidden You don't have permission to access /spam/FR_PAYLESSTAXES.txt on this server." Sorry guys, only the ruleset file (the one I tried, of course) was readable, all the non empty spam samples

Re: [Rule Set proposal] French Rules

2008-06-17 Thread Big Wave Dave
On Tue, Jun 17, 2008 at 12:11 PM, John GALLET <[EMAIL PROTECTED]> wrote: > Hi, > > This is my first post on this list and first ruleset, so please point me to > the right place/documents if I am doing anything wrong. > > According to a search of this list on markmail.org, there have been few > subj