John GALLET wrote:
Re,
Anyway, these are the patterns I tried to code in FR_SPAMISLEGAL and
FR_HOWTOUNSUBSCRIBE, plus one I considered too generic (if you can't
read this mail in html, click here).
It might be worth collecting more ham that includes any such common
text -- or even _generating
John Wilcock writes:
> Justin Mason a écrit :
> > John GALLET writes:
> >> Well, thanks for writing it. I think its main weak point for French and
> >> other accented languages is handling the different encodings for a same
> >> char with an accent, some kind of "synonyms" list. The same letter,
Justin Mason a écrit :
John GALLET writes:
Well, thanks for writing it. I think its main weak point for French and
other accented languages is handling the different encodings for a same
char with an accent, some kind of "synonyms" list. The same letter, say "a
with an accent", can be misspell
John GALLET writes:
> Re,
>
> >> Anyway, these are the patterns I tried to code in FR_SPAMISLEGAL and
> >> FR_HOWTOUNSUBSCRIBE, plus one I considered too generic (if you can't
> >> read this mail in html, click here).
> >
> > It might be worth collecting more ham that includes any such common
> >
Re,
Anyway, these are the patterns I tried to code in FR_SPAMISLEGAL and
FR_HOWTOUNSUBSCRIBE, plus one I considered too generic (if you can't
read this mail in html, click here).
It might be worth collecting more ham that includes any such common
text -- or even _generating_ mails along those
John GALLET writes:
> Hi,
>
> > You run "seek-phrases-in-corpus" over the 2 corpora, and it'll spit out
> > the patterns; you can then write rules based on these.
>
> I did so, the results are interesting, though I do not really know where
> to go from there. If I take the first 50 "best" patte
John GALLET a écrit :
I think I have a newbye simple problem of philosophy/strategy: my
approach, for what it's worth, was that I flag anything that contains
some unsubscribe links and French law reminders because anyway all the
ones I receive are spam, and I add the opt-in mailing/newsletter I
Hi,
If these are hit rates with a very minimal daily corpus, don't know if the
present ruleset is ready for production unless you have 0 tolerance for any
bulk, period
I'm afraid I must agree. I don't have a confirmed and sorted corpus per se,
but after a single night's live testing with ver
On Dienstag, 24. Juni 2008 John Wilcock wrote:
> with just a bit of fine tuning
I guess John Gallet needs a bigger corpus, maybe you could share some
ham/spam with him. He does the work to create the rules, and with
better corpus the rules will become better. I know this, I maintain the
GERMAN
Yet Another Ninja a écrit :
If these are hit rates with a very minimal daily corpus, don't know if
the present ruleset is ready for production unless you have 0 tolerance
for any bulk, period
I'm afraid I must agree. I don't have a confirmed and sorted corpus per
se, but after a single night'
Re,
I excluded the last two rules from my masscheck to avoid FPs as these
ESPs/X-Mailers are definitely grey, "import rcpt list and blast" sort of ESPs
not black for global use.
If you can point me to some more information on how to do that, on-list or
off-list, I am interested. I am new to
Hi,
You run "seek-phrases-in-corpus" over the 2 corpora, and it'll spit out
the patterns; you can then write rules based on these.
I did so, the results are interesting, though I do not really know where
to go from there. If I take the first 50 "best" patterns and strip off the
obvious stand
On 6/23/2008 4:36 PM, John GALLET wrote:
Hi,
First of all, thanks to Justin for patiently helping me to install
mass-check and pointing me in the right direction. I will try to run the
algorithms tonight to see what they come up with.
In the meantime, you can find a hit-frequencies report at
Thanks for taking this burden upon yourself. One other thing you should be
prepared to do, if you're willing to devote long-term responsibility to these
rules, is to provide sa-update-compatible feeds of your dynamic rules. This
is another thing that Justin can probably help you with.
I am hap
Re,
Looking at the rules, I'm worried about false positives on genuine opt-in
advertising. I have a number of users who choose to receive all kinds of
advertising blurb,
This is one of the reasons why I did not hunt for "click here" and "if you
can't see this email in html". Now correct me i
John GALLET a écrit :
Any feedback on the results (not enough in corpus, bad rules, good
rules, etc.) appreciated.
Looking at the rules, I'm worried about false positives on genuine
opt-in advertising. I have a number of users who choose to receive all
kinds of advertising blurb, so I'll run
On Mon, 23 Jun 2008, John GALLET wrote:
First of all, thanks to Justin for patiently helping me to install
mass-check and pointing me in the right direction.
Applause for Justin! This is the sort of thing we need to see for many
more specialized spam categories...
I will try to run the alg
Hi,
First of all, thanks to Justin for patiently helping me to install
mass-check and pointing me in the right direction. I will try to run the
algorithms tonight to see what they come up with.
In the meantime, you can find a hit-frequencies report at:
http://www.saphirtech.fr/spam/freqs_2008
Giampaolo Tomassoni writes:
> > -Original Message-
> > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
> > Sent: Thursday, June 19, 2008 5:49 PM
> > To: Giampaolo Tomassoni
> > Cc: [EMAIL PROTECTED]; users@spamassassin.apache.org
> > Subject:
> -Original Message-
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
> Sent: Thursday, June 19, 2008 5:49 PM
> To: Giampaolo Tomassoni
> Cc: [EMAIL PROTECTED]; users@spamassassin.apache.org
> Subject: Re: [Rule Set proposal] French Rules
>
> ...omissis...
>
Giampaolo Tomassoni writes:
> > -Original Message-
> > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
> > Sent: Thursday, June 19, 2008 5:28 PM
> > To: Giampaolo Tomassoni
> > Cc: [EMAIL PROTECTED]; users@spamassassin.apache.org
> > Subject:
> -Original Message-
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
> Sent: Thursday, June 19, 2008 5:28 PM
> To: Giampaolo Tomassoni
> Cc: [EMAIL PROTECTED]; users@spamassassin.apache.org
> Subject: Re: [Rule Set proposal] French Rules
>
>
>
Giampaolo Tomassoni writes:
> > -Original Message-
> > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
> > Sent: Wednesday, June 18, 2008 12:10 PM
> > To: John GALLET
> > Cc: users@spamassassin.apache.org
> > Subject: Re: [Rule Set propos
> -Original Message-
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
> Sent: Wednesday, June 18, 2008 12:10 PM
> To: John GALLET
> Cc: users@spamassassin.apache.org
> Subject: Re: [Rule Set proposal] French Rules
>
> ...omissis...
>
> by the way, if you&
I still miss samples for two rules, even if I did had hits according to
/var/spool/maillog I did not save them.
I added a sample for the FR_NOTSPAM rule, and I removed the
FR_YOURELUCKY rule as I see other forms of the text getting through so
it is not efficient. On the other hand, nearly al
John GALLET writes:
> Hi,
>
> This is my first post on this list and first ruleset, so please point me
> to the right place/documents if I am doing anything wrong.
>
> According to a search of this list on markmail.org, there have been few
> subjects about spam in French and (no disrespect mea
Hi,
I was able to access the URL you mentioned, but not all of the files
below it. I received:
"Forbidden
You don't have permission to access /spam/FR_PAYLESSTAXES.txt on this server."
Sorry guys, only the ruleset file (the one I tried, of course) was
readable, all the non empty spam samples
On Tue, Jun 17, 2008 at 12:11 PM, John GALLET
<[EMAIL PROTECTED]> wrote:
> Hi,
>
> This is my first post on this list and first ruleset, so please point me to
> the right place/documents if I am doing anything wrong.
>
> According to a search of this list on markmail.org, there have been few
> subj
28 matches
Mail list logo