On 2014-12-05 08:28, Axb wrote:
On 12/05/2014 05:20 PM, Kris Deugau wrote:
Axb wrote:
On 12/05/2014 01:15 AM, Ian Zimmerman wrote:
On Thu, 04 Dec 2014 22:41:13 +0100,
Axb <[email protected]> wrote:

Axb> To be able to create usable rules, several times/day I need feeds
Axb> to spit *at least* +150k/day. As I don't have the data....

150k of what?  Bytes?  Emails?  Tokens?

Sorry, thought this was obvious...

SOUGHT type rule generation extracts txt strings from spams so it means
+150k spams/day

It seems to work reasonably well for me with ~2-3K each ham and spam,
and even provides a handful of subrules even with ~225 spam subtype
messages.  (I generate a number of sets of rules with different subtypes
of spam.)

It's probably not nearly as *effective* as it could be with larger
working sets.

Agreed.

... I use about 5-15k from the last 8 hrs (amount varies dramatically) per rule
gen run *for local* use, but that's hardly representative for global coverage.

Add LKML to your large batch of training email and I bet you get "interesting" results, at best.

And one must always remember that one person's spam is another person's ham.

{o.o}

Reply via email to