On 2014-12-05 08:28, Axb wrote:
On 12/05/2014 05:20 PM, Kris Deugau wrote:
Axb wrote:
On 12/05/2014 01:15 AM, Ian Zimmerman wrote:
On Thu, 04 Dec 2014 22:41:13 +0100,
Axb <[email protected]> wrote:
Axb> To be able to create usable rules, several times/day I need feeds
Axb> to spit *at least* +150k/day. As I don't have the data....
150k of what? Bytes? Emails? Tokens?
Sorry, thought this was obvious...
SOUGHT type rule generation extracts txt strings from spams so it means
+150k spams/day
It seems to work reasonably well for me with ~2-3K each ham and spam,
and even provides a handful of subrules even with ~225 spam subtype
messages. (I generate a number of sets of rules with different subtypes
of spam.)
It's probably not nearly as *effective* as it could be with larger
working sets.
Agreed.
... I use about 5-15k from the last 8 hrs (amount varies dramatically) per rule
gen run *for local* use, but that's hardly representative for global coverage.
Add LKML to your large batch of training email and I bet you get "interesting"
results, at best.
And one must always remember that one person's spam is another person's ham.
{o.o}