Re: Best practice for learning submissions

2018-07-24 Thread John Hardin
On Tue, 24 Jul 2018, Nick Bright wrote: On 7/24/2018 9:58 AM, John Hardin wrote: However, unless you *really* trust the people who are providing training data, you don't train on the submissions without first reviewing them. Therefore, forwarding as an RFC-822 attachment isn't a deal-killer.

Re: Best practice for learning submissions

2018-07-24 Thread Kris Deugau
Nick Bright wrote: On 7/23/2018 11:49 PM, Bill Cole wrote: The goal is to get a copy of the message that is identical to what SA saw when it arrived. For IMAP users, this is easiest to get with a 'missed spam' mailbox into which users can move messages for learning. If you must rely on

Re: Best practice for learning submissions

2018-07-24 Thread Bill Cole
On 24 Jul 2018, at 13:39, Nick Bright wrote: On 7/23/2018 11:49 PM, Bill Cole wrote: The goal is to get a copy of the message that is identical to what SA saw when it arrived. For IMAP users, this is easiest to get with a 'missed spam' mailbox into which users can move messages for learning.

Re: Best practice for learning submissions

2018-07-24 Thread Nick Bright
On 7/24/2018 9:58 AM, John Hardin wrote: However, unless you *really* trust the people who are providing training data, you don't train on the submissions without first reviewing them. Therefore, forwarding as an RFC-822 attachment isn't a deal-killer. You can review the submission and if

Re: Best practice for learning submissions

2018-07-24 Thread Nick Bright
On 7/24/2018 1:47 AM, Pedro David Marco wrote: >On Tuesday, July 24, 2018, 1:38:59 AM GMT+2, Nick Bright wrote: >So I ask: what is the best practice for learning submissions when using  site-wide bayes? Nick, do all your users use the same MUA? There are some user level &qu

Re: Best practice for learning submissions

2018-07-24 Thread Nick Bright
On 7/23/2018 11:49 PM, Bill Cole wrote: The goal is to get a copy of the message that is identical to what SA saw when it arrived. For IMAP users, this is easiest to get with a 'missed spam' mailbox into which users can move messages for learning. If you must rely on forwarded submissions,

Re: Best practice for learning submissions

2018-07-24 Thread Alex
Hi, >> The problem I'm trying to solve is "how to implement a training system on >> my server". > > I'd suggest a manual review step before feeding the messages to Bayes. > > You **WILL** get users reporting all kinds of "unwanted today because > Reasons" but otherwise legitimate email as spam.

Re: Best practice for learning submissions

2018-07-24 Thread RW
On Mon, 23 Jul 2018 18:38:48 -0500 Nick Bright wrote: > When requesting submissions from users for use with sa-learn, if they > are going to forward the message somewhere; is it best for that to be > forwarded as an attachment, or forwarded inline? Don't use forward inline, the choice is

Re: Best practice for learning submissions

2018-07-24 Thread John Hardin
On Mon, 23 Jul 2018, Nick Bright wrote: On 7/23/2018 8:10 PM, Reindl Harald wrote: what exactly is the problem right-click on the attachments, save them to files and drag them to the imap training folder? What on earth is even right about it? I'm not going to do that for hundreds or even

Re: Best practice for learning submissions

2018-07-24 Thread John Hardin
On Mon, 23 Jul 2018, Nick Bright wrote: On 7/23/2018 7:30 PM, Reindl Harald wrote: provide imap-shared folders and code a script which fetches the raw-messages and fires sa-learn to the eml-files *never* train inline-forwardings And when that isn't an option, for example with POP3 clients?

Re: Best practice for learning submissions

2018-07-24 Thread RW
On Tue, 24 Jul 2018 10:34:42 -0400 Kris Deugau wrote: > Kris Deugau wrote: > > Nick Bright wrote: > > > >> The problem I'm trying to solve is "how to implement a training > >> system on my server". > > > > I'd suggest a manual review step before feeding the messages to > > Bayes. > > > >

Re: Best practice for learning submissions

2018-07-24 Thread Kris Deugau
Kris Deugau wrote: Nick Bright wrote: The problem I'm trying to solve is "how to implement a training system on my server". I'd suggest a manual review step before feeding the messages to Bayes. You **WILL** get users reporting all kinds of "unwanted today because Reasons" but otherwise

Re: Best practice for learning submissions

2018-07-24 Thread Kris Deugau
Nick Bright wrote: The problem I'm trying to solve is "how to implement a training system on my server". I'd suggest a manual review step before feeding the messages to Bayes. You **WILL** get users reporting all kinds of "unwanted today because Reasons" but otherwise legitimate email as

Re: Best practice for learning submissions

2018-07-24 Thread Alex Woick
Nick Bright schrieb am 24.07.2018 um 01:38: So I ask: what is the best practice for learning submissions when using site-wide bayes? From what I learnt about best practice: - before implementing spam-learning based on user-submissions, figure out how educated your users

Re: Best practice for learning submissions

2018-07-24 Thread Pedro David Marco
On Tuesday, July 24, 2018, 6:50:13 AM GMT+2, Bill Cole wrote: > Learning ham is harder Totally agree Bill, unless you use Microsoft technics...:  send everything to spam folder and if moved to inbox by user then... it is ham! -PedroD

Re: Best practice for learning submissions

2018-07-24 Thread Pedro David Marco
>On Tuesday, July 24, 2018, 1:38:59 AM GMT+2, Nick Bright wrote: >So I ask: what is the best practice for learning submissions when using  >site-wide bayes? Nick, do all your users use the same MUA?  There are some user level "plug-ins" that may be configured to

Re: Best practice for learning submissions

2018-07-23 Thread Bill Cole
So I ask: what is the best practice for learning submissions when using site-wide bayes? The goal is to get a copy of the message that is identical to what SA saw when it arrived. For IMAP users, this is easiest to get with a 'missed spam' mailbox into which users can move messages for learnin

Re: Best practice for learning submissions

2018-07-23 Thread David B Funk
On Mon, 23 Jul 2018, Nick Bright wrote: On 7/23/2018 7:55 PM, Reindl Harald wrote: and even if - whats the point to store the surrounding messages in the corpus which you should keep forever if you need rebuild from scratch later? what is the problem you try to solveand why can't you just

Re: Best practice for learning submissions

2018-07-23 Thread Nick Bright
On 7/23/2018 8:10 PM, Reindl Harald wrote: what exactly is the problem right-click on the attachments, save them to files and drag them to the imap training folder? What on earth is even right about it? I'm not going to do that for hundreds or even thousands of submissions sent in by users,

Re: Best practice for learning submissions

2018-07-23 Thread Nick Bright
On 7/23/2018 7:55 PM, Reindl Harald wrote: and even if - whats the point to store the surrounding messages in the corpus which you should keep forever if you need rebuild from scratch later? what is the problem you try to solveand why can't you just store the attachment instead the whole mail

Re: Best practice for learning submissions

2018-07-23 Thread Nick Bright
On 7/23/2018 7:49 PM, Reindl Harald wrote: surely, just right-click on the attachment and save it to a raw-message (eml-file) So that's a "no" (sa-learn doesn't know how to 'right click' an attachment). -- --- - Nick Bright

Re: Best practice for learning submissions

2018-07-23 Thread Nick Bright
On 7/23/2018 7:30 PM, Reindl Harald wrote: provide imap-shared folders and code a script which fetches the raw-messages and fires sa-learn to the eml-files *never* train inline-forwardings And when that isn't an option, for example with POP3 clients? Is it possible to train on attached

Best practice for learning submissions

2018-07-23 Thread Nick Bright
rning from a mailbox of my own spam (with full headers - the actual mails) is quite different from users *forwarding* spam for training. So I ask: what is the best practice for learning submissions when using site-wide bayes? -- --- - N