This is what i need Bowie
The query must be
select from_address, from_domain, to_address, subject from maillog where 
subject REGEXP '\b(?:(?:FedEx|Shipment|702193383246|Notification)\b.*?){3}';

But unfortunately mysql give error
ERROR 1139 (42000): Got error 'repetition-operator operand invalid' from regexp
MySQL regular expressions don't have lookarounds


Nicola Piazzi
CED - Sistemi
COMET s.p.a.
Via Michelino, 105 - 40127 Bologna - Italia
Tel.  +39 051.6079.293
Cell. +39 328.21.73.470
Web: www.gruppocomet.it<http://www.gruppocomet.it/>
[Descrizione: gc]

Da: Bowie Bailey [mailto:bowie_bai...@buc.com]
Inviato: mercoledì 28 settembre 2016 17:46
A: users@spamassassin.apache.org
Oggetto: Re: R: R: R: regular expression needed

I don't know of a way to do that with a simple regex.  But since you are 
writing a plugin, you could do it by parsing the output of a regex search.

1) Create a regex which will match on any combination of 3 of the words.  This 
will let you pull all of the possible matches from previous emails.
Something like this:  /\b(?:(?:word1|word2|word3|word4)\b.*?){3}/

2) For each of the lines found by the previous regex, run another regex that 
captures all matched words.
/\b(word1|word2|word3|word4)\b/g    (note the global modifier to catch all 
matches)

3) Take a look at the results for each line and see if the regex matched at 
least 3 unique words.

I'm quite sure that this is not the most efficient method, but it should work.

Bowie
On 9/28/2016 11:20 AM, Nicola Piazzi wrote:
Obviously i intended to write a plugin that search the db
But I need the regex syntax to search at least 3 words that match of 4 words 
given
Nicola Piazzi
CED - Sistemi
COMET s.p.a.
Via Michelino, 105 - 40127 Bologna - Italia
Tel.  +39 051.6079.293
Cell. +39 328.21.73.470
Web: www.gruppocomet.it<http://www.gruppocomet.it/>
[Descrizione: gc]

Da: Bowie Bailey [mailto:bowie_bai...@buc.com]
Inviato: mercoledì 28 settembre 2016 17:17
A: Nicola Piazzi 
<nicola.pia...@gruppocomet.it><mailto:nicola.pia...@gruppocomet.it>; 
Spamassassin List 
<users@spamassassin.apache.org><mailto:users@spamassassin.apache.org>
Oggetto: Re: R: R: regular expression needed

Please keep list emails on the list.

I don't think you could do a simple regex match for what you want.  As I said 
previously, this would require a plugin both to build the custom regex(s) (or 
DB query) and to search for the previous emails.  You would want to keep the 
prior email information in a database of some sort since doing a search of a 
large text file for every incoming email would probably be too slow.

Bowie
On 9/28/2016 10:05 AM, Nicola Piazzi wrote:
Flux :

I receive an email with subject "Federal Express Important invoice number 20"
Plugin search a regex in maillog database for 10 days ago mails and this regex 
search match 1 or more lines
So we know that similar mails received in the past
But it is normal to receive similar text but not so normal to receive same 
subject from different addresses directed to different internal users



Nicola Piazzi
CED - Sistemi
COMET s.p.a.
Via Michelino, 105 - 40127 Bologna - Italia
Tel.  +39 051.6079.293
Cell. +39 328.21.73.470
Web: www.gruppocomet.it<http://www.gruppocomet.it/>
[Descrizione: gc]

Da: Bowie Bailey [mailto:bowie_bai...@buc.com]
Inviato: mercoledì 28 settembre 2016 16:01
A: users@spamassassin.apache.org<mailto:users@spamassassin.apache.org>
Oggetto: Re: R: regular expression needed

I'm still not clear on exactly what you are trying to do, but in order to test 
anything against previous messages, you will need a custom SA plugin and some 
sort of database to store the information about previous emails.  That is 
beyond my area of expertise.

If you just need a regex to match something, I'd be happy to help, but I would 
need a more explicit description of what you are trying to match.

Bowie
On 9/28/2016 9:29 AM, Nicola Piazzi wrote:
Bowie, your ia a manual way, it works but is not automated
Automation is a plugin that check similar words in oldest messages (for example 
3 of 4 words match)
Then plugin check if sender domain is different and recipient is different




Da: Bowie Bailey [mailto:bowie_bai...@buc.com]
Inviato: mercoledì 28 settembre 2016 15:26
A: users@spamassassin.apache.org<mailto:users@spamassassin.apache.org>
Oggetto: Re: regular expression needed

On 9/28/2016 9:02 AM, Nicola Piazzi wrote:




Usually we receive spam having subjects like these examples in order of time :





Subject                                                                         
                     From                                                    To
FedEx Shipment 702193383647 Notification                       
j...@company1.com<mailto:j...@company1.com>                     
s...@mycompany.it<mailto:s...@mycompany.it>
FedEx Shipment 722566383641 Notification                       
a...@other.com<mailto:a...@other.com>                          
a...@mycompany.it<mailto:a...@mycompany.it>
FedEx Shipment 734563383644 Notification                       
i...@company1.com<mailto:i...@company1.com>                   
lo...@mycompany.it<mailto:lo...@mycompany.it>
A package for you jim                                                           
       b...@cocacola.com<mailto:b...@cocacola.com>                          
j...@mycompany.it<mailto:j...@mycompany.it>
A package for you sue                                                           
      j...@buster.com<mailto:j...@buster.com>                            
s...@mycompany.it<mailto:s...@mycompany.it>


These come from viruses that infect different pcs in the word that send same 
spam
I want to write a plugin that test each email giving penality to these mails
Detection routine

A mail arrive
Subject is : FedEx Shipment 702193383647 Notification
I search in maillog table for a regex that MATCH FedEx Shipment 702193383647 
Notification ALSO IN FedEx Shipment 722566383641 Notification AND IN FedEx 
Shipment 734563383644 Notification
If it match I verify that FROM DOMAIN IS DIFFERENT
And then I verify that TO ADDRESS IS DIFFERENT

Now I need a regex sintax to put all extracted words of PHRASE FedEx Shipment 
734563383644 Notification and match if it found at least 3 of 4 words

Someone can help ?

I don't follow exactly what you are trying to do in the description above, but 
for that problem, I would start with something like this:

header  __FEDEX_ADDR From:addr /\@fedex\.com/
header __FEDEX_SUBJ Subject /FedEx Shipment/
meta FEDEX_SPAM  __FEDEX_SUBJ && ! __FEDEX_ADDR
score FEDEX_SPAM 2.0

(Off the top of my head and completely untested.  Adjust score as required.)

This will hit any email with "FedEx Shipment" in the subject that doesn't come 
from fedex.com.  Note that it will also hit on any legitimate FedEx emails that 
have been forwarded.  You could minimize this by constraining the subject match 
to be at the beginning of the line (/^Fedex Shipment/).  This may or may not 
have an effect on spam detection.  You could also do a test for non-FedEx urls 
in the body rather than looking at the sender.

You could use a simple subject line test for the "A package for you" emails, 
unless you know of a valid delivery service that uses that phrase.

--
Bowie



Reply via email to