On 12/02/2015 12:06 PM, Raymond Bakker wrote: > Hello, > > ==Summary== > We are experiencing different DLP behavior for complex RegEx between two > installations. > > > ==System== > Version: ciphermail-virtual-appliance-2.10.0-3. > 1. Ubuntu pre-made virtual appliance (on my laptop) > 2. Red Hat & CentOS gateway package (on a test server) > > > ==Configuration== > DLP: several triggers with "Must Encrypt" > Settings: Encrypt Mode "No Encryption" > Settings: DLP Patterns added > > > ==Example== > We want to search a message for [any text][four numbers][any text] > So we try this RegEx: *.\d{4}.* > > This works perfectly on the Ubuntu VA, but it encrypts EVERY message on > CentOS. > Everything is back to normal when we disable the complex RegEx on CentOS. > > We also tried to search for a little more simple like: [0-9][0-9][0-9][0-9] > Ubuntu version is fine, CentOS version encrypts every message. > > > ==DLP Trigger Comparison == > Ubuntu version: > - Single words work as expected > - Mail header works as expected > - Complex *.\d{4}.* works as expected > > CentOS version: > - Single words work as expected > - Mail header works as expected > - Complex *.\d{4}.* works DIFFERENT > > > Does anyone have experience with this situation? > > Is our installation perhaps incorrect?
It's quite likely that a message contains 4 digits. Could it be that the mail sent via the CentOS gateway is sent with some other mail app than the mail sent via the virtual appliance? The DLP text extractor also extracts header values. So for example a date header will also be extracted. Since almost all mails contain a date header, almost any mail will contain 4 digits. If you have the "raw" MIME content, you can see what text the DLP engine see during scanning by uploading the MIME message to the "extract text" tool (Admin -> other -> extract text). The "extract text" tool will return the normalized text. > So we try this RegEx: *.\d{4}.* If you want to trigger on 4 digits, you should use \d{4} , i.e., skip the .* part. The .* is not needed, it will make scanning slower. The reg exp is not required to match the complete text, i.e. .* is kind of implicitly added to any reg ex. Kind regards, CipherMail support > Cheers, > > Raymond Bakker | Integration Consultant > > T +31 (0)10 288 1600 > M +31 (0)6 2222 5515 > E raymond.bak...@vanadgroup.com > > VANAD Enovation > Rivium Westlaan 1 > 2909 LD Capelle aan den IJssel > The Netherlands > > Website | Facebook | LinkedIn | Twitter > > This e-mail is personal. For our disclaimer, please visit > www.vanadgroup.com/disclaimer > > _______________________________________________ > Users mailing list > Users@lists.djigzo.com > https://lists.djigzo.com/lists/listinfo/users -- CipherMail email encryption Email encryption with support for S/MIME, OpenPGP, PDF encryption and secure webmail pull. https://www.ciphermail.com Twitter: http://twitter.com/CipherMail _______________________________________________ Users mailing list Users@lists.djigzo.com https://lists.djigzo.com/lists/listinfo/users