On 12/03/2015 12:34 PM, CipherMail support wrote: > On 12/03/2015 12:24 PM, Raymond Bakker wrote: >>>> Hello, >>>> >>>> ==Summary== >>>> We are experiencing different DLP behavior for complex RegEx between two >>>> installations. >>>> >>>> >>>> ==System== >>>> Version: ciphermail-virtual-appliance-2.10.0-3. >>>> 1. Ubuntu pre-made virtual appliance (on my laptop) >>>> 2. Red Hat & CentOS gateway package (on a test server) >>>> >>>> >>>> ==Configuration== >>>> DLP: several triggers with "Must Encrypt" >>>> Settings: Encrypt Mode "No Encryption" >>>> Settings: DLP Patterns added >>>> >>>> >>>> ==Example== >>>> We want to search a message for [any text][four numbers][any text] >>>> So we try this RegEx: *.\d{4}.* >>>> >>>> This works perfectly on the Ubuntu VA, but it encrypts EVERY message on >>>> CentOS. >>>> Everything is back to normal when we disable the complex RegEx on CentOS. >>>> >>>> We also tried to search for a little more simple like: [0-9][0-9][0-9][0-9] >>>> Ubuntu version is fine, CentOS version encrypts every message. >>>> >>>> >>>> ==DLP Trigger Comparison == >>>> Ubuntu version: >>>> - Single words work as expected >>>> - Mail header works as expected >>>> - Complex *.\d{4}.* works as expected >>>> >>>> CentOS version: >>>> - Single words work as expected >>>> - Mail header works as expected >>>> - Complex *.\d{4}.* works DIFFERENT >>>> >>>> >>>> Does anyone have experience with this situation? >>>> >>>> Is our installation perhaps incorrect? >>> >>> It's quite likely that a message contains 4 digits. Could it be that the >>> mail sent via the CentOS gateway is sent with some other mail app than >>> the mail sent via the virtual appliance? >>> >>>>> We will look at this tomorrow, but I'm quite sure it is a default >>>>> intallation as described in the CipherMail guide. >>> >>> The DLP text extractor also extracts header values. So for example a >>> date header will also be extracted. Since almost all mails contain a >>> date header, almost any mail will contain 4 digits. >>> > >> That's true. The original is 8 digits (simulate Dutch Personal Id) >> but I get the point. What I don't understand (yet) is that my testing >> method & messages are the same on Ubuntu and CentOS and that it works >> on the Ubuntu version. > > Are you sure that the messages sent via the Ubuntu version are exactly > the same as the message sent via the CentOS version? If for example the > message sent via the Ubuntu system is sent by Zimbra but the message > sent via the CentOS version is sent via Exchange then it's kind of > comparing apples and oranges. It might be that one mail client (server?) > adds certain headers with 8 digits and the other mail client (server?) not.
To make it less likely to have false positives, it might help if you require that the number of digits are exactly 8 for a match. Because with your original reg exp, digit sequences of 8 or more would trigger. The following reg exp only triggers on digit sequences of exactly 8 digits: \b\d{8}\b Note: the \b is a word boundary separator Kind regards, CIpherMail support -- CipherMail email encryption Email encryption with support for S/MIME, OpenPGP, PDF encryption and secure webmail pull. https://www.ciphermail.com Twitter: http://twitter.com/CipherMail _______________________________________________ Users mailing list Users@lists.djigzo.com https://lists.djigzo.com/lists/listinfo/users