David G. Andersen wrote:
body GENETICS_DATA /([ACGT]{3,}[CGT][ACGT]?\s*){3,}/
describe GENETICS_DATA A, C, T, G, who do we appreciate?
scoreGENETICS_DATA -5
Ahh, heck. Here's a better one for all of the geneticists
on the list (one of them? :-):
anyone else seeing false-positives more often with 2.11?
Yes, I have had to roll back to 2.01.
Geoff Gibbs
UK-Human Genome Mapping Project-Resource Centre,
Hinxton, Cambridge, CB10 1SB, UK
Tel: +44 1223 494530 Fax: +44 1223 494512 E-mail: [EMAIL PROTECTED]
David G. Andersen wrote:
anyone else seeing false-positives more often with 2.11?
Yes, I have had to roll back to 2.01.
A bit of a suggestion, since you're seeing false positives in a highly
specific domain. I've been creating word-frequency-based whitelists
from various mailing
Ed Henderson wrote:
anyone else seeing false-positives more often with 2.11?
Yes, I have had to roll back to 2.01.
I have not seen more false positives but have seen a significant improvement
with false negatives. From my experience it is an improvement over 2.01
Previously I have
One thing to try, for your particular situation.
This rule could match in some strange base-64
encoded files, but it's extremely unlikely -- I ran it through
my spam corpus, and it hit 7 lines out of 260 megabytes, so
you should be OK:
body GENETICS_DATA
Ahh, heck. Here's a better one for all of the geneticists
on the list (one of them? :-):
/\b([ACGT]{1,}\s*[CGT]\s*[ACGT]{1,}\s*){3,}\b/
The addition of the word boundary test also avoids all of the
false matches from my corpus. Requires that the sequence
be at least 9 bps, and have at least