Re: [SAtalk] last false positive from 2.11 (7/7)

2002-03-08 Thread Geoff Gibbs
David G. Andersen wrote: body GENETICS_DATA /([ACGT]{3,}[CGT][ACGT]?\s*){3,}/ describe GENETICS_DATA A, C, T, G, who do we appreciate? scoreGENETICS_DATA -5 Ahh, heck. Here's a better one for all of the geneticists on the list (one of them? :-):

Re: [SAtalk] last false positive from 2.11 (7/7)

2002-03-07 Thread Geoff Gibbs
anyone else seeing false-positives more often with 2.11? Yes, I have had to roll back to 2.01. Geoff Gibbs UK-Human Genome Mapping Project-Resource Centre, Hinxton, Cambridge, CB10 1SB, UK Tel: +44 1223 494530 Fax: +44 1223 494512 E-mail: [EMAIL PROTECTED]

Re: [SAtalk] last false positive from 2.11 (7/7)

2002-03-07 Thread Geoff Gibbs
David G. Andersen wrote: anyone else seeing false-positives more often with 2.11? Yes, I have had to roll back to 2.01. A bit of a suggestion, since you're seeing false positives in a highly specific domain. I've been creating word-frequency-based whitelists from various mailing

RE: [SAtalk] last false positive from 2.11 (7/7)

2002-03-07 Thread Geoff Gibbs
Ed Henderson wrote: anyone else seeing false-positives more often with 2.11? Yes, I have had to roll back to 2.01. I have not seen more false positives but have seen a significant improvement with false negatives. From my experience it is an improvement over 2.01 Previously I have

Re: [SAtalk] last false positive from 2.11 (7/7)

2002-03-07 Thread David G. Andersen
One thing to try, for your particular situation. This rule could match in some strange base-64 encoded files, but it's extremely unlikely -- I ran it through my spam corpus, and it hit 7 lines out of 260 megabytes, so you should be OK: body GENETICS_DATA

Re: [SAtalk] last false positive from 2.11 (7/7)

2002-03-07 Thread David G. Andersen
Ahh, heck. Here's a better one for all of the geneticists on the list (one of them? :-): /\b([ACGT]{1,}\s*[CGT]\s*[ACGT]{1,}\s*){3,}\b/ The addition of the word boundary test also avoids all of the false matches from my corpus. Requires that the sequence be at least 9 bps, and have at least