Craig Jackson wrote:
Hi,
Our small business never receives mail from top level domains other than
com,net,org,mil,edu,gov,and us -- except spam. Additionally, we never
receive email with links containing other level domains -- except spam.
The logic is that we are small and do no business outside our geographic
area. So I wrote a body test for checking links that don't have these
top level domains:
m{https?://[^/\s]+?(?<!\.com)(?<!\.net)(?<!\.org)(?<!\.gov)(?<!\.us)(?<!\.edu)(?<!\.mil)(\/\[^\s])?}
This I copied from the Spamassassin test for odd ports. The logic is
similar. However I have never seen some of this notation. And of course
the test doesn't work -- too many false positives.
1) What do the enclosing {} mean?
2) What is the ?<! supposed to do?
3) Does this work with line wrapped links?
4) Shouldn't the domains be separated by | instead of all enclosed in ()?
If you would point to a tutorial that covers this I would be grateful. I
have checked a few beginner regex sites and even read most of the regex
book, but don't remember this particular syntax.
One of the amazing things about posting to lists is that shortly after
posting I usually find the answer to the question. Well, I've now
learned something about negative look-ahead assertions that I did not
know about. But please questions 1) and 3) above I still haven't answered.
Thanks