Craig Jackson wrote:
Hi,
Our small business never receives mail from top level domains other than com,net,org,mil,edu,gov,and us -- except spam. Additionally, we never receive email with links containing other level domains -- except spam. The logic is that we are small and do no business outside our geographic area. So I wrote a body test for checking links that don't have these top level domains:


m{https?://[^/\s]+?(?<!\.com)(?<!\.net)(?<!\.org)(?<!\.gov)(?<!\.us)(?<!\.edu)(?<!\.mil)(\/\[^\s])?}

This I copied from the Spamassassin test for odd ports. The logic is similar. However I have never seen some of this notation. And of course the test doesn't work -- too many false positives.

1) What do the enclosing {} mean?
2) What is the ?<! supposed to do?
3) Does this work with line wrapped links?
4) Shouldn't the domains be separated by | instead of all enclosed in ()?

If you would point to a tutorial that covers this I would be grateful. I have checked a few beginner regex sites and even read most of the regex book, but don't remember this particular syntax.


One of the amazing things about posting to lists is that shortly after posting I usually find the answer to the question. Well, I've now learned something about negative look-ahead assertions that I did not know about. But please questions 1) and 3) above I still haven't answered.
Thanks

Reply via email to