Re: Top level domain test -- somewhat OT

Craig Jackson Mon, 30 May 2005 18:39:01 -0700

Craig Jackson wrote:

Hi,
Our small business never receives mail from top level domains other thancom,net,org,mil,edu,gov,and us -- except spam. Additionally, we neverreceive email with links containing other level domains -- except spam.The logic is that we are small and do no business outside our geographicarea. So I wrote a body test for checking links that don't have thesetop level domains:
m{https?://[^/\s]+?(?<!\.com)(?<!\.net)(?<!\.org)(?<!\.gov)(?<!\.us)(?<!\.edu)(?<!\.mil)(\/\[^\s])?}
This I copied from the Spamassassin test for odd ports. The logic issimilar. However I have never seen some of this notation. And of coursethe test doesn't work -- too many false positives.
1) What do the enclosing {} mean?
2) What is the ?<! supposed to do?
3) Does this work with line wrapped links?
4) Shouldn't the domains be separated by | instead of all enclosed in ()?
If you would point to a tutorial that covers this I would be grateful. Ihave checked a few beginner regex sites and even read most of the regexbook, but don't remember this particular syntax.

One of the amazing things about posting to lists is that shortly afterposting I usually find the answer to the question. Well, I've nowlearned something about negative look-ahead assertions that I did notknow about. But please questions 1) and 3) above I still haven't answered.

Thanks

Re: Top level domain test -- somewhat OT

Reply via email to