> > m{https?://[^/\s]+?(?<!\.com)(?<!\.net)(?<!\.org)(?<!\.gov)(?<!\.us)(?<!\.ed u)(?<!\.mil)(\/\[^\s])?} > > > > > One of the amazing things about posting to lists is that shortly after > posting I usually find the answer to the question. Well, I've now > learned something about negative look-ahead assertions that I did not
Actually that is a negative lookBEHIND assertion that they are using. Negative lookAHEAD is (?!, not (?<!. What this test is saying, in more or less english is: Match 'http', possibly followed by 's', and then followed by '//:'. Then match everything up to a / or space, but don't be greedy about it. (which means, stop on the FIRST / or space you find, not the last one.) Now that you are pointing at a / or space, are the preceeding 4 characters not .com, and not .net, etc. Then we get to the last part, which I suspect you added, since the coding style is different, and it does some things it odd things. In fact, I'm not at all sure exactly what the intent was here. I think perhaps it was trying to look for a / optionally followed by a space after the url. But we already know that there is a / or space here from the original non-greedy match. In any case, if that was the intent, it should have been coded as "(?:/[^\s])?". The ?: after the ( says that you are only using the parends as grouping, and not as a capturing group. This is MUCH faster, according to the Perl pundits. You don't need a backslash in front of the slash in this case, because the overall delimiter characters are {} instead of the more common //. And you certainly don't want a backslash in front of the [ character that is part of the character grouping, unless you wanted to compare a literal [ character. In that case you would also need a backslash in front of the ] character. (I suspect that the appropriate match here would be simply "[/\s]" to match the slash or space we know is here. Or more simply, just a dot. We don't care what it matches, and we already have a pretty good idea of what it will match.) Loren