> >
m{https?://[^/\s]+?(?<!\.com)(?<!\.net)(?<!\.org)(?<!\.gov)(?<!\.us)(?<!\.ed
u)(?<!\.mil)(\/\[^\s])?}
> >
> >
> One of the amazing things about posting to lists is that shortly after
> posting I usually find the answer to the question. Well, I've now
> learned something about negative look-ahead assertions that I did not

Actually that is a negative lookBEHIND assertion that they are using.
Negative lookAHEAD is (?!, not (?<!.

What this test is saying, in more or less english is: Match 'http', possibly
followed by 's', and then followed by '//:'.  Then match everything up to a
/ or space, but don't be greedy about it.  (which means, stop on the FIRST /
or space you find, not the last one.)  Now that you are pointing at a / or
space, are the preceeding 4 characters not .com, and not .net, etc.

Then we get to the last part, which I suspect you added, since the coding
style is  different, and it does some things it odd things.  In fact, I'm
not at all sure exactly what the intent was here.  I think perhaps it was
trying to look for a / optionally followed by a space after the url.  But we
already know that there is a / or space here from the original non-greedy
match.

In any case, if that was the intent, it should have been coded as
"(?:/[^\s])?".  The ?: after the ( says that you are only using the parends
as grouping, and not as a capturing group.  This is MUCH faster, according
to the Perl pundits.  You don't need a backslash in front of the slash in
this case, because the overall delimiter characters are {} instead of the
more common //.  And you certainly don't want a backslash in front of the
[ character that is part of the character grouping, unless you wanted to
compare a literal [ character.  In that case you would also need a backslash
in front of the ] character.

(I suspect that the appropriate match here would be simply "[/\s]" to match
the slash or space we know is here.  Or more simply, just a dot.  We don't
care what it matches, and we already have a pretty good idea of what it will
match.)

        Loren

Reply via email to