On Thu, Jul 24, 2008 at 2:37 PM, John Campbell <[EMAIL PROTECTED]> wrote:
> On Thu, Jul 24, 2008 at 2:19 PM, Michael B Allen <[EMAIL PROTECTED]> wrote:
>> Does anyone have a good PCRE for matching URLs?
>>
>> Or perhaps someone can improve (or correct) the expression I'm using 
>> currently:
>>
>>  $expr = '[a-zA-Z0-9]{1,10}://[a-zA-Z0-9.-]+[\p{L}/~._-]*|mailto:[EMAIL 
>> PROTECTED]';
>>
>
> I am not sure I completely understand what you are trying to do, but
> it doesn't look like you are matching + or %.

You mean in the path? In the path I suppose I should permit quite a
few more characters (I forgot 0-9 too). This makes the expression:

$expr = 
'[a-zA-Z0-9]{1,10}://[a-zA-Z0-9.-]+[\p{L}0-9!$%&\\()+-./;=^_~]*|mailto:[EMAIL 
PROTECTED]',

> What is the context for the matching?

This will be used to pick out URLs in Creole Wiki markup. Which
incedentally is not supposed to match characters that can occur
naturally at the end of a sentence (,.?!:;"') so I guess I need to
leave out '.' and ';' for my particular application.

So given markup:

  Please visit 
http://www.yahoo.com/usèrs+100%&lusers$/~jerry/y_a-n.g/Yahoo;=^(!)foo.

The regex should match (minus the dot at the end):

  [http://www.yahoo.com/usèrs+100%&lusers$/~jerry/y_a-n.g/Yahoo;=^(!)foo]

although in practice a URL this crazy should probably be formalized
with square brackets as defined by Creole for links.

Mike

-- 
Michael B Allen
PHP Active Directory SPNEGO SSO
http://www.ioplex.com/
_______________________________________________
New York PHP Community Talk Mailing List
http://lists.nyphp.org/mailman/listinfo/talk

NYPHPCon 2006 Presentations Online
http://www.nyphpcon.com

Show Your Participation in New York PHP
http://www.nyphp.org/show_participation.php

Reply via email to