Does anyone have a good PCRE for matching URLs?

All of the examples that I have looked at in various places are too
simple or exclude invalid characters rather than include valid ones
(and of course fail to exclude all bad characters) or don't properly
use escaping ... etc.

Or perhaps someone can improve (or correct) the expression I'm using currently:

  $expr = '[a-zA-Z0-9]{1,10}://[a-zA-Z0-9.-]+[\p{L}/~._-]*|mailto:[EMAIL 
PROTECTED]';

The exprssion breakdown is:

  [a-zA-Z0-9]{1,10} - Protocol specifier (e.g. http, ftps, smb, gopher, ...)
  :// - Protocol host separator (mailto style handled by or condition)
  [a-zA-Z0-9.-]+ - The hostname (currently we assume only ASCII)
  [\p{L}/~._-]* - A UTF-8 path (probably need to allow some other
chars but not '?')
  |mailto:[EMAIL PROTECTED] - Or a mailto URL

Mike

-- 
Michael B Allen
PHP Active Directory SPNEGO SSO
http://www.ioplex.com/
_______________________________________________
New York PHP Community Talk Mailing List
http://lists.nyphp.org/mailman/listinfo/talk

NYPHPCon 2006 Presentations Online
http://www.nyphpcon.com

Show Your Participation in New York PHP
http://www.nyphp.org/show_participation.php

Reply via email to