Does anyone have a good PCRE for matching URLs? All of the examples that I have looked at in various places are too simple or exclude invalid characters rather than include valid ones (and of course fail to exclude all bad characters) or don't properly use escaping ... etc.
Or perhaps someone can improve (or correct) the expression I'm using currently: $expr = '[a-zA-Z0-9]{1,10}://[a-zA-Z0-9.-]+[\p{L}/~._-]*|mailto:[EMAIL PROTECTED]'; The exprssion breakdown is: [a-zA-Z0-9]{1,10} - Protocol specifier (e.g. http, ftps, smb, gopher, ...) :// - Protocol host separator (mailto style handled by or condition) [a-zA-Z0-9.-]+ - The hostname (currently we assume only ASCII) [\p{L}/~._-]* - A UTF-8 path (probably need to allow some other chars but not '?') |mailto:[EMAIL PROTECTED] - Or a mailto URL Mike -- Michael B Allen PHP Active Directory SPNEGO SSO http://www.ioplex.com/ _______________________________________________ New York PHP Community Talk Mailing List http://lists.nyphp.org/mailman/listinfo/talk NYPHPCon 2006 Presentations Online http://www.nyphpcon.com Show Your Participation in New York PHP http://www.nyphp.org/show_participation.php