Re: URL parsing for the hard cases

2007-07-23 Thread Miles
On 7/23/07, Miles wrote: > On 7/22/07, John Nagle wrote: > > Is there any library function that correctly tests for an IP address vs. a > > domain name based on syntax, i.e. without looking it up in DNS? > > import re, string > > NETLOC_RE = re.compile(r'''^ #start of string > (?:([EMAIL PR

Re: URL parsing for the hard cases

2007-07-23 Thread Miles
On 7/22/07, John Nagle wrote: > Is there any library function that correctly tests for an IP address vs. a > domain name based on syntax, i.e. without looking it up in DNS? import re, string NETLOC_RE = re.compile(r'''^ #start of string (?:([EMAIL PROTECTED])+@)?# 1: (?:\[

Re: URL parsing for the hard cases

2007-07-22 Thread Miles
On 7/23/07, John Nagle wrote: > Here's another hard case. This one might be a bug in urlparse: > > import urlparse > > s = 'ftp://administrator:[EMAIL PROTECTED]/originals/6 june > 07/ebay/login/ebayisapi.html' > > urlparse.urlparse(s) > > yields: > > (u'ftp', u'administrator:[EMAIL PROTECTED]', u

Re: URL parsing for the hard cases

2007-07-22 Thread John Nagle
Here's another hard case. This one might be a bug in urlparse: import urlparse s = 'ftp://administrator:[EMAIL PROTECTED]/originals/6 june 07/ebay/login/ebayisapi.html' urlparse.urlparse(s) yields: (u'ftp', u'administrator:[EMAIL PROTECTED]', u'/originals/6 june 07/ebay/login/ebayisapi.html

Re: URL parsing for the hard cases

2007-07-22 Thread John Nagle
[EMAIL PROTECTED] wrote: > Once you eliminate IPv6 addresses, parsing is simple. Is there a > colon? Then there is a port number. Does the left over have any > characters not in [0123456789.]? Then it is a name, not an IPv4 > address. > > --Michael Dillon > You wish. Hex input of IP address

Re: URL parsing for the hard cases

2007-07-22 Thread memracom
On 22 Jul, 18:56, John Nagle <[EMAIL PROTECTED]> wrote: > Is there something available that will parse the "netloc" field as > returned by URLparse, including all the hard cases? The "netloc" field > can potentially contain a port number and a numeric IP address. The > IP address may take man

Re: URL parsing for the hard cases

2007-07-22 Thread Miles
On 7/22/07, John Nagle wrote: > Is there something available that will parse the "netloc" field as > returned by URLparse, including all the hard cases? The "netloc" field > can potentially contain a port number and a numeric IP address. The > IP address may take many forms, including an IPv6

URL parsing for the hard cases

2007-07-22 Thread John Nagle
Is there something available that will parse the "netloc" field as returned by URLparse, including all the hard cases? The "netloc" field can potentially contain a port number and a numeric IP address. The IP address may take many forms, including an IPv6 address. I'm parsing URLs used b