Re: [SQL] Extracting hostname from URI column

2007-09-18 Thread John Summerfield
Paul Lambert wrote: Paul Lambert wrote: chester c young wrote: I'm trying to use substr() and position() functions to extract the full host name (and later a domain) from a column that holds URLs. substring( href from '.*://\([^/]*)' ); Ok, your solution looks better than mine... but I hav

Re: [SQL] Extracting hostname from URI column

2007-09-17 Thread Andrew Sullivan
On Mon, Sep 17, 2007 at 12:18:56AM +0100, Gregory Stark wrote: > > Not all countries break up their national tld space into sections like .co.uk > or .com.au. Canadian domains can be bought directly under .ca like amazon.ca. > > I think you'll have to store a specific list of tlds and how deep yo

Re: [SQL] Extracting hostname from URI column

2007-09-16 Thread Gregory Stark
"Paul Lambert" <[EMAIL PROTECTED]> writes: > What would be the best way to do something like that? I assume it won't be a > simple regex like the above due to the country codes on the end of some > domains. My thought is look at the last portion of the domain, if it's 2 > characters long then assu

Re: [SQL] Extracting hostname from URI column

2007-09-16 Thread Paul Lambert
Paul Lambert wrote: chester c young wrote: I'm trying to use substr() and position() functions to extract the full host name (and later a domain) from a column that holds URLs. substring( href from '.*://\([^/]*)' ); Ok, your solution looks better than mine... but I have no idea how to int

Re: [SQL] Extracting hostname from URI column

2007-09-12 Thread Alvaro Herrera
Paul Lambert wrote: > Andrej Ricnik-Bay wrote: > >> Plain regex The key are the parenthesis () ... >> basically it will omit ANYTHING + two slashes at the beginning >> of a string. Then it will match everything BUT a slash, and as >> much of that as possible since regex are greedy by default >

Re: [SQL] Extracting hostname from URI column

2007-09-11 Thread Paul Lambert
Paul Lambert wrote: Just use distinct... test=# select distinct count(*),substring( href from '.*://([^/]*)' ) as domain from url group by domain order by domain; OK so distinct was redundant there... it gives the same results without it. AutoDRS=# select count(*) as occurances,substring( h

Re: [SQL] Extracting hostname from URI column

2007-09-11 Thread ogjunk-pgjedan
substring func, people! Otis - Original Message From: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> To: pgsql-sql@postgresql.org Sent: Tuesday, September 11, 2007 11:16:15 PM Subject: Re: [SQL] Extracting hostname from URI column Hi, Thanks, perfect! (though I'll have

Re: [SQL] Extracting hostname from URI column

2007-09-11 Thread chester c young
> And what I'd like is something that would give me the counts for the > number of occurrences of each unique hostname. Something much like > `uniq -c'. Can anyone tell me how that's done or where I should look > for info? (I'm not sure what to look for, that's the problem). > select substring

Re: [SQL] Extracting hostname from URI column

2007-09-11 Thread Paul Lambert
[EMAIL PROTECTED] wrote: And what I'd like is something that would give me the counts for the number of occurrences of each unique hostname. Something much like `uniq -c'. Can anyone tell me how that's done or where I should look for info? (I'm not sure what to look for, that's the problem)

Re: [SQL] Extracting hostname from URI column

2007-09-11 Thread chester c young
> >> I'm trying to use substr() and position() functions to extract the > >> full host name (and later a domain) from a column that holds URLs. > > > > substring( href from '.*://\([^/]*)' ); > > typo: no backslash in front of left paren substring( href from '.*://([^/]*)' ) match up thru // wi

Re: [SQL] Extracting hostname from URI column

2007-09-11 Thread ogjunk-pgjedan
essage From: chester c young <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] Cc: sql pgsql Sent: Tuesday, September 11, 2007 8:42:46 PM Subject: Re: [SQL] Extracting hostname from URI column > I'm trying to use substr() and position() functions to extract the > full host name

Re: [SQL] Extracting hostname from URI column

2007-09-11 Thread Paul Lambert
Andrej Ricnik-Bay wrote: Plain regex The key are the parenthesis () ... basically it will omit ANYTHING + two slashes at the beginning of a string. Then it will match everything BUT a slash, and as much of that as possible since regex are greedy by default (hence the host name he was lookin

Re: [SQL] Extracting hostname from URI column

2007-09-11 Thread Andrej Ricnik-Bay
On 9/12/07, Paul Lambert <[EMAIL PROTECTED]> wrote: > > substring( href from '.*://\([^/]*)' ); > Ok, your solution looks better than mine... but I have no idea how to > interpret that, time to consult some manuals. Plain regex The key are the parenthesis () ... basically it will omit ANYTHING

Re: [SQL] Extracting hostname from URI column

2007-09-11 Thread Paul Lambert
chester c young wrote: I'm trying to use substr() and position() functions to extract the full host name (and later a domain) from a column that holds URLs. substring( href from '.*://\([^/]*)' ); Ok, your solution looks better than mine... but I have no idea how to interpret that, time to

Re: [SQL] Extracting hostname from URI column

2007-09-11 Thread Paul Lambert
[EMAIL PROTECTED] wrote: Hi, I'm trying to use substr() and position() functions to extract the full host name (and later a domain) from a column that holds URLs. This is what I'm trying, but it clearly doesn't do the job. => select substr(href, position('://' in href)+3, position('://' in hr

Re: [SQL] Extracting hostname from URI column

2007-09-11 Thread chester c young
> I'm trying to use substr() and position() functions to extract the > full host name (and later a domain) from a column that holds URLs. substring( href from '.*://\([^/]*)' ); Pinpoint customers who a

[SQL] Extracting hostname from URI column

2007-09-11 Thread ogjunk-pgjedan
Hi, I'm trying to use substr() and position() functions to extract the full host name (and later a domain) from a column that holds URLs. This is what I'm trying, but it clearly doesn't do the job. => select substr(href, position('://' in href)+3, position('://' in href)+3+position('/' in href)