[algogeeks] Re: Judging whether a URL exists among millions, insert if not

2008-08-22 Thread [EMAIL PROTECTED]
I think you better set unique constraint on URL column and generate a index with URL. All the methods above has to search the table at least once anyway and if so adds constraint and let DB handle it is most efficient. --~--~-~--~~~---~--~~ You received this

[algogeeks] Re: Judging whether a URL exists among millions, insert if not

2008-08-21 Thread Abdul Habra
two things: 1. select count(*) from table where HASH_CODE=hc and select count( HASH_CODE) from table where HASH_CODE=hc are equivalent 2. hash code uniquness is not guaranteed. Say your hash code is 32 bit signed integer. you could have at most 2^31 distinct hashcodes (roughly 2 billions). On

[algogeeks] Re: Judging whether a URL exists among millions, insert if not

2008-08-21 Thread Fred
On Aug 21, 12:38 pm, Ashish Chugh [EMAIL PROTECTED] wrote: Few more suggestions, Instead of select count(*) from table where HASH_CODE=hc and URL='urlToFind' to select count( HASH_CODE) from table where HASH_CODE=hc is better, since HASH_CODE is unique. You can cache all hash codes or

[algogeeks] Re: Judging whether a URL exists among millions, insert if not

2008-08-20 Thread Ashish Chugh
Instead of MD5, I think hashCode will suffice. Also it would be unieque for each url and will take lesser number of bytes. Regards, /Ashish On Wed, Aug 20, 2008 at 2:12 PM, Fred [EMAIL PROTECTED] wrote: Hi, all: I've got such a problem: there are millions of URLs in the database, and

[algogeeks] Re: Judging whether a URL exists among millions, insert if not

2008-08-20 Thread Abdul Habra
I agree with Ashish. Use hashCode. Here is my suggestion: Add a new column to your db table, lets call it HASH_CODE whenever you add a url row, populate the HASH_CODE with the hashcode of the URL. When you want to search for the existance of a URL: select count(*) from table where HASH_CODE=hc