I think you better set unique constraint on URL column and generate a
index with URL.
All the methods above has to search the table at least once anyway and
if so adds constraint and let DB handle it is most efficient.
--~--~-~--~~~---~--~~
You received this
two things:
1.
select count(*) from table where HASH_CODE=hc
and
select count( HASH_CODE) from table where HASH_CODE=hc
are equivalent
2. hash code uniquness is not guaranteed. Say your hash code is 32 bit
signed integer. you could have at most 2^31 distinct hashcodes (roughly 2
billions).
On
On Aug 21, 12:38 pm, Ashish Chugh [EMAIL PROTECTED] wrote:
Few more suggestions,
Instead of
select count(*) from table where HASH_CODE=hc and URL='urlToFind'
to
select count( HASH_CODE) from table where HASH_CODE=hc
is better, since HASH_CODE is unique.
You can cache all hash codes or
Instead of MD5, I think hashCode will suffice. Also it would be unieque for
each url and will take lesser number of bytes.
Regards,
/Ashish
On Wed, Aug 20, 2008 at 2:12 PM, Fred [EMAIL PROTECTED] wrote:
Hi, all:
I've got such a problem: there are millions of URLs in the
database, and
I agree with Ashish. Use hashCode.
Here is my suggestion:
Add a new column to your db table, lets call it HASH_CODE
whenever you add a url row, populate the HASH_CODE with the hashcode of the
URL.
When you want to search for the existance of a URL:
select count(*) from table where HASH_CODE=hc