Re: [algogeeks] Efficient Way to Detect Duplicate Document

2011-05-08 Thread sourabh jakhar
please elaborate On Wed, May 4, 2011 at 4:20 PM, Sathaiah Dontula don.sat...@gmail.comwrote: hash on the each page and compare the hash value. Thanks regards, Sathaiah Dontula On Tue, May 3, 2011 at 8:59 PM, bittu shashank7andr...@gmail.com wrote: suppose You have a billion urls, where

Re: [algogeeks] Efficient Way to Detect Duplicate Document

2011-05-08 Thread rahul patil
calculate MD5 and then see if MD5 for 2 pages is equal. On Sun, May 8, 2011 at 1:24 PM, sourabh jakhar sourabhjak...@gmail.comwrote: please elaborate On Wed, May 4, 2011 at 4:20 PM, Sathaiah Dontula don.sat...@gmail.comwrote: hash on the each page and compare the hash value. Thanks

Re: [algogeeks] Efficient Way to Detect Duplicate Document

2011-05-07 Thread Sathaiah Dontula
hash on the each page and compare the hash value. Thanks regards, Sathaiah Dontula On Tue, May 3, 2011 at 8:59 PM, bittu shashank7andr...@gmail.com wrote: suppose You have a billion urls, where each is a huge page. How do you detect the duplicate documents? on what criteria you will

[algogeeks] Efficient Way to Detect Duplicate Document

2011-05-03 Thread bittu
suppose You have a billion urls, where each is a huge page. How do you detect the duplicate documents? on what criteria you will detect it, what algorithm , approach , whats will be the complexity of each approach as it has many application in computer science ...i would like to have some good