Hello Puneet,

PK> Pre-process the log file, creating a hash with the unique field as the
PK> key. Then, loop over the hash and insert it in your db.

PK> If memory is a constraint, don't bother even creating a hash. Loop over
PK> the log file, create an array, sort the array, remove the duplicates,
PK> then insert it in the db, making sure that you have AutoCommit off and
PK> commits every 10k or 100k records.


PK> Should be done in a few seconds. To give you an idea, I once de-duped a
PK> file with 320 million rows of duplicate email addresses in about 120
PK> seconds on an ancient, creaking iBook. A million records should be a
PK> piece of cake.

If array very big more then 1GB? As you it is sort and remove?
External Sorting?
May be simple example or article?

Or you sorting in the memory?

Thank you.

-- 
Best regards,
 Yuriy                            mailto:[EMAIL PROTECTED]

Reply via email to