Re: Re[4]: [sqlite] Sqlite low level and Speed.

Puneet Kishor Mon, 20 Jun 2005 09:37:56 -0700


On Jun 20, 2005, at 11:24 AM, Yuriy wrote:

CS> What are you actually trying to do? And can you quantify "veryslow" and
CS> tell us what you actually expect or what would be acceptable?
100,000 oll ok 7 seconds
1,000,000 software halt :(
CS> Is this representitive of what you are trying to do? Are youstoring IPCS> addresses, and you want to discard duplicates? Using the "onconflict"
CS> resolution is probably your fastest course of action.
I write log analyzer and want to use sqlite as database.

All Operations in my software grouping big list strings.
and need the fastest speed.
if use "group by" it is slow :(
if all he's doing is discarding duplicate strings, with norequirement for
persistent storage, it is easily done with a primitive hash table
implementation. could probably be done efficiently in less than ahundred
lines of c, most of which could be adapted from some example code.


or, a couple of three lines of Perl.

Yes need disk-based hash or btree. But SQLite in the low level it
Disk-Based Btree.

Pre-process the log file, creating a hash with the unique field as thekey. Then, loop over the hash and insert it in your db.

If memory is a constraint, don't bother even creating a hash. Loop overthe log file, create an array, sort the array, remove the duplicates,then insert it in the db, making sure that you have AutoCommit off andcommits every 10k or 100k records.

Should be done in a few seconds. To give you an idea, I once de-duped afile with 320 million rows of duplicate email addresses in about 120seconds on an ancient, creaking iBook. A million records should be apiece of cake.



--
Puneet Kishor

Re: Re[4]: [sqlite] Sqlite low level and Speed.

Reply via email to