First of all you should write to sqlite-users mailing list, not sqlite-dev.
> Why is the indexing so slow and bogs down as we proceed with insertions ? > Any suggestions ? > Also, how could I improve performance ? I can't say exactly why performance with index degrades so significantly with the size of the table. Probably it's something to do with cache - try to increase its size. Also it's quite known that creating index after inserting all rows is much faster than creating index before that. So it can be even beneficial in inserting huge amount of rows somewhere in the middle of the work: first delete all indexes, then insert rows, then create indexes once more. > Is there any way to have a UNIQUE > field but disable indexing till the end? How do you expect your uniqueness to be enforced? SQLite does that by looking into index - if value is there then it is repeated, if value is not there then it's unique and should be inserted into index for further check. > However, as I am using hash values instead of filenames I need to deal with > collisions. Hence, removing the UNIQUE field is risky as I cant detect > collisions. But I am also unable to use it because of the poor indexing > performance. What's wrong with creating unique index after all rows are inserted? It's the same as declaring UNIQUE field although checking for uniqueness is postponed till index creation and you will unable to create index if there's some not unique values. But I don't know if all this matters to you. Pavel On Fri, Mar 12, 2010 at 7:51 AM, rohan a <info1...@gmail.com> wrote: > Hello, > > I am using an SQLite database for storing certain file properties. Initially > I used a pathname (ex: D:\\1\2\file.txt) as a PRIMARY KEY. But as the > database created was very large, I decided to use hashes of the pathname > instead. This brought down the size occupied by the database on disk. I am > using the C language interface provided. > > The table looks like this: > > CREATE TABLE IF NOT EXISTS EITable (Crc1 INTEGER, Crc2 INTEGER, Ctime > INTEGER, Checksum INTEGER, UNIQUE(Crc1,Crc2)); > > I generate 2 64-bit hashes of the pathname and store them into the database. > With a combination of (Crc1,Crc2) being UNIQUE. > > The size of the database definitely came down using this method. However, > the time taken for insertions becomes abnormally large and un-acceptable. It > is quite fast initially but slows down drastically as the database gets > filled with records. > > When I remove the UNIQUE field and create an INDEX after all INSERTions are > completed the insertions are fast and complete quickly. I use the CREATE > INDEX when all insertions are completed. > However, as I am using hash values instead of filenames I need to deal with > collisions. Hence, removing the UNIQUE field is risky as I cant detect > collisions. But I am also unable to use it because of the poor indexing > performance. > > Why is the indexing so slow and bogs down as we proceed with insertions ? > Any suggestions ? > Also, how could I improve performance ?Is there any way to have a UNIQUE > field but disable indexing till the end? > > I am using Transactions and compiled statements for INSERT, SELECT etc. I > timed each of the activities like generation of the hash, insertion, time > taken for COMMIT. This indicated that the indexing is the culprit. Also > moving indexing to the end improved performance. > > Any help/suggestions on this ? > > Thanks > > _______________________________________________ > sqlite-dev mailing list > sqlite-...@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-dev > > _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users