First of all you should write to sqlite-users mailing list, not sqlite-dev.

> Why is the indexing so slow and bogs down as we proceed with insertions ?
> Any suggestions ?
> Also, how could I improve performance ?

I can't say exactly why performance with index degrades so
significantly with the size of the table. Probably it's something to
do with cache - try to increase its size. Also it's quite known that
creating index after inserting all rows is much faster than creating
index before that. So it can be even beneficial in inserting huge
amount of rows somewhere in the middle of the work: first delete all
indexes, then insert rows, then create indexes once more.

> Is there any way to have a UNIQUE
> field but disable indexing till the end?

How do you expect your uniqueness to be enforced? SQLite does that by
looking into index - if value is there then it is repeated, if value
is not there then it's unique and should be inserted into index for
further check.

> However, as I am using hash values instead of filenames I need to deal with
> collisions. Hence, removing the UNIQUE field is risky as I cant detect
> collisions. But I am also unable to use it because of the poor indexing
> performance.

What's wrong with creating unique index after all rows are inserted?
It's the same as declaring UNIQUE field although checking for
uniqueness is postponed till index creation and you will unable to
create index if there's some not unique values. But I don't know if
all this matters to you.


Pavel

On Fri, Mar 12, 2010 at 7:51 AM, rohan a <info1...@gmail.com> wrote:
> Hello,
>
> I am using an SQLite database for storing certain file properties. Initially
> I used a pathname (ex: D:\\1\2\file.txt) as a PRIMARY KEY. But as the
> database created was very large, I decided to use hashes of the pathname
> instead. This brought down the size occupied by the database on disk. I am
> using the C language interface provided.
>
> The table looks like this:
>
> CREATE TABLE IF NOT EXISTS EITable (Crc1 INTEGER, Crc2 INTEGER, Ctime
> INTEGER, Checksum INTEGER, UNIQUE(Crc1,Crc2));
>
> I generate 2 64-bit hashes of the pathname and store them into the database.
> With a combination of (Crc1,Crc2) being UNIQUE.
>
> The size of the database definitely came down using this method. However,
> the time taken for insertions becomes abnormally large and un-acceptable. It
> is quite fast initially but slows down drastically as the database gets
> filled with records.
>
> When I remove the UNIQUE field and create an INDEX after all INSERTions are
> completed the insertions are fast and complete quickly. I use the CREATE
> INDEX when all insertions are completed.
> However, as I am using hash values instead of filenames I need to deal with
> collisions. Hence, removing the UNIQUE field is risky as I cant detect
> collisions. But I am also unable to use it because of the poor indexing
> performance.
>
> Why is the indexing so slow and bogs down as we proceed with insertions ?
> Any suggestions ?
> Also, how could I improve performance ?Is there any way to have a UNIQUE
> field but disable indexing till the end?
>
> I am using Transactions and compiled statements for INSERT, SELECT etc. I
> timed each of the activities like generation of the hash, insertion, time
> taken for COMMIT. This indicated that the indexing is the culprit. Also
> moving indexing to the end improved performance.
>
> Any help/suggestions on this ?
>
> Thanks
>
> _______________________________________________
> sqlite-dev mailing list
> sqlite-...@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-dev
>
>
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to