One more test I would is first principles.
Load 1200 records and just do "select * from items" -- you aren't going to
get any faster than that.
Then add the index query.
You should find a performance knee as you add records (try adding them in
powers of 2).
To test I would use "select * from items where rowid%2==0" for 2400 records,
and rows%4 for 4800 records, etc.

Also, what happens if you don't encrypt?

Also, what if you turn off SQLite caching completely.  Let CE have a bit
more cache space?

You could also create 2 tables -- one for you frequent data and one for the
non-frequent.
That's 2 selects but might be noticeably faster if the frequent is small
enough.

Mike

-----Original Message-----
From: sqlite-users-boun...@sqlite.org
[mailto:sqlite-users-boun...@sqlite.org] On Behalf Of Mohit Sindhwani
Sent: Thursday, July 11, 2013 11:20 PM
To: General Discussion of SQLite Database
Subject: [sqlite] Understanding how data is stored and the index is managed

Hi All,

We have a system in which there are around 3 million records in a 
particular table within SQLite3 on Windows CE.  There is a primary key 
index on the table.

We are selecting 1200 records from the table using a prepare - bind - 
step - reset approach.  We find the time seems unreasonably long given 
that the query is of the type select * from items where id = ? (id is 
the primary key).  On this device, it takes around 6seconds.

We think it's because the table is probably quite large, as may be the 
index but the cache is only 2MB.  If we are hitting the table at random 
places, it could be that every step actually takes us to a different 
part of the table, so the cache is not helping.

To test this hypothesis, we did the following:
* Forced the queries to be far apart (basically every query was to a 
record that 3million/ 1200 apart) - this would be like the worst case, 
and it was.  The time went up to 9 seconds (+3 seconds)
* Forced the queries to be within the first 15% of the space - we expect 
that this would increase the cache hit.  The time came down to around 4 
seconds (-2 seconds).

Given this, are there things that we can do?  We know that when we do 
the lookup, there is a high chance that a significant portion of the 
1200 results will always lie in within the same 15% of the database.  
Can we make use of that knowledge somehow?

We could try to renumber the IDs so that all the IDs are in sequence, 
but that is not the easiest thing to do.  Does insertion order have an 
impact on how the data is stored?  If we inserted the most frequently 
accessed records first, would it mean that they would be closer to each 
other in the table and the index and therefore, we could get a better 
performance?

The database is read only and we are using CEROD, so we don't have to 
worry about data changing.

(The page size is 4KB and that matches the file system block size).

Thanks for any thoughts.

Best Regards,
Mohit.
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to