On 3/29/06, Andy Spencer <[EMAIL PROTECTED]> wrote:
> I have a sqlite database with about 3 GB of data, most of which is stored
> in a data table with about 75 million records, having three columns
> (EntryId INTEGER, PropertyId INTEGER, Value NUMERIC) and
> PRIMARY KEY(EntryId, PropertyId).
>
> This table is not indexed, to allow faster updates.

Why not index the table? The overhead for updates is fairly small.


>
> The problem is that it takes over an hour to access all Values, for a
> specified PropertyId, when the value is obtained for each EntryId
> separately (using "SELECT Value FROM Data WHERE PropertyId=? AND
> EntryId=?", bound to the specified PropertyId and EntryId) and
> the EntryId values for successive database queries are in essentially
> random order (taken from an external list of entries that has been
> sorted by property values).
>
> This same query (getting the property value for each EntryId,
> separately) only takes about 7 minutes when the EntryId values for
> successive database queries are in the same ascending order as
> the data orginally inserted into the table.
>
> I assume that this has to do with better pager caching of successive
> records in the database, whereas random access may re-read the same
> page multiple times (due to the limited cache).
>
> My question is whether it should be faster to
>
> A) create an index for the table before the query,
>    query the value (for the specified PropertyId) for each EntryId
>    (in essentially random order, from external list of entries),
>    and delete the index after the queries (for each EntryId) are done
>
> or
>
> B) issue a single "SELECT EntryId, Value FROM Data WHERE PropertyId=?" query
>    (bound to the specified PropertyId) and step through the results,
>    using something like a hash table lookup to map the EntryId values
>    (returned from the query) back to an index into the external list of
>    entries.
>
> The values extracted from the database are to be copied into an entry
> property data structure, having the same order as the external list of
> entries.

If raw blazing data collection speed is required try a data warehouse.
Collect your data to flat files (this would be even faster than using sqlite
without indexes). Then import that data to an database without indexes.
Then create the index (this is faster than a mass insert into an indexed table).
Then do all your reporting from the database.



---
SqliteImporter: Command line fixed and delimited text import.
http://www.reddawn.net/~jsprenkl/Sqlite

Reply via email to