On 6/20/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
"Scott Hess" <[EMAIL PROTECTED]> wrote: > On 6/20/07, Andrew Finkenstadt <[EMAIL PROTECTED]> wrote: > > How difficult do you think it would be to support an alternative method of > > indexing within SQLite specifically to support O(1) retrieval of the rowid > > for a table, and then potentially O(1) retrieval of the row data for a > > table, when in-order retrieval is undesired? It's a major rewrite. SQLite assumes btree access in a number of important places.
Thanks for that info. I will save myself the heartache then....
> > > My database design is highly specialized to never, ever retrieve data except > > by whole-key equality, and would benefit greatly from this optimization. > > I'm willing to code it up myself, although I'm not quite set up to do SQL > > syntax parser changes, preferring to use the amalgamation. > > Would you be able to live with converting your primary key to a 64-bit > hash value? In development of fts2, I found that direct rowid -> row > access is pretty fast. It's still technically O(logN), but the log's > base is absurdly large, so it comes reasonably close to O(1). I concur with Scott's assessment. The typical fanout in the table btrees used by SQLite is around 100 (assuming a 1K page) so you can get to one of 10 million pages in only 3 page reads. And the first page is almost guaranteed to be in cache, so really only 2 page reads.
My default page size is 32k, with the number of rows entries at about a hundred thousand... assuming base-100 for 1k, that would be base 3000 or so for 32k, which would mean 1 I/O if uncached, followed by a search for the actual row containing the data. (I use an indirect table in order to keep my small data separate from my blobs, perhaps this was unnecessary.) create table index_data ( id integer not null primary key autoincrement, rod blob not null unique /* alternate key of 20-bytes fixed */, ... ); create table file_data ( id integer not null primary key, compression_method integer not null, file_size integer not null, file_bytes blob not null); The retrieval of looking up index_data.id via rod equality takes longer than I would like. file_data retrieval has its own issues which I am still tuning. --a