Andrew Finkenstadt wrote:
On 6/20/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
"Scott Hess" <[EMAIL PROTECTED]> wrote:
> On 6/20/07, Andrew Finkenstadt <[EMAIL PROTECTED]> wrote:
> > How difficult do you think it would be to support an alternative
method of
> > indexing within SQLite specifically to support O(1) retrieval of the
rowid
> > for a table, and then potentially O(1) retrieval of the row data
for a
> > table, when in-order retrieval is undesired?
It's a major rewrite. SQLite assumes btree access in a number of
important places.
Thanks for that info. I will save myself the heartache then....
>
> > My database design is highly specialized to never, ever retrieve data
except
> > by whole-key equality, and would benefit greatly from this
optimization.
> > I'm willing to code it up myself, although I'm not quite set up to do
SQL
> > syntax parser changes, preferring to use the amalgamation.
>
> Would you be able to live with converting your primary key to a 64-bit
> hash value? In development of fts2, I found that direct rowid -> row
> access is pretty fast. It's still technically O(logN), but the log's
> base is absurdly large, so it comes reasonably close to O(1).
I concur with Scott's assessment. The typical fanout in the table
btrees used by SQLite is around 100 (assuming a 1K page) so you can
get to one of 10 million pages in only 3 page reads. And the first
page is almost guaranteed to be in cache, so really only 2 page
reads.
My default page size is 32k, with the number of rows entries at about a
hundred thousand... assuming base-100 for 1k, that would be base 3000 or so
for 32k, which would mean 1 I/O if uncached, followed by a search for the
actual row containing the data. (I use an indirect table in order to keep
my small data separate from my blobs, perhaps this was unnecessary.)
create table index_data ( id integer not null primary key autoincrement,
rod
blob not null unique /* alternate key of 20-bytes fixed */, ... );
create table file_data ( id integer not null primary key,
compression_method
integer not null, file_size integer not null, file_bytes blob not null);
The retrieval of looking up index_data.id via rod equality takes longer
than
I would like. file_data retrieval has its own issues which I am still
tuning.
--a
I ran some trials on b-tree versus hashing on another project. It
convinced me that the hashing access was only beneficial in a very low
memory environment such as a database designed 40 years ago. With the
upper levels of the B-tree always in cache and no clashes it was the
clearly the right approach.
What did give improved speed was larger node sizes and a binary search
on keys within the node.
-----------------------------------------------------------------------------
To unsubscribe, send email to [EMAIL PROTECTED]
-----------------------------------------------------------------------------