On 6/20/07, Scott Hess <[EMAIL PROTECTED]> wrote:
On 6/20/07, Andrew Finkenstadt <[EMAIL PROTECTED]> wrote: > How difficult do you think it would be to support an alternative method of > indexing within SQLite specifically to support O(1) retrieval of the rowid > for a table, and then potentially O(1) retrieval of the row data for a > table, when in-order retrieval is undesired? > > My database design is highly specialized to never, ever retrieve data except > by whole-key equality, and would benefit greatly from this optimization. > I'm willing to code it up myself, although I'm not quite set up to do SQL > syntax parser changes, preferring to use the amalgamation. Would you be able to live with converting your primary key to a 64-bit hash value?
Yes. I could... or even smaller, if I can do collision exclusion by a select against the "real primary key". In development of fts2, I found that direct rowid -> row
access is pretty fast. It's still technically O(logN), but the log's base is absurdly large, so it comes reasonably close to O(1). Then you could do something like: SELECT x FROM t WHERE rowid = hashfn(?0) AND real_primary = ?0 hashfn() would take a string and return a 64-bit rowid. The test against real_primary is to guard against hash collisions, which you may-or-may-not care to bother with (that said, it should be essentially free, at least in disk I/O terms).
It's funny you should say that, as my current "best" (fastest) 64-bit hash of a string involves concatenating Ye Olde Standard CRC-32 and Paul Hsieh's SuperFastHash function, and I've been considering using that internally as an in-memory surrogate of what would otherwise be a long path/filename.
BTW, this may be an example of a case where *dbm or bdb would be justified. It would also be interesting for someone to build a virtual table interface implementing this idiom.
I gave those options thought as well, but ended up liking the vdbe/SQL interface for development convenience, and the fact that "out of the box" all of the cross-thread cross-process consistency issues were dealt with. I'm a 17-year veteran of Oracle databases, and (perl)dbm databases before that, so I really liked the lesser "mental surface area" needed to program real programs with these options. --a