Brannon King wrote:
John Stanton wrote:
You don't seem to need a data manipulation system like Sqlite, more a
form of high volume storage. Do you really need elaborate SQL,
journalling, ROLLBACK and assured disk storage?
Di you consider some form of hashed storage, perhaps linear hashing,
to build a compact and high performance associative array for your
sparsely keyed data.
Do you really need the overhead of B-trees is you are just storing a
sparse array?
JS
I don't need journaling or rollback. I'd love a way to shut them off.
But elaborate SQL, that sure is handy. I'm not just storing, I'm
viewing stored, compressed data. I definitely need some way of querying
a sparse matrix data that is larger than my DRAM. Sqlite sure seems like
the quickest route to a workable product for that to happen. It has all
the streaming/caching built in. Because of that, I assume it is faster
than random file access. It supports complex data queries and indexes,
both things I would need anyway. In the world of programming, I think
many will agree you should get a working product, then make it faster.
I'm just trying to get the most speed out of the easiest tool. If I need
to rewrite the file storage for the next version, we can consider the
cost to benefit for that separately.
I saw you performance requirements and data rate, which looks difficult
to achieve when you are writing journals and ensuring the integrity of
disk records.
You will find that Sqlite is much slower than random file access,
because Sqlite is built on top of random file access. You get random
file access speed less all the overhead of Sqlite's journals and B-trees.
We have an application using storage something like yours and we use
memory mapped areas with AVL trees for indexing. If it needed to run as
fast as yours we would probably use hashing rather than the binary
trees. A sparse index is realized by concatenated keys. This method
dynamically uses memory for caching, but is not limited to physical
memory size, only virtual memory. It assumes POSIX capabilities from
the OS.
With hashing you avoid the overhead of B-tree balancing with insertions,
but pay for it by not having the keys accessable in an ordered sequence.
Ask yourself and your users/customers whether the easiest solution or
the best solution is the most satisfactory.