On 2014/05/31 04:45, Hayden Livingston wrote:
I have a binary format that is effectively structured data.

I currently have multiple indexes but none of them are sorted, because in
my toy system ORDER BY's are not supported, an implicit ORDER BY time of
record inserted exists because it is a single threaded application.

My indexing story is that I know a priori what my index will be, and I just
double write them.

I want to support ORDER BY on basically one additional field, and I'm
having a hard time coming up with a solution that doesn't involved
effectively writing on my B+Tree which has knowledge of my serialization
binary format.

Simply add a column with a sort specifier, like an integer key value to indicate the sort rank, with an index defined for it. Unique index even if you need. If you do not know the sort rank before-hand then add the actual value to the extra column (I mean from your data structure's internal value which you want to be sorting by).

I've thought to myself, maybe I could just store a duplicate copy of the
data sorted by this field into SQLite. The data footprint is not small, but
I'm willing to pay that disk cost if that's the only option.

Much more expensive than an extra column, and I'm positive it wouldn't be 
needed.

My only concern with moving to SQLite is performance design decisions that
are built into a database. My system is basically only doing the work
needed to iterate over my data. No concurrency, sequential reads.

When/If I move to SQLite, what kind of things will I start to pay for?
Concurrency? Other fluff data structure needs for SQLite?

I think you might have some skewed information. Using SQLite will be a gain, with less fluff and amazing speed, portability and portability. (not sure why those two words are the same, one intends to mean the ability to change between systems or upgrade over time, the other means to have in a light package that can easily be transported or moved to a different device etc.) What you give up for these gains is networked client-server style usage and user-access control. If you need networking or user-controls, use another engine, if not, SQLite is one of the best.

Obviously I imagine that whatever field I choose to get a sorted index on,
I expose as it as a field.

True for any DB engine you choose, not just SQLite... Though in SQLite you could devise a virtual table mechanism that encapsulates your data structure and exposes it as if it were a normal DB table, which is what a lot of people do having your specific kind of set-up. You can read up on it on the SQLite site some more. (Just search "Virtual Table") - and let me add this would be a great deal more easy and helpful than devising your own B-Tree storage medium.

Then the non SQLite question: how much work would it be to duplicate a
B-tree like data structure for my binary data?

I can only imagine you ask the question because you are not completely familiar with how B-Trees work (else you probably could easily judge how long it would take you, unless I'm missing some nuance). B-Tree structures are fairly widely used for all kinds of things. I coded a simple one in a day some time ago but I imagine it can take a lot longer for one with a lot of optimizations. There are a lot of articles on the net, but the Wiki article on B-Trees should suffice to give you an idea of what it entails.


_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to