On 16 Sep 2013, at 10:04pm, Jason H <scorp...@yahoo.com> wrote: > As the table is viewed though, it is void as a join between the key and all > column families. What denotes a column family (cf) is not specified, however > the idea is to group columns into cfs by usage. That is cf1 is your most > commonly needed data, and cfN is the least often needed. > > HBase is queried by a specialized API. This API is written to work over very > large datasets, working directly with the data. However not all uses of HBase > need this. The majority of queries are distributed just because they are over > a huge dataset, with a modest amount of rows returned. Distribution allows > for much more paralleled disk reading. For this case, a SQLite cluster makes > perfect sense. > > Mapping all of this to SQLite, I could see a bit of work could go a long way. > Column families can be implemented as separate files, which are ATTACHed and > joined as needed. The most complicated operation is a join, where we have to > coordinate the list of distinct values of the join to all other notes, for > join matching. We then have to move all of that data to the same node for the > join.
Just a quick couple of things you didn't mention that might help. You probably already know about them but you mentioned ATTACH and didn't mention them so I thought I might niggle you. First, look into VIEWs. You can save any SELECT as a VIEW, then consult it like you would a table. So if you have split your data up in separate tables, and even separate databases using ATTACH, you can reunite it by defining a VIEW that includes one or more JOINs. This can dramatically simplify the syntax of the SELECTs you actually want to do. Since it's easy to create and destroy VIEWs you can do it ATTACH,CREATE VIEW and DROP VIEW,DETACH, linking the auxiliary storage into what looks like normal tables. Unfortunately SQLite does not implement making changes (INSERT, UPDATE, DELETE) to VIEWs. Which is understandable because doing it correctly is very tricky. But it's great for SELECTs. Second, there are many ways of implementing clusters, depending on how tricky you want to get. You can write a SQLite server app which takes commands and returns results as JSON. Or at the other end of the spectrum you can implement your own Virtual File System (VFS) which takes disk accesses addressed to an individual file system and understands how to distribute the operations over your cluster. (I have no idea how to do the latter and, thank goodness, my job doesn't require it.) So the first solution you see is not necessarily the right one. If you can't pick one yourself I'm hoping that people here might be able to help recommend an implementation that would suit you. Simon. _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users