On 16 Sep 2013, at 10:04pm, Jason H <scorp...@yahoo.com> wrote:

> As the table is viewed though, it is void as a join between the key and all 
> column families. What denotes a column family (cf) is not specified, however 
> the idea is to group columns into cfs by usage. That is cf1 is your most 
> commonly needed data, and cfN is the least often needed.
> 
> HBase is queried by a specialized API. This API is written to work over very 
> large datasets, working directly with the data. However not all uses of HBase 
> need this. The majority of queries are distributed just because they are over 
> a huge dataset, with a modest amount of rows returned. Distribution allows 
> for much more paralleled disk reading.  For this case, a SQLite cluster makes 
> perfect sense. 
> 
> Mapping all of this to SQLite, I could see a bit of work could go a long way. 
> Column families can be implemented as separate files, which are ATTACHed and 
> joined as needed. The most complicated operation is a join, where we have to 
> coordinate the list of distinct values of the join to all other notes, for 
> join matching. We then have to move all of that data to the same node for the 
> join. 

Just a quick couple of things you didn't mention that might help.  You probably 
already know about them but you mentioned ATTACH and didn't mention them so I 
thought I might niggle you.

First, look into VIEWs.  You can save any SELECT as a VIEW, then consult it 
like you would a table.  So if you have split your data up in separate tables, 
and even separate databases using ATTACH, you can reunite it by defining a VIEW 
that includes one or more JOINs.  This can dramatically simplify the syntax of 
the SELECTs you actually want to do.  Since it's easy to create and destroy 
VIEWs you can do it ATTACH,CREATE VIEW and DROP VIEW,DETACH, linking the 
auxiliary storage into what looks like normal tables.

Unfortunately SQLite does not implement making changes (INSERT, UPDATE, DELETE) 
to VIEWs.  Which is understandable because doing it correctly is very tricky.  
But it's great for SELECTs.

Second, there are many ways of implementing clusters, depending on how tricky 
you want to get.  You can write a SQLite server app which takes commands and 
returns results as JSON.  Or at the other end of the spectrum you can implement 
your own Virtual File System (VFS) which takes disk accesses addressed to an 
individual file system and understands how to distribute the operations over 
your cluster.  (I have no idea how to do the latter and, thank goodness, my job 
doesn't require it.)  So the first solution you see is not necessarily the 
right one.  If you can't pick one yourself I'm hoping that people here might be 
able to help recommend an implementation that would suit you.

Simon.
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to