Hi All,
New to Cassandra, so apologies if I don't fully grok stuff just yet.
I have data keyed by a key as well as a date. I want to run a query to get
multiple keys across multiple contiguous date ranges simultaneously. I'm
currently storing the date along with the row key like this:
key1|2011-05-15 { c1 : , c2 :, c3 : ... }
key1|2011-05-16 { c1 : , c2 :, c3 : ... }
key2|2011-05-15 { c1 : , c2 :, c3 : ... }
key2|2011-05-16 { c1 : , c2 :, c3 : ... }
...
I generate all the key/date combinations that I'm interested in and use
multiget_slice to retrieve them, pulling in all the columns for each key (I
need all the data, but the number of columns is small: less than 100). The
total number of row keys retrieved will only be 100 or so.
Now it strikes me I could also store this using composite columns, like
this:
key1 { 2011-05-15|c1 : , 2011-5-16|c1 : , 2011-05-15|c2 :, 2011-05-16|c2 :
, 2011-05-15|c3 : , 2011-05-16|c3 : , ... }
key2 { 2011-05-15|c1 : , 2011-5-16|c1 : , 2011-05-15|c2 :, 2011-05-16|c2 :
, 2011-05-15|c3 : , 2011-05-16|c3 : , ... }
...
Then use multislice_get again (but with less keys), and use a slice range to
only retrieve the dates I'm interested in.
Another alternative I guess would be to use OPP with the first storage
approach and get_range_slices, but as I understand this would not be great
for performance due to keys being clustered together on a single node?
So my question is, which approach is best? One downside to the latter I
guess is that the number of columns grows without bound (although with 2
billion to play with this isn't gonna be a problem any time soon). Also
multiget_slice supports only one slice predicate, so I'd guess I'd have to
use multiple queries to get multiple date ranges.
Anyway, any thoughts/tips appreciated.
Thanks,
Charles