Hello Mark,
El 26/04/2010, a las 07:17, Mark Robson escribió:
> I think the solution to this would be to choose your nodes' tokens wisely
> before you start inserting data, and if possible, modify the keys to split
> them better between the nodes.
>
> For example, if your key has two parts, one of which you want to range scan,
> another which you don't. Say you have customer_id and a timestamp. The
> customer ID does not need to be range scanned, so you can hash it into a hex
> value (say), then append the timestamp (in a lexically sortable way of
> course). So you'd end up with keys like
>
> HHHH-0012345-0001234567890
>
> Where HHHH is a hash of the customer ID, 0012345 is the customer ID, and the
> rest is a timestamp.
>
> You'd be able to do a time range scan by using the known prefixes, and
> distributing your nodes equally from 0000 to ffff would result in fairly even
> data (provided you don't have a very small number of very large customers).
How do you ask cassandra to do a range scan with a prefix? As far as I can
tell, you can't do something like:
db.get_range('SomeCF', :start => 'HHHH-0012345-*')
...do you?
Regards
--
Lucas Di Pentima - Santa Fe, Argentina
Jabber: [email protected]
MSN: [email protected]