Re: when i use the OrderPreservingPartition, the load is very imbalance

Lucas Di Pentima Mon, 26 Apr 2010 04:50:07 -0700

Hello Mark,

El 26/04/2010, a las 07:17, Mark Robson escribió:


> I think the solution to this would be to choose your nodes' tokens wisely 
> before you start inserting data, and if possible, modify the keys to split 
> them better between the nodes.
> 
> For example, if your key has two parts, one of which you want to range scan, 
> another which you don't. Say you have customer_id and a timestamp. The 
> customer ID does not need to be range scanned, so you can hash it into a hex 
> value (say), then append the timestamp (in a lexically sortable way of 
> course). So you'd end up with keys like 
> 
> HHHH-0012345-0001234567890
> 
> Where HHHH is a hash of the customer ID, 0012345 is the customer ID, and the 
> rest is a timestamp.
> 
> You'd be able to do a time range scan by using the known prefixes, and 
> distributing your nodes equally from 0000 to ffff would result in fairly even 
> data (provided you don't have a very small number of very large customers).


How do you ask cassandra to do a range scan with a prefix? As far as I can 
tell, you can't do something like:

db.get_range('SomeCF', :start => 'HHHH-0012345-*')

...do you?


Regards
--
Lucas Di Pentima - Santa Fe, Argentina
Jabber: [email protected]
MSN: [email protected]

Re: when i use the OrderPreservingPartition, the load is very imbalance

Reply via email to