According to http://wiki.apache.org/cassandra/Operations nodetool repair is used to perform a major compaction and compare data between the nodes, repairing any conflicts. Not sure that would improve the load balance, though it may reduce some wasted space on the nodes.
nodetool loadbalance will remove the node from the ring after streaming it's data to the remaining nodes and the add it back in the busiest part. I've used it before and it seems to do the trick. Also consider the size of the rows. Are they generally similar or do you have some that are much bigger? The keys will be distributed without considering the size of the data. The RP is random though, i do not think it tries to evenly distribute the keys. So some variance with a small number of nodes should be expected IMHO. Aaron On 21 Jun 2010, at 02:31, James Golick wrote: > I ran cleanup on all of them and the distribution looked roughly even after > that, but a couple of days later, it's looking pretty uneven. > > On Sun, Jun 20, 2010 at 10:21 AM, Jordan Pittier - Rezel <jor...@rezel.net> > wrote: > Hi, > Have you tried nodetool repair (or cleanup) on your nodes ? > > > On Sun, Jun 20, 2010 at 4:16 PM, James Golick <jamesgol...@gmail.com> wrote: > I just increased my cluster from 2 to 4 nodes, and RF=2 to RF=3, using RP. > > The tokens seem pretty even on the ring, but two of the nodes are far more > heavily loaded than the others. I understand that there are a variety of > possible reasons for this, but I'm wondering whether anybody has suggestions > for now to tweak the tokens such that this problem is alleviated. Would it be > better to just add 2 more nodes? > > Address Status Load Range > Ring > 170141183460469231731687303715884105728 > 10.36.99.140 Up 61.73 GB 43733172796241720623128947447312912170 > |<--| > 10.36.99.134 Up 69.7 GB 85070591730234615865843651857942052864 > | | > 10.36.99.138 Up 54.08 GB > 128813844387867495544257452469445200073 | | > 10.36.99.136 Up 54.75 GB > 170141183460469231731687303715884105728 |-->| > > >