This page has a guide to setting the initial tokens for the nodes http://wiki.apache.org/cassandra/Operations#Ring_management
You can also use the bin/nodetool cfstats command or JConsole to check the maximum row size in each node, to see if you have a monster row. Aaron On 3/02/2011, at 10:22 PM, abhinav prakash rai wrote: > Hi Peter, > > Thanks for your reply. > > Our application is multi-threaded. we are using 8 core machine. In our > application we are using 4 column families out of which one column family is > containing rows whose size is huge relative to size of the rows in other > column families. > > In the ring the balance is highly skewed.Can you suggest we can insure even > balancing of the load across the cluster? > > The rows id in one column family is combination of cell numbers ( ie > 9883240354_9885430354 ) and other row id's are like thread_name_12234 etc. > > How to insure spreading the data across rows? > > Thanks & Regards, > abhinav > > > > > > On Thu, Feb 3, 2011 at 1:46 PM, Peter Schuller <peter.schul...@infidyne.com> > wrote: > > First time I tun single instance of Cassandra and my application on a system > > (16GB ram and 8 core), the time taken was 480sec. > > When I added one more system ,(means this time I was running 2 instance > > of Cassandra in cluster) and running application from single client , I > > found time taken in increased to 1000sec. And I also found that that data > > distribution was also very odd on both system (in one system data were about > > 2.5GB and another were 140MB). > > Is any configuration require while running Cassandra in a cluster other than > > adding seeds ? > > For starters: > > (1) Are you spreading your data around evenly across row? Rows > determine where data is placed in the cluster. > (2) Is your ring actually balanced? (nodetool ring, they should have 50/50) > (3) Is your test concurrent/multi-threaded? Increasing total time > would be expected if you're moving from local traffic only to running > against remote machines, if your test is a sequential workload. > Adding machines increases aggregate throughput across multiple > clients; it won't make individual requests faster (except indirectly > of course by avoiding overloaded conditions). > > > -- > / Peter Schuller > > > > -- > Regards, > Abhinav P. Rai