This page has a guide to setting the initial tokens for the nodes 
http://wiki.apache.org/cassandra/Operations#Ring_management

You can also use the bin/nodetool cfstats command or JConsole to check the 
maximum row size in each node, to see if you have a monster row.

Aaron

On 3/02/2011, at 10:22 PM, abhinav prakash rai wrote:

> Hi Peter,
> 
> Thanks for your reply.
> 
> Our application is multi-threaded. we are using 8 core machine. In our 
> application we are using 4 column families out of which one column family is 
> containing rows whose size is huge relative to size of the rows in other 
> column families.
> 
> In the ring the balance is highly skewed.Can you suggest we can insure even 
> balancing of the load across the cluster?
> 
> The rows id in one column family is combination of cell numbers ( ie 
> 9883240354_9885430354 ) and other row id's are like thread_name_12234 etc.
> 
> How to insure spreading the data across rows?
> 
> Thanks & Regards,
> abhinav
> 
> 
> 
>  
> 
> On Thu, Feb 3, 2011 at 1:46 PM, Peter Schuller <peter.schul...@infidyne.com> 
> wrote:
> > First time I tun single instance of Cassandra and my application on a system
> > (16GB ram and 8 core), the time taken was 480sec.
> > When I added one more system ,(means this time I was running 2 instance
> > of Cassandra in cluster) and running application from single client , I
> > found time taken in increased to 1000sec.   And I also found that that data
> > distribution was also very odd on both system (in one system data were about
> > 2.5GB and another were 140MB).
> > Is any configuration require while running Cassandra in a cluster other than
> > adding seeds ?
> 
> For starters:
> 
> (1) Are you spreading your data around evenly across row? Rows
> determine where data is placed in the cluster.
> (2) Is your ring actually balanced? (nodetool ring, they should have 50/50)
> (3) Is your test concurrent/multi-threaded? Increasing total time
> would be expected if you're moving from local traffic only to running
> against remote machines,  if your test is a sequential workload.
> Adding machines increases aggregate throughput across multiple
> clients; it won't make individual requests faster (except indirectly
> of course by avoiding overloaded conditions).
> 
> 
> --
> / Peter Schuller
> 
> 
> 
> -- 
> Regards,
> Abhinav P. Rai

Reply via email to