Storage proxy will give you the total writes through the server, for all CFs.
CommitLog thread pool is not what you want. It's not designed to measure the column or row throughput, it's just how many tasks have run through the thread pool. The closest thing to recording the number of columns is the MemtableColumnCount in the per CF stats in JMX (and cfinfo in nodetool). It is updated here https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/Memtable.java#L196 Note: - it only counts top level columns, not sub columns - it includes deletes - it is per Memtable, so it is cleared when a new memtable is switched in. - the number is also included in the logs when the memtable is flushed Hope that helps. ----------------- Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 12/10/2011, at 8:31 PM, Alexandru Dan Sicoe wrote: > Thanks for the quick replies guys! > > Just to explain to you why I wanted to understand these two measures, I do > batch inserts to Cassandra but the batches are not fixed in size i.e. the > number of columns in a batch varies and also the data type of the values > placed in the columns varies (the name of the columns is always a long - > timestamp) => this also makes it hard to predict the actual data rate I am > sending to Cassandra. I thought that if I can get a cluster wide measurement > of the batch insertions per second and also of the individual column > insertions per second I can understand better what's happening. > > So, from what you guys said I understand that: > - the StorageProxi WriteOperations attribute gives me the batch insertions > per second sent to the cluster (so this is fine) > - the Commitlog CompletedTasks attribute is definitely a closer measurement > to the single column insertions but it is not accurate (i.e. it will be > higher) because several types of row mutations can happen when any column is > inserted - How close is this measurement to the single column insertions per > second I want to obtain? Is there anything I can use to get a more accurate > measurement of the single column insertions per sec or is it good enough? > > Cheers, > Alexandru > > On Wed, Oct 12, 2011 at 4:18 AM, Tyler Hobbs <ty...@datastax.com> wrote: > The OpsCenter graph you're referring to basically does the following: > > 1. For each node, find out how much the WriteOperations attribute of the > StorageProxy increased during the last minute. > 2. Sum these values to get a total for the cluster. > 3. Divide by 60 to get an average number of WriteOperations per second for > the cluster. > > > On Tue, Oct 11, 2011 at 3:55 PM, aaron morton <aa...@thelastpickle.com> wrote: > Its the number of mutations, a mutation is a collection of changes for a > single row across one or more column families. > > Take a look at the nodetool cfstats, this is where I assume Ops Centre is > getting it's data from. > > Cheers > > ----------------- > Aaron Morton > Freelance Cassandra Developer > @aaronmorton > http://www.thelastpickle.com > > On 12/10/2011, at 3:44 AM, Alexandru Dan Sicoe wrote: > >> Hello everyone, >> I was trying to get some cluster wide statistics of the total insertions >> performed in my 3 node Cassandra 0.8.6 cluster. So I wrote a nice little >> program that gets the CompletedTasks attribute of >> org.apache.cassandra.db:type=Commitlog from every node, sums up the values >> and records them in a .csv every 10 sec or so. Everything works and I get my >> stats but later I found out that I am not really sure what this measure >> means. I think it is the individual column insertions performed! Am I >> correct? >> In the meantime I installed the trial version of the DataStax Operations >> Center. The cluster wide dashboard, showing Writes performed as a function >> of time, gives me much smaller values of the rates, compared to the >> measurement I described before. The Datastax writes/sec are of the same >> order of magnitude as the batch writes I perform on the cluster. But somehow >> I cannot relate between this rate and the rate of my CompletedTasks >> measurement. >> >> How do people usually measure insertion rates for their custers ? Per batch, >> per single columns or is actual data rate more important to know? >> >> Cheers, >> Alexandru >> > > > > > -- > Tyler Hobbs > Software Engineer, DataStax > Maintainer of the pycassa Cassandra Python client library > > > > > -- > Alexandru Dan Sicoe > MEng, CERN Marie Curie ACEOLE Fellow >