Right, row stats in 0.6 are just "what I've seen during the compactions that happened to run since this node restarted last."
0.7 has persistent (and more fine-grained) statistics. On Thu, Aug 12, 2010 at 1:28 PM, Ryan King <r...@twitter.com> wrote: > On Thu, Aug 12, 2010 at 9:08 AM, Julie <julie.su...@nextcentury.com> wrote: >> I am chasing down a row size discrepancy and am confused. >> >> I populated a single node Cassandra cluster with 10,000 rows of data, using >> numeric keys 1-10,000, where each row is a little over 100kB in length and >> has >> a single column in it. >> >> When I perform a cfstats on the node immediately after writing the data, it >> reports that the Compacted row minimum size = Compacted row maximum size >> which >> is a little over 100,000 bytes. This is what I expect. >> >> I then run an application that randomly reads rows and adds a timestamp >> column >> to each row read. This timestamp column name and column value is just adding >> a few bytes to the row. >> >> But after running my reading app for a few hours, cfstats reports a very odd >> minimum row size (and compacted mean row size): >> >> [r...@ec2-server1 ~]# /mnt/server/apache-cassandra-0.6.2/bin/nodetool -h >> ec2-server1 -p 8080 cfstats >> Keyspace: Keyspace1 >> Read Count: 670434 >> Read Latency: 36.22349047035205 ms. >> Write Count: 1519933 >> Write Latency: 0.02940705741634664 ms. >> Pending Tasks: 0 >> Column Family: Standard1 >> SSTable count: 6 >> Space used (live): 11130225642 >> Space used (total): 11130225642 >> Memtable Columns Count: 1435 >> Memtable Data Size: 40344907 >> Memtable Switch Count: 1329 >> Read Count: 670434 >> Read Latency: 41.768 ms. >> Write Count: 1519933 >> Write Latency: 0.025 ms. >> Pending Tasks: 0 >> Key cache capacity: 200000 >> Key cache size: 200000 >> Key cache hit rate: 0.48049934471509675 >> Row cache: disabled >> Compacted row minimum size: 238 >> Compacted row maximum size: 100323 >> Compacted row mean size: 67548 >> >> I thought I had a bug in my code so I wrote another app to read every row >> in the database, keys 1-10,000. I get the size of each row after reading it >> (by adding up all column names and column values in the row and the size of >> the key string) and this matches what I expect -- every single key in this >> table has a size of just over 100,000 bytes. (I know there are some >> overhead columns in each row but I assume these will only make the row >> larger, not smaller.) >> >> So I am confused about where cfstats is getting the row sizes it is working >> with? >> >> When I add the timestamp column to each row, I am not deleting the other >> column (large) in the row but I am not rewriting the large column either. > > I'm guessing (haven't read this part of the source) that the min size > is being generated in minor compaction, which doesn't see the whole > row. > > -ryan > -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com