Hi, I have a small 2 node cassandra cluster that seems to be constrained by read throughput. There are about 100 writes/s and 60 reads/s mostly against a skinny column family. Here's the cfstats for that family:
SSTable count: 13 Space used (live): 231920026568 Space used (total): 231920026568 Number of Keys (estimate): 356899200 Memtable Columns Count: 1385568 Memtable Data Size: 359155691 Memtable Switch Count: 26 Read Count: 40705879 Read Latency: 25.010 ms. Write Count: 9680958 Write Latency: 0.036 ms. Pending Tasks: 0 Bloom Filter False Postives: 28380 Bloom Filter False Ratio: 0.00360 Bloom Filter Space Used: 874173664 Compacted row minimum size: 61 Compacted row maximum size: 152321 Compacted row mean size: 1445 iostat shows almost no write activity, here's a typical line: Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util sdb 0.00 0.00 312.87 0.00 6.61 0.00 43.27 23.35 105.06 2.28 71.19 and nodetool tpstats always shows pending tasks in the ReadStage. The data set has grown beyond physical memory (250GB/node w/64GB of RAM) so I know disk access is required, but are there particular settings I should experiment with that could help relieve some read i/o pressure? I already put memcached in front of cassandra so the row cache probably won't help much. Also this column family stores smallish documents (usually 1-100K) along with metadata. The document is only occasionally accessed, usually only the metadata is read/written. Would splitting out the document into a separate column family help? Thanks Kireet