I wanted to share this with the community in the hopes that it might help someone with their schema design.
I didn't get any red flags early on to limit the number of columns we use. If anything the community pushes for dynamic schema because Cassandra has super nice online ALTER TABLE. However, in practice we've found that Cassandra started to use a LOT more CPU than anything else in our stack. Including Elasticsearch. ES uses about 8% of our total CPU whereas Cassandra uses about 70% of it.. It's not an apples to oranges comparison mind you but Cassandra definitely warrants some attention in this scenario. I put Cassandra into a profiler (Java Mission Control) to see if anything weird was happening and didn't see any red flags. There were some issues with CAS so I rewrote that to implement a query before CAS operation where we first check if the row is already there, then use a CAS if its missing. That was a BIG performance bump. Probably reduced our C* usages by 40% However, I started to speculate that it might be getting overwhelmed with the raw numbers of rows. I fired up cassandra_stress to verify and basically split it at 10 columns with 150 bytes and then 150 columns with 10 bytes. In this synthetic benchmark C* was actually 5-6x faster for the run with 10 columns. So this tentatively confirmed my hypothesis. So I decided to get a bit more aggressive and tried to test it with a less synthetic benchmark. I wrote my own benchmark which uses our own schema in two forms. INLINE_ONLY: 150 columns... DATA_ONLY: 4 columns (two primary key, 1 data_format and one data_blob) column It creates T threads, writes W rows, then reads R rows.. I set T=50, W=50,000, R=50,000 It does a write pass, then a read pass. I didn't implement a mixed workload though.. I think that my results wouldn't matter as much. The results were similarly impressive but not as much as the synthetic benchmark above. It was 2x faster (6 minutes vs 3 minutes). In the inline only benchmark, C* spends 70% of the time in high CPU. In data_only it's about 50/50. I think we're going to move to this model and re-write all our C* stables to support this inline JSON. The second benchmark was under 2.0.16... (our production version). The cassandra_stress was under 3.0 beta as I wanted to see if a later version of cassandra fixed the problem. It doesn't. This was done on a 128GB box with two Samsung SSDs in RAID0. I didn't test it with any replicas. This brings up some interesting issues: - still interesting that C* spends as much time as it does under high CPU load. I'd like to profile it again. - Looks like there's room for improvement in the JSON encoder/decoder. I'm not sure how much we would see though because it's already using the latest jackson which I've tuned significantly. I might be able to get some performance out of it by avoiding GC and garbage collection. - Later C* might improve our CPU regardless so this might be something we do anyway (upgrade our cassandra). -- We’re hiring if you know of any awesome Java Devops or Linux Operations Engineers! Founder/CEO Spinn3r.com Location: *San Francisco, CA* blog: http://burtonator.wordpress.com … or check out my Google+ profile <https://plus.google.com/102718274791889610666/posts>