Generally Ive seen it recommended to do a composite CF since it gives you more flexibility and its easier to debug. You can get some performance improvements by storing a serialized blob (a lot of data you can represent much smaller this way by factor of 10 or more if clever) to represent your entity but the complexity is rarely worth it. It is likely a premature optimization but I have seen cases its shown a good improvement.
either case, the data will ultimately be read sequentially from disk per sstable (normal bottleneck) so the only benefit you gain is - potentially disk space (if serialization is efficient) and network bandwidth - Cassandra won’t have to deserialize as many columns, but I’m fairly certain this is utterly irrelevant - if stored in a mechanism that you can deserialize efficiently (like protobufs) it can make a big difference on your app side keep in mind if serializing data though you will have to always maintain code that will be able to read old versions, it can become very complex and lead to weird bugs. --- Chris Lohfink On Apr 21, 2014, at 3:53 AM, Jagan Ranganathan <ja...@zohocorp.com> wrote: > Dear All, > > We have a requirement to store 'N' columns of an entity in a CF. Mostly this > is write once and read many times. What is the best way to store the data? > Composite CF > Simple CF with value as protobuf extracted data > Both provides extendable columns which is a requirement for our usage. > > But I want to know which one is efficient, assuming there is bound to be say > 5% of updates? > > Regards, > Jagan