Generally Ive seen it recommended to do a composite CF since it gives you more 
flexibility and its easier to debug.  You can get some performance improvements 
by storing a serialized blob (a lot of data you can represent much smaller this 
way by factor of 10 or more if clever) to represent your entity but the 
complexity is rarely worth it.  It is likely a premature optimization but I 
have seen cases its shown a good improvement.

either case, the data will ultimately be read sequentially from disk per 
sstable (normal bottleneck) so the only benefit you gain is 
- potentially disk space (if serialization is efficient) and network bandwidth
- Cassandra won’t have to deserialize as many columns, but I’m fairly certain 
this is utterly irrelevant
- if stored in a mechanism that you can deserialize efficiently (like 
protobufs) it can make a big difference on your app side

keep in mind if serializing data though you will have to always maintain code 
that will be able to read old versions, it can become very complex and lead to 
weird bugs.

---
Chris Lohfink

On Apr 21, 2014, at 3:53 AM, Jagan Ranganathan <ja...@zohocorp.com> wrote:

> Dear All,
> 
> We have a requirement to store 'N' columns of an entity in a CF. Mostly this 
> is write once and read many times. What is the best way to store the data?
> Composite CF
> Simple CF with value as protobuf extracted data
> Both provides extendable columns which is a requirement for our usage. 
> 
> But I want to know which one is efficient, assuming there is bound to be say 
> 5% of updates?
> 
> Regards,
> Jagan

  • Doubt Jagan Ranganathan
    • Re: Doubt Chris Lohfink

Reply via email to