I need to generate from a 2TB dataset and exploded it to 4 Column Families.

The result dataset is likely to be 20TB or more. I'm currently using Spark
so I sorted the (rk, cf, cq) myself. It's huge and I'm considering how to
optimize it.

My question is:
Should I sort and write each column family one by one, or should I put them
all together then do sort and write?

Does my question make sense?

-- 
Jianshi Huang

LinkedIn: jianshi
Twitter: @jshuang
Github & Blog: http://huangjs.github.com/

Reply via email to