I know HBase will set the TotalOrderPartitioner in MR, but in Spark, I need to sort the rows myself.
Jianshi On Sat, Aug 2, 2014 at 12:24 AM, Arun Allamsetty <arun.allamse...@gmail.com> wrote: > Hi Jianshi, > > Do you mean that you want to sort the row keys? If yes, then you don't have > to worry about it because HBase sorts the row keys on its own but > lexicographically. > > Cheers, > Arun > > Sent from a mobile device. Please don't mind the typos. > On Jul 30, 2014 9:02 PM, "Jianshi Huang" <jianshi.hu...@gmail.com> wrote: > > > I need to generate from a 2TB dataset and exploded it to 4 Column > Families. > > > > The result dataset is likely to be 20TB or more. I'm currently using > Spark > > so I sorted the (rk, cf, cq) myself. It's huge and I'm considering how to > > optimize it. > > > > My question is: > > Should I sort and write each column family one by one, or should I put > them > > all together then do sort and write? > > > > Does my question make sense? > > > > -- > > Jianshi Huang > > > > LinkedIn: jianshi > > Twitter: @jshuang > > Github & Blog: http://huangjs.github.com/ > > > -- Jianshi Huang LinkedIn: jianshi Twitter: @jshuang Github & Blog: http://huangjs.github.com/