Oh, btw, is latest HDP 2.1(0.98.0.2.1.7.0-784-hadoop2) have this fix? Jianshi
On Fri, Nov 14, 2014 at 9:37 AM, Jianshi Huang <jianshi.hu...@gmail.com> wrote: > Thanks Ted. > > I think the fix you mentioned is this one HBASE-12078 > <https://issues.apache.org/jira/browse/HBASE-12078>. > > Not sure when our Hadoop admin would upgrade it, ahhh.... > > Jianshi > > On Thu, Nov 13, 2014 at 11:15 PM, Ted Yu <yuzhih...@gmail.com> wrote: > >> Keep in mind that Prefix Tree encoding has higher overhead in write path >> compared to other data block encoding methods. >> >> Please use 0.98.7 which has the latest fixes for Prefix Tree encoding. >> >> Cheers >> >> On Thu, Nov 13, 2014 at 1:27 AM, Jianshi Huang <jianshi.hu...@gmail.com> >> wrote: >> >> > Thanks Ram, >> > >> > How about Prefix Tree based encoding then? HBASE-4676 >> > <https://issues.apache.org/jira/browse/HBASE-4676> says it's also >> possible >> > to do suffix tries? Then it could be a nice fit for JSON String (or any >> > long value where changes are small). >> > >> > Maybe I should just flatten JSON to columns, hmm...what's the overhead >> for >> > a column? >> > >> > Jianshi >> > >> > On Thu, Nov 13, 2014 at 4:49 PM, ramkrishna vasudevan < >> > ramkrishna.s.vasude...@gmail.com> wrote: >> > >> > > >>So is it possible to specify FASTDIFF for rowkey/column and DIFF for >> > > value >> > > cell? >> > > No that is not possible now. All the encoding is per KV only. >> > > But what you say is definitely worth trying. >> > > >> > > >>So would you recommend storing JSON flattened as many columns? >> > > May be yes. But I have practically not used JSON formats so I may >> not be >> > > the best person to comment on this. >> > > >> > > Regards >> > > Ram >> > > >> > > On Thu, Nov 13, 2014 at 2:01 PM, Jianshi Huang < >> jianshi.hu...@gmail.com> >> > > wrote: >> > > >> > > > Thanks Ram, >> > > > >> > > > So is it possible to specify FASTDIFF for rowkey/column and DIFF for >> > > value >> > > > cell? >> > > > >> > > > So would you recommend storing JSON flattened as many columns? >> > > > >> > > > Jianshi >> > > > >> > > > On Thu, Nov 13, 2014 at 2:08 PM, ramkrishna vasudevan < >> > > > ramkrishna.s.vasude...@gmail.com> wrote: >> > > > >> > > > > Hi >> > > > > >> > > > > >> Since I'm storing >> > > > > historical data (snapshot data) and changes between adjacent value >> > > cells >> > > > > are relatively small. >> > > > > >> > > > > If the values are changing even if it is smaller the FASTDIFF will >> > > > rewrite >> > > > > the value part. Only if there are exact matches then it would >> skip >> > the >> > > > > value part. JFYI. >> > > > > >> > > > > Regards >> > > > > Ram >> > > > > >> > > > > On Thu, Nov 13, 2014 at 11:23 AM, Jianshi Huang < >> > > jianshi.hu...@gmail.com >> > > > > >> > > > > wrote: >> > > > > >> > > > > > I thought FASTDIFF was only for rowkey and columns, great if it >> > also >> > > > > works >> > > > > > in value cell. >> > > > > > >> > > > > > And thanks for the bjson link! >> > > > > > >> > > > > > Jianshi >> > > > > > >> > > > > > On Thu, Nov 13, 2014 at 1:18 PM, Ted Yu <yuzhih...@gmail.com> >> > wrote: >> > > > > > >> > > > > > > There is FASTDIFF data block encoding. >> > > > > > > >> > > > > > > See also http://bjson.org/ >> > > > > > > >> > > > > > > Cheers >> > > > > > > >> > > > > > > On Nov 12, 2014, at 9:08 PM, Jianshi Huang < >> > > jianshi.hu...@gmail.com> >> > > > > > > wrote: >> > > > > > > >> > > > > > > > Hi, >> > > > > > > > >> > > > > > > > I'm currently saving JSON in pure String format in the value >> > cell >> > > > and >> > > > > > > > depends on HBase' block compression to reduce the overhead >> of >> > > JSON. >> > > > > > > > >> > > > > > > > I'm wondering if there's a more space efficient way to store >> > > JSON? >> > > > > > > > (there're lots of 0s and 1s, JSON String actually is an OK >> > > format) >> > > > > > > > >> > > > > > > > I want to keep the value as a Map since the schema of source >> > data >> > > > > might >> > > > > > > > change over time. >> > > > > > > > >> > > > > > > > Also is there a DIFF based encoding for values? Since I'm >> > storing >> > > > > > > > historical data (snapshot data) and changes between adjacent >> > > value >> > > > > > cells >> > > > > > > > are relatively small. >> > > > > > > > >> > > > > > > > >> > > > > > > > Thanks, >> > > > > > > > -- >> > > > > > > > Jianshi Huang >> > > > > > > > >> > > > > > > > LinkedIn: jianshi >> > > > > > > > Twitter: @jshuang >> > > > > > > > Github & Blog: http://huangjs.github.com/ >> > > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > > -- >> > > > > > Jianshi Huang >> > > > > > >> > > > > > LinkedIn: jianshi >> > > > > > Twitter: @jshuang >> > > > > > Github & Blog: http://huangjs.github.com/ >> > > > > > >> > > > > >> > > > >> > > > >> > > > >> > > > -- >> > > > Jianshi Huang >> > > > >> > > > LinkedIn: jianshi >> > > > Twitter: @jshuang >> > > > Github & Blog: http://huangjs.github.com/ >> > > > >> > > >> > >> > >> > >> > -- >> > Jianshi Huang >> > >> > LinkedIn: jianshi >> > Twitter: @jshuang >> > Github & Blog: http://huangjs.github.com/ >> > >> > > > > -- > Jianshi Huang > > LinkedIn: jianshi > Twitter: @jshuang > Github & Blog: http://huangjs.github.com/ > -- Jianshi Huang LinkedIn: jianshi Twitter: @jshuang Github & Blog: http://huangjs.github.com/