Re: Storing JSON in HBase value cell, which serialization format is most compact?

2014-11-13 Thread Jianshi Huang
Thanks Ram, So is it possible to specify FASTDIFF for rowkey/column and DIFF for value cell? So would you recommend storing JSON flattened as many columns? Jianshi On Thu, Nov 13, 2014 at 2:08 PM, ramkrishna vasudevan < ramkrishna.s.vasude...@gmail.com> wrote: > Hi > > >> Since I'm storing > h

Re: Trying to connect HBase Java Client I get: "Failed to locate the winutils binary in the hadoop binary path"

2014-11-13 Thread Dima Spivak
Nestor, No, you don't need a full distribution of Hadoop installed on your client machine as long as you have the necessary dependencies on the classpath when you run the client. -Dima On Wed, Nov 12, 2014 at 4:47 PM, Néstor Boscán wrote: > Yes I already applied that. > > I just wanted to unde

Re: Storing JSON in HBase value cell, which serialization format is most compact?

2014-11-13 Thread ramkrishna vasudevan
>>So is it possible to specify FASTDIFF for rowkey/column and DIFF for value cell? No that is not possible now. All the encoding is per KV only. But what you say is definitely worth trying. >>So would you recommend storing JSON flattened as many columns? May be yes. But I have practically not use

Re: Storing JSON in HBase value cell, which serialization format is most compact?

2014-11-13 Thread Jianshi Huang
Thanks Ram, How about Prefix Tree based encoding then? HBASE-4676 says it's also possible to do suffix tries? Then it could be a nice fit for JSON String (or any long value where changes are small). Maybe I should just flatten JSON to columns, hm

Re: Trying to connect HBase Java Client I get: "Failed to locate the winutils binary in the hadoop binary path"

2014-11-13 Thread Néstor Boscán
Hi Dima I've added the dependencies to hbase-client using maven. So, in theory, all dependecies should be in my classpath. But I still get the error. Settings HADOOP_HOME point to a hadoop distribution fixes the problem but this means that in my web application layer I will have to install the had

Re: Trying to connect HBase Java Client I get: "Failed to locate the winutils binary in the hadoop binary path"

2014-11-13 Thread Dima Spivak
Nestor, You mention building your client with Maven, but how are you running it (i.e. are you sure the dependencies are actually made available to the client after it's built)? -Dima On Thu, Nov 13, 2014 at 3:37 AM, Néstor Boscán wrote: > Hi Dima > > I've added the dependencies to hbase-client

Re: Storing JSON in HBase value cell, which serialization format is most compact?

2014-11-13 Thread ramkrishna vasudevan
Yes, I forgot to mention that. PrefixTree would best suit you. On Thu, Nov 13, 2014 at 2:57 PM, Jianshi Huang wrote: > Thanks Ram, > > How about Prefix Tree based encoding then? HBASE-4676 > says it's also possible > to do suffix tries? Then it

How to configure HTTPS for HBase servers web browser?

2014-11-13 Thread Rohith Sharma K S
Hi I am setting up Hadoop + HBase cluster in https mode for web access. For Hadoop cluster , I am able to set up. But for HBase , I could not find document or configurations to setting up HBase cluster in HTTPS mode. I read some document for setting up HBase REST API with SSL and WebHBase

Re: Trying to connect HBase Java Client I get: "Failed to locate the winutils binary in the hadoop binary path"

2014-11-13 Thread Néstor Boscán
Hi Dima Thanks for your quick answers. I'm running it directly in the Java IDE. And the project has all the dependencies from Maven. Regards, Néstor On Thu, Nov 13, 2014 at 7:12 AM, Dima Spivak wrote: > Nestor, > > You mention building your client with Maven, but how are you running it > (i.

Re: Storing JSON in HBase value cell, which serialization format is most compact?

2014-11-13 Thread Ted Yu
Keep in mind that Prefix Tree encoding has higher overhead in write path compared to other data block encoding methods. Please use 0.98.7 which has the latest fixes for Prefix Tree encoding. Cheers On Thu, Nov 13, 2014 at 1:27 AM, Jianshi Huang wrote: > Thanks Ram, > > How about Prefix Tree ba

Correlation between GETS, PUTS and system metrics to identify performance degradation

2014-11-13 Thread rohit kumar
Hi, I have a question regarding Hbase metrics. There can be a case when RAM consumption on a hbase node could be very high. Could this be possibly due to avg_time taken by PUTS/ GETS requests is very long ? For example: Can "compaction queue" metrics be a factor for such time taking operations?

Re: Trying to connect HBase Java Client I get: "Failed to locate the winutils binary in the hadoop binary path"

2014-11-13 Thread Dima Spivak
Have you tried using the Maven exec plugin outside the IDE? If all the right dependencies are available, it simply shouldn't complain about dependencies. :) -Dima On Thursday, November 13, 2014, Néstor Boscán wrote: > Hi Dima > > Thanks for your quick answers. > > I'm running it directly in the

Re: Correlation between GETS, PUTS and system metrics to identify performance degradation

2014-11-13 Thread Ted Yu
Rohit: Below are some metrics you should monitor: compactionQueueLength flushQueueLength numCallsInGeneralQueue numCallsInPriorityQueue QueueCallTime_mean QueueCallTime_75th_percentile QueueCallTime_99th_percentile Cheers On Thu, Nov 13, 2014 at 5:05 AM, rohit kumar wrote: > Hi, > > I

Re: Trying to connect HBase Java Client I get: "Failed to locate the winutils binary in the hadoop binary path"

2014-11-13 Thread Néstor Boscán
I dont think winutils.exe is something that is in the dependency tree. And its looking for that program using the HADOOP_HOME enviroment variable. So it looks like its searching for a hadoop installation. Regards, Nestor On Thursday, November 13, 2014, Dima Spivak wrote: > Have you tried using

Logging for HBase tests

2014-11-13 Thread Stephen Boesch
How can logging be enabled/viewed when launching the hbase tests via command line maven? Given the following mvn command, I am able to set breakpoints within an IDE (intellij): mvn -Dmaven.surefire.debug="-Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=y,address=5005 -Xnoagent -Djava.compi

Re: Logging for HBase tests

2014-11-13 Thread Ted Yu
TestTableSnapshotInputFormat is a unit test which you can run from your IDE directly. BTW there was a typo in the command line below w.r.t. the test name. Cheers On Thu, Nov 13, 2014 at 4:57 PM, Stephen Boesch wrote: > How can logging be enabled/viewed when launching the hbase tests via > comm

Re: Logging for HBase tests

2014-11-13 Thread Stephen Boesch
Hi Ted, as mentioned in that SOF post (a) breakpoints are not respected when launched inside IJ (but they ARE respected when launching mvn command line) and (b) logging is not working in both IJ and command line 2014-11-13 17:13 GMT-08:00 Ted Yu : > TestTableSnapshotInputFormat is a unit test

Re: Storing JSON in HBase value cell, which serialization format is most compact?

2014-11-13 Thread Jianshi Huang
Thanks Ted. I think the fix you mentioned is this one HBASE-12078 . Not sure when our Hadoop admin would upgrade it, ahhh Jianshi On Thu, Nov 13, 2014 at 11:15 PM, Ted Yu wrote: > Keep in mind that Prefix Tree encoding has higher overhead

Re: Storing JSON in HBase value cell, which serialization format is most compact?

2014-11-13 Thread Jianshi Huang
Oh, btw, is latest HDP 2.1(0.98.0.2.1.7.0-784-hadoop2) have this fix? Jianshi On Fri, Nov 14, 2014 at 9:37 AM, Jianshi Huang wrote: > Thanks Ted. > > I think the fix you mentioned is this one HBASE-12078 > . > > Not sure when our Hadoop admin w

Re: Storing JSON in HBase value cell, which serialization format is most compact?

2014-11-13 Thread Ted Yu
No. The upcoming HDP 2.2 does have that fix. Cheers On Thu, Nov 13, 2014 at 5:38 PM, Jianshi Huang wrote: > Oh, btw, is latest HDP 2.1(0.98.0.2.1.7.0-784-hadoop2) have this fix? > > Jianshi > > On Fri, Nov 14, 2014 at 9:37 AM, Jianshi Huang > wrote: > > > Thanks Ted. > > > > I think the fix yo

Avoid GC Pauses on Scan MapReduces

2014-11-13 Thread Pere Kyle
Hi there, Recently I have been experiencing instability when scanning our HBASE cluster. The table we are trying to scan is 1.5B records 1TB, we have 12GB heap and 17 servers. Our GC options are as so: -XX:OnOutOfMemoryError=kill -9 %p -Xmx12000m -XX:+UseConcMarkSweepGC -Xmx12g -Xmx12g The err

Re: Storing JSON in HBase value cell, which serialization format is most compact?

2014-11-13 Thread Jianshi Huang
But HDP 2.2 uses HDFS 2.6.0... very hard to convince our admins to upgrade. Would you recommend us to upgrade to 2.6.0? I'll ask them to consult HWX if you say yes. :) Jianshi On Fri, Nov 14, 2014 at 9:42 AM, Ted Yu wrote: > No. > The upcoming HDP 2.2 does have that fix. > > Cheers > > On Thu,

RE: How to configure HTTPS for HBase servers web browser?

2014-11-13 Thread Kiran Kumar.M.R
Hi Rohith, 1. To enable https mode for web UI, you need to configure "hadoop.ssl.enabled" in 0.98.x "hbase.ssl.enabled" in trunk version. 2. webUI ports configured in "hbase.regionserver.info.port" will now work over https. Ensure that you are given https:// in URL If you give http:// it

Re: Storing JSON in HBase value cell, which serialization format is most compact?

2014-11-13 Thread Ted Yu
You can use HBase from HDP 2.2 on hdfs 2.5 If you have further question, let's take it offline. Cheers On Thu, Nov 13, 2014 at 6:12 PM, Jianshi Huang wrote: > But HDP 2.2 uses HDFS 2.6.0... very hard to convince our admins to upgrade. > > Would you recommend us to upgrade to 2.6.0? I'll ask th

RE: Avoid GC Pauses on Scan MapReduces

2014-11-13 Thread Dhaval Shah
You can do scan.setCacheBlocks (false) To disable scanner caching on map reduce scans. Also use parallel GC for new generation.  That will help reduce stop the world pauses with cms Sent from my T-Mobile 4G LTE Device Original message From: Pere Kyle Date:11/13/2014 8:54

Re: Storing JSON in HBase value cell, which serialization format is most compact?

2014-11-13 Thread Ted Yu
w.r.t. the effect of data block encoding on HFile size, take a look at Doug Meil's blog 'The Effect of ColumnFamily, RowKey and KeyValue Design on HFile Size': http://blogs.apache.org/hbase/ Cheers On Thu, Nov 13, 2014 at 1:27 AM, Jianshi Huang wrote: > Thanks Ram, > > How about Prefix Tree bas

Re: Storing JSON in HBase value cell, which serialization format is most compact?

2014-11-13 Thread Jianshi Huang
Oh, that article, I've read that before. I'm using the approach that using a single KV to hold all my columns (mostly readonly). So conclusion: saving in disk space is not that huge one HBase column per colomn: 1,350,483 1000 SNAPPY DIFF vs one HBase column for all columns: 1,119,330 1000

Re: Version in HBase

2014-11-13 Thread Krishna Kalyan
Thanks Anoop. This worked On Wed, Nov 12, 2014 at 4:50 PM, Anoop John wrote: > So you want one version with ts<= give ts? > > Have a look at Scan#setTimeRange(long minStamp, long maxStamp) > If you know the exact ts for cells, you can use Scan#setTimeStamp(long > timestamp) > > -Anoop- > > On We