Re:Re: Re: What way to improve MTTR other than DLR(distributed log replay)

2016-10-17 Thread allanwin
Here is the thing. We have backported DLR(HBASE-7006) to our 0.94 clusters in production environment(of course a lot of bugs are fixed and it is working well). It is was proven to be a huge gain. When a large cluster crash down, the MTTR improved from several hours to less than a hour. Now,

Re: Re: What way to improve MTTR other than DLR(distributed log replay)

2016-10-17 Thread Anoop John
Agree with ur observation.. But DLR feature we wanted to get removed.. Because it is known to have issues.. Or else we need major work to correct all these issues. -Anoop- On Tue, Oct 18, 2016 at 7:41 AM, Ted Yu wrote: > If you have a cluster, I suggest you turn on DLR and observe the effect >

Re: [ANNOUNCE] Stephen Yuan Jiang joins Apache HBase PMC

2016-10-17 Thread Anoop John
Congrats Stephen... On Tue, Oct 18, 2016 at 10:20 AM, ramkrishna vasudevan wrote: > Congrats Stephen!! > > On Tue, Oct 18, 2016 at 2:37 AM, Stack wrote: > >> Wahoo! >> >> On Fri, Oct 14, 2016 at 11:27 AM, Enis Söztutar wrote: >> >> > On behalf of the Apache HBase PMC, I am happy to announce tha

Re: Accessing Hbase tables through Spark, this seems to work

2016-10-17 Thread Mich Talebzadeh
yes Hive external table is partitioned on a daily basis (datestamp below) CREATE EXTERNAL TABLE IF NOT EXISTS ${DATABASE}.externalMarketData ( KEY string , SECURITY string , TIMECREATED string , PRICE float ) COMMENT 'From prices Kakfa delivered by Flume location by day' ROW FORMAT s

Re: [ANNOUNCE] Stephen Yuan Jiang joins Apache HBase PMC

2016-10-17 Thread ramkrishna vasudevan
Congrats Stephen!! On Tue, Oct 18, 2016 at 2:37 AM, Stack wrote: > Wahoo! > > On Fri, Oct 14, 2016 at 11:27 AM, Enis Söztutar wrote: > > > On behalf of the Apache HBase PMC, I am happy to announce that Stephen > has > > accepted our invitation to become a PMC member of the Apache HBase > projec

Re: Re: What way to improve MTTR other than DLR(distributed log replay)

2016-10-17 Thread Ted Yu
If you have a cluster, I suggest you turn on DLR and observe the effect where fewer than half the region servers are up after the crash. You would have first hand experience that way. On Mon, Oct 17, 2016 at 6:33 PM, allanwin wrote: > > > > Yes, region replica is a good way to improve MTTR. Spec

Re:Re: What way to improve MTTR other than DLR(distributed log replay)

2016-10-17 Thread allanwin
Yes, region replica is a good way to improve MTTR. Specially if one or two servers are down, region replica can improve data availability. But for big disaster like 1/3 or 1/2 region servers shutdown, I think DLR still useful to bring regions online more quickly and with less IO usage. Rega

Re: Accessing Hbase tables through Spark, this seems to work

2016-10-17 Thread ayan guha
I do not see a rationale to have hbase in this scheme of thingsmay be I am missing something? If data is delivered in HDFS, why not just add partition to an existing Hive table? On Tue, Oct 18, 2016 at 8:23 AM, Mich Talebzadeh wrote: > Thanks Mike, > > My test csv data comes as > > UUID,

Re: Accessing Hbase tables through Spark, this seems to work

2016-10-17 Thread Mich Talebzadeh
Thanks Mike, My test csv data comes as UUID, ticker, timecreated, price a2c844ed-137f-4820-aa6e-c49739e46fa6, S01, 2016-10-17T22:02:09, 53.36665625650533484995 a912b65e-b6bc-41d4-9e10-d6a44ea1a2b0, S02, 2016-10-17T22:02:09, 86.31917515824627016510

Re: [ANNOUNCE] Stephen Yuan Jiang joins Apache HBase PMC

2016-10-17 Thread Stack
Wahoo! On Fri, Oct 14, 2016 at 11:27 AM, Enis Söztutar wrote: > On behalf of the Apache HBase PMC, I am happy to announce that Stephen has > accepted our invitation to become a PMC member of the Apache HBase project. > > Stephen has been working on HBase for a couple of years, and is already a >

Re: [ANNOUNCE] Stephen Yuan Jiang joins Apache HBase PMC

2016-10-17 Thread Esteban Gutierrez
Congrats, Stephen! -- Cloudera, Inc. On Sun, Oct 16, 2016 at 6:16 PM, 张铎 wrote: > Congratulations! > > 2016-10-17 9:07 GMT+08:00 Heng Chen : > > > Congrats! :) > > > > 2016-10-16 8:19 GMT+08:00 Jerry He : > > > Congratulations, Stephen. > > > > > > Jerry > > > > > > On Fri, Oct 14, 2016 at 12

Re: HBase restart without region reassigning

2016-10-17 Thread Ted Yu
Since you are using 1.1.2, you may want to look at HBASE-14531 which was fixed in 1.1.3 FYI On Mon, Oct 17, 2016 at 9:13 AM, Alexander Ilyin wrote: > I'm restarting it through Ambari. First time I specified a delay between > regionserver restarts, second time I didn't. Not sure whether Ambari u

Re: HBase restart without region reassigning

2016-10-17 Thread Alexander Ilyin
I'm restarting it through Ambari. First time I specified a delay between regionserver restarts, second time I didn't. Not sure whether Ambari uses graceful restart script internally but I can try to use it directly. On Mon, Oct 17, 2016 at 6:00 PM, Jeremy Carroll wrote: > How are you restarting

Re: HBase restart without region reassigning

2016-10-17 Thread Jeremy Carroll
How are you restarting the cluster? From my experience a graceful rolling restart retains locality. For each regionserver (one at a time) run the graceful restart script to retain local blocks. The master configuration option you specified only works on a full cluster reboot (or master reboot) On

Re: HBase restart without region reassigning

2016-10-17 Thread Alexander Ilyin
Hi Dima, These are instances in the cloud and we're using Consul for name resolution. Regarding network settings, your question is a bit broad... Which settings would you recommend to check first? On Mon, Oct 17, 2016 at 5:28 PM, Dima Spivak wrote: > Hey Alexander, > > Could something be amiss

Re: HBase restart without region reassigning

2016-10-17 Thread Dima Spivak
Hey Alexander, Could something be amiss in your network settings? Seeing phantom datanodes could be tripping things up. Are these physical machines or instances in the cloud? On Monday, October 17, 2016, Alexander Ilyin wrote: > Hi, > > We have a 7-node HBase cluster (version 1.1.2) and we chan

HBase restart without region reassigning

2016-10-17 Thread Alexander Ilyin
Hi, We have a 7-node HBase cluster (version 1.1.2) and we change some of its settings from time to time which requires a restart. The problem is that every time after the restart load balancer reassigns the regions making data locality low. To address this issue we tried the settings described he

Re: Hbase Coprocessor postPut not triggered for

2016-10-17 Thread Ted Yu
Your scenario should be covered by unit tests already. Take a look at hbase-server/src/test/java/org/apache/hadoop/hbase/coprocessor/SimpleRegionObserver.java which is used by the following tests: TestRegionObserverInterface TestIncrementTimeRange If you can show the issue through (modified) un

RE: Hbase Coprocessor postPut not triggered for

2016-10-17 Thread Begar, Veena
We are using 1.1.2.2.4.0.0-169 version. Yes, by additional logging, I determined that postPut method is not triggered for the update. Here is the code sinppet: List puts = new ArrayList<>(dataObjects.size()); Table table = connection.getTable(TableName.valueOf(qualifiedTableName); for (DataOb

Re: Maximum size of HBase row

2016-10-17 Thread Jean-Marc Spaggiari
Hi Sreeram, If only 0.01% of the rows are reaching 1GB, then HBase should be able to handle that... However, there is few things you might want to keep in mind. 1) What is the distribution of those 0.01%? Any risk for most of them to end up on the same region? 2) How is data ingested into the reg

Re: What way to improve MTTR other than DLR(distributed log replay)

2016-10-17 Thread Ted Yu
Here was the thread discussing DLR: http://search-hadoop.com/m/YGbbOxBK2n4ES12&subj=Re+DISCUSS+retiring+current+DLR+code > On Oct 17, 2016, at 4:15 AM, allanwin wrote: > > Hi, All > DLR can improve MTTR dramatically, but since it have many bugs like > HBASE-13567, HBASE-12743, HBASE-13535, HB

What way to improve MTTR other than DLR(distributed log replay)

2016-10-17 Thread allanwin
Hi, All DLR can improve MTTR dramatically, but since it have many bugs like HBASE-13567, HBASE-12743, HBASE-13535, HBASE-14729(any more I'don't know?), it was proved unreliable, and has been deprecated almost in all branches now. My question is, is there any other way other than DLR to impro

Re: Maximum size of HBase row

2016-10-17 Thread Jean-Marc Spaggiari
Hi Sreeram., HBase will not split a region withing a row. So if a row gets WAY to many columns, its size can grow higher than the configured max region size. Which, of course, is not recommended because your region will serve a single row. If you think your row will become bigger than 1% or your m

Maximum size of HBase row

2016-10-17 Thread Sreeram
Hi All, Please let me know if the maximum size of a HBase row (in terms of storage space) will be equal to the configured size of a region? I understand the parameter hbase.table.max.rowsize to be the maximum bytes that can be transferred in a single get/scan operation and not related to the actu

Followup on the presentations at HBaseCon East and Strata Conference

2016-10-17 Thread Daniel Vimont
Thanks very much to all those last month (at HBaseCon East and at the Strata Conference) who gave feedback regarding *ColumnManager for HBase*[1]. By my reckoning, the greatest enthusiasm was expressed for the package's "*Column Aliasing*" function (which could save a LOT of storage space in a "nar