Re: Re: Re: Re: What way to improve MTTR other than DLR(distributed log replay)

2016-10-31 Thread Enis Söztutar
inlined. On Thu, Oct 27, 2016 at 3:01 PM, Stack wrote: > On Fri, Oct 21, 2016 at 3:24 PM, Enis Söztutar wrote: > > > A bit late, but let me give my perspective. This can also be moved to > jira > > or dev@ I think. > > > > DLR was a nice and had pretty good gains for MTTR. However, dealing with

Re: Re: Re: Re: What way to improve MTTR other than DLR(distributed log replay)

2016-10-27 Thread Stack
On Thu, Oct 27, 2016 at 3:01 PM, Stack wrote: > On Fri, Oct 21, 2016 at 3:24 PM, Enis Söztutar wrote: > >> A bit late, but let me give my perspective. This can also be moved to jira >> or dev@ I think. >> >> DLR was a nice and had pretty good gains for MTTR. However, dealing with >> the sequence

Re: Re: Re: Re: What way to improve MTTR other than DLR(distributed log replay)

2016-10-27 Thread Stack
On Fri, Oct 21, 2016 at 3:24 PM, Enis Söztutar wrote: > A bit late, but let me give my perspective. This can also be moved to jira > or dev@ I think. > > DLR was a nice and had pretty good gains for MTTR. However, dealing with > the sequence ids, onlining regions etc and the replay paths proved t

Re: Re: Re: Re: What way to improve MTTR other than DLR(distributed log replay)

2016-10-27 Thread Phil Yang
Hi all We are also considering how to improve MTTR these days and have a plan. I just notice this thread, hoping I am not late :) My original thought is simple: Why we must put entries to MemStore of new RS? We can read/write WAL entries only once in failover, from log entries to HFiles, which is

Re: Re: Re: Re: What way to improve MTTR other than DLR(distributed log replay)

2016-10-21 Thread Enis Söztutar
A bit late, but let me give my perspective. This can also be moved to jira or dev@ I think. DLR was a nice and had pretty good gains for MTTR. However, dealing with the sequence ids, onlining regions etc and the replay paths proved to be too difficult in practice. I think the way forward would be

Re: Re: Re: Re: What way to improve MTTR other than DLR(distributed log replay)

2016-10-18 Thread Ted Yu
Allan: One factor to consider is that the assignment manager in hbase 2.0 would be quite different from those in 0.98 and 1.x branches. Meaning, you may need to come up with two solutions for a single problem. FYI On Tue, Oct 18, 2016 at 6:11 PM, Allan Yang wrote: > Hi, Ted > These issues I me

Re:Re: Re: Re: What way to improve MTTR other than DLR(distributed log replay)

2016-10-18 Thread Allan Yang
Hi, Ted These issues I mentioned above(HBASE-13567, HBASE-12743, HBASE-13535, HBASE-14729) are ALL reproduced in our HBase1.x test environment. Fixing them is exactly what I'm going to do. I haven't found the root cause yet, but I will update if I find solutions. what I afraid is that, there a

Re: Re: Re: What way to improve MTTR other than DLR(distributed log replay)

2016-10-18 Thread Ted Yu
Allan: I wonder how you deal with open issues such as HBASE-13535. >From your description, it seems your team fixed more DLR issues. Cheers On Mon, Oct 17, 2016 at 11:37 PM, allanwin wrote: > > > > Here is the thing. We have backported DLR(HBASE-7006) to our 0.94 > clusters in production envir

Re:Re: Re: What way to improve MTTR other than DLR(distributed log replay)

2016-10-17 Thread allanwin
Here is the thing. We have backported DLR(HBASE-7006) to our 0.94 clusters in production environment(of course a lot of bugs are fixed and it is working well). It is was proven to be a huge gain. When a large cluster crash down, the MTTR improved from several hours to less than a hour. Now,

Re: Re: What way to improve MTTR other than DLR(distributed log replay)

2016-10-17 Thread Anoop John
Agree with ur observation.. But DLR feature we wanted to get removed.. Because it is known to have issues.. Or else we need major work to correct all these issues. -Anoop- On Tue, Oct 18, 2016 at 7:41 AM, Ted Yu wrote: > If you have a cluster, I suggest you turn on DLR and observe the effect >

Re: Re: What way to improve MTTR other than DLR(distributed log replay)

2016-10-17 Thread Ted Yu
If you have a cluster, I suggest you turn on DLR and observe the effect where fewer than half the region servers are up after the crash. You would have first hand experience that way. On Mon, Oct 17, 2016 at 6:33 PM, allanwin wrote: > > > > Yes, region replica is a good way to improve MTTR. Spec

Re:Re: What way to improve MTTR other than DLR(distributed log replay)

2016-10-17 Thread allanwin
Yes, region replica is a good way to improve MTTR. Specially if one or two servers are down, region replica can improve data availability. But for big disaster like 1/3 or 1/2 region servers shutdown, I think DLR still useful to bring regions online more quickly and with less IO usage. Rega

Re: What way to improve MTTR other than DLR(distributed log replay)

2016-10-17 Thread Ted Yu
Here was the thread discussing DLR: http://search-hadoop.com/m/YGbbOxBK2n4ES12&subj=Re+DISCUSS+retiring+current+DLR+code > On Oct 17, 2016, at 4:15 AM, allanwin wrote: > > Hi, All > DLR can improve MTTR dramatically, but since it have many bugs like > HBASE-13567, HBASE-12743, HBASE-13535, HB

What way to improve MTTR other than DLR(distributed log replay)

2016-10-17 Thread allanwin
Hi, All DLR can improve MTTR dramatically, but since it have many bugs like HBASE-13567, HBASE-12743, HBASE-13535, HBASE-14729(any more I'don't know?), it was proved unreliable, and has been deprecated almost in all branches now. My question is, is there any other way other than DLR to impro