答复: 答复: [Shadow Regions / Read Replicas ]

2013-12-12 Thread 谢良
not about the mainly design point(HA for read), but focus on latency related. Thanks, 发件人: Enis Söztutar [enis@gmail.com] 发送时间: 2013年12月10日 5:24 收件人: dev@hbase.apache.org 主题: Re: 答复: [Shadow Regions / Read Replicas ] We are also proposing to implement

Re: [Shadow Regions / Read Replicas ]

2013-12-12 Thread Jonathan Hsieh
On Wed, Dec 4, 2013 at 3:56 PM, Stack wrote: > On Thu, Dec 5, 2013 at 7:46 AM, Enis Söztutar wrote: > > > I did not know that we were reopening the log file for tailing. From what > > Nicolas talks about in > > https://issues.apache.org/jira/browse/HDFS-3219 it seems that the > visible > > lengt

Re: [Shadow Regions / Read Replicas ] External replication disqualified?

2013-12-12 Thread Jonathan Hsieh
A little delayed but more questions. On Tue, Dec 3, 2013 at 10:41 PM, Devaraj Das wrote: > On Tue, Dec 3, 2013 at 6:49 PM, Jonathan Hsieh wrote: > > > The read replicas doc mentions something a little more intrusive in the > "3 > > options" section but doesn't seem to disqualify it. > > > > >

Re: 答复: [Shadow Regions / Read Replicas ]

2013-12-09 Thread Enis Söztutar
cy tail, > but no result still. > > Enis, could you share your thoughts on it? thanks > > Thanks, > > > 发件人: Enis Söztutar [enis@gmail.com] > 发送时间: 2013年12月4日 6:18 > 收件人: dev@hbase.apache.org > 主题: Re: [Shadow Regions

答复: [Shadow Regions / Read Replicas ]

2013-12-07 Thread 谢良
tar [enis@gmail.com] 发送时间: 2013年12月4日 6:18 收件人: dev@hbase.apache.org 主题: Re: [Shadow Regions / Read Replicas ] On Tue, Dec 3, 2013 at 12:31 PM, Vladimir Rodionov wrote: > The downside: > > - Double/Triple memstore usage > - Increased block cache usage (effectively, block cac

Re: [Shadow Regions / Read Replicas ]

2013-12-04 Thread Stack
On Thu, Dec 5, 2013 at 7:46 AM, Enis Söztutar wrote: > I did not know that we were reopening the log file for tailing. From what > Nicolas talks about in > https://issues.apache.org/jira/browse/HDFS-3219 it seems that the visible > length is not > updated for the open stream which is a shame. How

Re: [Shadow Regions / Read Replicas ]

2013-12-04 Thread Enis Söztutar
I did not know that we were reopening the log file for tailing. From what Nicolas talks about in https://issues.apache.org/jira/browse/HDFS-3219 it seems that the visible length is not updated for the open stream which is a shame. However in the append design, the primary can send the committed len

Re: [Shadow Regions / Read Replicas ]

2013-12-04 Thread Stack
A few comments after reading through this thread: + Thanks for moving the (good) discussion here out of the issue. + Testing WAL 'tailing'* would be a good input to have. My sense is that a WALpr would make for about the same load on HDFS (and if so, lets just go there altogether). + I like the n

Re: [Shadow Regions / Read Replicas ]

2013-12-04 Thread Enis Söztutar
On Wed, Dec 4, 2013 at 12:25 PM, Jimmy Xiang wrote: > I am concerned about reading stale data. I understand some people may want > this feature. One of the reason is about the region availability. If we > make sure those regions are always available, we don't have to compromise, > right? How abo

Re: [Shadow Regions / Read Replicas ]

2013-12-04 Thread Enis Söztutar
> > > Thanks for adding it there -- I really think it is a big headline caveat on > my expectation of "eventual consistency". Other systems out there that > give you eventually consistency on the millisecond level for most cases, > while this initial implementation would has eventual mean 10's of

Re: [Shadow Regions / Read Replicas ]

2013-12-04 Thread Jimmy Xiang
I am concerned about reading stale data. I understand some people may want this feature. One of the reason is about the region availability. If we make sure those regions are always available, we don't have to compromise, right? How about we support something like region pipeline? For each importa

Re: [Shadow Regions / Read Replicas ] External replication disqualified?

2013-12-03 Thread Devaraj Das
On Tue, Dec 3, 2013 at 6:49 PM, Jonathan Hsieh wrote: > The read replicas doc mentions something a little more intrusive in the "3 > options" section but doesn't seem to disqualify it. > > I don't quite see what you are referring to actually... Can you please copy-paste a relevant line from the d

Re: [Shadow Regions / Read Replicas ]

2013-12-03 Thread Devaraj Das
On Tue, Dec 3, 2013 at 6:47 PM, Jonathan Hsieh wrote: > On Tue, Dec 3, 2013 at 2:04 PM, Enis Söztutar wrote: > > > On Tue, Dec 3, 2013 at 11:51 AM, Jonathan Hsieh > wrote:> > > > > > > On Tue, Dec 3, 2013 at 11:07 AM, Enis Söztutar > wrote: > > > > > > > Thanks Jon for bringing this to dev@.

Re: [Shadow Regions / Read Replicas ]

2013-12-03 Thread Jimmy Xiang
We don't have to ship the edits one by one. We can use a configurable batch to control the impact on network. On Tue, Dec 3, 2013 at 7:59 PM, Jimmy Xiang wrote: > A separate branch similar to that for snapshot is great. +1. > > For wal tailing, we can just skip those edits not for the shadow r

Re: [Shadow Regions / Read Replicas ]

2013-12-03 Thread Jimmy Xiang
A separate branch similar to that for snapshot is great. +1. For wal tailing, we can just skip those edits not for the shadow regions, right? To tail the wal, we need to wait till the wal block is available. There seems to be a hard latency. Is it better to have a pool of daemon threads to ship

Re: [Shadow Regions / Read Replicas ] External replication disqualified?

2013-12-03 Thread Jonathan Hsieh
The read replicas doc mentions something a little more intrusive in the "3 options" section but doesn't seem to disqualify it. Relatedly just as another strawman, for the "mostly read only" use case and "bulk load only" usecases, why not use normal replication against two clusters in the same HDFS

Re: [Shadow Regions / Read Replicas ]

2013-12-03 Thread Jonathan Hsieh
On Tue, Dec 3, 2013 at 2:04 PM, Enis Söztutar wrote: > On Tue, Dec 3, 2013 at 11:51 AM, Jonathan Hsieh wrote:> > > > > On Tue, Dec 3, 2013 at 11:07 AM, Enis Söztutar wrote: > > > > > Thanks Jon for bringing this to dev@. > > > > > > > > > On Mon, Dec 2, 2013 at 10:01 PM, Jonathan Hsieh > > wr

Re: [Shadow Regions / Read Replicas ]

2013-12-03 Thread Jonathan Hsieh
On Tue, Dec 3, 2013 at 2:48 PM, Vladimir Rodionov wrote: > >MTTR and this work is ortagonal. In a distributed system, you cannot > >differentiate between > >a process not responding because it is down or it is busy or network is > >down, or whatnot. Having > >a couple of seconds detection time is

Re: [Shadow Regions / Read Replicas ] Block Affinity

2013-12-03 Thread Jonathan Hsieh
On Tue, Dec 3, 2013 at 3:46 PM, Nick Dimiduk wrote: > On Tue, Dec 3, 2013 at 11:37 AM, Enis Söztutar wrote: > > > I think we do not want to differentiate between RS's by splitting them > between > > primaries and shadows. This will complicate provisioning, administration, > > monitoring and load

Re: [Shadow Regions / Read Replicas ] Wal per region?

2013-12-03 Thread Jonathan Hsieh
On Tue, Dec 3, 2013 at 3:07 PM, Enis Söztutar wrote: > On Tue, Dec 3, 2013 at 2:03 PM, Jonathan Hsieh wrote: > > > On Tue, Dec 3, 2013 at 11:42 AM, Enis Söztutar > wrote: > > > > > On Mon, Dec 2, 2013 at 10:20 PM, Jonathan Hsieh > > wrote: > > > > > > > > Deveraj: > > > > > Jonathan Hsieh, WAL

Re: [Shadow Regions / Read Replicas ] Block Affinity

2013-12-03 Thread Jonathan Hsieh
On Tue, Dec 3, 2013 at 11:37 AM, Enis Söztutar wrote: > Responses inlined. > > On Mon, Dec 2, 2013 at 10:00 PM, Jonathan Hsieh wrote: > > > For the most efficient consistent read-recovery (shadow > regions/memstores), > > it would make sense to have them assigned to the rs's where the Hlogs are

Re: [Shadow Regions / Read Replicas ] Block Affinity

2013-12-03 Thread Nick Dimiduk
On Tue, Dec 3, 2013 at 11:37 AM, Enis Söztutar wrote: > I think we do not want to differentiate between RS's by splitting them between > primaries and shadows. This will complicate provisioning, administration, > monitoring and load balancing a lot, and will not achieve very cheap > secondary reg

Re: [Shadow Regions / Read Replicas ] Wal per region?

2013-12-03 Thread Enis Söztutar
On Tue, Dec 3, 2013 at 2:03 PM, Jonathan Hsieh wrote: > On Tue, Dec 3, 2013 at 11:42 AM, Enis Söztutar wrote: > > > On Mon, Dec 2, 2013 at 10:20 PM, Jonathan Hsieh > wrote: > > > > > > Deveraj: > > > > Jonathan Hsieh, WAL per region (WALpr) would give you the locality > (and > > > hence HDFS sh

Re: [Shadow Regions / Read Replicas ]

2013-12-03 Thread Vladimir Rodionov
>MTTR and this work is ortagonal. In a distributed system, you cannot >differentiate between >a process not responding because it is down or it is busy or network is >down, or whatnot. Having >a couple of seconds detection time is unrealistic. You will end up in a >very unstable state where >you wi

Re: [Shadow Regions / Read Replicas ]

2013-12-03 Thread Enis Söztutar
On Tue, Dec 3, 2013 at 12:31 PM, Vladimir Rodionov wrote: > The downside: > > - Double/Triple memstore usage > - Increased block cache usage (effectively, block cache will have 50% > capacity may be less) These are covered at the tradeoff section at the design doc. > > These downsides are pret

Re: [Shadow Regions / Read Replicas ]

2013-12-03 Thread Devaraj Das
On Tue, Dec 3, 2013 at 12:31 PM, Vladimir Rodionov wrote: > The downside: > > - Double/Triple memstore usage > - Increased block cache usage (effectively, block cache will have 50% > capacity may be less) > > These downsides are pretty serious ones. This will result: > > 1. in decreased overall pe

Re: [Shadow Regions / Read Replicas ]

2013-12-03 Thread Enis Söztutar
On Tue, Dec 3, 2013 at 11:51 AM, Jonathan Hsieh wrote: > To keep the discussion focused on the design goals, I'm going start > referring to enis and deveraj's eventually consistent read replicas as the > *read replica* design, and consistent fast read recovery mechanism based on > shadowing/taili

Re: [Shadow Regions / Read Replicas ] Wal per region?

2013-12-03 Thread Jonathan Hsieh
On Tue, Dec 3, 2013 at 11:42 AM, Enis Söztutar wrote: > On Mon, Dec 2, 2013 at 10:20 PM, Jonathan Hsieh wrote: > > > > Deveraj: > > > Jonathan Hsieh, WAL per region (WALpr) would give you the locality (and > > hence HDFS short > > > circuit) of reads if you were to couple it with the favored nod

Re: [Shadow Regions / Read Replicas ] Wal per region?

2013-12-03 Thread Jonathan Hsieh
On Tue, Dec 3, 2013 at 11:21 AM, Devaraj Das wrote: > On Mon, Dec 2, 2013 at 10:20 PM, Jonathan Hsieh wrote: > > > > With this in mind, I actually I making the case that we would group the > all > > the regions from RS-A onto the same set of preferred regions servers. > This > > way we only nee

Re: [Shadow Regions / Read Replicas ]

2013-12-03 Thread Vladimir Rodionov
The downside: - Double/Triple memstore usage - Increased block cache usage (effectively, block cache will have 50% capacity may be less) These downsides are pretty serious ones. This will result: 1. in decreased overall performance due to decreased efficient block cache size 2. In more frequent

Re: [Shadow Regions / Read Replicas ]

2013-12-03 Thread Jonathan Hsieh
To keep the discussion focused on the design goals, I'm going start referring to enis and deveraj's eventually consistent read replicas as the *read replica* design, and consistent fast read recovery mechanism based on shadowing/tailing the wals as *shadow regions* or *shadow memstores*. Can we ag

Re: [Shadow Regions / Read Replicas ] Wal per region?

2013-12-03 Thread Enis Söztutar
On Mon, Dec 2, 2013 at 10:20 PM, Jonathan Hsieh wrote: > > Deveraj: > > Jonathan Hsieh, WAL per region (WALpr) would give you the locality (and > hence HDFS short > > circuit) of reads if you were to couple it with the favored nodes. The > cost is of course more WAL > > files... In the current si

Re: [Shadow Regions / Read Replicas ] Block Affinity

2013-12-03 Thread Enis Söztutar
Responses inlined. On Mon, Dec 2, 2013 at 10:00 PM, Jonathan Hsieh wrote: > > Enis: > > I was trying to refer to not having co-location constraints for secondary > replicas whose primaries are hosted by the same > > RS. For example, if R1(replica=0), and R2(replica=0) are hosted on RS1, > R1(rep

Re: [Shadow Regions / Read Replicas ]

2013-12-03 Thread Devaraj Das
On Tue, Dec 3, 2013 at 11:07 AM, Enis Söztutar wrote: > Thanks Jon for bringing this to dev@. > > > On Mon, Dec 2, 2013 at 10:01 PM, Jonathan Hsieh wrote: > > > Fundamentally, I'd prefer focusing on making HBase "HBasier" instead of > > tackling a feature that other systems architecturally can d

Re: [Shadow Regions / Read Replicas ] Wal per region?

2013-12-03 Thread Devaraj Das
On Mon, Dec 2, 2013 at 10:20 PM, Jonathan Hsieh wrote: > > Deveraj: > > Jonathan Hsieh, WAL per region (WALpr) would give you the locality (and > hence HDFS short > > circuit) of reads if you were to couple it with the favored nodes. The > cost is of course more WAL > > files... In the current si

Re: [Shadow Regions / Read Replicas ]

2013-12-03 Thread Enis Söztutar
Thanks Jon for bringing this to dev@. On Mon, Dec 2, 2013 at 10:01 PM, Jonathan Hsieh wrote: > Fundamentally, I'd prefer focusing on making HBase "HBasier" instead of > tackling a feature that other systems architecturally can do better > (inconsistent reads). I consider consistent reads/writ

Re: [Shadow Regions / Read Replicas ] Wal per region?

2013-12-02 Thread Jonathan Hsieh
> Deveraj: > Jonathan Hsieh, WAL per region (WALpr) would give you the locality (and hence HDFS short > circuit) of reads if you were to couple it with the favored nodes. The cost is of course more WAL > files... In the current situation (no WALpr) it would create quite some traffic cross machine,

Re: [Shadow Regions / Read Replicas ]

2013-12-02 Thread Jonathan Hsieh
Fundamentally, I'd prefer focusing on making HBase "HBasier" instead of tackling a feature that other systems architecturally can do better (inconsistent reads). I consider consistent reads/writes being one of HBase's defining features. That said, I think read replicas makes sense and is a nice f

Re: [Shadow Regions / Read Replicas ] Block Affinity

2013-12-02 Thread Jonathan Hsieh
> Enis: > I was trying to refer to not having co-location constraints for secondary replicas whose primaries are hosted by the same > RS. For example, if R1(replica=0), and R2(replica=0) are hosted on RS1, R1(replica=1) and R2(replica=1) can be hosted by RS2 > and RS3 respectively. This can definit

[Shadow Regions / Read Replicas ]

2013-12-02 Thread Jonathan Hsieh
HBASE-10070 [1] looks to be heading into a discussion more apt for the mailing list than in the jira. Moving this to the dev list for threaded discussion. I'll start a few threads by replying to this thread with edited titles [1] https://issues.apache.org/jira/browse/HBASE-10070 -- // Jonathan