Thanks Yu!
It was certainly helpful.

> Regarding the issue you met, what's the setting of
hbase.regionserver.maxlogs in your env? By default it's 32 which means for
each RS the un-archived wal number shouldn't exceed 32. However, when
multiwal enabled, it allows 32 logs for each group, thus becoming 64 wals
allowed for a single RS.

I used default configuration for this. By multiWal, I understand there is
different wal per region. Can you please explain how did you get 64 wals
for a Region Server.

> when multiwal enabled, it allows 32 logs for each group, thus becoming 64
wals allowed for a single RS.

I thought one of the side effects of having multiwal enabled is that there
will be *large amount of data waiting in unarchived wals.*
So if a region server fails, it would take more time to playback the wal
files and hence it could *compromise Availability.*

Wdyt ?

Thanks
-Sachin


On Tue, Jun 6, 2017 at 2:04 PM, Yu Li <car...@gmail.com> wrote:

> Hi Sachin,
>
> We have been using multiwal in production here in Alibaba for over 2 years
> and see no problem. Facebook is also running multiwal online. Please refer
> to HBASE-14457 <https://issues.apache.org/jira/browse/HBASE-14457> for
> more
> details.
>
> There's also a JIRA HBASE-15131
> <https://issues.apache.org/jira/browse/HBASE-15131> proposing to turn on
> multiwal by default but still under discussion, please feel free to leave
> your voice there.
>
> Regarding the issue you met, what's the setting of
> hbase.regionserver.maxlogs in your env? By default it's 32 which means for
> each RS the un-archived wal number shouldn't exceed 32. However, when
> multiwal enabled, it allows 32 logs for each group, thus becoming 64 wals
> allowed for a single RS.
>
> Let me further explain how it leads to RegionTooBusyException:
> 1. if the number of un-archived wal exceeds the setting, it will check the
> oldest WAL and flush all regions involved in it
> 2. if the data ingestion speed is high and wal keeps rolling, there'll be
> many small hfiles flushed out, that compaction speed cannot catch up
> 3. when hfile number of one store exceeds the setting of
> hbase.hstore.blockingStoreFiles (10 by default), it will delay the flush
> for hbase.hstore.blockingWaitTime (90s by default)
> 4. when data ingestion continues but flush delayed, the memstore size might
> exceed the upper limit thus throw RegionTooBusyException
>
> Hope these information helps.
>
> Best Regards,
> Yu
>
> On 6 June 2017 at 13:39, Sachin Jain <sachinjain...@gmail.com> wrote:
>
> > Hi,
> >
> > I was in the middle of a situation where I was getting
> > *RegionTooBusyException* with log something like:
> >
> >     *Above Memstore limit, regionName = X ... memstore size = Y and
> > blockingMemstoreSize = Z*
> >
> > This potentially hinted me towards *hotspotting* of a particular region.
> So
> > I fixed my keyspace partitioning to have more uniform distribution per
> > region. It did not completely fix the problem but definitely delayed it a
> > bit.
> >
> > Next thing, I enabled *multiWal*. As I remember there is a configuration
> > which leads to flushing of memstores when the threshold of wal is
> reached.
> > Upon doing this, problem seems to go away.
> >
> > But, this raises couple of questions
> >
> > 1. Are there any reprecussions of using *multiWal* in production
> > environment ?
> > 2. If there are no repercussions and only benefits of using *multiWal*,
> why
> > is this not turned on by default. Let other consumers turn it off in
> > certain (whatever) scenarios.
> >
> > PS: *Hbase Configuration*
> > Single Node (Local Setup) v1.3.1 Ubuntu 16 Core machine.
> >
> > Thanks
> > -Sachin
> >
>

Reply via email to