One additional data point, I tried to manually re-assign the region in
question from the shell, that for some reason caused the region server to
restart and the region did get assigned to another region server. But then
the problem moved to that region server almost immediately.

Does that just mean our write load is disproportionately hitting that one
region? We have a prefix scheme in place for all our keys where we prepend
an MD5 hash based 4 digit prefix to all keys to make sure we get good
randomization, so that would be surprising.

As usual any feedback would be appreciated.

Cheers.

----
Saad



On Wed, Feb 28, 2018 at 9:31 PM, Saad Mufti <saad.mu...@gmail.com> wrote:

> Hi,
>
> We are running on Amazon EMR based HBase 1.4.0 . We are currently seeing a
> situation where sometimes a particular region gets into a situation where a
> lot of write requests to any row in that region timeout saying they failed
> to obtain a lock on a row in a region and eventually they experience an IPC
> timeout. This causes the IPC queue to blow up in size as requests get
> backed up, and that region server experiences a much higher than normal
> timeout rate for all requests, not just those timing out for failing to
> obtain the row lock.
>
> The strange thing is the rows are always different but the region is
> always the same. So the question is, is there a region component to how
> long a row write lock would be held? I looked at the debug dump and the
> RowLocks section shows a long list of write row locks held, all of them are
> from the same region but different rows.
>
> Will trying to obtain a write row lock experience delays if no one else
> holds a lock on the same row but the region itself is experiencing read
> delays? We do have an incremental compaction tool running that major
> compacts one region per region server at a time, so that will drive out
> pages from the bucket cache. But for most regions the impact is
> transitional until the bucket cache gets populated by pages from the new
> HFile. But for this one region we start timing out trying to obtain write
> locks on rows in that region.
>
> Any insight anyone can provide would be most welcome.
>
> Cheers.
>
> ----
> Saad
>
>

Reply via email to