Bryan,

We ran into that same condition here last week doing pretty much the
same thing. Maybe you're hitting it too.

We found that the region server wasn't blocked all the time, but when it
was blocked  there was a associated log message ("INFO
org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC
Server handler 1 on 9009' on region
WorldcatXmlFragments,7233395,1335296580834.c7ecef95084b8babca50f405afbb1
77f.: memstore size 256.1m is >= than blocking 256.0m size") in our
logs.     We had the same IPC Server Info that you described too.

It turned out, when that region would block, that the mappers were
taking all the rpc listener slots into the region server (visible from
the region server directly in the "Show Active RPC Calls"). Since the
mappers had all the slots, our gets for other tables would wait just to
get into the region server.

The rpc handler count is configurable, see:
http://hbase.apache.org/book/config.files.html#hbase.regionserver.handle
r.count

We upped our value for that from it's default of 10 to 50 (more than the
number of mappers we were running) and the problem went away.

I'm sure there's an art to setting that value, we think 50 will work
well for us. YMMV. 

-----Original Message-----
From: Bryan Beaudreault [mailto:bbeaudrea...@hubspot.com] 
Sent: Tuesday, May 15, 2012 5:39 PM
To: user@hbase.apache.org
Subject: Heavy Writes Block Reads to RegionServer

We are running a job that does heavy writes into a new table.  The table
is
not pre-split so it has 1 region.  I know this is not recommended; we
were
doing it partially to test this particular case.

Here's what we're seeing:


   1. Reads are entirely blocked.  No reads to any region on that server
   make it through.
   2. Writes are insanely slow.  Some writes appear to be taking over 10
   minutes.
   3. All of the box's resources are quiet:  Around < 20% CPU usage,
plenty
   of memory to spare, iostat looked normal
   4. ngrep showed only writes coming through.  no reads
   5. The logs showed lots of WARN org.apache.hadoop.ipc.HBaseServer:
IPC
   Server Responder, call
   multi(org.apache.hadoop.hbase.client.MultiAction@714cd947) from
   10.211.117.161:34380: output error

Any ideas what's up?  Is there some sort of global lock that might halt
reads during heavy writes?  Anything else we can look for during this?
We
can rerun the job to reproduce this, as this is a test cluster which can
afford to be brought down.

Reply via email to