Bryan, We ran into that same condition here last week doing pretty much the same thing. Maybe you're hitting it too.
We found that the region server wasn't blocked all the time, but when it was blocked there was a associated log message ("INFO org.apache.hadoop.hbase.regionserver.HRegion: Blocking updates for 'IPC Server handler 1 on 9009' on region WorldcatXmlFragments,7233395,1335296580834.c7ecef95084b8babca50f405afbb1 77f.: memstore size 256.1m is >= than blocking 256.0m size") in our logs. We had the same IPC Server Info that you described too. It turned out, when that region would block, that the mappers were taking all the rpc listener slots into the region server (visible from the region server directly in the "Show Active RPC Calls"). Since the mappers had all the slots, our gets for other tables would wait just to get into the region server. The rpc handler count is configurable, see: http://hbase.apache.org/book/config.files.html#hbase.regionserver.handle r.count We upped our value for that from it's default of 10 to 50 (more than the number of mappers we were running) and the problem went away. I'm sure there's an art to setting that value, we think 50 will work well for us. YMMV. -----Original Message----- From: Bryan Beaudreault [mailto:bbeaudrea...@hubspot.com] Sent: Tuesday, May 15, 2012 5:39 PM To: user@hbase.apache.org Subject: Heavy Writes Block Reads to RegionServer We are running a job that does heavy writes into a new table. The table is not pre-split so it has 1 region. I know this is not recommended; we were doing it partially to test this particular case. Here's what we're seeing: 1. Reads are entirely blocked. No reads to any region on that server make it through. 2. Writes are insanely slow. Some writes appear to be taking over 10 minutes. 3. All of the box's resources are quiet: Around < 20% CPU usage, plenty of memory to spare, iostat looked normal 4. ngrep showed only writes coming through. no reads 5. The logs showed lots of WARN org.apache.hadoop.ipc.HBaseServer: IPC Server Responder, call multi(org.apache.hadoop.hbase.client.MultiAction@714cd947) from 10.211.117.161:34380: output error Any ideas what's up? Is there some sort of global lock that might halt reads during heavy writes? Anything else we can look for during this? We can rerun the job to reproduce this, as this is a test cluster which can afford to be brought down.