Thanks. I appear to have resolved this problem by restarting the HBase Master and the RegionServers that were reporting the failure.
Brian On Nov 11, 2014, at 12:13 PM, Ted Yu <yuzhih...@gmail.com> wrote: > For your first question, region server web UI, > rs-status#regionRequestStats, shows Write Request Count. > > You can monitor the value for the underlying region to see if it receives > above-normal writes. > > Cheers > > On Mon, Nov 10, 2014 at 4:06 PM, Brian Jeltema <bdjelt...@gmail.com> wrote: > >>> Was the region containing this row hot around the time of failure ? >> >> How do I measure that? >> >>> >>> Can you check region server log (along with monitoring tool) what >> memstore pressure was ? >> >> I didn't see anything in the region server logs to indicate a problem. And >> given the >> reproducibility of the behavior, it's hard to see how dynamic parameters >> such as >> memory pressure could be at the root of the problem. >> >> Brian >> >> On Nov 10, 2014, at 3:22 PM, Ted Yu <yuzhih...@gmail.com> wrote: >> >>> Was the region containing this row hot around the time of failure ? >>> >>> Can you check region server log (along with monitoring tool) what >> memstore pressure was ? >>> >>> Thanks >>> >>> On Nov 10, 2014, at 11:34 AM, Brian Jeltema < >> brian.jelt...@digitalenvoy.net> wrote: >>> >>>>> How many tasks may write to this row concurrently ? >>>> >>>> only 1 mapper should be writing to this row. Is there a way to check >> which >>>> locks are being held? >>>> >>>>> Which 0.98 release are you using ? >>>> >>>> 0.98.0.2.1.2.1-471-hadoop2 >>>> >>>> Thanks >>>> Brian >>>> >>>> On Nov 10, 2014, at 2:21 PM, Ted Yu <yuzhih...@gmail.com> wrote: >>>> >>>>> There could be more than one reason where RegionTooBusyException is >> thrown. >>>>> Below are two (from HRegion): >>>>> >>>>> * We throw RegionTooBusyException if above memstore limit >>>>> * and expect client to retry using some kind of backoff >>>>> */ >>>>> private void checkResources() >>>>> >>>>> * Try to acquire a lock. Throw RegionTooBusyException >>>>> >>>>> * if failed to get the lock in time. Throw InterruptedIOException >>>>> >>>>> * if interrupted while waiting for the lock. >>>>> >>>>> */ >>>>> >>>>> private void lock(final Lock lock, final int multiplier) >>>>> >>>>> How many tasks may write to this row concurrently ? >>>>> >>>>> Which 0.98 release are you using ? >>>>> >>>>> Cheers >>>>> >>>>> On Mon, Nov 10, 2014 at 11:10 AM, Brian Jeltema < >>>>> brian.jelt...@digitalenvoy.net> wrote: >>>>> >>>>>> I’m running a map/reduce job against a table that is performing a >> large >>>>>> number of writes (probably updating every row). >>>>>> The job is failing with the exception below. This is a solid failure; >> it >>>>>> dies at the same point in the application, >>>>>> and at the same row in the table. So I doubt it’s a conflict with >>>>>> compaction (and the UI shows no compaction in progress), >>>>>> or that there is a load-related cause. >>>>>> >>>>>> ‘hbase hbck’ does not report any inconsistencies. The >>>>>> ‘waitForAllPreviousOpsAndReset’ leads me to suspect that >>>>>> there is operation in progress that is hung and blocking the update. I >>>>>> don’t see anything suspicious in the HBase logs. >>>>>> The data at the point of failure is not unusual, and is identical to >> many >>>>>> preceding rows. >>>>>> Does anybody have any ideas of what I should look for to find the >> cause of >>>>>> this RegionTooBusyException? >>>>>> >>>>>> This is Hadoop 2.4 and HBase 0.98. >>>>>> >>>>>> 14/11/10 13:46:13 INFO mapreduce.Job: Task Id : >>>>>> attempt_1415210751318_0010_m_000314_1, Status : FAILED >>>>>> Error: >>>>>> org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: >> Failed >>>>>> 1744 actions: RegionTooBusyException: 1744 times, >>>>>> at >>>>>> >> org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.makeException(AsyncProcess.java:207) >>>>>> at >>>>>> >> org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.access$1700(AsyncProcess.java:187) >>>>>> at >>>>>> >> org.apache.hadoop.hbase.client.AsyncProcess.waitForAllPreviousOpsAndReset(AsyncProcess.java:1568) >>>>>> at >>>>>> >> org.apache.hadoop.hbase.client.HTable.backgroundFlushCommits(HTable.java:1023) >>>>>> at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:995) >>>>>> at org.apache.hadoop.hbase.client.HTable.put(HTable.java:953) >>>>>> >>>>>> Brian >>>> >>> >> >>