[ https://issues.apache.org/jira/browse/HBASE-10882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Andrew Purtell resolved HBASE-10882. ------------------------------------ Resolution: Invalid Please ask for assistance on the u...@hbase.apache.org mailing list. > Bulkload process hangs on regions randomly and finally throws > RegionTooBusyException > ------------------------------------------------------------------------------------ > > Key: HBASE-10882 > URL: https://issues.apache.org/jira/browse/HBASE-10882 > Project: HBase > Issue Type: Bug > Components: regionserver > Affects Versions: 0.94.10 > Environment: rhel 5.6, jdk1.7.0_45, hadoop-2.2.0-cdh5.0.0 > Reporter: Victor Xu > Attachments: jstack_5105.log > > > I came across the problem in the early morning several days ago. It happened > when I used hadoop completebulkload command to bulk load some hdfs files into > hbase table. Several regions hung and after retried three times they all > threw RegionTooBusyExceptions. Fortunately, I caught one of the exceptional > region’s HRegionServer process’s jstack info just in time. > I found that the bulkload process was waiting for a write lock: > at > java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.tryLock(ReentrantReadWriteLock.java:1115) > The lock id is 0x00000004054ecbf0. > In the meantime, many other Get/Scan operations were also waiting for the > same lock id. And, of course, they were waiting for the read lock: > at > java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.tryLock(ReentrantReadWriteLock.java:873) > The most ridiculous thing is NO ONE OWNED THE LOCK! I searched the jstack > output carefully, but cannot find any process who claimed to own the lock. > When I restart the bulk load process, it failed at different regions but with > the same RegionTooBusyExceptions. > I guess maybe the region was doing some compactions at that time and owned > the lock, but I couldn’t find compaction info in the hbase-logs. > Finally, after several days’ hard work, the only temporary solution to this > problem was found, that is TRIGGERING A MAJOR COMPACTION BEFORE THE BULKLOAD, > So which process owned the lock? Has anyone came across the same problem > before? -- This message was sent by Atlassian JIRA (v6.2#6252)