hi all
    向社区求助一个问题,这两天总是在12:50左右遇到一个异常,描述如下:
    hbase版本:2.2.6
    hadoop版本:3.3.1
    异常现象:一个隔离组下的(只有一张表)的一个节点,在某一时刻write call
queue阻塞,阻塞时间点开始,这张表的读写qps都降为0,客户端读写不了该表,RS call queue阻塞开始的时间点,日志中不断有如下报错:
2023-05-08 12:42:27,310 ERROR [MemStoreFlusher.2]
regionserver.MemStoreFlusher: Cache flush failed for region
user_feature_v2,eacf_1658057555,1660314723816.2376cc2326b5372131cc530b115d959a.
org.apache.hadoop.hbase.exceptions.TimeoutIOException: Failed to get sync
result after 300000 ms for txid=16920651960, WAL system stuck?
        at
org.apache.hadoop.hbase.regionserver.wal.SyncFuture.get(SyncFuture.java:155)
        at
org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.blockOnSync(AbstractFSWAL.java:743)
        at
org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.sync(AsyncFSWAL.java:625)
        at
org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.sync(AsyncFSWAL.java:602)
        at
org.apache.hadoop.hbase.regionserver.HRegion.doSyncOfUnflushedWALChanges(HRegion.java:2754)
        at
org.apache.hadoop.hbase.regionserver.HRegion.internalPrepareFlushCache(HRegion.java:2691)
        at
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2549)
        at
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2523)
        at
org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:2409)
        at
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:611)
        at
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:580)
        at
org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$1000(MemStoreFlusher.java:68)
        at
org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:360)
        at java.lang.Thread.run(Thread.java:748)
节点memstore中无法刷新数据到WAL文件中,节点其他指标都正常,HDFS也没有压力。重启阻塞节点后,表恢复正常。异常期间,捕获的jstack文件我放进附件中了。
麻烦社区大佬有空帮忙定位下原因
jstack 文件见ISSUE: https://issues.apache.org/jira/browse/HBASE-27850的附件

Reply via email to