Hi, just a bump on this post to check if anyone knows more about this... On Mon, Oct 26, 2015 at 11:06 PM, Gaurav Agarwal <gau...@arkin.net> wrote:
> Hi All, > > We are running hbase - *Version 1.0.0-cdh5.4.2, rUnknown, Tue May 19 > 17:04:41 PDT 2015,* and are facing the problem in the bug ( > https://issues.apache.org/jira/browse/HBASE-12074), where the > regionserver crashes due to concurrent roll of wal file. > > Below are failure logs from one of the instance in our env: > > 2015-10-25 22:09:41,885 INFO > [regionserver/localhost/127.0.0.1:60020.logRoller] > wal.FSHLog: Rolled WAL > /var/lib/hbase/data/WALs/localhost,60020,1445796437179/localhost%2C60020%2C1445796437179.default.1445810949648 > with entries=11826, filesize=30.40 MB; new WAL > /var/lib/hbase/data/WALs/localhost,60020,1445796437179/localhost%2C60020%2C1445796437179.default.1445810981882 > 2015-10-25 22:10:09,177 INFO > [regionserver/localhost/127.0.0.1:60020.logRoller] > wal.FSHLog: Rolled WAL > /var/lib/hbase/data/WALs/localhost,60020,1445796437179/localhost%2C60020%2C1445796437179.default.1445810981882 > with entries=7796, filesize=30.41 MB; new WAL > /var/lib/hbase/data/WALs/localhost,60020,1445796437179/localhost%2C60020%2C1445796437179.default.1445811009174 > 2015-10-25 22:10:09,189 ERROR [sync.2] wal.FSHLog: Error syncing, request > close of wal > java.io.IOException: java.lang.NullPointerException > at > org.apache.hadoop.hbase.regionserver.wal.ProtobufLogWriter.sync(ProtobufLogWriter.java:176) > at > org.apache.hadoop.hbase.regionserver.wal.FSHLog$SyncRunner.run(FSHLog.java:1334) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.NullPointerException > at > org.apache.hadoop.hbase.regionserver.wal.ProtobufLogWriter.sync(ProtobufLogWriter.java:173) > ... 2 more > 2015-10-25 22:10:09,226 FATAL > [regionserver/localhost/127.0.0.1:60020.logRoller] > regionserver.HRegionServer: ABORTING region server > localhost,60020,1445796437179: IOE in log roller > java.io.IOException: java.lang.NullPointerException > at > org.apache.hadoop.hbase.regionserver.wal.ProtobufLogWriter.sync(ProtobufLogWriter.java:176) > at > org.apache.hadoop.hbase.regionserver.wal.FSHLog$SyncRunner.run(FSHLog.java:1334) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.NullPointerException > at > org.apache.hadoop.hbase.regionserver.wal.ProtobufLogWriter.sync(ProtobufLogWriter.java:173) > ... 2 more > 2015-10-25 22:10:09,226 FATAL > [regionserver/localhost/127.0.0.1:60020.logRoller] > regionserver.HRegionServer: RegionServer abort: loaded coprocessors are: > [org.apache.hadoop.hbase.coprocessor.MultiRowMutationEndpoint] > > Does anyone know if there is some workaround this problem or if there is a > patch for this? > If there is no workarounds/patch, I can help create a patch but would need > some general guidance on what could be going on here. > > --cheers, gaurav > -- --cheers, gaurav