On Mon, Sep 21, 2015 at 5:58 PM, Shrijeet <[email protected]> wrote:
> HBase Version: 0.94.26 > HDFS version: 2.5.x > > We backported HBASE-8755 onto 0.94.27 Have you put much work into it? If you went to HBASE-10156, write throughput is better again and you'd be on same system as is in current hbase-1.x so would be easier to help you. > and seeing a corner case that I wish > to run by the list. In one of the write heavy use cases we noticed region > server hanging forever (all server handlers busy on HLog.sync plus few > other odd things). From logs we could see we had hit what looked like > https://issues.apache.org/jira/browse/HDFS-7765 > > *15/09/04 19:55:32 INFO wal.HLog: AsyncHLogWriter exiting* > Exception in thread "AsyncHLogWriter" 15/09/04 19:55:32 INFO > regionserver.StoreFile$Reader: Loaded ROW (CompoundBloomFilter) metadata > for e10b96d0b1b94675b9bbe60b9ce8e220 > 15/09/04 19:55:32 INFO regionserver.Store: Added > > hdfs://localhost/hbase/table/a032227ad1500eec1d0ed108c52bc31c/t/e10b96d0b1b94675b9bbe60b9ce8e220, > entries=609619, sequenceid=8449834686, filesize=4.5 M > *java.lang.ArrayIndexOutOfBoundsException: 4608* > *>-at org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:76)* > >-at > > org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:50) > >-at java.io.DataOutputStream.writeInt(DataOutputStream.java:197) > >-at org.apache.hadoop.io.SequenceFile$Writer.sync(SequenceFile.java:1229) > >-at > > org.apache.hadoop.io.SequenceFile$Writer.checkAndWriteSync(SequenceFile.java:1290) > >-at > org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1330) > >-at > org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1297) > >-at > > org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.append(SequenceFileLogWriter.java:284) > >-at > > org.apache.hadoop.hbase.regionserver.wal.HLog$AsyncWriter.run(HLog.java:1303) > >-at java.lang.Thread.run(Thread.java:745) > > Since AsyncWriter got aborted due to an unhanded exception the AsyncSyncer > etc. will not be notified, thus deadlocking the server. I am not familiar > with HLog's life cycle, how does it handles errors in its worker threads? My guess is that they are not supposed to have exited... the exception is unexpected. They are not supposed to die in normal operation. > I > see some IOE handling but looks like its a deferred action. Fair to assume > we can't do same (as IOE) for all Throwables? > > At least catch the above I'd say. > This is how we handle IOE in AsyncWriter: > > // 3. write all buffered writes to HDFS(append, without sync) > try { > for (Entry e : pendingWrites) { > writer.append(e); > } > } catch(IOException e) { > LOG.fatal("Error while AsyncWriter write, request close of hlog ", e); > requestLogRoll(); > * asyncIOE = e;* > failedTxid.set(this.lastWrittenTxid); > } > IIRC, append is not supposed to throw an exception IIRC (later, we learn that it can.... In the later versions of the WAL subsystem, there is careful handling to make sure we don't do things like fail an append but successfully sync. St.Ack > > // 4. update 'lastWrittenTxid' and notify AsyncSyncer to do 'sync' > asyncSyncer.setWrittenTxid(this.lastWrittenTxid); >
