Hi Ted, Thanks for looking into this. I'm not an admin of this cluster, so I probably won't be able to help with testing.
Just to clarify, the client code I sent succeeds here. HBase region server crashes later when flushing WAL. Then the region is failed over to a new server, which also crashes; every crash leaves a 64MB temp file, which adds up quickly, since the region servers are restarted automatically. I'll put that in a JIRA. Regards, Daniel 2017-07-14 14:26 GMT+02:00 Ted Yu <yuzhih...@gmail.com>: > I put up a quick test (need to find better place) exercising the snippet > you posted: > > https://pastebin.com/FNh245LD > > I got past where "written large put" is logged. > > Can you log an hbase JIRA ? > > On Fri, Jul 14, 2017 at 4:47 AM, Ted Yu <yuzhih...@gmail.com> wrote: > > > If possible, can you try the following fix ? > > > > Thanks > > > > diff --git a/hbase-server/src/main/java/org/apache/hadoop/hbase/io/ > hfile/HFile.java > > b/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java > > index feddc2c..ea01f76 100644 > > --- a/hbase-server/src/main/java/org/apache/hadoop/hbase/io/ > > hfile/HFile.java > > +++ b/hbase-server/src/main/java/org/apache/hadoop/hbase/io/ > > hfile/HFile.java > > @@ -834,7 +834,9 @@ public class HFile { > > int read = in.read(pbuf); > > if (read != pblen) throw new IOException("read=" + read + ", > > wanted=" + pblen); > > if (ProtobufUtil.isPBMagicPrefix(pbuf)) { > > - parsePB(HFileProtos.FileInfoProto.parseDelimitedFrom(in)); > > + HFileProtos.FileInfoProto.Builder builder = > > HFileProtos.FileInfoProto.newBuilder(); > > + ProtobufUtil.mergeDelimitedFrom(builder, in); > > + parsePB(builder.build()); > > } else { > > if (in.markSupported()) { > > in.reset(); > > > > On Fri, Jul 14, 2017 at 4:01 AM, Daniel Jeliński <djelins...@gmail.com> > > wrote: > > > >> Hello, > >> While playing with MOB feature (on HBase 1.2.0-cdh5.10.0), I > accidentally > >> created a table that killed every region server it was assigned to. I > >> can't > >> test it with other revisions, and I couldn't find it in JIRA. > >> > >> I'm reporting it here, let me know if there's a better place. > >> > >> Gist of code used to create the table: > >> > >> private String table = "poisonPill"; > >> private byte[] familyBytes = Bytes.toBytes("cf"); > >> private void createTable(Connection conn) throws IOException { > >> Admin hbase_admin = conn.getAdmin(); > >> HTableDescriptor htable = new HTableDescriptor(TableName.val > >> ueOf(table)); > >> HColumnDescriptor hfamily = new HColumnDescriptor(familyBytes); > >> hfamily.setMobEnabled(true); > >> htable.setConfiguration("hfile.format.version","3"); > >> htable.addFamily(hfamily); > >> hbase_admin.createTable(htable); > >> } > >> private void killTable(Connection conn) throws IOException { > >> Table tbl = conn.getTable(TableName.valueOf(table)); > >> byte[] data = new byte[1<<26]; > >> byte[] smalldata = new byte[0]; > >> Put put = new Put(Bytes.toBytes("1")); > >> put.addColumn(familyBytes, data, smalldata); > >> tbl.put(put); > >> } > >> > >> Resulting exception on region server: > >> > >> 2017-07-11 09:34:54,704 WARN > >> org.apache.hadoop.hbase.regionserver.HStore: Failed validating store > >> file hdfs://sandbox/hbase/data/default/poisonPill/f82e20f32302dfd > >> d95c89ecc3be5a211/.tmp/7858d223eddd4199ad220fc77bb612eb, > >> retrying num=0 > >> org.apache.hadoop.hbase.io.hfile.CorruptHFileException: Problem > >> reading HFile Trailer from file > >> hdfs://sandbox/hbase/data/default/poisonPill/f82e20f32302dfd > >> d95c89ecc3be5a211/.tmp/7858d223eddd4199ad220fc77bb612eb > >> at org.apache.hadoop.hbase.io.hfile.HFile.pickReaderVersion(HFi > >> le.java:497) > >> at org.apache.hadoop.hbase.io.hfile.HFile.createReader(HFile. > >> java:525) > >> at org.apache.hadoop.hbase.regionserver.StoreFile$Reader.<init> > >> (StoreFile.java:1105) > >> at org.apache.hadoop.hbase.regionserver.StoreFileInfo.open( > >> StoreFileInfo.java:265) > >> at org.apache.hadoop.hbase.regionserver.StoreFile.open(StoreFil > >> e.java:404) > >> at org.apache.hadoop.hbase.regionserver.StoreFile.createReader( > >> StoreFile.java:509) > >> at org.apache.hadoop.hbase.regionserver.StoreFile.createReader( > >> StoreFile.java:499) > >> at org.apache.hadoop.hbase.regionserver.HStore.createStoreFileA > >> ndReader(HStore.java:675) > >> at org.apache.hadoop.hbase.regionserver.HStore.createStoreFileA > >> ndReader(HStore.java:667) > >> at org.apache.hadoop.hbase.regionserver.HStore.validateStoreFil > >> e(HStore.java:1746) > >> at org.apache.hadoop.hbase.regionserver.HStore.flushCache( > >> HStore.java:942) > >> at org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl > >> .flushCache(HStore.java:2299) > >> at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCa > >> cheAndCommit(HRegion.java:2372) > >> at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushca > >> che(HRegion.java:2102) > >> at org.apache.hadoop.hbase.regionserver.HRegion.replayRecovered > >> Edits(HRegion.java:4139) > >> at org.apache.hadoop.hbase.regionserver.HRegion.replayRecovered > >> EditsIfAny(HRegion.java:3934) > >> at org.apache.hadoop.hbase.regionserver.HRegion.initializeRegio > >> nInternals(HRegion.java:828) > >> at org.apache.hadoop.hbase.regionserver.HRegion.initialize( > >> HRegion.java:799) > >> at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion( > >> HRegion.java:6480) > >> at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion( > >> HRegion.java:6441) > >> at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion( > >> HRegion.java:6412) > >> at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion( > >> HRegion.java:6368) > >> at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion( > >> HRegion.java:6319) > >> at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandl > >> er.openRegion(OpenRegionHandler.java:362) > >> at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandl > >> er.process(OpenRegionHandler.java:129) > >> at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandl > >> er.java:129) > >> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool > >> Executor.java:1142) > >> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo > >> lExecutor.java:617) > >> at java.lang.Thread.run(Thread.java:745) > >> Caused by: com.google.protobuf.InvalidProtocolBufferException: > >> Protocol message was too large. May be malicious. Use > >> CodedInputStream.setSizeLimit() to increase the size limit. > >> at com.google.protobuf.InvalidProtocolBufferException. > >> sizeLimitExceeded(InvalidProtocolBufferException.java:110) > >> at com.google.protobuf.CodedInputStream.refillBuffer(CodedInput > >> Stream.java:755) > >> at com.google.protobuf.CodedInputStream.isAtEnd(CodedInputStrea > >> m.java:701) > >> at com.google.protobuf.CodedInputStream.readTag(CodedInputStrea > >> m.java:99) > >> at org.apache.hadoop.hbase.protobuf.generated.HFileProtos$ > >> FileInfoProto.<init>(HFileProtos.java:82) > >> at org.apache.hadoop.hbase.protobuf.generated.HFileProtos$ > >> FileInfoProto.<init>(HFileProtos.java:46) > >> at org.apache.hadoop.hbase.protobuf.generated.HFileProtos$ > >> FileInfoProto$1.parsePartialFrom(HFileProtos.java:135) > >> at org.apache.hadoop.hbase.protobuf.generated.HFileProtos$ > >> FileInfoProto$1.parsePartialFrom(HFileProtos.java:130) > >> at com.google.protobuf.AbstractParser.parsePartialFrom(Abstract > >> Parser.java:200) > >> at com.google.protobuf.AbstractParser.parsePartialDelimitedFrom > >> (AbstractParser.java:241) > >> at com.google.protobuf.AbstractParser.parseDelimitedFrom(Abstra > >> ctParser.java:253) > >> at com.google.protobuf.AbstractParser.parseDelimitedFrom(Abstra > >> ctParser.java:259) > >> at com.google.protobuf.AbstractParser.parseDelimitedFrom(Abstra > >> ctParser.java:49) > >> at org.apache.hadoop.hbase.protobuf.generated.HFileProtos$ > >> FileInfoProto.parseDelimitedFrom(HFileProtos.java:297) > >> at org.apache.hadoop.hbase.io.hfile.HFile$FileInfo.read(HFile. > >> java:752) > >> at org.apache.hadoop.hbase.io.hfile.HFileReaderV2.<init>(HFileR > >> eaderV2.java:161) > >> at org.apache.hadoop.hbase.io.hfile.HFileReaderV3.<init>(HFileR > >> eaderV3.java:77) > >> at org.apache.hadoop.hbase.io.hfile.HFile.pickReaderVersion(HFi > >> le.java:487) > >> ... 28 more > >> > >> After a number of tries, RegionServer service is aborted. > >> > >> I wasn't able to reproduce this issue with MOB disabled. > >> > >> Regards, > >> > >> Daniel > >> > > > > >