Hey again,

I'm running a bulk load on s3 and I'm seeing region servers being instructed to load an hfile multiple times. I've seen this behavior two different ways. The first time I saw this was after deciding to brute force my way around the problem in HBASE-20774 [1] by just letting the cluster copy the data down from s3 and back up. After letting this run over night the input 15TB of data was now over 60TB in the hbase catalog. I noticed in the logs that files were being copied multiple times (I saw around 10 times on some files on a single region server).

After writing a patch for HBASE-20774 and deploying it to the cluster on the next attempt I'm seeing the same behavior but because when you bulk load from the same FS as the HBase root, HBase renames files rather than copying them. This is causing FileNotFoundExceptions the next time the region server is instructed to load the file. I've cherry picked the logs of an occurrence of this happening off of a region server:

2018-06-27 19:34:18,170 INFO [RpcServer.default.FPBQ.Fifo.handler=29,queue=2,port=16020] regionserver.HStore: Validating hfile at s3://bucket_name/data/bulkload/z3/d/19bca1eee5ff45a38cca616696b92ffe for inclusion in store d region z3_v2,\x00\x09\xD9 \x00\x00\x00\x00\x00\x00\x00,1530122295326.f6e3567551881fbd48baed14bfe2252b. 2018-06-27 19:34:18,236 INFO [RpcServer.default.FPBQ.Fifo.handler=29,queue=2,port=16020] s3n.S3NativeFileSystem: Opening 's3://bucket_name/data/bulkload/z3/d/19bca1eee5ff45a38cca616696b92ffe' for reading 2018-06-27 19:34:45,915 INFO [RpcServer.default.FPBQ.Fifo.handler=29,queue=2,port=16020] s3n.S3NativeFileSystem: rename s3://bucket_name/data/bulkload/z3/d/19bca1eee5ff45a38cca616696b92ffe s3://bucket_name/data/default/z3_v2/f6e3567551881fbd48baed14bfe2252b/d/47aef50a986340908a8242b256adbedf_SeqId_4_ 2018-06-27 19:34:59,005 INFO [RpcServer.default.FPBQ.Fifo.handler=29,queue=2,port=16020] regionserver.HStore: Loaded HFile s3://bucket_name/data/bulkload/z3/d/19bca1eee5ff45a38cca616696b92ffe into store 'd' as s3://bucket_name/data/default/z3_v2/f6e3567551881fbd48baed14bfe2252b/d/47aef50a986340908a8242b256adbedf_SeqId_4_ - updating store file list. 2018-06-27 19:34:59,101 INFO [RpcServer.default.FPBQ.Fifo.handler=29,queue=2,port=16020] regionserver.HStore: Successfully loaded store file s3://bucket_name/data/bulkload/z3/d/19bca1eee5ff45a38cca616696b92ffe into store d (new location: s3://bucket_name/data/default/z3_v2/f6e3567551881fbd48baed14bfe2252b/d/47aef50a986340908a8242b256adbedf_SeqId_4_) 2018-06-27 19:40:03,463 INFO [RpcServer.default.FPBQ.Fifo.handler=29,queue=2,port=16020] regionserver.HStore: Validating hfile at s3://bucket_name/data/bulkload/z3/d/19bca1eee5ff45a38cca616696b92ffe for inclusion in store d region z3_v2,\x00\x09\xD9 \x00\x00\x00\x00\x00\x00\x00,1530122295326.f6e3567551881fbd48baed14bfe2252b. org.apache.hadoop.io.MultipleIOException: 18 exceptions [java.io.FileNotFoundException: No such file or directory 's3://bucket_name/data/bulkload/z3/d/19bca1eee5ff45a38cca616696b92ffe', ....] at org.apache.hadoop.io.MultipleIOException.createIOException(MultipleIOException.java:53) at org.apache.hadoop.hbase.regionserver.HRegion.bulkLoadHFiles(HRegion.java:5782) at org.apache.hadoop.hbase.regionserver.RSRpcServices.bulkLoadHFile(RSRpcServices.java:2170) at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:36615)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2354)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124)
at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:297)
at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:277)

I'm not sure where the problem is coming from here, if the master is adding files to the queue multiple times or just not removing them. If anyone has knowledge on this or have an idea where things are going wrong it would be very helpful since I'm not super familiar with the code base.

Thanks,
Austin

[1] https://issues.apache.org/jira/browse/HBASE-20774

Reply via email to