alan.zhao created HBASE-27733: --------------------------------- Summary: hfile split occurs during bulkload, the new HFile file does not specify favored nodes Key: HBASE-27733 URL: https://issues.apache.org/jira/browse/HBASE-27733 Project: HBase Issue Type: Improvement Reporter: alan.zhao Assignee: alan.zhao
## BulkloadHFilesTool.class /** * Copy half of an HFile into a new HFile. */ private static void copyHFileHalf(Configuration conf, Path inFile, Path outFile, Reference reference, ColumnFamilyDescriptor familyDescriptor) throws IOException { FileSystem fs = inFile.getFileSystem(conf); CacheConfig cacheConf = CacheConfig.DISABLED; HalfStoreFileReader halfReader = null; StoreFileWriter halfWriter = null; try { ReaderContext context = new ReaderContextBuilder().withFileSystemAndPath(fs, inFile).build(); StoreFileInfo storeFileInfo = new StoreFileInfo(conf, fs, fs.getFileStatus(inFile), reference); storeFileInfo.initHFileInfo(context); halfReader = (HalfStoreFileReader) storeFileInfo.createReader(context, cacheConf); storeFileInfo.getHFileInfo().initMetaAndIndex(halfReader.getHFileReader()); Map<byte[], byte[]> fileInfo = halfReader.loadFileInfo(); int blocksize = familyDescriptor.getBlocksize(); Algorithm compression = familyDescriptor.getCompressionType(); BloomType bloomFilterType = familyDescriptor.getBloomFilterType(); HFileContext hFileContext = new HFileContextBuilder().withCompression(compression) .withChecksumType(StoreUtils.getChecksumType(conf)) .withBytesPerCheckSum(StoreUtils.getBytesPerChecksum(conf)).withBlockSize(blocksize) .withDataBlockEncoding(familyDescriptor.getDataBlockEncoding()).withIncludesTags(true) .withCreateTime(EnvironmentEdgeManager.currentTime()).build(); *halfWriter = new StoreFileWriter.Builder(conf, cacheConf, fs).withFilePath(outFile)* *.withBloomType(bloomFilterType).withFileContext(hFileContext).build();* HFileScanner scanner = halfReader.getScanner(false, false, false); scanner.seekTo(); ... When hfile splitting occurs during bulkload, the new HFile file does not specify favored nodes, which will affect the locality of data. Internally, we implemented a version of the code that allows us to specify the favored nodes of the split HFile in copyHFileHalf() to avoid compromising locality -- This message was sent by Atlassian Jira (v8.20.10#820010)