alan.zhao created HBASE-27733:
---------------------------------

             Summary: hfile split occurs during bulkload, the new HFile file 
does not specify favored nodes
                 Key: HBASE-27733
                 URL: https://issues.apache.org/jira/browse/HBASE-27733
             Project: HBase
          Issue Type: Improvement
            Reporter: alan.zhao
            Assignee: alan.zhao


## BulkloadHFilesTool.class

/**
* Copy half of an HFile into a new HFile.
*/
private static void copyHFileHalf(Configuration conf, Path inFile, Path outFile,
Reference reference, ColumnFamilyDescriptor familyDescriptor) throws 
IOException {
FileSystem fs = inFile.getFileSystem(conf);
CacheConfig cacheConf = CacheConfig.DISABLED;
HalfStoreFileReader halfReader = null;
StoreFileWriter halfWriter = null;
try {
ReaderContext context = new ReaderContextBuilder().withFileSystemAndPath(fs, 
inFile).build();
StoreFileInfo storeFileInfo =
new StoreFileInfo(conf, fs, fs.getFileStatus(inFile), reference);
storeFileInfo.initHFileInfo(context);
halfReader = (HalfStoreFileReader) storeFileInfo.createReader(context, 
cacheConf);
storeFileInfo.getHFileInfo().initMetaAndIndex(halfReader.getHFileReader());
Map<byte[], byte[]> fileInfo = halfReader.loadFileInfo();

int blocksize = familyDescriptor.getBlocksize();
Algorithm compression = familyDescriptor.getCompressionType();
BloomType bloomFilterType = familyDescriptor.getBloomFilterType();
HFileContext hFileContext = new 
HFileContextBuilder().withCompression(compression)
.withChecksumType(StoreUtils.getChecksumType(conf))
.withBytesPerCheckSum(StoreUtils.getBytesPerChecksum(conf)).withBlockSize(blocksize)
.withDataBlockEncoding(familyDescriptor.getDataBlockEncoding()).withIncludesTags(true)
.withCreateTime(EnvironmentEdgeManager.currentTime()).build();
*halfWriter = new StoreFileWriter.Builder(conf, cacheConf, 
fs).withFilePath(outFile)*
 *.withBloomType(bloomFilterType).withFileContext(hFileContext).build();*
HFileScanner scanner = halfReader.getScanner(false, false, false);
scanner.seekTo();

...

 

When hfile splitting occurs during bulkload, the new HFile file does not 
specify favored nodes, which will affect the locality of data. Internally, we 
implemented a version of the code that allows us to specify the favored nodes 
of the split HFile in copyHFileHalf() to avoid compromising locality



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to