Aman Poonia created HBASE-28836: ----------------------------------- Summary: Parallelize the archival of compacted files Key: HBASE-28836 URL: https://issues.apache.org/jira/browse/HBASE-28836 Project: HBase Issue Type: Improvement Components: regionserver Affects Versions: 2.5.10 Reporter: Aman Poonia Assignee: Aman Poonia
While splitting a region in hbase it has to cleanup compacted files for bookkeeping. Currently we do it sequentially and that is good enough because for hdfs it is a fast operation. When we do the same in s3 it becomes a issue. We need to paralleize this to make it faster. {code:java} // code placeholder for (File file : toArchive) { // if its a file archive it try { LOG.trace("Archiving {}", file); if (file.isFile()) { // attempt to archive the file if (!resolveAndArchiveFile(baseArchiveDir, file, startTime)) { LOG.warn("Couldn't archive " + file + " into backup directory: " + baseArchiveDir); failures.add(file); } } else { // otherwise its a directory and we need to archive all files LOG.trace("{} is a directory, archiving children files", file); // so we add the directory name to the one base archive Path parentArchiveDir = new Path(baseArchiveDir, file.getName()); // and then get all the files from that directory and attempt to // archive those too Collection<File> children = file.getChildren(); failures.addAll(resolveAndArchive(fs, parentArchiveDir, children, start)); } } catch (IOException e) { LOG.warn("Failed to archive {}", file, e); failures.add(file); } } {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)