Aman Poonia created HBASE-28836:
-----------------------------------

             Summary: Parallelize the archival of compacted files 
                 Key: HBASE-28836
                 URL: https://issues.apache.org/jira/browse/HBASE-28836
             Project: HBase
          Issue Type: Improvement
          Components: regionserver
    Affects Versions: 2.5.10
            Reporter: Aman Poonia
            Assignee: Aman Poonia


While splitting a region in hbase it has to cleanup compacted files for 
bookkeeping.
 
Currently we do it sequentially and that is good enough because for hdfs it is 
a fast operation. When we do the same in s3 it becomes a issue. We need to 
paralleize this to make it faster.
{code:java}
// code placeholder
for (File file : toArchive) {
      // if its a file archive it
      try {
        LOG.trace("Archiving {}", file);
        if (file.isFile()) {
          // attempt to archive the file
          if (!resolveAndArchiveFile(baseArchiveDir, file, startTime)) {
            LOG.warn("Couldn't archive " + file + " into backup directory: " + 
baseArchiveDir);
            failures.add(file);
          }
        } else {
          // otherwise its a directory and we need to archive all files
          LOG.trace("{} is a directory, archiving children files", file);
          // so we add the directory name to the one base archive
          Path parentArchiveDir = new Path(baseArchiveDir, file.getName());
          // and then get all the files from that directory and attempt to
          // archive those too
          Collection<File> children = file.getChildren();
          failures.addAll(resolveAndArchive(fs, parentArchiveDir, children, 
start));
        }
      } catch (IOException e) {
        LOG.warn("Failed to archive {}", file, e);
        failures.add(file);
      }
    } {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to