dannycjones commented on code in PR #5308: URL: https://github.com/apache/hadoop/pull/5308#discussion_r1081452359
########## hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/mapred/CopyMapper.java: ########## @@ -354,7 +354,14 @@ private boolean canSkip(FileSystem sourceFS, CopyListingFileStatus source, boolean sameLength = target.getLen() == source.getLen(); boolean sameBlockSize = source.getBlockSize() == target.getBlockSize() || !preserve.contains(FileAttribute.BLOCKSIZE); - if (sameLength && sameBlockSize) { + // checksum check to be done if same file len(greater than 0), same block + // size and the target file has been updated more recently than the source + // file. + // Note: For Different cloud stores with different checksum algorithms, + // checksum comparisons are not performed so we would be depending on the + // file size and modification time. + if (sameLength && (source.getLen() > 0) && sameBlockSize && + source.getModificationTime() < target.getModificationTime()) { Review Comment: Why the addition of the `getLen() > 0`? We want to always copy if its an empty file? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org