dannycjones commented on code in PR #5308:
URL: https://github.com/apache/hadoop/pull/5308#discussion_r1081452359


##########
hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/mapred/CopyMapper.java:
##########
@@ -354,7 +354,14 @@ private boolean canSkip(FileSystem sourceFS, 
CopyListingFileStatus source,
     boolean sameLength = target.getLen() == source.getLen();
     boolean sameBlockSize = source.getBlockSize() == target.getBlockSize()
         || !preserve.contains(FileAttribute.BLOCKSIZE);
-    if (sameLength && sameBlockSize) {
+    // checksum check to be done if same file len(greater than 0), same block
+    // size and the target file has been updated more recently than the source
+    // file.
+    // Note: For Different cloud stores with different checksum algorithms,
+    // checksum comparisons are not performed so we would be depending on the
+    // file size and modification time.
+    if (sameLength && (source.getLen() > 0) && sameBlockSize &&
+        source.getModificationTime() < target.getModificationTime()) {

Review Comment:
   Why the addition of the `getLen() > 0`? We want to always copy if its an 
empty file?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to