[ https://issues.apache.org/jira/browse/HADOOP-17256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Steve Loughran resolved HADOOP-17256. ------------------------------------- Resolution: Duplicate caused by HADOOP-8143, which has now been rolled back everywhere it went in. It can also cause 404 errors, so was a critical roll back. Closing as a duplicate of HADOOP-8143 All future releases of Hadoop branch 3 will contain this fix > DistCp -update option will be invalid when distcp files from hdfs to S3 > ----------------------------------------------------------------------- > > Key: HADOOP-17256 > URL: https://issues.apache.org/jira/browse/HADOOP-17256 > Project: Hadoop Common > Issue Type: Bug > Components: tools/distcp > Reporter: liuxiaolong > Priority: Major > Attachments: image-2020-09-10-17-25-46-354.png, > image-2020-09-10-17-33-50-505.png, image-2020-09-10-17-45-16-998.png, > image-2020-09-10-17-47-01-653.png, image-2020-09-10-17-52-32-290.png > > > We use distcp with -update option to copy a dir from hdfs to S3. When we run > distcp job once more, it will overwrite S3 dir directly, rather than skip the > same files. > > Test Case: > Run twice the following cmd, the modify time of S3 files will be modified > every time. > hadoop distcp -update /test/ s3a://${s3_buckect}/test/ > > Check code in CopyMapper.java and S3AFileSystem.java > (1) For the first time, distcp job will create files in S3, but blockSize is > unused! > !image-2020-09-10-17-45-16-998.png|width=542,height=485! > > (2) For the second time, the distcp job will compare fileSize and blockSize > between hdfs file and S3 file > !image-2020-09-10-17-47-01-653.png|width=524,height=248! > > (3) blockSize is unused, when get blockSize of S3 file, it return a default > value. > In S3AFileSystem.java, we find that the default value of fs.s3a.block.size is > 32 * 1024 * 1024. > !image-2020-09-10-17-33-50-505.png|width=451,height=762! > > !image-2020-09-10-17-52-32-290.png|width=527,height=87! > > The blockSize of HDFS seems invalid in Object Store, like S3. So I think > there's no need to compare blockSize when distcp with -update option. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org