[ 
https://issues.apache.org/jira/browse/HADOOP-17256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-17256.
-------------------------------------
    Resolution: Duplicate

caused by HADOOP-8143, which has now been rolled back everywhere it went in. It 
can also cause 404 errors, so was a critical roll back. Closing as a duplicate 
of HADOOP-8143

All future releases of Hadoop branch 3 will contain this fix

> DistCp -update option will be invalid when distcp files from hdfs to S3
> -----------------------------------------------------------------------
>
>                 Key: HADOOP-17256
>                 URL: https://issues.apache.org/jira/browse/HADOOP-17256
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: tools/distcp
>            Reporter: liuxiaolong
>            Priority: Major
>         Attachments: image-2020-09-10-17-25-46-354.png, 
> image-2020-09-10-17-33-50-505.png, image-2020-09-10-17-45-16-998.png, 
> image-2020-09-10-17-47-01-653.png, image-2020-09-10-17-52-32-290.png
>
>
> We use distcp with -update option to copy a dir from hdfs to S3. When we run 
> distcp job once more, it will overwrite S3 dir directly, rather than skip the 
> same files.
>   
>  Test Case:
> Run twice the following cmd,  the modify time of S3 files will be modified 
> every time.
>  hadoop distcp -update /test/ s3a://${s3_buckect}/test/
>  
> Check code in CopyMapper.java and S3AFileSystem.java 
> (1) For the first time, distcp job will create files in S3, but blockSize is 
> unused!
> !image-2020-09-10-17-45-16-998.png|width=542,height=485!
>  
> (2) For the second time, the distcp job will compare fileSize and blockSize 
> between hdfs file and S3 file
> !image-2020-09-10-17-47-01-653.png|width=524,height=248!
>  
> (3) blockSize is unused, when get blockSize of S3 file, it return a default 
> value.
> In S3AFileSystem.java, we find that the default value of fs.s3a.block.size is 
> 32 * 1024 * 1024.
> !image-2020-09-10-17-33-50-505.png|width=451,height=762!
>  
> !image-2020-09-10-17-52-32-290.png|width=527,height=87!
>   
> The blockSize of HDFS seems invalid in Object Store, like S3. So I think 
> there's no need to compare blockSize when distcp with -update option.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org

Reply via email to