[ https://issues.apache.org/jira/browse/HADOOP-13654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Steve Loughran resolved HADOOP-13654. ------------------------------------- Resolution: Won't Fix > S3A create() to support asynchronous check of dest & parent paths > ----------------------------------------------------------------- > > Key: HADOOP-13654 > URL: https://issues.apache.org/jira/browse/HADOOP-13654 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 > Affects Versions: 2.7.3 > Reporter: Steve Loughran > > One source of delays in S3A is the need to check if a destination path exists > in create; this makes sure the operation isn't trying to overwrite a > directory. > #. This is slow, 1-4 HTTPS requests > # The code doesn't seem to check the entire parent path to make sure there > isn't a file as a parent (which raises the question: shouldn't we have a > contract test for this?) > # Even with the create overwrite=false check, the fact that the new object > isn't created until the output stream is close()'d, means that the check has > race conditions. > Instead of doing a synchronous check in create(), we could do an asynchronous > check of the parent directory tree. If any error surfaced, this could be > cached and then thrown on the next call to: write(), flush() or close(); that > is, the failure of a create due to path problems would not surface > immediately on the create() call, *but before any writes were committed*. > The full directory tree can/should be checked, and is results remembered. > This would allow for the post-commit cleanup to issue delete() requests > purely for those paths (if any) which referred to directories. > As well as the need to use the AWS thread pool, there's a bit of complexity > with cancelling multipart uploads: the output stream needs to know that the > request failed, and that the multipart should be aborted. > If the complexity of the asynchronous calls can be coped with, *and client > code happy to accept errors in the any IO call to the output stream*, then > the initial overhead at file creation could be skipped. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org