Mohammad Kamrul Islam created MAPREDUCE-6713:
------------------------------------------------

             Summary: Distcp doesn't provide the option to override the default 
staging directory
                 Key: MAPREDUCE-6713
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6713
             Project: Hadoop Map/Reduce
          Issue Type: Bug
          Components: distcp
    Affects Versions: 2.5.1
            Reporter: Mohammad Kamrul Islam


*Current state and shortcoming*
=======================
By default, distcp writes temporary files into 
$TARGET_PATH/.distcp.tmp/$taskatttempttid. (See 
RetriableFileCopyCommand#getTmpFile). There is no way a user can override this 
staging/tmp directory. The problem is obvious in S3 with versioning. For 
example, user wants to turn on S3 versioning only for his target directory but 
not the staging/tmp directory. Current distcp also creates versioning for 
staging directory which can contain a lot of temporary files. If user can 
override this path by a non-versioned S3 path for staging, it will make things 
cleaner.
  
*Proposed solution*
==============
Provide a new option(-stage) where user can optionally provide a path from 
target FS. Distcp mapper tasks will write distcp temporary files into that 
directory. 

*Possible Confusions* 
=================
There is another distcp option (-tmp) which can be assumed to serve the same 
purpose. But this option works only with "-atomic" option which has a different 
meaning of temporary files.
Another confusion could be the staging directory used by mapreduce framework. 
The proposed temp directory is for distcp specific.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

Reply via email to