[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated MAPREDUCE-7331:
--------------------------------------
    Issue Type: New Feature  (was: Bug)

> Make temporary directory used by FileOutputCommitter configurable
> -----------------------------------------------------------------
>
>                 Key: MAPREDUCE-7331
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7331
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: mrv2
>    Affects Versions: 3.0.0
>         Environment: CDH 6.2.1 Hadoop 3.0.0
>            Reporter: Bimalendu Choudhary
>            Priority: Major
>
> Spark SQL applications uses FileOutputCommitter to commit and merge its files 
> under a table directory. The hardcoded PENDING_DIR_NAME = _temporary 
> directory results in multiple application using the same temporary directory. 
> This casues unwanted results of one application interfering with other 
> applications temporary files. Also one application ending up deleting 
> temporary files of other. There is no way right now for applications to have 
> there unique path to store the temporary files to avoid any interference from 
> other totally independent applications.  I think the temporary directory 
> being used by FileOutputCommitter should be made configurable to let the 
> caller call with with its own unique value as per the requirement and avoid 
> it getting deleted or overwritten by other applications 
> Something like:
> {quote}public static final String PENDING_DIR_NAME_DEFAULT = "_temporary";
>  public static final String PENDING_DIR_NAME_DEFAULT =
>  "mapreduce.fileoutputcommitter.tempdir";
> {quote}
>  
> This can be used very efficiently by Spark applications to handle even stage 
> failures where temporary directories from previous attempts cause problem and 
> can help in so many situations. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

Reply via email to