Hi, MR does use multiple disks when spilling. But the work directory is also round-robined to spread I/O.
YARN sets an environment property thats a list (comma separated value) of directories (ApplicationConstants.LOCAL_DIR_ENV) your app container can together use. Perhaps read it in with StringUtils.getTrimmedStrings(System.getenv(ApplicationConstants.LOCAL_DIR_ENV)); and then round robin internally over those paths (with free space handling)? Perhaps you can even reuse the org.apache.hadoop.fs.LocalDirAllocator class; which is what MR uses. Its not been declared publicly stable though, but we can do that over a JIRA. On Mon, Oct 21, 2013 at 2:05 AM, John Lilley <john.lil...@redpoint.net> wrote: > Harsh, thanks for the quick response. These files don't need to be on the > DFS (although we use that too). These are local files used during sorting, > joining, transitive closure. > > The task-relative folder might be good enough, but our app *can* make use of > multiple temp folders if they are available. Our YARN app can be fairly I/O > intensive; is it possible to allocate more than one temp folder on different > physical devices? > > Or perhaps YARN might help us. Will YARN assign tasks to CWD folders on > different disks so that they do not compete with each other on I/O? > > For that matter, where does MR allocate the temporary files generated by > Mapper output? Presumably MR has the same I/O parallelism requirements that > we do. > > Thanks > John > > > -----Original Message----- > From: Harsh J [mailto:ha...@cloudera.com] > Sent: Sunday, October 20, 2013 10:49 AM > To: <user@hadoop.apache.org> > Subject: Re: temporary file locations for YARN applications > > Every container gets its own local work directory (You can use the relative > ./) thats auto-cleaned up at the end of the container's life. > This is the best place to store the temporary files. This is not something > you need custom configuration for. > > Do the files need to be on a distributed FS or a local one? > > On Sun, Oct 20, 2013 at 8:54 PM, John Lilley <john.lil...@redpoint.net> wrote: >> We have a pure YARN application (no MapReduce) that has need to store >> a significant amount of temporary data. How can we know the best >> location for these files? How can we ensure that our YARN tasks have >> write access to these locations? Is this something that must be configured >> outside of YARN? >> Thanks, >> John > > -- > Harsh J -- Harsh J