Using the distributed cache is a good idea for MR-based tasks, but not all tasks are MR-based.
For example, I might need to run a shell script action followed by a Java action, neither of which does anything with MR and need to work on files on the local filesystem. It would be useful to have a "compound action" that can run a shell action and Java action on the same node consecutively. I was hoping this is what a sub-workflow is for. One could argue that "compound things" just need to be managed via your own shell action, but I like the Java action because it sets up your classpath (including the Hadoop jars in your path). I'm not sure how to do this in my own shell script to launch a Java program. So it is more convenient to run a shell action that runs some bash stuff and then launch a Java program to do more stuff with it before putting the final result into HDFS. Any other ideas on ways to do this? -Michael On Wed, Oct 30, 2013 at 12:20 PM, Serega Sheypak <serega.shey...@gmail.com>wrote: > Its mapreduce duty to select which TT node use to run task. > Try to put your local stuff into hdfs and use distributed cache > 30.10.2013 19:22 пользователь <mpeters...@gmail.com> написал: > > > I have two actions that need to run on the same datanode (due to stuff on > > the local filesystem). Is there any way to ensure this in Oozie? > > > > For instance, if I put them into the same sub-workflow, will that work? > > Does a subworkflow run two or more actions at the same node? > > > > Thanks, > > -Michael > > >