Using the distributed cache is a good idea for MR-based tasks, but not all
tasks are MR-based.

For example, I might need to run a shell script action followed by a Java
action, neither of which does anything with MR and need to work on files on
the local filesystem.  It would be useful to have a "compound action" that
can run a shell action and Java action on the same node consecutively.  I
was hoping this is what a sub-workflow is for.

One could argue that "compound things" just need to be managed via your own
shell action, but I like the Java action because it sets up your classpath
(including the Hadoop jars in your path).  I'm not sure how to do this in
my own shell script to launch a Java program. So it is more convenient to
run a shell action that runs some bash stuff and then launch a Java program
to do more stuff with it before putting the final result into HDFS.

Any other ideas on ways to do this?
-Michael


On Wed, Oct 30, 2013 at 12:20 PM, Serega Sheypak
<serega.shey...@gmail.com>wrote:

> Its mapreduce duty to select which TT node use to run task.
> Try to put your local stuff into hdfs and use distributed cache
> 30.10.2013 19:22 пользователь <mpeters...@gmail.com> написал:
>
> > I have two actions that need to run on the same datanode (due to stuff on
> > the local filesystem).  Is there any way to ensure this in Oozie?
> >
> > For instance, if I put them into the same sub-workflow, will that work?
> >  Does a subworkflow run two or more actions at the same node?
> >
> > Thanks,
> > -Michael
> >
>

Reply via email to