> Taking care about concrete task tracker node in not hadoop approach.
The "Hadoop approach" is no longer a pure MapReduce approach with the
coming of YARN and Hadoop 2. It is a parallel processing platform.
Not every piece of data that goes into a parallel processing system is
massively huge "b
Looks like global design issue.
Taking care about concrete task tracker node in not hadoop approach.
It's hard to imagine what kind of problem you try to solve.
What would you do if your closuter grows to 50 nodes? to 100 nodes?
2013/10/30
> Using the distributed cache is a good idea for MR-ba
Michael,
Currently it is not possible to make to actions to run in the same node.
Eventually, when Oozie starts leveraging Yarn capabilities, this could be
possible.
Today you can overcome this limitation by using HDFS as the filesystem for
your actions to leave and pick up data from.
Thanks.
Using the distributed cache is a good idea for MR-based tasks, but not all
tasks are MR-based.
For example, I might need to run a shell script action followed by a Java
action, neither of which does anything with MR and need to work on files on
the local filesystem. It would be useful to have a "
Its mapreduce duty to select which TT node use to run task.
Try to put your local stuff into hdfs and use distributed cache
30.10.2013 19:22 пользователь написал:
> I have two actions that need to run on the same datanode (due to stuff on
> the local filesystem). Is there any way to ensure this
I have two actions that need to run on the same datanode (due to stuff on
the local filesystem). Is there any way to ensure this in Oozie?
For instance, if I put them into the same sub-workflow, will that work?
Does a subworkflow run two or more actions at the same node?
Thanks,
-Michael