Re: Is there a way to ensure two actions run on the same datanode?

2013-10-30 Thread mpeterson2
> Taking care about concrete task tracker node in not hadoop approach. The "Hadoop approach" is no longer a pure MapReduce approach with the coming of YARN and Hadoop 2. It is a parallel processing platform. Not every piece of data that goes into a parallel processing system is massively huge "b

Re: Is there a way to ensure two actions run on the same datanode?

2013-10-30 Thread Serega Sheypak
Looks like global design issue. Taking care about concrete task tracker node in not hadoop approach. It's hard to imagine what kind of problem you try to solve. What would you do if your closuter grows to 50 nodes? to 100 nodes? 2013/10/30 > Using the distributed cache is a good idea for MR-ba

Re: Is there a way to ensure two actions run on the same datanode?

2013-10-30 Thread Alejandro Abdelnur
Michael, Currently it is not possible to make to actions to run in the same node. Eventually, when Oozie starts leveraging Yarn capabilities, this could be possible. Today you can overcome this limitation by using HDFS as the filesystem for your actions to leave and pick up data from. Thanks.

Re: Is there a way to ensure two actions run on the same datanode?

2013-10-30 Thread mpeterson2
Using the distributed cache is a good idea for MR-based tasks, but not all tasks are MR-based. For example, I might need to run a shell script action followed by a Java action, neither of which does anything with MR and need to work on files on the local filesystem. It would be useful to have a "

Re: Is there a way to ensure two actions run on the same datanode?

2013-10-30 Thread Serega Sheypak
Its mapreduce duty to select which TT node use to run task. Try to put your local stuff into hdfs and use distributed cache 30.10.2013 19:22 пользователь написал: > I have two actions that need to run on the same datanode (due to stuff on > the local filesystem). Is there any way to ensure this

Is there a way to ensure two actions run on the same datanode?

2013-10-30 Thread mpeterson2
I have two actions that need to run on the same datanode (due to stuff on the local filesystem). Is there any way to ensure this in Oozie? For instance, if I put them into the same sub-workflow, will that work? Does a subworkflow run two or more actions at the same node? Thanks, -Michael