I would suggest looking at how Pig/Hive/Sqoop/Distcp actions works if you
want to have a custom <cascading> action. Which, BTW, it would be a great
contribution to Oozie.

If you are going this path, you'll have to write a CascadingActionExecutor
class that runs in the Oozie server and you'll e corhave to write a
CascadingMain class that runs in the Launcher job. Plus an XSD defining the
cascading XML syntax.

If you want to start simpler, you can could do it via the Java action. You
will only need the CascadingMain for this. You can cannibalize
Pig/Hive/Sqoop/Distcp Oozie main class for this. The most important thing
here is to ensure the tokens are propagated to the cascading MR jobs.

hope this helps.


On Tue, Oct 8, 2013 at 3:19 PM, <mpeters...@gmail.com> wrote:

> Follow up.  I've tried to run a Cascading job in oozie a couple of ways,
> but they all fail for various reasons.
>
> I tried to put it in a map-reduce action with
> oozie.launcher.action.main.class defined pointing to my Cascading class,
> but I can't see any way to pass all the arguments to it that it needs.
>
> I also tried to use a shell action using oozie.launcher.action.main.class.
>  That launches my class but doesn't pass any arguments to it even though I
> specified arguments in the shell action.
>
> Finally, I tried to do it with a shell command where I don't specify
> oozie.launcher.action.main.class and instead put '/usr/bin/hadoop' as the
> exec action and then put all the rest of the invocation as commands.  This
> invokes my Cascading class with the right arguments, but then dies for no
> apparent reason that I can tell from the Hadoop logs (it never launches the
> Cascading MR jobs).
>
> If anyone has an example of a working oozie workflow where they wrap a
> Cascading job, I'd love to see it.
>
> -Michael
>
>
>
> On Tue, Oct 8, 2013 at 4:08 PM, <mpeters...@gmail.com> wrote:
>
> > Apologies if this has been asked before, but I can't figure out how to
> > search the archives of this mailing list and 20 minutes of googling
> yielded
> > no useful results.
> >
> > I'm on a team that uses Cascading to do our MapReduce flows.  However, we
> > are investigating using Oozie to do additional types of actions (hive,
> > shell, etc.) and use its scheduler.  For this to work, we'll need to be
> > able to run a Cascading job as an oozie action.  Which is what I can't
> > figure out how to do.
> >
> > Typically to run a Cascading job, we'll do this:
> >
> > hadoop jar mycascading_uberjar.jar com.company.MyCascadingFlow arg1 arg2
> > arg3 argN
> >
> > My first thought was to use an oozie map-reduce action, since I run this
> > with "hadoop jar" and Cascading creates MRs under the hood, but the oozie
> > map-reduce action wants things like mapred.mapper.class
> > and mapred.reducer.class.  Well MyCascadingFlow runs two dozen different
> > mappers and a few different reducers!
> >
> > What is the best way to do this?  The java action seems wrong since it
> > won't run it with "hadoop jar".  Which leaves me with just a shell action
> > and putting the "hadoop jar ...." line in a shell script and invoking it.
> >
> > Other ideas?
> >
> > -Michael
> >
>



-- 
Alejandro

Reply via email to