Hello,

I was wondering what the correct way to submit a Job to hadoop using the
Pipes API is -- currently, I invoke a command similar to this:

/usr/local/hadoop/bin/hadoop pipes -conf
/usr/local/mapreduce/reports/reports.xml -input
/store/requests/archive/*/*/* -output out

However, this way of invoking the job has a few problems: it is a shell
command, and thus a bit awkward to embed this type of job submission in a
C++ program. Secondly, it would be awkward to retrieve the JobId from this
shell command since all its output would have to be properly parsed. And
last, it goes into a loop as long as the program is running, instead of
going to the background.

Now, in an ideal case, I would have some kind of HTTP url or whatever where
I can submit job submissions to, which in turn returns some data about the
new job, including the JobId.

I need the JobId to me able to match my system's task id's with Hadoop
JobId's when the <job.end.notification.url> is visited.

I was wondering whether these requirements can be met without having to
write a custom Java application, or that the native Java API is the only way
to go to retrieve the JobId upon job submission.

Thanks in advance!

Regards,

Leon Mergen

Reply via email to