Hello, I was wondering what the correct way to submit a Job to hadoop using the Pipes API is -- currently, I invoke a command similar to this:
/usr/local/hadoop/bin/hadoop pipes -conf /usr/local/mapreduce/reports/reports.xml -input /store/requests/archive/*/*/* -output out However, this way of invoking the job has a few problems: it is a shell command, and thus a bit awkward to embed this type of job submission in a C++ program. Secondly, it would be awkward to retrieve the JobId from this shell command since all its output would have to be properly parsed. And last, it goes into a loop as long as the program is running, instead of going to the background. Now, in an ideal case, I would have some kind of HTTP url or whatever where I can submit job submissions to, which in turn returns some data about the new job, including the JobId. I need the JobId to me able to match my system's task id's with Hadoop JobId's when the <job.end.notification.url> is visited. I was wondering whether these requirements can be met without having to write a custom Java application, or that the native Java API is the only way to go to retrieve the JobId upon job submission. Thanks in advance! Regards, Leon Mergen