Can I do that with s3distcp / distcp? The job is being configured in the run() method of s3distcp (as it implements Tool). So I think I can't use this approach. I use this for the jobs I control of course, but the problem is things like distcp where I don't control the configuration.
Dave From: Manoj Babu [mailto:manoj...@gmail.com] Sent: Friday, December 14, 2012 12:57 PM To: user@hadoop.apache.org Subject: Re: How to submit Tool jobs programatically in parallel? David, You try like below instead of runJob() you can try submitJob(). JobClient jc = new JobClient(job); jc.submitJob(job); Cheers! Manoj. On Fri, Dec 14, 2012 at 10:09 AM, David Parks <davidpark...@yahoo.com> wrote: I'm submitting unrelated jobs programmatically (using AWS EMR) so they run in parallel. I'd like to run an s3distcp job in parallel as well, but the interface to that job is a Tool, e.g. ToolRunner.run(...). ToolRunner blocks until the job completes though, so presumably I'd need to create a thread pool to run these jobs in parallel. But creating multiple threads to submit concurrent jobs via ToolRunner, blocking on the jobs completion, just feels improper. Is there an alternative?