Confirmed that ToolRunner is NOT thread-safe: *Original code (which runs into problems):*
public static int run(Configuration conf, Tool tool, String[] args) throws Exception{ if(conf == null) { conf = new Configuration(); } GenericOptionsParser parser = new GenericOptionsParser(conf, args); //set the configuration back, so that Tool can configure itself tool.setConf(conf); //get the args w/o generic hadoop args String[] toolArgs = parser.getRemainingArgs(); return tool.run(toolArgs); } *New code (which works):* public static int run(Configuration conf, Tool tool, String[] args) throws Exception{ if(conf == null) { conf = new Configuration(); } GenericOptionsParser parser = getParser(conf, args); tool.setConf(conf); //get the args w/o generic hadoop args String[] toolArgs = parser.getRemainingArgs(); return tool.run(toolArgs); } private static *synchronized *GenericOptionsParser getParser(Configuration conf, String[] args) throws Exception { return new GenericOptionsParser(conf, args); } On Wed, Mar 19, 2014 at 10:15 AM, Something Something < mailinglist...@gmail.com> wrote: > I would like to trigger a few Hadoop jobs simultaneously. I've created a > pool of threads using Executors.newFixedThreadPool. Idea is that if the > pool size is 2, my code will trigger 2 Hadoop jobs at the same exact time > using 'ToolRunner.run'. In my testing, I noticed that these 2 threads > keep stepping on each other. > > When I looked under the hood, I noticed that ToolRunner creates > GenericOptionsParser which in turn calls a static method > 'buildGeneralOptions'. This method uses 'OptionBuilder.withArgName' > which uses an instance variable called, 'argName'. This doesn't look > thread safe to me and I believe is the root cause of issues I am running > into. > > Any thoughts? >