Hello. I'm unsure of if this is a bug or an oversight, but since I've not found any reference anywhere to this, I figured I might bring it to light.
I've been using MultipleInputs for several of my MapReduce jobs, where I am joining together different forms of data. However, I have encountered the following exception with some uses of MultipleInputs in Hadoop 0.20.2: java.lang.ArrayIndexOutOfBoundsException: 1 at org.apache.hadoop.mapred.lib.MultipleInputs.getInputFormatMap(MultipleInputs.java:94) at org.apache.hadoop.mapred.lib.DelegatingInputFormat.getSplits(DelegatingInputFormat.java:51) at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:810) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:781) at org.apache.hadoop.mapreduce.Job.submit(Job.java:432) at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:447) After tracing through the source code, it appears that this occurs when the input path specified in MultipleInputs#addInputPath() contains a comma, which most often using globs (for example, "/months/{March,April,May}.txt"). Because the path itself contains commas, one of the two special delimiters used in MultipleInputs#getInputFormatMap(), when the input format map is being created, it parses the path-inputformat data incorrectly. Could someone verify this behavior in other versions of Hadoop? And possibly the more important question, should this actually be considered a bug in MultipleInputs? Thanks. ________________________________ This e-mail and any attachments may contain confidential material for the sole use of the intended recipient. If you are not the intended recipient, please be aware that any disclosure, copying, distribution or use of this e-mail or any attachment is prohibited. If you have received this e-mail in error, please contact the sender and delete all copies. Thank you for your cooperation.