I am assuming you have looked at this already: https://issues.apache.org/jira/browse/MAPREDUCE-5186
You do have a workaround here to increase *mapreduce.job.max.split.locations *value in hive configuration, or do we need more than that here ? -Rahul On Thu, Sep 19, 2013 at 11:00 AM, Murtaza Doctor <murtazadoc...@gmail.com>wrote: > It used to throw a warning in 1.03 and now has become an IOException. I > was more trying to figure out why it is exceeding the limit even though the > replication factor is 3. Also Hive may use CombineInputSplit or some > version of it, are we saying it will always exceed the limit of 10? > > > On Thu, Sep 19, 2013 at 10:05 AM, Edward Capriolo > <edlinuxg...@gmail.com>wrote: > >> We have this job submit property buried in hive that defaults to 10. We >> should make that configurable. >> >> >> On Wed, Sep 18, 2013 at 9:34 PM, Harsh J <ha...@cloudera.com> wrote: >> >>> Do your input files carry a replication factor of 10+? That could be >>> one cause behind this. >>> >>> On Thu, Sep 19, 2013 at 6:20 AM, Murtaza Doctor <murtazadoc...@gmail.com> >>> wrote: >>> > Folks, >>> > >>> > Any one run into this issue before: >>> > java.io.IOException: Max block location exceeded for split: Paths: >>> > "/foo/bar...." >>> > .... >>> > InputFormatClass: org.apache.hadoop.mapred.TextInputFormat >>> > splitsize: 15 maxsize: 10 >>> > at >>> > >>> org.apache.hadoop.mapreduce.split.JobSplitWriter.writeOldSplits(JobSplitWriter.java:162) >>> > at >>> > >>> org.apache.hadoop.mapreduce.split.JobSplitWriter.createSplitFiles(JobSplitWriter.java:87) >>> > at >>> > >>> org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:501) >>> > at >>> > >>> org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:471) >>> > at >>> > >>> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:366) >>> > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1269) >>> > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1266) >>> > at java.security.AccessController.doPrivileged(Native Method) >>> > at javax.security.auth.Subject.doAs(Subject.java:415) >>> > at >>> > >>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408) >>> > at org.apache.hadoop.mapreduce.Job.submit(Job.java:1266) >>> > at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:606) >>> > at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:601) >>> > at java.security.AccessController.doPrivileged(Native Method) >>> > at javax.security.auth.Subject.doAs(Subject.java:415) >>> > at >>> > >>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408) >>> > at >>> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:601) >>> > at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:586) >>> > at >>> org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:447) >>> > >>> > When we set the property to something higher as suggested like: >>> > mapreduce.job.max.split.locations = more than on what it failed >>> > then the job runs successfully. >>> > >>> > I am trying to dig up additional documentation on this since the >>> default >>> > seems to be 10, not sure how that limit was set. >>> > Additionally what is the recommended value and what factors does it >>> depend >>> > on? >>> > >>> > We are running YARN, the actual query is Hive on CDH 4.3, with Hive >>> version >>> > 0.10 >>> > >>> > Any pointers in this direction will be helpful. >>> > >>> > Regards, >>> > md >>> >>> >>> >>> -- >>> Harsh J >>> >> >> >