Matt, It would be better for you to do an global config update: set *mapreduce.job.max.split.locations *to at least the number of datanodes in your cluster, either in hive-site.xml or mapred-site.xml. Either case, this is a sensible configuration update if you are going to use CombineFileInputFormat to read input data in hive.
-Rahul On Thu, Sep 19, 2013 at 3:31 PM, Matt Davies <m...@mattdavies.net> wrote: > What are the ramifications of setting a hard coded value in our scripts > and then changing parameters which influence the input data size. I.e. I > want to run across 1 day worth of data, then a different day I want to run > against 30 days? > > > > > On Thu, Sep 19, 2013 at 3:11 PM, Rahul Jain <rja...@gmail.com> wrote: > >> I am assuming you have looked at this already: >> >> https://issues.apache.org/jira/browse/MAPREDUCE-5186 >> >> You do have a workaround here to increase *mapreduce.job.max.split.locations >> *value in hive configuration, or do we need more than that here ? >> >> -Rahul >> >> >> On Thu, Sep 19, 2013 at 11:00 AM, Murtaza Doctor <murtazadoc...@gmail.com >> > wrote: >> >>> It used to throw a warning in 1.03 and now has become an IOException. I >>> was more trying to figure out why it is exceeding the limit even though the >>> replication factor is 3. Also Hive may use CombineInputSplit or some >>> version of it, are we saying it will always exceed the limit of 10? >>> >>> >>> On Thu, Sep 19, 2013 at 10:05 AM, Edward Capriolo <edlinuxg...@gmail.com >>> > wrote: >>> >>>> We have this job submit property buried in hive that defaults to 10. We >>>> should make that configurable. >>>> >>>> >>>> On Wed, Sep 18, 2013 at 9:34 PM, Harsh J <ha...@cloudera.com> wrote: >>>> >>>>> Do your input files carry a replication factor of 10+? That could be >>>>> one cause behind this. >>>>> >>>>> On Thu, Sep 19, 2013 at 6:20 AM, Murtaza Doctor < >>>>> murtazadoc...@gmail.com> wrote: >>>>> > Folks, >>>>> > >>>>> > Any one run into this issue before: >>>>> > java.io.IOException: Max block location exceeded for split: Paths: >>>>> > "/foo/bar...." >>>>> > .... >>>>> > InputFormatClass: org.apache.hadoop.mapred.TextInputFormat >>>>> > splitsize: 15 maxsize: 10 >>>>> > at >>>>> > >>>>> org.apache.hadoop.mapreduce.split.JobSplitWriter.writeOldSplits(JobSplitWriter.java:162) >>>>> > at >>>>> > >>>>> org.apache.hadoop.mapreduce.split.JobSplitWriter.createSplitFiles(JobSplitWriter.java:87) >>>>> > at >>>>> > >>>>> org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:501) >>>>> > at >>>>> > >>>>> org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:471) >>>>> > at >>>>> > >>>>> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:366) >>>>> > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1269) >>>>> > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1266) >>>>> > at java.security.AccessController.doPrivileged(Native Method) >>>>> > at javax.security.auth.Subject.doAs(Subject.java:415) >>>>> > at >>>>> > >>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408) >>>>> > at org.apache.hadoop.mapreduce.Job.submit(Job.java:1266) >>>>> > at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:606) >>>>> > at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:601) >>>>> > at java.security.AccessController.doPrivileged(Native Method) >>>>> > at javax.security.auth.Subject.doAs(Subject.java:415) >>>>> > at >>>>> > >>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408) >>>>> > at >>>>> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:601) >>>>> > at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:586) >>>>> > at >>>>> org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:447) >>>>> > >>>>> > When we set the property to something higher as suggested like: >>>>> > mapreduce.job.max.split.locations = more than on what it failed >>>>> > then the job runs successfully. >>>>> > >>>>> > I am trying to dig up additional documentation on this since the >>>>> default >>>>> > seems to be 10, not sure how that limit was set. >>>>> > Additionally what is the recommended value and what factors does it >>>>> depend >>>>> > on? >>>>> > >>>>> > We are running YARN, the actual query is Hive on CDH 4.3, with Hive >>>>> version >>>>> > 0.10 >>>>> > >>>>> > Any pointers in this direction will be helpful. >>>>> > >>>>> > Regards, >>>>> > md >>>>> >>>>> >>>>> >>>>> -- >>>>> Harsh J >>>>> >>>> >>>> >>> >> >