Re: Issue: Max block location exceeded for split error when running hive

Rahul Jain Thu, 19 Sep 2013 14:39:01 -0700

I am assuming you have looked at this already:

https://issues.apache.org/jira/browse/MAPREDUCE-5186


You do have a workaround here to increase *mapreduce.job.max.split.locations
*value in hive configuration, or do we need more than that here ?

-Rahul


On Thu, Sep 19, 2013 at 11:00 AM, Murtaza Doctor <murtazadoc...@gmail.com>wrote:

> It used to throw a warning in 1.03 and now has become an IOException. I
> was more trying to figure out why it is exceeding the limit even though the
> replication factor is 3. Also Hive may use CombineInputSplit or some
> version of it, are we saying it will always exceed the limit of 10?
>
>
> On Thu, Sep 19, 2013 at 10:05 AM, Edward Capriolo 
> <edlinuxg...@gmail.com>wrote:
>
>> We have this job submit property buried in hive that defaults to 10. We
>> should make that configurable.
>>
>>
>> On Wed, Sep 18, 2013 at 9:34 PM, Harsh J <ha...@cloudera.com> wrote:
>>
>>> Do your input files carry a replication factor of 10+? That could be
>>> one cause behind this.
>>>
>>> On Thu, Sep 19, 2013 at 6:20 AM, Murtaza Doctor <murtazadoc...@gmail.com>
>>> wrote:
>>> > Folks,
>>> >
>>> > Any one run into this issue before:
>>> > java.io.IOException: Max block location exceeded for split: Paths:
>>> > "/foo/bar...."
>>> > ....
>>> > InputFormatClass: org.apache.hadoop.mapred.TextInputFormat
>>> > splitsize: 15 maxsize: 10
>>> > at
>>> >
>>> org.apache.hadoop.mapreduce.split.JobSplitWriter.writeOldSplits(JobSplitWriter.java:162)
>>> > at
>>> >
>>> org.apache.hadoop.mapreduce.split.JobSplitWriter.createSplitFiles(JobSplitWriter.java:87)
>>> > at
>>> >
>>> org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:501)
>>> > at
>>> >
>>> org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:471)
>>> > at
>>> >
>>> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:366)
>>> > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1269)
>>> > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1266)
>>> > at java.security.AccessController.doPrivileged(Native Method)
>>> > at javax.security.auth.Subject.doAs(Subject.java:415)
>>> > at
>>> >
>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
>>> > at org.apache.hadoop.mapreduce.Job.submit(Job.java:1266)
>>> > at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:606)
>>> > at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:601)
>>> > at java.security.AccessController.doPrivileged(Native Method)
>>> > at javax.security.auth.Subject.doAs(Subject.java:415)
>>> > at
>>> >
>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
>>> > at
>>> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:601)
>>> > at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:586)
>>> > at
>>> org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:447)
>>> >
>>> > When we set the property to something higher as suggested like:
>>> > mapreduce.job.max.split.locations = more than on what it failed
>>> > then the job runs successfully.
>>> >
>>> > I am trying to dig up additional documentation on this since the
>>> default
>>> > seems to be 10, not sure how that limit was set.
>>> > Additionally what is the recommended value and what factors does it
>>> depend
>>> > on?
>>> >
>>> > We are running YARN, the actual query is Hive on CDH 4.3, with Hive
>>> version
>>> > 0.10
>>> >
>>> > Any pointers in this direction will be helpful.
>>> >
>>> > Regards,
>>> > md
>>>
>>>
>>>
>>> --
>>> Harsh J
>>>
>>
>>
>

Re: Issue: Max block location exceeded for split error when running hive

Reply via email to