Re: Issue: Max block location exceeded for split error when running hive

Rahul Jain Thu, 19 Sep 2013 16:49:13 -0700

Matt,

It would be better for you to do an global config update: set
*mapreduce.job.max.split.locations
*to at least the number of datanodes in your cluster, either in
hive-site.xml or mapred-site.xml. Either case, this is a sensible
configuration update if you are going to use CombineFileInputFormat to read
input data in hive.


-Rahul


On Thu, Sep 19, 2013 at 3:31 PM, Matt Davies <m...@mattdavies.net> wrote:

> What are the ramifications of setting a hard coded value in our scripts
> and then changing parameters which influence the input data size. I.e. I
> want to run across 1 day worth of data, then a different day I want to run
> against 30 days?
>
>
>
>
> On Thu, Sep 19, 2013 at 3:11 PM, Rahul Jain <rja...@gmail.com> wrote:
>
>> I am assuming you have looked at this already:
>>
>> https://issues.apache.org/jira/browse/MAPREDUCE-5186
>>
>> You do have a workaround here to increase *mapreduce.job.max.split.locations
>> *value in hive configuration, or do we need more than that here ?
>>
>> -Rahul
>>
>>
>> On Thu, Sep 19, 2013 at 11:00 AM, Murtaza Doctor <murtazadoc...@gmail.com
>> > wrote:
>>
>>> It used to throw a warning in 1.03 and now has become an IOException. I
>>> was more trying to figure out why it is exceeding the limit even though the
>>> replication factor is 3. Also Hive may use CombineInputSplit or some
>>> version of it, are we saying it will always exceed the limit of 10?
>>>
>>>
>>> On Thu, Sep 19, 2013 at 10:05 AM, Edward Capriolo <edlinuxg...@gmail.com
>>> > wrote:
>>>
>>>> We have this job submit property buried in hive that defaults to 10. We
>>>> should make that configurable.
>>>>
>>>>
>>>> On Wed, Sep 18, 2013 at 9:34 PM, Harsh J <ha...@cloudera.com> wrote:
>>>>
>>>>> Do your input files carry a replication factor of 10+? That could be
>>>>> one cause behind this.
>>>>>
>>>>> On Thu, Sep 19, 2013 at 6:20 AM, Murtaza Doctor <
>>>>> murtazadoc...@gmail.com> wrote:
>>>>> > Folks,
>>>>> >
>>>>> > Any one run into this issue before:
>>>>> > java.io.IOException: Max block location exceeded for split: Paths:
>>>>> > "/foo/bar...."
>>>>> > ....
>>>>> > InputFormatClass: org.apache.hadoop.mapred.TextInputFormat
>>>>> > splitsize: 15 maxsize: 10
>>>>> > at
>>>>> >
>>>>> org.apache.hadoop.mapreduce.split.JobSplitWriter.writeOldSplits(JobSplitWriter.java:162)
>>>>> > at
>>>>> >
>>>>> org.apache.hadoop.mapreduce.split.JobSplitWriter.createSplitFiles(JobSplitWriter.java:87)
>>>>> > at
>>>>> >
>>>>> org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:501)
>>>>> > at
>>>>> >
>>>>> org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:471)
>>>>> > at
>>>>> >
>>>>> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:366)
>>>>> > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1269)
>>>>> > at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1266)
>>>>> > at java.security.AccessController.doPrivileged(Native Method)
>>>>> > at javax.security.auth.Subject.doAs(Subject.java:415)
>>>>> > at
>>>>> >
>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
>>>>> > at org.apache.hadoop.mapreduce.Job.submit(Job.java:1266)
>>>>> > at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:606)
>>>>> > at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:601)
>>>>> > at java.security.AccessController.doPrivileged(Native Method)
>>>>> > at javax.security.auth.Subject.doAs(Subject.java:415)
>>>>> > at
>>>>> >
>>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
>>>>> > at
>>>>> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:601)
>>>>> > at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:586)
>>>>> > at
>>>>> org.apache.hadoop.hive.ql.exec.ExecDriver.execute(ExecDriver.java:447)
>>>>> >
>>>>> > When we set the property to something higher as suggested like:
>>>>> > mapreduce.job.max.split.locations = more than on what it failed
>>>>> > then the job runs successfully.
>>>>> >
>>>>> > I am trying to dig up additional documentation on this since the
>>>>> default
>>>>> > seems to be 10, not sure how that limit was set.
>>>>> > Additionally what is the recommended value and what factors does it
>>>>> depend
>>>>> > on?
>>>>> >
>>>>> > We are running YARN, the actual query is Hive on CDH 4.3, with Hive
>>>>> version
>>>>> > 0.10
>>>>> >
>>>>> > Any pointers in this direction will be helpful.
>>>>> >
>>>>> > Regards,
>>>>> > md
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Harsh J
>>>>>
>>>>
>>>>
>>>
>>
>

Re: Issue: Max block location exceeded for split error when running hive

Reply via email to