I use the following

To specify the Mapper Input Split Size (134217728 is in bytes)
==============================================================
SET mapreduce.input.fileinputformat.split.maxsize=134217728;

From: Frank Luo <[email protected]<mailto:[email protected]>>
Reply-To: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Date: Wednesday, April 24, 2013 7:02 AM
To: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Subject: how to limit mappers for a hive job

I am trying to query a huge file with 370 blocks, but it errors out with 
message of “number of mappers exceeds limit” and my cluster has a 
“mapred.tasktracker.map.tasks.maximum” set to 50.

I have tried to set parameters such as  hive.exec.mappers.max/ 
mapred.tasktracker.tasks/ apred.tasktracker.map.tasks.maximum through beeswax 
and seems none of them is effective.

I can change “mapred.tasktracker.map.tasks.maximum” and the query can go 
through, but I really want to limit concurrent number of tasks per job.

So any suggestions please? I am running cloudera 4.5.

CONFIDENTIALITY NOTICE
======================
This email message and any attachments are for the exclusive use of the 
intended recipient(s) and may contain confidential and privileged information. 
Any unauthorized review, use, disclosure or distribution is prohibited. If you 
are not the intended recipient, please contact the sender by reply email and 
destroy all copies of the original message along with any attachments, from 
your computer system. If you are the intended recipient, please be advised that 
the content of this message is subject to access, review and disclosure by the 
sender's Email System Administrator.

Reply via email to