Also make sure hive is using CombinedHiveInputFormat (not just HiveInputFormat). Combined is the default for newer versions.
On Wed, Apr 24, 2013 at 10:51 AM, Sanjay Subramanian < [email protected]> wrote: > I use the following > > To specify the Mapper Input Split Size (134217728 is in bytes) > ============================================================== > SET mapreduce.input.fileinputformat.split.maxsize=134217728; > > From: Frank Luo <[email protected]> > Reply-To: "[email protected]" <[email protected]> > Date: Wednesday, April 24, 2013 7:02 AM > To: "[email protected]" <[email protected]> > Subject: how to limit mappers for a hive job > > I am trying to query a huge file with 370 blocks, but it errors out > with message of “number of mappers exceeds limit” and my cluster has a > “mapred.tasktracker.map.tasks.maximum” > set to 50. > > > > I have tried to set parameters such as > hive.exec.mappers.max/mapred.tasktracker.tasks/ > apred.tasktracker.map.tasks.maximum > through beeswax and seems none of them is effective. > > > > I can change “mapred.tasktracker.map.tasks.maximum” and the query can go > through, but I really want to limit concurrent number of tasks per job. > > > > So any suggestions please? I am running cloudera 4.5. > > CONFIDENTIALITY NOTICE > ====================== > This email message and any attachments are for the exclusive use of the > intended recipient(s) and may contain confidential and privileged > information. Any unauthorized review, use, disclosure or distribution is > prohibited. If you are not the intended recipient, please contact the > sender by reply email and destroy all copies of the original message along > with any attachments, from your computer system. If you are the intended > recipient, please be advised that the content of this message is subject to > access, review and disclosure by the sender's Email System Administrator. >
