Re: How does Hive determine the number of mapred tasks?

2010-02-19 Thread Edward Capriolo
The maximum number of tasks running at once per node is dictated by mapred.tasktracker.map.tasks.maximum 6 mapred.tasktracker.reduce.tasks.maximum 4 I do not work with ec2 so I do not know if how to adjust it. Hive prints a message like this during the query. Number of reduce tasks not

How does Hive determine the number of mapred tasks?

2010-02-19 Thread Saurabh Nanda
Hi, Is there any page/document that describes the methods/techniques used by Hive to arrive at the optimum number of map tasks & optimum number of reduce tasks? I'm running a 3-node Amazon EMR cluster, and Hive has determined that 34 map & 2 reduce tasks are optimum. Out of the 34 map tasks only