Hi Chen The property is applicable for CombineFileInputFormat and not a general one. It is not specified as a property in defaults but it is actually calculated within the framework .
'This property is not present in old MapReduce API(with the execption of CombineFileInputFormat)' You can use mapred.max.split.size to control the split size while using CombineFileInputFormat. Regards Bejoy KS Sent from handheld, please excuse typos. -----Original Message----- From: Cheng Su <scarcer...@gmail.com> Date: Mon, 19 Nov 2012 15:01:03 To: <user@hive.apache.org>; <bejoy...@yahoo.com> Reply-To: user@hive.apache.org Subject: Re: How does hive decide to launch how many map tasks? Hi, Bejoy. I find the mapred.min.split.size in mapred-default.xml, but there is no mapred.max.split.size property. I'm using hadoop 0.20.205.0. Maybe only newer versions support mapred.max.split.size? On Fri, Nov 16, 2012 at 8:08 PM, Cheng Su <scarcer...@gmail.com> wrote: > Thank you so much :) > > On Fri, Nov 16, 2012 at 5:49 PM, Bejoy KS <bejoy...@yahoo.com> wrote: >> Hi Chen >> >> The computation on the number of Input Splits/ map tasks is totally >> determined by the InputFormat used as well as the split size. >> >> Hive used CombineHiveInput format so you may not be having one mapper per >> file if your files are small. You can control the number of maps by >> controlling the split sizes. >> Mapred.min.split.size >> Mapred.max.split.size >> >> Regards >> Bejoy KS >> >> Sent from handheld, please excuse typos. >> >> -----Original Message----- >> From: Cheng Su <scarcer...@gmail.com> >> Date: Fri, 16 Nov 2012 14:39:57 >> To: <user@hive.apache.org> >> Reply-To: user@hive.apache.org >> Subject: How does hive decide to launch how many map tasks? >> >> Hi, all >> >> How does hive decide to launch how many map tasks? >> I know there are some configs to help hive to decide how many reduce >> task to launch? >> But how about map tasks? >> >> I thought that number of map tasks equals to the number of the store files. >> I have a table now with 2 partitions, and one has 4 files in it, the >> other has 2, >> when I execute "select count(*) from table", only one map is launched. >> >> How can I increase the number of map tasks to improve the performance? >> >> Thanks. >> >> -- >> >> Regards, >> Cheng Su > > > > -- > > Regards, > Cheng Su -- Regards, Cheng Su