Re: how is number of mappers determined in mapside join?

2012-03-20 Thread Bruce Bian
Thanks Bejoy! That helps. On Tue, Mar 20, 2012 at 12:10 AM, Bejoy Ks bejoy...@yahoo.com wrote: Hi Bruce From my understanding, that formula is not for CombineFileInputFormat but for other basic Input Formats. I'd just brief you on CombineFileInputFormat to get things more clear.

how is number of mappers determined in mapside join?

2012-03-19 Thread Bruce Bian
Hi there, when I'm executing the following queries in hive set hive.auto.convert.join = true; CREATE TABLE IDAP_ROOT as SELECT a.*,b.acnt_no FROM idap_pi_root a LEFT OUTER JOIN idap_pi_root_acnt b ON a.acnt_id=b.acnt_id the number of mappers to run in the mapside join is 3, how is it determined?

Re: how is number of mappers determined in mapside join?

2012-03-19 Thread Bejoy Ks
Hi Bruce       In map side join the smaller table is loader in memory and hence the number of mappers is dependent only on the data on larger table. Say If CombineHiveInputFormat is used and we have our hdfs block size as 32 mb, min split size as 1B and max split size as 256 mb. Which means one

Re: how is number of mappers determined in mapside join?

2012-03-19 Thread Bruce Bian
Hi Bejoy, Thanks for your reply. The function is from the book, Hadoop The Definitive Guide 2nd edition. On page 203 there is The split size is calculated by the formula (see the computeSplitSize() method in FileInputFormat): max(minimumSize, min(maximumSize, blockSize)) by default:minimumSize