Thanks Bejoy! That helps.
On Tue, Mar 20, 2012 at 12:10 AM, Bejoy Ks bejoy...@yahoo.com wrote:
Hi Bruce
From my understanding, that formula is not for
CombineFileInputFormat but for other basic Input Formats.
I'd just brief you on CombineFileInputFormat to get things more clear.
Hi there,
when I'm executing the following queries in hive
set hive.auto.convert.join = true;
CREATE TABLE IDAP_ROOT as
SELECT a.*,b.acnt_no
FROM idap_pi_root a LEFT OUTER JOIN idap_pi_root_acnt b ON
a.acnt_id=b.acnt_id
the number of mappers to run in the mapside join is 3, how is it
determined?
Hi Bruce
In map side join the smaller table is loader in memory and hence the
number of mappers is dependent only on the data on larger table. Say If
CombineHiveInputFormat is used and we have our hdfs block size as 32 mb, min
split size as 1B and max split size as 256 mb. Which means one
Hi Bejoy,
Thanks for your reply.
The function is from the book, Hadoop The Definitive Guide 2nd edition. On
page 203 there is
The split size is calculated by the formula (see the computeSplitSize()
method in FileInputFormat): max(minimumSize, min(maximumSize, blockSize))
by default:minimumSize