Re: Map side join

Souvik Banerjee Thu, 13 Dec 2012 10:00:45 -0800

Hi Bejoy,

The input files are non-compressed text file.
There are enough free slots in the cluster.


Can you please let me know can I increase the no of mappers?
I tried reducing the HDFS block size to 32 MB from 128 MB. I was expecting
to get more mappers. But still it's launching same no of mappers like it
was doing while the HDFS block size was 128 MB. I have enough map slots
available, but not being able to utilize those.


Thanks and regards,
Souvik.


On Thu, Dec 13, 2012 at 11:12 AM, <[email protected]> wrote:

> **
> Hi Souvik
>
> Is your input files compressed using some non splittable compression codec?
>
> Do you have enough free slots while this job is running?
>
> Make sure that the job is not running locally.
>
> Regards
> Bejoy KS
>
> Sent from remote device, Please excuse typos
> ------------------------------
> *From: * Souvik Banerjee <[email protected]>
> *Date: *Wed, 12 Dec 2012 14:27:27 -0600
> *To: *<[email protected]>; <[email protected]>
> *ReplyTo: * [email protected]
> *Subject: *Re: Map side join
>
> Hi Bejoy,
>
> Yes I ran the pi example. It was fine.
> Regarding the HIVE Job what I found is that it took 4 hrs for the first
> map job to get completed.
> Those map tasks were doing their job and only reported status after
> completion. It is indeed taking too long time to finish. Nothing I could
> find relevant in the logs.
>
> Thanks and regards,
> Souvik.
>
> On Wed, Dec 12, 2012 at 8:04 AM, <[email protected]> wrote:
>
>> **
>> Hi Souvik
>>
>> Apart from hive jobs is the normal mapreduce jobs like the wordcount
>> running fine on your cluster?
>>
>> If it is working, for the hive jobs are you seeing anything skeptical in
>> task, Tasktracker or jobtracker logs?
>>
>>
>> Regards
>> Bejoy KS
>>
>> Sent from remote device, Please excuse typos
>> ------------------------------
>> *From: * Souvik Banerjee <[email protected]>
>> *Date: *Tue, 11 Dec 2012 17:12:20 -0600
>> *To: *<[email protected]>; <[email protected]>
>> *ReplyTo: * [email protected]
>> *Subject: *Re: Map side join
>>
>> Hello Everybody,
>>
>> Need help in for on HIVE join. As we were talking about the Map side join
>> I tried that.
>> I set the flag set hive.auto.convert.join=true;
>>
>> I saw Hive converts the same to map join while launching the job. But the
>> problem is that none of the map job progresses in my case. I made the
>> dataset smaller. Now it's only 512 MB cross 25 MB. I was expecting it to be
>> done very quickly.
>> No luck with any change of settings.
>> Failing to progress with the default setting changes these settings.
>> set hive.mapred.local.mem=1024; // Initially it was 216 I guess
>> set hive.join.cache.size=100000; // Initialliu it was 25000
>>
>> Also on Hadoop side I made this changes
>>
>> mapred.child.java.opts -Xmx1073741824
>>
>> But I don't see any progress. After more than 40 minutes of run I am at
>> 0% map completion state.
>> Can you please throw some light on this?
>>
>> Thanks a lot once again.
>>
>> Regards,
>> Souvik.
>>
>>
>>
>> On Fri, Dec 7, 2012 at 2:32 PM, Souvik Banerjee <[email protected]
>> > wrote:
>>
>>> Hi Bejoy,
>>>
>>> That's wonderful. Thanks for your reply.
>>> What I was wondering if HIVE can do map side join with more than one
>>> condition on JOIN clause.
>>> I'll simply try it out and post the result.
>>>
>>> Thanks once again.
>>>
>>> Regards,
>>> Souvik.
>>>
>>>  On Fri, Dec 7, 2012 at 2:10 PM, <[email protected]> wrote:
>>>
>>>> **
>>>> Hi Souvik
>>>>
>>>> In earlier versions of hive you had to give the map join hint. But in
>>>> later versions just set hive.auto.convert.join = true;
>>>> Hive automatically selects the smaller table. It is better to give the
>>>> smaller table as the first one in join.
>>>>
>>>> You can use a map join if you are joining a small table with a large
>>>> one, in terms of data size. By small, better to have the smaller table size
>>>> in range of MBs.
>>>> Regards
>>>> Bejoy KS
>>>>
>>>> Sent from remote device, Please excuse typos
>>>> ------------------------------
>>>> *From: *Souvik Banerjee <[email protected]>
>>>> *Date: *Fri, 7 Dec 2012 13:58:25 -0600
>>>> *To: *<[email protected]>
>>>> *ReplyTo: *[email protected]
>>>> *Subject: *Map side join
>>>>
>>>> Hello everybody,
>>>>
>>>> I have got a question. I didn't came across any post which says
>>>> somethign about this.
>>>> I have got two tables. Lets say A and B.
>>>> I want to join A & B in HIVE. I am currently using HIVE 0.9 version.
>>>> The join would be on few columns. like on (A.id1 = B.id1) AND (A.id2 =
>>>> B.id2) AND (A.id3 = B.id3)
>>>>
>>>> Can I ask HIVE to use map side join in this scenario? Should I give a
>>>> hint to HIVE by saying /*+mapjoin(B)*/
>>>>
>>>> Get back to me if you want any more information in this regard.
>>>>
>>>> Thanks and regards,
>>>> Souvik.
>>>>
>>>
>>>
>>
>

Re: Map side join

Reply via email to