Re: What is implemented behind the PIG Joins

byambajav byambajargal Tue, 23 Aug 2011 00:08:16 -0700

Pig 0.8.1.

On Mon, Aug 22, 2011 at 10:58 PM, Thejas Nair <[email protected]>wrote:


> Hi Byambajargal,
> What version of pig does your distribution use ?
> -Thejas
>
>
> On 8/22/11 3:42 AM, byambaa wrote:
>
>> Hello
>> I have a cluster with 11 nodes each of them have 16 GB RAM, 6 core CPU,
>> 1 TB HDD and i am using cloudera distribution CHD4b with Pig. I have two
>> Pig
>> Join queries which are a Parallel and a Replicated version of pig Join
>> and MapReduce Reduce side and Map side joins.
>>
>> Theoretically Replicated Join could be faster than Parallel join but in
>> my case Parallel is faster.
>> i have a questions :
>>
>> 1.I am wondering why the replicated join is so slowly how it works what
>> is the behind the replicated join.
>> 2. MR reduce side join was faster than parallel pig join, what is
>> implemented background the parallel pig join. i guess pig implement also
>> MR reduce side join.
>>
>> Could you explain me about the Pig joins how it works and what is run
>> behind the pig scripts
>>
>>
>> Replicated Join in HDFS Replicated Join in Hbase MR Reduce side join MR
>> Joins (Singleton pattern)
>> obr_wp_annotation 1786MB
>> 29 sec 50 sec 36 sec 19
>> obr_ct_annotation 5916MB
>> 799 sec 523 sec
>> 108 sec 69
>> obr_pm_annotation 16983MB
>> 1794 sec
>> 707 sec 248 sec 138
>>
>> the relation file is 659MB
>>
>> thanks you very much
>>
>> Byambajargal
>>
>>
>>
>

Re: What is implemented behind the PIG Joins

Reply via email to