Thanks for your quick reply. If so, I can use the limit operator to compare
good and bad join plan. It takes time to dump all.

Bests,
Mingda


On Tue, Dec 6, 2016 at 5:23 PM, Zhang, Liyun <liyun.zh...@intel.com> wrote:

> Hi:
>    I think the query time about multiple join part is not related with the
> number of limit operator(in your case the number is 4). When the query is
> executed, limit_data is executed after Bad_OrderRes, after join
> (Bad_OrderRes) is finished, limit(limit_data) starts.
> If I have missed something, please tell me.
>
>
> Best Regards
> Kelly Zhang/Zhang,Liyun
>
>
>
> -----Original Message-----
> From: mingda li [mailto:limingda1...@gmail.com]
> Sent: Wednesday, December 7, 2016 8:18 AM
> To: d...@pig.apache.org; user@pig.apache.org
> Subject: How to test the efficiency of multiple join
>
> Dear all,
>
> I want to test the different multiple join orders' efficiency. However,
> since the pig query is executed lazily, I need to use dump or store to let
> the query be executed.
>
> Now, I use the following query to test the efficiency.
>
> *Bad_OrderIn = JOIN inventory BY  inv_item_sk, catalog_sales BY
> cs_item_sk;*
> *Bad_OrderRes = JOIN Bad_OrderIn  BY   (cs_item_sk, cs_order_number),
> catalog_returns BY (cr_item_sk, cr_order_number);* *limit_data = LIMIT
> Bad_OrderRes 4; * *Dump limit_data;*
>
> Do you think this is OK to just show 4 of results? Could this query
> execution time represent the efficiency of multilpe join? I am not sure if
> it will just get 4 items and stop without executing other items.
>
> Bests,
> Mingda
>

Reply via email to