Thanks for your quick reply. If so, I can use the limit operator to compare good and bad join plan. It takes time to dump all.
Bests, Mingda On Tue, Dec 6, 2016 at 5:23 PM, Zhang, Liyun <liyun.zh...@intel.com> wrote: > Hi: > I think the query time about multiple join part is not related with the > number of limit operator(in your case the number is 4). When the query is > executed, limit_data is executed after Bad_OrderRes, after join > (Bad_OrderRes) is finished, limit(limit_data) starts. > If I have missed something, please tell me. > > > Best Regards > Kelly Zhang/Zhang,Liyun > > > > -----Original Message----- > From: mingda li [mailto:limingda1...@gmail.com] > Sent: Wednesday, December 7, 2016 8:18 AM > To: d...@pig.apache.org; user@pig.apache.org > Subject: How to test the efficiency of multiple join > > Dear all, > > I want to test the different multiple join orders' efficiency. However, > since the pig query is executed lazily, I need to use dump or store to let > the query be executed. > > Now, I use the following query to test the efficiency. > > *Bad_OrderIn = JOIN inventory BY inv_item_sk, catalog_sales BY > cs_item_sk;* > *Bad_OrderRes = JOIN Bad_OrderIn BY (cs_item_sk, cs_order_number), > catalog_returns BY (cr_item_sk, cr_order_number);* *limit_data = LIMIT > Bad_OrderRes 4; * *Dump limit_data;* > > Do you think this is OK to just show 4 of results? Could this query > execution time represent the efficiency of multilpe join? I am not sure if > it will just get 4 items and stop without executing other items. > > Bests, > Mingda >