Limit 4 would make processing of join stop after 4 records. It is not a good idea to add it if you are testing performance of join.
On Tue, Dec 6, 2016 at 8:13 PM mingda li <limingda1...@gmail.com> wrote: > Thanks for your quick reply. If so, I can use the limit operator to compare > > good and bad join plan. It takes time to dump all. > > > > Bests, > > Mingda > > > > > > On Tue, Dec 6, 2016 at 5:23 PM, Zhang, Liyun <liyun.zh...@intel.com> > wrote: > > > > > Hi: > > > I think the query time about multiple join part is not related with > the > > > number of limit operator(in your case the number is 4). When the query is > > > executed, limit_data is executed after Bad_OrderRes, after join > > > (Bad_OrderRes) is finished, limit(limit_data) starts. > > > If I have missed something, please tell me. > > > > > > > > > Best Regards > > > Kelly Zhang/Zhang,Liyun > > > > > > > > > > > > -----Original Message----- > > > From: mingda li [mailto:limingda1...@gmail.com] > > > Sent: Wednesday, December 7, 2016 8:18 AM > > > To: d...@pig.apache.org; user@pig.apache.org > > > Subject: How to test the efficiency of multiple join > > > > > > Dear all, > > > > > > I want to test the different multiple join orders' efficiency. However, > > > since the pig query is executed lazily, I need to use dump or store to > let > > > the query be executed. > > > > > > Now, I use the following query to test the efficiency. > > > > > > *Bad_OrderIn = JOIN inventory BY inv_item_sk, catalog_sales BY > > > cs_item_sk;* > > > *Bad_OrderRes = JOIN Bad_OrderIn BY (cs_item_sk, cs_order_number), > > > catalog_returns BY (cr_item_sk, cr_order_number);* *limit_data = LIMIT > > > Bad_OrderRes 4; * *Dump limit_data;* > > > > > > Do you think this is OK to just show 4 of results? Could this query > > > execution time represent the efficiency of multilpe join? I am not sure > if > > > it will just get 4 items and stop without executing other items. > > > > > > Bests, > > > Mingda > > > > >