Re: Request for feedback: cost-based optimizer

2009-09-11 Thread Dmitriy Ryaboy
Hi Alan, Thanks for the detailed review. After getting Daniel's feedback (and grokking the relationship between Pig's logical and physical operators, which is a little different than that described in the literature), we agree that the proper place to put the optimizer is at the logical layer, alt

Re: Request for feedback: cost-based optimizer

2009-09-11 Thread Alan Gates
This is a good start at adding a cost based optimizer to Pig. I have a number of comments: 1) Your argument for putting it in the physical layer rather than the logical is that the logical layer does not know physical statistics. This need not be true. You suggest adding a getStatistics

Re: Request for feedback: cost-based optimizer

2009-09-03 Thread Dmitriy Ryaboy
Daniel, thanks for the information, this is useful. On Wed, Sep 2, 2009 at 2:06 PM, Jianyong Dai wrote: > Yes, physical properties is important for an optimizer. To optimize Pig > well, we need to know the underlying hadoop execution environment, such as # > of map-reduce jobs, how many maps/redu

Re: Request for feedback: cost-based optimizer

2009-09-02 Thread Jianyong Dai
Yes, physical properties is important for an optimizer. To optimize Pig well, we need to know the underlying hadoop execution environment, such as # of map-reduce jobs, how many maps/reducers, how the job is configured, etc. This is true even for a rule based optimizer. Unfortunately, physical

Re: Request for feedback: cost-based optimizer

2009-09-01 Thread Dmitriy Ryaboy
Our initial survey of related literature showed that the usual place for a CBO tends to be between the physical and logical layer (in fact, the famous Cascades paper advocates removing the distinction between physical and logical operators altogether, and using an "is_logical" and "is_physical" fla

Re: Request for feedback: cost-based optimizer

2009-09-01 Thread Jianyong Dai
I am still reading but one interesting question is why you decide to put CBO in physical layer? Dmitriy Ryaboy wrote: Whoops :-) Here's the Google doc: http://docs.google.com/Doc?docid=0Adqb7pZsloe6ZGM4Z3o1OG1fMjFrZjViZ21jdA&hl=en -Dmitriy On Tue, Sep 1, 2009 at 12:51 PM, Santhosh Srinivasan

Re: Request for feedback: cost-based optimizer

2009-09-01 Thread Dmitriy Ryaboy
Whoops :-) Here's the Google doc: http://docs.google.com/Doc?docid=0Adqb7pZsloe6ZGM4Z3o1OG1fMjFrZjViZ21jdA&hl=en -Dmitriy On Tue, Sep 1, 2009 at 12:51 PM, Santhosh Srinivasan wrote: > Dmitriy and Gang, > > The mailing list does not allow attachments. Can you post it on a > website and just send t

RE: Request for feedback: cost-based optimizer

2009-09-01 Thread Santhosh Srinivasan
Dmitriy and Gang, The mailing list does not allow attachments. Can you post it on a website and just send the URL ? Thanks, Santhosh -Original Message- From: Dmitriy Ryaboy [mailto:dvrya...@gmail.com] Sent: Tuesday, September 01, 2009 9:48 AM To: pig-dev@hadoop.apache.org Subject: Requ