Hi, Sorry for the delay in sending this. Following are the notes from the last developer's meeting.
Arvind ----------- *Attendees* - From Y!: Alan, Santosh, Romain, Daniel, Richard, Ashutosh, Ben, Julian - From Cloudera: Arvind *Agenda* - Error Handling - Brainstorming Ideas For 0.9 - Brainstorming Ideas Beyond 0.9 *Error Handling Suggestions/Proposal Discussion:* - Allow each statement to declare ONERROR clause with a UDF to handle the control in case of error. - This would be better than current behavior of exiting on error. - Alternatively, allow ONERROR to be declared for an entire script/session which would allow individual statements to override and provide a more specialized UDF for error handling. - Yet another alternative - allow the specification of a threshold number of errors that Pig ignores before exiting. - Key idea is to ensure that the error handling is focused on data error handling and not control-flow. - Action Item: Post the key proposal on the Wiki. *Brainstorming Ideas For 0.9:* - Internal development done by March - Release tentatively by May - Support for ILLUSTRATE. - Current status: - Parser rewrite almost complete - Working on load data according to schema - support for padding missing values - No support for Boolean type planned yet. - Big features in 0.9 - Parser change - Macro support - Jython/Script support - Penny (Formally Inspector Gadget): framework to instrument scripts. Allows detection of bad records that cause failures, implement constraints. - Works by integrating with the optimizer to produce wrappers for key UDFs of interest. - Agents can be added in different parts of the query - Prepackaged agents available, but framework allows the creation of custom agents as needed. - Pending work - implementation of unit tests, and turning this into a patch. *Brainstorming Ideas Beyond 0.9:* - Support for different backends for Pig (MR, Piranha, Local, Oozie) - Execution engine that can generate plans specific to the underlying architecture and allow controlling routines to rewrite/re-optimize the plan mid-execution. - Thread safety when running local jobs - to allow better embedding of Pig as a light-weight tool in web-applications and other multi-threaded environments. - Work includes making UDF context thread-safe and removing statics from the implementation. - Will benefit Oozie and other systems that embed Pig without having to worry about side-effects. - Allow execution to resume from where it left off after due to runtime failure. - May be done by allowing Oozie as a backend where the plan is converted into an Oozie workflow. - Alternatively Pig could delegate blocks of execution to Oozie. - Scalability: Pig should support users who may not know the intricate details of the job/architecture. Things such as memory allocation, skew handling etc automatically without user involvement. - Allow pig to kill jobs already submitted if the shell exits due to a Control+C or other failures. - UDF 2.0 - simplify UDF interfaces, along with support for multiple versions of the UDF at the same time. *General* - Loops in Pig: No direct support, but available indirectly by integration with scripting environments. - Would be good to allow Pig to be provisioned across the cluster for faster job startup. - Pig-pen: not under active development and not supported. On Fri, Feb 11, 2011 at 6:30 PM, Santhosh Srinivasan <s...@yahoo-inc.com>wrote: > Arvind from Cloudera took excellent notes. You should see it next week > after Alan gets a chance to review them. > > Santhosh > > -----Original Message----- > From: Dmitriy Ryaboy [mailto:dvrya...@gmail.com] > Sent: Friday, February 11, 2011 5:34 PM > To: dev@pig.apache.org > Subject: Re: REMINDER: Pig developer meeting in February > > Hi folks, > Any chance someone took notes? :) > > D > > On Tue, Feb 8, 2011 at 9:38 PM, Dmitriy Ryaboy <dvrya...@gmail.com> wrote: > > Hi All, > > I got sick and won't be able to make it. Would love to see some notes > > after the meeting :). > > > > D > > > > On Tue, Feb 8, 2011 at 10:29 AM, Olga Natkovich <ol...@yahoo-inc.com> > wrote: > >> Hi Guys, > >> > >> We are looking forward to see you tomorrow at 4 pm at Yahoo campus in > Sunnyvale. > >> > >> Yahoo address is > >> > >> 701 First Ave. > >> Sunnyvale, CA 94089 > >> > >> We are in building E. Please, ask for Alan or me at the reception. > >> > >> Olga > >> > >> -----Original Message----- > >> From: Olga Natkovich [mailto:ol...@yahoo-inc.com] > >> Sent: Thursday, February 03, 2011 10:42 AM > >> To: dev@pig.apache.org > >> Subject: REMINDER: Pig developer meeting in February > >> > >> Hi guys, > >> > >> This is just a reminder that the meeting will be held next Wednesday, > 2/9 4-6 pm at Yahoo campus. > >> > >> If you have not yet responded but planning to attend, please, let me > know. > >> > >> Olga > >> > >> -----Original Message----- > >> From: Santhosh Srinivasan [mailto:s...@yahoo-inc.com] > >> Sent: Friday, January 28, 2011 3:36 PM > >> To: dev@pig.apache.org > >> Subject: RE: Pig developer meeting in February > >> > >> I am planning to attend. > >> > >> -----Original Message----- > >> From: Olga Natkovich [mailto:ol...@yahoo-inc.com] > >> Sent: Friday, January 28, 2011 12:58 PM > >> To: dev@pig.apache.org > >> Subject: RE: Pig developer meeting in February > >> > >> I believe we have critical mass so the meeting is on! > >> > >> If you have not responded yet but planning to attend, please, let me > know. > >> > >> Thanks, > >> > >> Olga > >> > >> -----Original Message----- > >> From: Julien Le Dem [mailto:led...@yahoo-inc.com] > >> Sent: Thursday, January 27, 2011 5:21 PM > >> To: dev@pig.apache.org > >> Subject: Re: Pig developer meeting in February > >> > >> Me too. > >> Julien > >> > >> > >> On 1/27/11 4:09 PM, "Dmitriy Ryaboy" <dvrya...@gmail.com> wrote: > >> > >> Ok yeah I'll come :). > >> > >> > >> > >> On Thu, Jan 27, 2011 at 3:17 PM, Olga Natkovich <ol...@yahoo-inc.com> > wrote: > >> > >>> While there is a lively discussion on this thread, I have not > >>> actually gotten any responses to having the meeting with exception of 1 > person :). > >>> > >>> Please, let me know by the end of the week if you are planning to > attend. > >>> If we don't get at least a few more responses I suggest we postpone > >>> the meeting. > >>> > >>> Thanks, > >>> > >>> Olga > >>> > >>> -----Original Message----- > >>> From: Dmitriy Ryaboy [mailto:dvrya...@gmail.com] > >>> Sent: Wednesday, January 26, 2011 6:04 PM > >>> To: dev@pig.apache.org > >>> Subject: Re: Pig developer meeting in February > >>> > >>> Right, we do partition filtering, but not true predicate pushdown. > >>> > >>> On Wed, Jan 26, 2011 at 5:59 PM, Daniel Dai <jiany...@yahoo-inc.com> > >>> wrote: > >>> > >>> > Are you talking about LoadMetadata.setPartitionFilter? > >>> > PartitionFilterOptimizer will do that. > >>> > > >>> > Daniel > >>> > > >>> > > >>> > Dmitriy Ryaboy wrote: > >>> > > >>> >> I may be wrong but I think predicate pushdown is designed for, > >>> >> but not actually implemented in the current LoadPushdown > >>> >> interface (you can only push projections). If I am wrong, that's > >>> >> great.. but if not, that would > >>> be > >>> >> an important feature to add, as people are trying to connect Pig > >>> >> to "smart" > >>> >> storage systems like rdbmses, HBase, and Cassandra more and more. > >>> >> I > >>> think > >>> >> we only kind of simulate this with partition keys info, which is > >>> >> not always sufficient > >>> >> > >>> >> D > >>> >> > >>> >> On Wed, Jan 26, 2011 at 2:41 PM, Julien Le Dem > >>> >> <led...@yahoo-inc.com> > >>> >> wrote: > >>> >> > >>> >> > >>> >> > >>> >>> If making Pig Thread safe (i.e.: two threads running a different > >>> >>> pig > >>> >>> script) is important then we need to change some of the APIs > >>> >>> from > >>> static > >>> >>> singleton access to a dependency injection pattern. > >>> >>> In that case, this should probably be done before 1.0 For example: > >>> >>> UDFContext should be passed to the UDF after construction > >>> >>> (similar to the SevrletContext in Servlet or the way Hadoop > >>> >>> passes the context to tasks) Also a clearly separated API that > >>> >>> does not depend on the Pig implementation would help. > >>> >>> For example UDFContext is in org.apache.pig.impl.util when it > >>> >>> would be better in org.apache.pig.api (Or at least an interface > >>> >>> defining it) > >>> >>> > >>> >>> Julien > >>> >>> > >>> >>> On 1/24/11 10:14 AM, "Olga Natkovich" <ol...@yahoo-inc.com> wrote: > >>> >>> > >>> >>> Hi Guys, > >>> >>> > >>> >>> I think it is time for us to have another meeting. Yahoo would > >>> >>> be happy to host if this works for everybody. How about > >>> >>> Wednesday, > >>> >>> 2/9 4-6 pm. > >>> >>> Please, > >>> >>> let us know if you are planning to attend and if the date/time > >>> >>> works > >>> for > >>> >>> you. > >>> >>> > >>> >>> Things that come to mind to discuss and as always feel free to > >>> >>> suggest > >>> >>> others: > >>> >>> > >>> >>> - Error handling proposal - this might be easier to > >>> >>> finalize face-to-face > >>> >>> - Pig 0.9 plan > >>> >>> - Pig Roadmap beyond 0.9 o What do we want to do > >>> >>> in Pig.next? > >>> >>> o Are we ready for Pig 1.0 > >>> >>> > >>> >>> Olga > >>> >>> > >>> >>> > >>> >>> > >>> >>> > >>> >> > >>> > > >>> > >> > >> > > >