Is that a PMC position? I also do AV and can bounce #credentials :D Russell Jurney twitter.com/rjurney [email protected] datasyndrome.com
On Jun 15, 2012, at 3:35 PM, Jonathan Coveney <[email protected]> wrote: > +1 > > 2012/6/15 Alan Gates <[email protected]> > >> Thanks Russell. I move we make you the official Apache Pig secretary. :) >> >> Alan. >> >> On Jun 12, 2012, at 9:45 PM, Russell Jurney wrote: >> >>> Tuesday, Pig Meetup >>> >>> Alan Gates - upcoming improvements in operators/backend physical plan. >>> Desphagetification. >>> Reworking UDF interface, keep backward compatibility. >>> Hadoop 2 coming, will be slow adoption. >>> >>> Bill Graham, Julien & Twitter - Optimization oriented. Cluster is at >>> capacity. Detect skew, cost based optimizers, dynamic tuning. Gathering >>> performance metrics, will be in HCatalog. Look at previous executions of >>> same job to optimize on the fly. >>> >>> Companies: Yahoo, consultants, salesforce, twitter, hortonworks, >> cloudera, >>> zocalo systems?, trend micro >>> >>> Bill presented Ambrose. Motivation: 40MR job pig scripts, added DAG view. >>> Shows you progress of your script as percentage and stepwise view. Helps >>> with debug, optimization. Major progress. >>> >>> Pig users talk - using pig in local mode on sample, then pushing to >>> cluster. Using illustrate to cut developer iterations. No counters in >> local >>> mode. Embedded pig in loops for ML. Java embedding. >>> Java API PigServer to run scripts from apps. Macros are helping remove >> ugly >>> blocks of code, but UDFs are more solved by JRuby. Mortar data fixed >> Python >>> UDFs. >>> >>> Reducing friction around using Pig with tools is important. Slowness of >>> batch is hard for new users. Sample is hard to prepare that will do >> joins. >>> Illustrate was invented for this purpose. >>> >>> Scheduling pig jobs is still a problem. Oozie is unpopular and too hard. >>> Azkaban is inadequate for the enterprise. People hack things together. It >>> sucks. >>> >>> HCatalog is maturing. Rest API. Hive and Pig together. Rest interface is >>> for metadata so far. People are wanting to extend it to grab UDFs, etc. >>> >>> Russell Jurney http://datasyndrome.com >> >>
