Thanks Russell. I move we make you the official Apache Pig secretary. :) Alan.
On Jun 12, 2012, at 9:45 PM, Russell Jurney wrote: > Tuesday, Pig Meetup > > Alan Gates - upcoming improvements in operators/backend physical plan. > Desphagetification. > Reworking UDF interface, keep backward compatibility. > Hadoop 2 coming, will be slow adoption. > > Bill Graham, Julien & Twitter - Optimization oriented. Cluster is at > capacity. Detect skew, cost based optimizers, dynamic tuning. Gathering > performance metrics, will be in HCatalog. Look at previous executions of > same job to optimize on the fly. > > Companies: Yahoo, consultants, salesforce, twitter, hortonworks, cloudera, > zocalo systems?, trend micro > > Bill presented Ambrose. Motivation: 40MR job pig scripts, added DAG view. > Shows you progress of your script as percentage and stepwise view. Helps > with debug, optimization. Major progress. > > Pig users talk - using pig in local mode on sample, then pushing to > cluster. Using illustrate to cut developer iterations. No counters in local > mode. Embedded pig in loops for ML. Java embedding. > Java API PigServer to run scripts from apps. Macros are helping remove ugly > blocks of code, but UDFs are more solved by JRuby. Mortar data fixed Python > UDFs. > > Reducing friction around using Pig with tools is important. Slowness of > batch is hard for new users. Sample is hard to prepare that will do joins. > Illustrate was invented for this purpose. > > Scheduling pig jobs is still a problem. Oozie is unpopular and too hard. > Azkaban is inadequate for the enterprise. People hack things together. It > sucks. > > HCatalog is maturing. Rest API. Hive and Pig together. Rest interface is > for metadata so far. People are wanting to extend it to grab UDFs, etc. > > Russell Jurney http://datasyndrome.com
