Notes of interest from Apache Pig Hackday, Austin edition

Jeremy Hanna Sat, 12 May 2012 11:23:53 -0700

Thanks again to Twitter for doing their event and inspiring ours.  I just 
wanted to report on some things we did in Austin for any interested.  We had a 
good turnout of about 30 people.


Kevin Safford presented an introduction to Pig, or Pig 101.  The slides are 
available here: 
http://www.slideshare.net/ktsafford/dachis-group-pigout101-12895911

Timothy Potter down from Colorado gave a presentation on intermediate Pig, or 
Pig 202.  His slides are available here: 
http://www.slideshare.net/thelabdude/dachis-group-pig-hackday-pig-202

Clint Miller gave an introduction to unit testing with Pig with these slides: 
http://www.slideshare.net/clintmiller1/unit-testing-pig

After that we had some lunch and linked up remotely for a bit to the Twitter 
hackday in the Bay Area.  Their group is mostly Pig committers and contributors 
so they worked on Pig tickets.  One thing that Twitter opensourced as part of 
the event was a workflow visualization tool called Ambrose, 
https://github.com/twitter/ambrose

Also mentioned was Alan Gates excellent reference Programming Pig, the web 
version found here: http://ofps.oreilly.com/titles/9781449302641/index.html
We started the afternoon with a list of things we could work on:

        • Pig mahout integration (pigout) led by Timothy Potter
        • Pig Unit improvments led by Clint Miller
        • David Boney wanted to get his KDD data preparation going with Pig for 
a competition
        • Kevin wanted to help people get the presentation examples running
        • Brandon Kearby led a group on helping get the IntelliJ IDEA Pig 
plugin working.
        • Josh Levy wanted to see about getting grunt to recognize parameters 
passed in.
        • Josh also wanted to look more at the python udf scripting and see if 
it could be improved.
        • John Prior wanted see if there could be a grunt pretty print when 
using describe
        • John also wanted to see if bash command history facilities could be 
added to grunt
        • John also brought up that knime is a really cool visual workflow 
creator for machine learning that could also could be developed for Pig.
        • The CassandraStorage loadstorefunc was also brought up as something 
Brandon Williams might work on, specifically the way to have it automatically 
use secondary indexes.

What actually happened?

Tim is going to continue working on the pig-vector integration into Mahout 
pending some feedback from Tim and the mahout folks.

Clint worked on getting Pig 0.10 branch downloaded and built locally in order 
to have something to patch against for the pig unit improvements outlined on 
this ticket:  https://issues.apache.org/jira/browse/PIG-2692

David Boney got his data loaded up in CFS, the Cassandra file system and made 
some progress there.

Several people talked about Pig generally getting things running on their own 
laptops and environments.

Brandon Kearby and others forked 
https://github.com/brandonkearby/three-little-piggies and the jar in that 
project can now be added to your IntelliJ IDEA plugins directory to associate 
.pig files and provide source coloring.  There's still some work to do there, 
but it's nice to have that working and available for IntelliJ 11 users.

Josh Levy got some ideas together with a couple of other attendees on how to 
improve the Pig/Python UDF scripting.  Josh and Jeremy contacted Julien from 
Twitter who had written the python udf support and he is reviewing Josh's 
proposed changes with the possibility of creating a ticket for it.

Grunt pretty print?  Coincidentally, someone in the Bay Area had the same 
thought and independent of our efforts created a ticket along with submitted a 
patch to do just that: https://issues.apache.org/jira/browse/PIG-2697

Brandon Williams is working on the CassandraStorage ticket - 
https://issues.apache.org/jira/browse/CASSANDRA-4238

Besides that there was great interaction among everyone until people went their 
own ways around 4 PM.  Thanks for Twitter for doing their hackathon.  We didn't 
interact too much with them because their group was more advanced and we didn't 
want to slow them down.  Several of us chatted in the #hadoop-pig channel on 
freenode (IRC) as well as Russell Jurney and Jonathan Coveney from the Bay 
Area. 

Cheers,

Jeremy

Notes of interest from Apache Pig Hackday, Austin edition

Reply via email to