great... this was exactly what I was hoping for ... (although I have a bit of sadness as I was just about ready to get by hands dirty)
On Tue, Apr 19, 2011 at 2:57 PM, Xavier Stevens <xstev...@mozilla.com>wrote: > For what it's worth I have one as well. This one uses Jackson to parse > everything. > > > https://github.com/xstevens/akela/blob/master/src/java/com/mozilla/pig/eval/json/JsonMap.java > > > On 4/19/11 11:55 AM, Dmitriy Ryaboy wrote: > > YES :) > > > > On Tue, Apr 19, 2011 at 11:49 AM, John Hui <john.m....@gmail.com> wrote: > > > >> I have a JSON library and pig script working. Should I just contribute > it > >> instead of reinventing the wheel? > >> > >> John > >> > >> On Tue, Apr 19, 2011 at 2:44 PM, Daniel Eklund <doekl...@gmail.com> > wrote: > >> > >>> Bill, thanks... > >>> > >>> so that is a confirmation... people have rolled their own, and it's > not > >> in > >>> piggybank. > >>> I would absolutely be willing to work with you to get a contribution > >> going, > >>> but (as > >>> a warning) I am extremely new to Pig. > >>> > >>> I was looking at this: > >>> http://wiki.apache.org/pig/UDFManual > >>> to get my mind wrapped around the framework. And I also discovered > this > >>> > >>> > >> > https://github.com/kevinweil/elephant-bird/blob/master/src/java/com/twitter/elephantbird/pig/piggybank/JsonStringToMap.java > >>> ( I am assuming this was the UDF you mentioned that inspired you)... > >>> > >>> A quick question about the UDF's registered at the top of a pig script: > >>> > >>> does > >>> REGISTER myJar.jar > >>> distribute the jar across HDFS (like a Hadoop job jar) so that the > >>> distribution of the code to the cluster nodes is transparent? > >>> In other words, do we NOT have to distribute myJar.jar to each node on > >> the > >>> cluster. > >>> > >>> thanks more, > >>> daniel > >>> > >>> > >>> > >>> On Tue, Apr 19, 2011 at 1:57 PM, Bill Graham <billgra...@gmail.com> > >> wrote: > >>>> We're doing the same thing using a JsonToMap UDF followed by a > >>>> MapToBag UDF. The former was similarly inspired by the elephant bird > >>>> JSONLoader. I'd be glad to collaborate on a contribution if you'd > >>>> like. > >>>> > >>>> Here's what our scripts look like: > >>>> > >>>> define mapToBag cnwk.hadoop.mapreduce.pig.udf.MapToBag(); > >>>> define jsonToMap cnwk.hadoop.mapreduce.pig.udf.JsonToMap(); > >>>> define concat org.apache.pig.builtin.StringConcat(); > >>>> > >>>> raw = LOAD 'hbase://user_info' > >>>> USING org.apache.pig.backend.hadoop.hbase.HBaseStorage( > >> 'events:*') > >>>> AS (events_map:map[]); > >>>> > >>>> -- Convert our maps to bags so we can flatten them out > >>>> B = FOREACH raw GENERATE mapToBag(events_map) AS event_bag; > >>>> > >>>> C = FOREACH B GENERATE FLATTEN(event_bag) AS (event_k:chararray, > >>>> event_v:chararray); > >>>> > >>>> -- Convert the JSON events into maps > >>>> D = FOREACH C GENERATE social_k, jsonToMap(event_v) AS > event_map:map[]; > >>>> > >>>> -- Example showing how to filter on a given field > >>>> E = FILTER D BY (event_map#'levt.astid' IS NOT NULL AND > >>>> event_map#'levt.asid' IS NOT NULL); > >>>> > >>>> -- Example showing how to pull data out of a map > >>>> F = FOREACH E GENERATE event_map#'levt.asid' AS asid, > >>>> event_map#'levt.astid' AS > >>>> astid; > >>>> > >>>> > >>>> thanks, > >>>> Bill > >>>> > >>>> On Tue, Apr 19, 2011 at 10:08 AM, Daniel Eklund <doekl...@gmail.com> > >>>> wrote: > >>>>> I noticed that there is a Pig JSON Loader (which might or might not > >> be > >>> in > >>>>> piggbank). > >>>>> Could anyone confirm the existence or absence of a JSONToTuple UDF? > >>> (not > >>>> a > >>>>> loader) > >>>>> > >>>>> I am inspired by the UDF mentioned on Slide 23 here: > >>>>> http://www.slideshare.net/danharvey/hbase-at-mendeley > >>>>> > >>>>> doc = FOREACH rawdocs GENERATE > >> DocumentProtobufBytesToTuple(protodoc) > >>> as > >>>>> DOC; > >>>>> > >>>>> My desire is to store a raw JSON doc in a cell in HBase and run pig > >>>> queries > >>>>> against the tuples generated by the UDF. > >>>>> I used the HBase Loader already to get the cell-data, and now I need > >> a > >>>>> JSON-deserializer. > >>>>> > >>>>> I would be willing to roll my own, (and contribute), but I figure I'd > >>> see > >>>> if > >>>>> there was anything out there first. > >>>>> > >>>>> thanks, > >>>>> daniel > >>>>> >