great...

this was exactly what I was hoping for ... (although I have a bit of sadness
as I was just about ready to get by hands dirty)

On Tue, Apr 19, 2011 at 2:57 PM, Xavier Stevens <xstev...@mozilla.com>wrote:

> For what it's worth I have one as well. This one uses Jackson to parse
> everything.
>
>
> https://github.com/xstevens/akela/blob/master/src/java/com/mozilla/pig/eval/json/JsonMap.java
>
>
> On 4/19/11 11:55 AM, Dmitriy Ryaboy wrote:
> > YES :)
> >
> > On Tue, Apr 19, 2011 at 11:49 AM, John Hui <john.m....@gmail.com> wrote:
> >
> >> I have a JSON library and pig script working.  Should I just contribute
> it
> >> instead of reinventing the wheel?
> >>
> >> John
> >>
> >> On Tue, Apr 19, 2011 at 2:44 PM, Daniel Eklund <doekl...@gmail.com>
> wrote:
> >>
> >>> Bill,  thanks...
> >>>
> >>>  so that is a confirmation... people have rolled their own, and it's
> not
> >> in
> >>> piggybank.
> >>> I would absolutely be willing to work with you to get a contribution
> >> going,
> >>> but (as
> >>> a warning) I am extremely new to Pig.
> >>>
> >>> I was looking at this:
> >>> http://wiki.apache.org/pig/UDFManual
> >>> to get my mind wrapped around the framework.  And I also discovered
> this
> >>>
> >>>
> >>
> https://github.com/kevinweil/elephant-bird/blob/master/src/java/com/twitter/elephantbird/pig/piggybank/JsonStringToMap.java
> >>> ( I am assuming this was the UDF you mentioned that inspired you)...
> >>>
> >>> A quick question about the UDF's registered at the top of a pig script:
> >>>
> >>> does
> >>> REGISTER myJar.jar
> >>> distribute the jar across HDFS (like a Hadoop job jar) so that the
> >>> distribution of the code to the cluster nodes is transparent?
> >>> In other words, do we NOT have to distribute myJar.jar to each node on
> >> the
> >>> cluster.
> >>>
> >>> thanks more,
> >>> daniel
> >>>
> >>>
> >>>
> >>> On Tue, Apr 19, 2011 at 1:57 PM, Bill Graham <billgra...@gmail.com>
> >> wrote:
> >>>> We're doing the same thing using a JsonToMap UDF followed by a
> >>>> MapToBag UDF. The former was similarly inspired by the elephant bird
> >>>> JSONLoader. I'd be glad to collaborate on a contribution if you'd
> >>>> like.
> >>>>
> >>>> Here's what our scripts look like:
> >>>>
> >>>> define mapToBag cnwk.hadoop.mapreduce.pig.udf.MapToBag();
> >>>> define jsonToMap cnwk.hadoop.mapreduce.pig.udf.JsonToMap();
> >>>> define concat org.apache.pig.builtin.StringConcat();
> >>>>
> >>>> raw = LOAD 'hbase://user_info'
> >>>>      USING org.apache.pig.backend.hadoop.hbase.HBaseStorage(
> >> 'events:*')
> >>>>      AS (events_map:map[]);
> >>>>
> >>>> -- Convert our maps to bags so we can flatten them out
> >>>> B = FOREACH raw GENERATE mapToBag(events_map) AS event_bag;
> >>>>
> >>>> C = FOREACH B GENERATE FLATTEN(event_bag) AS (event_k:chararray,
> >>>> event_v:chararray);
> >>>>
> >>>> -- Convert the JSON events into maps
> >>>> D = FOREACH C GENERATE social_k, jsonToMap(event_v) AS
> event_map:map[];
> >>>>
> >>>> -- Example showing how to filter on a given field
> >>>> E = FILTER D BY (event_map#'levt.astid' IS NOT NULL AND
> >>>> event_map#'levt.asid' IS NOT NULL);
> >>>>
> >>>> -- Example showing how to pull data out of a map
> >>>> F = FOREACH E GENERATE event_map#'levt.asid' AS asid,
> >>>>                                             event_map#'levt.astid' AS
> >>>> astid;
> >>>>
> >>>>
> >>>> thanks,
> >>>> Bill
> >>>>
> >>>> On Tue, Apr 19, 2011 at 10:08 AM, Daniel Eklund <doekl...@gmail.com>
> >>>> wrote:
> >>>>> I noticed that there is a Pig JSON Loader (which might or might not
> >> be
> >>> in
> >>>>> piggbank).
> >>>>> Could anyone confirm the existence or absence of a JSONToTuple UDF?
> >>>  (not
> >>>> a
> >>>>> loader)
> >>>>>
> >>>>> I am inspired by the UDF mentioned on Slide 23 here:
> >>>>> http://www.slideshare.net/danharvey/hbase-at-mendeley
> >>>>>
> >>>>>  doc = FOREACH rawdocs GENERATE
> >> DocumentProtobufBytesToTuple(protodoc)
> >>> as
> >>>>> DOC;
> >>>>>
> >>>>> My desire is to store a raw JSON doc in a cell in HBase and run pig
> >>>> queries
> >>>>> against the tuples generated by the UDF.
> >>>>> I used the HBase Loader already to get the cell-data, and now I need
> >> a
> >>>>> JSON-deserializer.
> >>>>>
> >>>>> I would be willing to roll my own, (and contribute), but I figure I'd
> >>> see
> >>>> if
> >>>>> there was anything out there first.
> >>>>>
> >>>>> thanks,
> >>>>> daniel
> >>>>>
>

Reply via email to