Thanks .I am able to parse json using elephantbird.
Now I am able to get source,destination,fraction in different bags.

But how can I give these values to my pigscript?


--Load Json
loadJson =  LOAD '$inputJson' USING
com.twitter.elephantbird.pig.load.JsonLoader('-nestedLoad=true') AS
(json:map []);

--PARSING JSON
--Source
a = FOREACH loadJson GENERATE json#'source' AS ParsedInput;

--Destination
b = FOREACH loadJson GENERATE json#'destination' AS ParsedOutput;

--Delimiter
c = FOREACH loadJson GENERATE json#'delimiter' AS ParsedDelimiter;

--Reserviour fraction
d = FOREACH loadJson GENERATE json#'reservoirSize' AS ParsedFraction;



--Load data
inputdata = LOAD 'a' using PigStorage('c');          --How to load my
source which is in bag a,when giving 'a' it lookes for a file named a in my
current directory
store inputdata into '/home/sreeveni/myfiles/pig/OUT/ab';
--Group data
--groupedByAll = group inputdata all;

--output into hdfs
--sampled = SAMPLE inputdata $fraction;
--store sampled into '$output' using PigStorage('$delimiter');


How to achieve the same in Apache Pig?






On Sat, Jul 26, 2014 at 5:38 AM, Ryan Prociuk <[email protected]> wrote:

> I would recommend using the elephant-bird-pig JsonLoader
>
> Have used it quite extensively to parse nested Json datasets with no issue.
>
> You can download the jar files from maven and Register in the script.
>
>
> http://mvnrepository.com/artifact/com.twitter.elephantbird/elephant-bird-pig
>
> https://github.com/kevinweil/elephant-bird/
>
> It has dependencies on the following jars
> json-simple-1.1.x.jar;
> elephant-bird-pig-4.x.jar;
> elephant-bird-hadoop-compat-4.x.jar;
> elephant-bird-core-4.x.jar;
>
> Parse the file
>
> fileA = LOAD '/hdfs-directory/' USING
> com.twitter.elephantbird.pig.load.JsonLoader('-nestedLoad') AS
>  (json:map[]);
>
> B = FOREACH A GENERATE
>        json#'col1' = col1;
>
> Ryan
>
>
>
>
> On Fri, Jul 25, 2014 at 4:55 PM, Satish Kolli <[email protected]> wrote:
>
> > Did you try the standard JsonLoader? I didn't personally use it but it
> > looks like you can specify the schema to extract/parse from your json.
> >
> > http://pig.apache.org/docs/r0.13.0/func.html#jsonloadstore
> >
> > If not, you can also look at the following example I found googling:
> >
> > https://gist.github.com/kimsterv/601331
> >
> >
> > Thanks.
> >
> >
> >
> >
> > On Fri, Jul 25, 2014 at 8:01 AM, praveenesh kumar <[email protected]>
> > wrote:
> >
> > > One simple way is to write a UDF that will act as Json parser. Load
> your
> > > data and then call your UDF to parse and extract whatever you want from
> > the
> > > Json. You need to build what you want to get. Pig doesn't do that for
> > you,
> > > it gives you the capability to do that. How you do is upto you.
> > >
> > >
> > > On Fri, Jul 25, 2014 at 12:09 PM, unmesha sreeveni <
> > [email protected]>
> > > wrote:
> > >
> > > > Hi
> > > >
> > > > This is my code for sampling
> > > >
> > > > *--Load data*
> > > > *inputdata = LOAD '$input' using PigStorage('$delimiter');*
> > > >
> > > > *--Group data*
> > > > *groupedByAll = group inputdata all;*
> > > >
> > > > *--output into hdfs*
> > > > *sampled = SAMPLE inputdata $fraction;*
> > > > *store sampled into '$output' using PigStorage('$delimiter'); *
> > > >
> > > >  --Sampling.pig
> > > > --pig -x mapreduce -f Sampling.pig -param input=foo.csv -param
> > > > output=OUT/pig -param delimiter="," -param fraction='0.05'
> > > >
> > > > --Load data
> > > > inputdata = LOAD '$input' using PigStorage('$delimiter');
> > > >
> > > > --Group data
> > > > groupedByAll = group inputdata all;
> > > >
> > > > --output into hdfs
> > > > sampled = SAMPLE inputdata $fraction;
> > > > store sampled into '$output' using PigStorage('$delimiter');
> > > >
> > > > I am taking input parameters as customized
> > > > pig -x mapreduce -f Sampling.pig -param input=foo.csv -param
> > > output=OUT/pig
> > > > -param delimiter="," -param fraction='0.05'
> > > >
> > > > I would like to do a modification in the same
> > > > I am trying to take my input as json
> > > >
> > > > sample json:
> > > >
> > > >
> > >
> >
> *{"Name":"sampling","elementInfo":{"fraction":"3"},"destination":"/user/sree/OUT","source":"/user/sree/foo.txt"}*
> > > >
> > > > Now I need to parse the above json and take the needful params.
> > > > How to do the same
> > > > I know we can load json in apache pig but how to extract the needful
> > from
> > > > the json
> > > >
> > > > from here I only need
> > > > fraction,destination,source
> > > >
> > > > Please suggest a way
> > > >
> > > > --
> > > > *Thanks & Regards *
> > > >
> > > >
> > > > *Unmesha Sreeveni U.B*
> > > > *Hadoop, Bigdata Developer*
> > > > *Center for Cyber Security | Amrita Vishwa Vidyapeetham*
> > > > http://www.unmeshasreeveni.blogspot.in/
> > > >
> > >
> >
>
>
>
> --
> Ryan Prociuk | Engineering Distributed Data
>



-- 
*Thanks & Regards *


*Unmesha Sreeveni U.B*
*Hadoop, Bigdata Developer*
*Center for Cyber Security | Amrita Vishwa Vidyapeetham*
http://www.unmeshasreeveni.blogspot.in/

Reply via email to