I would recommend using the elephant-bird-pig JsonLoader Have used it quite extensively to parse nested Json datasets with no issue.
You can download the jar files from maven and Register in the script. http://mvnrepository.com/artifact/com.twitter.elephantbird/elephant-bird-pig https://github.com/kevinweil/elephant-bird/ It has dependencies on the following jars json-simple-1.1.x.jar; elephant-bird-pig-4.x.jar; elephant-bird-hadoop-compat-4.x.jar; elephant-bird-core-4.x.jar; Parse the file fileA = LOAD '/hdfs-directory/' USING com.twitter.elephantbird.pig.load.JsonLoader('-nestedLoad') AS (json:map[]); B = FOREACH A GENERATE json#'col1' = col1; Ryan On Fri, Jul 25, 2014 at 4:55 PM, Satish Kolli <[email protected]> wrote: > Did you try the standard JsonLoader? I didn't personally use it but it > looks like you can specify the schema to extract/parse from your json. > > http://pig.apache.org/docs/r0.13.0/func.html#jsonloadstore > > If not, you can also look at the following example I found googling: > > https://gist.github.com/kimsterv/601331 > > > Thanks. > > > > > On Fri, Jul 25, 2014 at 8:01 AM, praveenesh kumar <[email protected]> > wrote: > > > One simple way is to write a UDF that will act as Json parser. Load your > > data and then call your UDF to parse and extract whatever you want from > the > > Json. You need to build what you want to get. Pig doesn't do that for > you, > > it gives you the capability to do that. How you do is upto you. > > > > > > On Fri, Jul 25, 2014 at 12:09 PM, unmesha sreeveni < > [email protected]> > > wrote: > > > > > Hi > > > > > > This is my code for sampling > > > > > > *--Load data* > > > *inputdata = LOAD '$input' using PigStorage('$delimiter');* > > > > > > *--Group data* > > > *groupedByAll = group inputdata all;* > > > > > > *--output into hdfs* > > > *sampled = SAMPLE inputdata $fraction;* > > > *store sampled into '$output' using PigStorage('$delimiter'); * > > > > > > --Sampling.pig > > > --pig -x mapreduce -f Sampling.pig -param input=foo.csv -param > > > output=OUT/pig -param delimiter="," -param fraction='0.05' > > > > > > --Load data > > > inputdata = LOAD '$input' using PigStorage('$delimiter'); > > > > > > --Group data > > > groupedByAll = group inputdata all; > > > > > > --output into hdfs > > > sampled = SAMPLE inputdata $fraction; > > > store sampled into '$output' using PigStorage('$delimiter'); > > > > > > I am taking input parameters as customized > > > pig -x mapreduce -f Sampling.pig -param input=foo.csv -param > > output=OUT/pig > > > -param delimiter="," -param fraction='0.05' > > > > > > I would like to do a modification in the same > > > I am trying to take my input as json > > > > > > sample json: > > > > > > > > > *{"Name":"sampling","elementInfo":{"fraction":"3"},"destination":"/user/sree/OUT","source":"/user/sree/foo.txt"}* > > > > > > Now I need to parse the above json and take the needful params. > > > How to do the same > > > I know we can load json in apache pig but how to extract the needful > from > > > the json > > > > > > from here I only need > > > fraction,destination,source > > > > > > Please suggest a way > > > > > > -- > > > *Thanks & Regards * > > > > > > > > > *Unmesha Sreeveni U.B* > > > *Hadoop, Bigdata Developer* > > > *Center for Cyber Security | Amrita Vishwa Vidyapeetham* > > > http://www.unmeshasreeveni.blogspot.in/ > > > > > > -- Ryan Prociuk | Engineering Distributed Data
