Hi
This is my code for sampling
*--Load data*
*inputdata = LOAD '$input' using PigStorage('$delimiter');*
*--Group data*
*groupedByAll = group inputdata all;*
*--output into hdfs*
*sampled = SAMPLE inputdata $fraction;*
*store sampled into '$output' using PigStorage('$delimiter'); *
--Sampling.pig
--pig -x mapreduce -f Sampling.pig -param input=foo.csv -param
output=OUT/pig -param delimiter="," -param fraction='0.05'
--Load data
inputdata = LOAD '$input' using PigStorage('$delimiter');
--Group data
groupedByAll = group inputdata all;
--output into hdfs
sampled = SAMPLE inputdata $fraction;
store sampled into '$output' using PigStorage('$delimiter');
I am taking input parameters as customized
pig -x mapreduce -f Sampling.pig -param input=foo.csv -param output=OUT/pig
-param delimiter="," -param fraction='0.05'
I would like to do a modification in the same
I am trying to take my input as json
sample json:
*{"Name":"sampling","elementInfo":{"fraction":"3"},"destination":"/user/sree/OUT","source":"/user/sree/foo.txt"}*
Now I need to parse the above json and take the needful params.
How to do the same
I know we can load json in apache pig but how to extract the needful from
the json
from here I only need
fraction,destination,source
Please suggest a way
--
*Thanks & Regards *
*Unmesha Sreeveni U.B*
*Hadoop, Bigdata Developer*
*Center for Cyber Security | Amrita Vishwa Vidyapeetham*
http://www.unmeshasreeveni.blogspot.in/