Hi Satya, Thanks I have already started checking JSON Serde. Lets see if it works. By the way can we write UDFs in Python/Ruby?
Regards, Ajay T On Sun, Nov 13, 2016 at 12:30 PM, Satya Harish Appana < satyaharish.app...@gmail.com> wrote: > You can use these > > *Json Serde: *https://github.com/rcongiu/Hive-JSON-Serde > > or else you can write a hive udtf, (Eg: http://beekeeperdata.com/ > posts/hadoop/2015/07/26/Hive-UDTF-Tutorial.html) > > > > On Sun, Nov 13, 2016 at 12:22 PM, Ajay Tirpude <tirpudeaj...@gmail.com> > wrote: > >> Hi Dudu, >> >> I want to parse my json file and get the desired output in csv file that >> I pasted in the output section. Currently I am able to achieve this using >> bash(jq command) but that is not an answer for json files that are in TBs. >> So I am looking for a solution in PIG or HIVE. >> >> Regards, >> Ajay T >> >> On Sun, Nov 13, 2016 at 12:10 PM, Markovitz, Dudu <dmarkov...@paypal.com> >> wrote: >> >>> And your issue/question is? >>> >>> >>> >>> *From:* Ajay Tirpude [mailto:tirpudeaj...@gmail.com] >>> *Sent:* Sunday, November 13, 2016 4:46 AM >>> *To:* user@hive.apache.org >>> *Subject:* Nested JSON Parsing >>> >>> >>> >>> Dear All, >>> >>> >>> >>> I am trying to parse this json file given below and my intention is to >>> convert this json file into a csv. >>> >>> >>> >>> *{* >>> >>> * "devicetype": "SmartPhone",* >>> >>> * "uuid": "sg76fdhh7gfxhxfhgxf67x",* >>> >>> * "ts": {* >>> >>> * "date": "2016-03-23T10:58:34.660Z"* >>> >>> * },* >>> >>> * "events": [* >>> >>> * {* >>> >>> * "timestamp": "2016-03-23T10:58:37Z",* >>> >>> * "evt": "first",* >>> >>> * "ad": "v6v75v88n98778mn",* >>> >>> * "tkey": "ngbbc76fbc6fb6fb66fb6",* >>> >>> * "mtp": "Wed Mar 23 2016 19:04:22 GMT 0800 (PHT)",* >>> >>> * "eventid": "eytuy"* >>> >>> * },* >>> >>> * {* >>> >>> * "timestamp": "2016-03-23T10:58:35Z",* >>> >>> * "evt": "second",* >>> >>> * "ad": "v6v75v88n98778mn",* >>> >>> * "tkey": "ngbbc76fbc6fb6fb66fb6"* >>> >>> * },* >>> >>> * {* >>> >>> * "timestamp": "2016-03-23T10:58:36Z",* >>> >>> * "evt": "third",* >>> >>> * "ad": "v6v75v88n98778mn",* >>> >>> * "tkey": "ngbbc76fbc6fb6fb66fb6"* >>> >>> * }* >>> >>> * ],* >>> >>> * "adid": "v6v75v88n98778mn",* >>> >>> * "ad_tz": {* >>> >>> * "date": "2016-03-23T10:58:34.660Z"* >>> >>> * },* >>> >>> * "ua": "Mozilla/5.0 (Linux; U; Android 4.3; en-gb; SM-N9005 >>> Build/JSS15J) AppleWebKit/534.30 (KHTML, like Gecko) Version/4.0 Mobile >>> Safari/534.30"* >>> >>> *}* >>> >>> >>> >>> There are few conditions that I need to apply before I parse >>> >>> >>> >>> 1. I want to get all the fields except timestamp inside events nested >>> key. >>> >>> 2. I want to loop events key for each evt. In above input file there are >>> three evts but that would not fixed in the actual input file. There can be >>> multiple evts and not just 3. >>> >>> 3. Not every evt block is similar. You can have different extra field in >>> each evt block but we need to extract every key. In case we don't have key >>> in one evt then the value should be blank for that env. For example for >>> evt: first we have two extra key value pair i.,e, eventid/mtp and these >>> value should be blank for other evts. Similarly we can have some key:value >>> in other evts as well so that other key:values should be blank in other >>> evts. >>> >>> >>> >>> At last I want the output to be like this >>> >>> >>> >>> devicetype >>> >>> uuid >>> >>> ts.date >>> >>> events.evt >>> >>> events.ad >>> >>> events.tkey >>> >>> events.mtp >>> >>> events.eventid >>> >>> adid >>> >>> ad_tz.date >>> >>> ua >>> >>> SmartPhone >>> >>> sg76fdhh7gfxhxfhgxf67x >>> >>> 2016-03-23T10:58:34.660Z >>> >>> first >>> >>> v6v75v88n98778mn >>> >>> ngbbc76fbc6fb6fb66fb6 >>> >>> Wed Mar 23 2016 19:04:22 GMT 0800 (PHT) >>> >>> eytuy >>> >>> v6v75v88n98778mn >>> >>> 2016-03-23T10:58:34.660Z >>> >>> Mozilla/5.0 (Linux; U; Android 4.3; en-gb; SM-N9005 Build/JSS15J) >>> AppleWebKit/534.30 (KHTML, like Gecko) Version/4.0 Mobile Safari/534.30 >>> >>> SmartPhone >>> >>> sg76fdhh7gfxhxfhgxf67x >>> >>> 2016-03-23T10:58:34.660Z >>> >>> second >>> >>> v6v75v88n98778mn >>> >>> ngbbc76fbc6fb6fb66fb6 >>> >>> v6v75v88n98778mn >>> >>> 2016-03-23T10:58:34.660Z >>> >>> Mozilla/5.0 (Linux; U; Android 4.3; en-gb; SM-N9005 Build/JSS15J) >>> AppleWebKit/534.30 (KHTML, like Gecko) Version/4.0 Mobile Safari/534.30 >>> >>> SmartPhone >>> >>> sg76fdhh7gfxhxfhgxf67x >>> >>> 2016-03-23T10:58:34.660Z >>> >>> third >>> >>> v6v75v88n98778mn >>> >>> ngbbc76fbc6fb6fb66fb6 >>> >>> >>> >>> >>> >>> v6v75v88n98778mn >>> >>> 2016-03-23T10:58:34.660Z >>> >>> Mozilla/5.0 (Linux; U; Android 4.3; en-gb; SM-N9005 Build/JSS15J) >>> AppleWebKit/534.30 (KHTML, like Gecko) Version/4.0 Mobile Safari/534.30 >>> >>> >>> >>> Regards, >>> >>> Ajay T >>> >> >> > > > -- > > > Regards, > Satya Harish Appana, > Software Development Engineer II, > Flipkart,Bangalore, > Ph:+91-9538797174. >