Hi Satya,

Thanks I have already started checking JSON Serde. Lets see if it works. By
the way can we write UDFs in Python/Ruby?

Regards,
Ajay T

On Sun, Nov 13, 2016 at 12:30 PM, Satya Harish Appana <
satyaharish.app...@gmail.com> wrote:

> You can use these
>
> *Json Serde: *https://github.com/rcongiu/Hive-JSON-Serde
>
> or else you can write a hive udtf, (Eg: http://beekeeperdata.com/
> posts/hadoop/2015/07/26/Hive-UDTF-Tutorial.html)
>
>
>
> On Sun, Nov 13, 2016 at 12:22 PM, Ajay Tirpude <tirpudeaj...@gmail.com>
> wrote:
>
>> Hi Dudu,
>>
>> I want to parse my json file and get the desired output in csv file that
>> I pasted in the output section. Currently I am able to achieve this using
>> bash(jq command) but that is not an answer for json files that are in TBs.
>> So I am looking for a solution in PIG or HIVE.
>>
>> Regards,
>> Ajay T
>>
>> On Sun, Nov 13, 2016 at 12:10 PM, Markovitz, Dudu <dmarkov...@paypal.com>
>> wrote:
>>
>>> And your issue/question is?
>>>
>>>
>>>
>>> *From:* Ajay Tirpude [mailto:tirpudeaj...@gmail.com]
>>> *Sent:* Sunday, November 13, 2016 4:46 AM
>>> *To:* user@hive.apache.org
>>> *Subject:* Nested JSON Parsing
>>>
>>>
>>>
>>> Dear All,
>>>
>>>
>>>
>>> I am trying to parse this json file given below and my intention is to
>>> convert this json file into a csv.
>>>
>>>
>>>
>>> *{*
>>>
>>> *  "devicetype": "SmartPhone",*
>>>
>>> *  "uuid": "sg76fdhh7gfxhxfhgxf67x",*
>>>
>>> *  "ts": {*
>>>
>>> *    "date": "2016-03-23T10:58:34.660Z"*
>>>
>>> *  },*
>>>
>>> *  "events": [*
>>>
>>> *    {*
>>>
>>> *      "timestamp": "2016-03-23T10:58:37Z",*
>>>
>>> *      "evt": "first",*
>>>
>>> *      "ad": "v6v75v88n98778mn",*
>>>
>>> *      "tkey": "ngbbc76fbc6fb6fb66fb6",*
>>>
>>> *      "mtp": "Wed Mar 23 2016 19:04:22 GMT 0800 (PHT)",*
>>>
>>> *      "eventid": "eytuy"*
>>>
>>> *    },*
>>>
>>> *    {*
>>>
>>> *      "timestamp": "2016-03-23T10:58:35Z",*
>>>
>>> *      "evt": "second",*
>>>
>>> *      "ad": "v6v75v88n98778mn",*
>>>
>>> *      "tkey": "ngbbc76fbc6fb6fb66fb6"*
>>>
>>> *    },*
>>>
>>> *    {*
>>>
>>> *      "timestamp": "2016-03-23T10:58:36Z",*
>>>
>>> *      "evt": "third",*
>>>
>>> *      "ad": "v6v75v88n98778mn",*
>>>
>>> *      "tkey": "ngbbc76fbc6fb6fb66fb6"*
>>>
>>> *    }*
>>>
>>> *  ],*
>>>
>>> *  "adid": "v6v75v88n98778mn",*
>>>
>>> *  "ad_tz": {*
>>>
>>> *    "date": "2016-03-23T10:58:34.660Z"*
>>>
>>> *  },*
>>>
>>> *  "ua": "Mozilla/5.0 (Linux; U; Android 4.3; en-gb; SM-N9005
>>> Build/JSS15J) AppleWebKit/534.30 (KHTML, like Gecko) Version/4.0 Mobile
>>> Safari/534.30"*
>>>
>>> *}*
>>>
>>>
>>>
>>> There are few conditions that I need to apply before I parse
>>>
>>>
>>>
>>> 1. I want to get all the fields except timestamp inside events nested
>>> key.
>>>
>>> 2. I want to loop events key for each evt. In above input file there are
>>> three evts but that would not fixed in the actual input file. There can be
>>> multiple evts and not just 3.
>>>
>>> 3. Not every evt block is similar. You can have different extra field in
>>> each evt block but we need to extract every key. In case we don't have key
>>> in one evt then the value should be blank for that env. For example for
>>> evt: first we have two extra key value pair i.,e, eventid/mtp and these
>>> value should be blank for other evts. Similarly we can have some key:value
>>> in other evts as well so that other key:values should be blank in other
>>> evts.
>>>
>>>
>>>
>>> At last I want the output to be like this
>>>
>>>
>>>
>>> devicetype
>>>
>>> uuid
>>>
>>> ts.date
>>>
>>> events.evt
>>>
>>> events.ad
>>>
>>> events.tkey
>>>
>>> events.mtp
>>>
>>> events.eventid
>>>
>>> adid
>>>
>>> ad_tz.date
>>>
>>> ua
>>>
>>> SmartPhone
>>>
>>> sg76fdhh7gfxhxfhgxf67x
>>>
>>> 2016-03-23T10:58:34.660Z
>>>
>>> first
>>>
>>> v6v75v88n98778mn
>>>
>>> ngbbc76fbc6fb6fb66fb6
>>>
>>> Wed Mar 23 2016 19:04:22 GMT 0800 (PHT)
>>>
>>> eytuy
>>>
>>> v6v75v88n98778mn
>>>
>>> 2016-03-23T10:58:34.660Z
>>>
>>> Mozilla/5.0 (Linux; U; Android 4.3; en-gb; SM-N9005 Build/JSS15J)
>>> AppleWebKit/534.30 (KHTML, like Gecko) Version/4.0 Mobile Safari/534.30
>>>
>>> SmartPhone
>>>
>>> sg76fdhh7gfxhxfhgxf67x
>>>
>>> 2016-03-23T10:58:34.660Z
>>>
>>> second
>>>
>>> v6v75v88n98778mn
>>>
>>> ngbbc76fbc6fb6fb66fb6
>>>
>>> v6v75v88n98778mn
>>>
>>> 2016-03-23T10:58:34.660Z
>>>
>>> Mozilla/5.0 (Linux; U; Android 4.3; en-gb; SM-N9005 Build/JSS15J)
>>> AppleWebKit/534.30 (KHTML, like Gecko) Version/4.0 Mobile Safari/534.30
>>>
>>> SmartPhone
>>>
>>> sg76fdhh7gfxhxfhgxf67x
>>>
>>> 2016-03-23T10:58:34.660Z
>>>
>>> third
>>>
>>> v6v75v88n98778mn
>>>
>>> ngbbc76fbc6fb6fb66fb6
>>>
>>>
>>>
>>>
>>>
>>> v6v75v88n98778mn
>>>
>>> 2016-03-23T10:58:34.660Z
>>>
>>> Mozilla/5.0 (Linux; U; Android 4.3; en-gb; SM-N9005 Build/JSS15J)
>>> AppleWebKit/534.30 (KHTML, like Gecko) Version/4.0 Mobile Safari/534.30
>>>
>>>
>>>
>>> Regards,
>>>
>>> Ajay T
>>>
>>
>>
>
>
> --
>
>
> Regards,
> Satya Harish Appana,
> Software Development Engineer II,
> Flipkart,Bangalore,
> Ph:+91-9538797174.
>

Reply via email to