Hello Sam, Please find attached PIG script for the same. You may find the necessary jars below.
http://mvnrepository.com/artifact/com.twitter.elephantbird/elephant-bird-pig Note: Same functionality can be achieved in Hive as well. Thanks and Regards Nishant Aggarwal, PMP Cell No:- +91 99588 94305 http://in.linkedin.com/pub/nishant-aggarwal/53/698/11b On Wed, Oct 28, 2015 at 8:30 AM, Sam Joe <[email protected]> wrote: > Thanks Nishant! Will try using Pig json loader too to achieve this > requirement. If you have any tutorial for extracting data from complex > nested json arrays (as the example given in my previous email), please send > it. > > Appreciate your help! > > Thanks, > Joel > > On Tue, Oct 27, 2015 at 10:20 PM, Nishant Aggarwal <[email protected]> > wrote: > >> Hello Sam, >> You can easily achieve this by using elephant-bird.jars in pig. We are >> also caturing tweets via flume and filter them using pig and elephant-jars. >> You can find the related jars over internet. >> >> Cheers, >> Nishant Aggarwal >> On 28 Oct 2015 00:50, "Sam Joe" <[email protected]> wrote: >> >>> Hi, >>> >>> Is it possible to use json_tuple function to extract data from json >>> arrays (nested too). I am trying to process json data as string and avoid >>> using serdes since user data may be malformed. >>> >>> Please see a sample json data given below: >>> >>> >>> { >>> >>> "filter_level": "low", >>> >>> "retweeted": false, >>> >>> "in_reply_to_screen_name": null, >>> >>> "possibly_sensitive": false, >>> >>> "truncated": false, >>> >>> "lang": "en", >>> >>> "in_reply_to_status_id_str": null, >>> >>> "id": 654395184428515332, >>> >>> "extended_entities": { >>> >>> "media": [{ >>> >>> "sizes": { >>> >>> "thumb": { >>> >>> "w": 150, >>> >>> "resize": "crop", >>> >>> "h": 150 >>> >>> }, >>> >>> "small": { >>> >>> "w": 340, >>> >>> "resize": "fit", >>> >>> "h": 255 >>> >>> }, >>> >>> "large": { >>> >>> "w": 1024, >>> >>> "resize": "fit", >>> >>> "h": 768 >>> >>> }, >>> >>> "medium": { >>> >>> "w": 600, >>> >>> "resize": "fit", >>> >>> "h": 450 >>> >>> } >>> >>> }, >>> >>> "source_user_id": 16864598, >>> >>> "media_url": "http://pbs.twimg.com/media/CRSL2MPWsAAOnZo.jpg", >>> >>> "display_url": "pic.twitter.com/i3004WyF4g", >>> >>> "type": "photo", >>> >>> "url": "http://t.co/i3004WyF4g", >>> >>> "id": 654301608990388224, >>> >>> "media_url_https": "https://pbs.twimg.com/media/CRSL2MPWsAAOnZo.jpg", >>> >>> "expanded_url": " >>> http://twitter.com/lordlancaster/status/654301626665189376/photo/1", >>> >>> "source_user_id_str": "16864598", >>> >>> "indices": [143, >>> >>> 144], >>> >>> "source_status_id_str": "654301626665189376", >>> >>> "source_status_id": 654301626665189376, >>> >>> "id_str": "654301608990388224" >>> >>> }, >>> >>> { >>> >>> "sizes": { >>> >>> "thumb": { >>> >>> "w": 150, >>> >>> "resize": "crop", >>> >>> "h": 150 >>> >>> }, >>> >>> "small": { >>> >>> "w": 340, >>> >>> "resize": "fit", >>> >>> "h": 255 >>> >>> }, >>> >>> "large": { >>> >>> "w": 1024, >>> >>> "resize": "fit", >>> >>> "h": 768 >>> >>> }, >>> >>> "medium": { >>> >>> "w": 600, >>> >>> "resize": "fit", >>> >>> "h": 450 >>> >>> } >>> >>> }, >>> >>> "source_user_id": 16864598, >>> >>> "media_url": "http://pbs.twimg.com/media/CRSL2MRWgAAGOcj.jpg", >>> >>> "display_url": "pic.twitter.com/i3004WyF4g", >>> >>> "type": "photo", >>> >>> "url": "http://t.co/i3004WyF4g", >>> >>> "id": 654301608998764544, >>> >>> "media_url_https": "https://pbs.twimg.com/media/CRSL2MRWgAAGOcj.jpg", >>> >>> "expanded_url": " >>> http://twitter.com/lordlancaster/status/654301626665189376/photo/1", >>> >>> "source_user_id_str": "16864598", >>> >>> "indices": [143, >>> >>> 144], >>> >>> "source_status_id_str": "654301626665189376", >>> >>> "source_status_id": 654301626665189376, >>> >>> "id_str": "654301608998764544" >>> >>> }, >>> >>> { >>> >>> "sizes": { >>> >>> "thumb": { >>> >>> "w": 150, >>> >>> "resize": "crop", >>> >>> "h": 150 >>> >>> }, >>> >>> "small": { >>> >>> "w": 340, >>> >>> "resize": "fit", >>> >>> "h": 255 >>> >>> }, >>> >>> "large": { >>> >>> "w": 1024, >>> >>> "resize": "fit", >>> >>> "h": 768 >>> >>> }, >>> >>> "medium": { >>> >>> "w": 600, >>> >>> "resize": "fit", >>> >>> "h": 450 >>> >>> } >>> >>> }, >>> >>> "source_user_id": 16864598, >>> >>> "media_url": "http://pbs.twimg.com/media/CRSL2MQWwAAP4Qo.jpg", >>> >>> "display_url": "pic.twitter.com/i3004WyF4g", >>> >>> "type": "photo", >>> >>> "url": "http://t.co/i3004WyF4g", >>> >>> "id": 654301608994586624, >>> >>> "media_url_https": "https://pbs.twimg.com/media/CRSL2MQWwAAP4Qo.jpg", >>> >>> "expanded_url": " >>> http://twitter.com/lordlancaster/status/654301626665189376/photo/1", >>> >>> "source_user_id_str": "16864598", >>> >>> "indices": [143, >>> >>> 144], >>> >>> "source_status_id_str": "654301626665189376", >>> >>> "source_status_id": 654301626665189376, >>> >>> "id_str": "654301608994586624" >>> >>> }, >>> >>> { >>> >>> "sizes": { >>> >>> "thumb": { >>> >>> "w": 150, >>> >>> "resize": "crop", >>> >>> "h": 150 >>> >>> }, >>> >>> "small": { >>> >>> "w": 340, >>> >>> "resize": "fit", >>> >>> "h": 255 >>> >>> }, >>> >>> "large": { >>> >>> "w": 1024, >>> >>> "resize": "fit", >>> >>> "h": 768 >>> >>> }, >>> >>> "medium": { >>> >>> "w": 600, >>> >>> "resize": "fit", >>> >>> "h": 450 >>> >>> } >>> >>> }, >>> >>> "source_user_id": 16864598, >>> >>> "media_url": "http://pbs.twimg.com/media/CRSL2M8WcAEXowZ.jpg", >>> >>> "display_url": "pic.twitter.com/i3004WyF4g", >>> >>> "type": "photo", >>> >>> "url": "http://t.co/i3004WyF4g", >>> >>> "id": 654301609179115521, >>> >>> "media_url_https": "https://pbs.twimg.com/media/CRSL2M8WcAEXowZ.jpg", >>> >>> "expanded_url": " >>> http://twitter.com/lordlancaster/status/654301626665189376/photo/1", >>> >>> "source_user_id_str": "16864598", >>> >>> "indices": [143, >>> >>> 144], >>> >>> "source_status_id_str": "654301626665189376", >>> >>> "source_status_id": 654301626665189376, >>> >>> "id_str": "654301609179115521" >>> >>> }] >>> >>> } >>> >>> } >>> >>> >>> Appreciate any help! >>> >>> >>> *Thanks,* >>> >>> *Joel* >>> >>> >>> >>> >
FILTER_TWEETS.pig
Description: Binary data
