I mean that I don't know what the return hive datatype of json_tuple is. If it is returning a json array based on the hive string datatype, explode() isn't going to be happy with that and it will throw the error that you are getting.
if you do a create table as statement, you can then check the resulting datatype of that table to see exactly what is returned from json_tuple(). If json_tuple() is returning a hive array datatype, then I'm not sure why your original query isn't working correctly unless you possibly need to wrap it in a sub-query...either way, the next step I would take in troubleshooting would be to use CTAS to a) check the datatype of the 'media' column if it isn't an array, you might checkout https://github.com/brndnmtthws/facebook-hive-udfs/blob/master/src/main/java/com/facebook/hive/udf/UDFJsonAsArray.java b) if 'media' turns out to be an array datatype you just need to troubleshoot the explode().... From: Sam Joe [mailto:[email protected]] Sent: Tuesday, October 27, 2015 7:37 PM To: [email protected] Subject: Re: Using json_tuple for Nested json Arrays Hi Ryan, I think tr3.media a complex json array having nested json tuple objects. For example, sizes is a json tuple object present inside the array which I think the function EXPLODE is not expecting. May be the explode function is expecting a closing brace '}' corresponding to the first brace '{' instead of another opening brace. So, could this be a bug? Thanks, Joel On Tue, Oct 27, 2015 at 5:22 PM, Ryan Harris <[email protected]<mailto:[email protected]>> wrote: hmmm...I'm not sure what the return value type of json_tuple is... I'd probably try creating a temporary table from your working query below and then work on getting the lateral view explode to work against the temp table. FAILED: UDFArgumentException explode() takes an array or a map as a parameter Apparently, hive doesn't think tr3.media is an array or map..so what is it? From: Sam Joe [mailto:[email protected]<mailto:[email protected]>] Sent: Tuesday, October 27, 2015 2:56 PM To: [email protected]<mailto:[email protected]> Subject: Re: Using json_tuple for Nested json Arrays Hi Ryan, The statement returns null for media as shown below: hive> SELECT tr2.id<http://tr2.id>, tr2.possibly_sensitive, tr2.media > FROM tweets_raw tr1 > LATERAL VIEW json_tuple(tr1.text_col, 'id', 'extended_entities', 'possibly_sensitive', 'extended_entities.media') tr2 as id, extended_entities, possibly_sensitive, media > where tr2.id<http://tr2.id>='654395184428515332' > LIMIT 1; OK 654395184428515332 false NULL I am able to return the media json array using the following script: hive> SELECT tr2.id<http://tr2.id>, tr2.possibly_sensitive, tr3.media > FROM tweets_raw tr1 > LATERAL VIEW json_tuple(tr1.text_col, 'id', 'extended_entities', 'possibly_sensitive') tr2 as id, extended_entities, possibly_sensitive > LATERAL VIEW json_tuple(tr2.extended_entities, 'media') tr3 as media > where tr2.id<http://tr2.id>='654395184428515332' > LIMIT 1; OK 654395184428515332 false [{"sizes":{"thumb":{"w":150,"resize":"crop","h":150},"small":{"w":340,"resize":"fit","h":255},"large":{"w":1024,"resize":"fit","h":768},"medium":{"w":600,"resize":"fit","h":450}},"source_user_id":16864598,"media_url":"http://pbs.twimg.com/media/CRSL2MPWsAAOnZo.jpg","display_url":"pic.twitter.com/i3004WyF4g<http://pic.twitter.com/i3004WyF4g>","type":"photo","url":"http://t.co/i3004WyF4g","id":654301608990388224,"media_url_https":"https://pbs.twimg.com/media/CRSL2MPWsAAOnZo.jpg","expanded_url":"http://twitter.com/lordlancaster/status/654301626665189376/photo/1","source_user_id_str":"16864598","indices":[143,144],"source_status_id_str":"654301626665189376","source_status_id":654301626665189376,"id_str":"654301608990388224"},{"sizes":{"thumb":{"w":150,"resize":"crop","h":150},"small":{"w":340,"resize":"fit","h":255},"large":{"w":1024,"resize":"fit","h":768},"medium":{"w":600,"resize":"fit","h":450}},"source_user_id":16864598,"media_url":"http://pbs.twimg.com/media/CRSL2MRWgAAGOcj.jpg","display_url":"pic.twitter.com/i3004WyF4g<http://pic.twitter.com/i3004WyF4g>","type":"photo","url":"http://t.co/i3004WyF4g","id":654301608998764544,"media_url_https":"https://pbs.twimg.com/media/CRSL2MRWgAAGOcj.jpg","expanded_url":"http://twitter.com/lordlancaster/status/654301626665189376/photo/1","source_user_id_str":"16864598","indices":[143,144],"source_status_id_str":"654301626665189376","source_status_id":654301626665189376,"id_str":"654301608998764544"},{"sizes":{"thumb":{"w":150,"resize":"crop","h":150},"small":{"w":340,"resize":"fit","h":255},"large":{"w":1024,"resize":"fit","h":768},"medium":{"w":600,"resize":"fit","h":450}},"source_user_id":16864598,"media_url":"http://pbs.twimg.com/media/CRSL2MQWwAAP4Qo.jpg","display_url":"pic.twitter.com/i3004WyF4g<http://pic.twitter.com/i3004WyF4g>","type":"photo","url":"http://t.co/i3004WyF4g","id":654301608994586624,"media_url_https":"https://pbs.twimg.com/media/CRSL2MQWwAAP4Qo.jpg","expanded_url":"http://twitter.com/lordlancaster/status/654301626665189376/photo/1","source_user_id_str":"16864598","indices":[143,144],"source_status_id_str":"654301626665189376","source_status_id":654301626665189376,"id_str":"654301608994586624"},{"sizes":{"thumb":{"w":150,"resize":"crop","h":150},"small":{"w":340,"resize":"fit","h":255},"large":{"w":1024,"resize":"fit","h":768},"medium":{"w":600,"resize":"fit","h":450}},"source_user_id":16864598,"media_url":"http://pbs.twimg.com/media/CRSL2M8WcAEXowZ.jpg","display_url":"pic.twitter.com/i3004WyF4g<http://pic.twitter.com/i3004WyF4g>","type":"photo","url":"http://t.co/i3004WyF4g","id":654301609179115521,"media_url_https":"https://pbs.twimg.com/media/CRSL2M8WcAEXowZ.jpg","expanded_url":"http://twitter.com/lordlancaster/status/654301626665189376/photo/1","source_user_id_str":"16864598","indices":[143,144],"source_status_id_str":"654301626665189376","source_status_id":654301626665189376,"id_str":"654301609179115521"}] Time taken: 0.191 seconds, Fetched: 1 row(s) hive> However, I would like to know how I can access the json tuples present inside the media json array. Appreciate your help! Thanks, Joel On Tue, Oct 27, 2015 at 4:44 PM, Ryan Harris <[email protected]<mailto:[email protected]>> wrote: I see where you are going with this now.... Not sure if you might be bumping into this bug: https://issues.apache.org/jira/browse/HIVE-1575 since this line LATERAL VIEW json_tuple(tr2.extended_entities, 'media') tr3 as media pulls the JSON array as a "top-level" object... does this not work? LATERAL VIEW json_tuple(tr1.text_col, 'id', 'extended_entities', 'possibly_sensitive', 'extended_entities.media') tr2 as id, extended_entities, possibly_sensitive, media From: Sam Joe [mailto:[email protected]<mailto:[email protected]>] Sent: Tuesday, October 27, 2015 2:25 PM To: [email protected]<mailto:[email protected]> Subject: Re: Using json_tuple for Nested json Arrays Hi Ryan, The simple query is running fine as shown below: hive> SELECT tr2.id<http://tr2.id>, tr2.possibly_sensitive > FROM tweets_raw tr1 > LATERAL VIEW json_tuple(tr1.text_col, 'id', 'extended_entities', 'possibly_sensitive') tr2 as id, extended_entities, possibly_sensitive > where tr2.id<http://tr2.id>='654395184428515332' > LIMIT 1; OK 654395184428515332 false Time taken: 1.813 seconds, Fetched: 1 row(s) hive> However, if I try to get any data from the json array it's failing. Thanks, Joel On Tue, Oct 27, 2015 at 4:21 PM, Ryan Harris <[email protected]<mailto:[email protected]>> wrote: looking at your sample data, you shouldn't need to use lateral view explode unless you are trying to get 1 entry per row for your media sizes (thumb, small, large, medium, etc) ... Try starting with something simple like : SELECT get_json_object(text_col, '$.id') as id FROM tweets_raw limit 10; You should also be able to use json_tuple(), but start simple.... From: Sam Joe [mailto:[email protected]<mailto:[email protected]>] Sent: Tuesday, October 27, 2015 1:43 PM To: [email protected]<mailto:[email protected]> Subject: Re: Using json_tuple for Nested json Arrays Hi Ryan, Thanks for your reply! I didn't try using get_json_object() UDF. I will try using that and let you know the results. I tried using the following script which failed : SELECT tr2.id<http://tr2.id>, tr2.possibly_sensitive, tr3.media, media_object.source_user_id FROM tweets_raw tr1 LATERAL VIEW json_tuple(tr1.text_col, 'id', 'extended_entities', 'possibly_sensitive') tr2 as id, extended_entities, possibly_sensitive LATERAL VIEW json_tuple(tr2.extended_entities, 'media') tr3 as media LATERAL VIEW EXPLODE(tr3.media) media_exploded as media_object where tr2.id<http://tr2.id>='654395184428515332' LIMIT 1; FAILED: UDFArgumentException explode() takes an array or a map as a parameter Thanks, Joel On Tue, Oct 27, 2015 at 3:37 PM, Ryan Harris <[email protected]<mailto:[email protected]>> wrote: Do you have an example of the query that you tried (which failed). In short, you probably want to use the get_json_object() UDF: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-get_json_object if you need the JSON array broken into individual records, you might require lateral view explode as in these examples: http://mechanics.flite.com/blog/2014/04/16/using-explode-and-lateral-view-in-hive/ http://stackoverflow.com/questions/28716165/how-to-query-struct-array-with-hive-get-json-object From: Sam Joe [mailto:[email protected]<mailto:[email protected]>] Sent: Tuesday, October 27, 2015 1:29 PM To: [email protected]<mailto:[email protected]> Subject: Re: Using json_tuple for Nested json Arrays I tried using EXPLODE function on the nested json array but it doesn't work and throws following error: FAILED: UDFArgumentException explode() takes an array or a map as a parameter Thanks, Joel On Tue, Oct 27, 2015 at 3:20 PM, Sam Joe <[email protected]<mailto:[email protected]>> wrote: Hi, Is it possible to use json_tuple function to extract data from json arrays (nested too). I am trying to process json data as string and avoid using serdes since user data may be malformed. Please see a sample json data given below: { "filter_level": "low", "retweeted": false, "in_reply_to_screen_name": null, "possibly_sensitive": false, "truncated": false, "lang": "en", "in_reply_to_status_id_str": null, "id": 654395184428515332, "extended_entities": { "media": [{ "sizes": { "thumb": { "w": 150, "resize": "crop", "h": 150 }, "small": { "w": 340, "resize": "fit", "h": 255 }, "large": { "w": 1024, "resize": "fit", "h": 768 }, "medium": { "w": 600, "resize": "fit", "h": 450 } }, "source_user_id": 16864598, "media_url": "http://pbs.twimg.com/media/CRSL2MPWsAAOnZo.jpg", "display_url": "pic.twitter.com/i3004WyF4g<http://pic.twitter.com/i3004WyF4g>", "type": "photo", "url": "http://t.co/i3004WyF4g", "id": 654301608990388224, "media_url_https": "https://pbs.twimg.com/media/CRSL2MPWsAAOnZo.jpg", "expanded_url": "http://twitter.com/lordlancaster/status/654301626665189376/photo/1", "source_user_id_str": "16864598", "indices": [143, 144], "source_status_id_str": "654301626665189376", "source_status_id": 654301626665189376, "id_str": "654301608990388224" }, { "sizes": { "thumb": { "w": 150, "resize": "crop", "h": 150 }, "small": { "w": 340, "resize": "fit", "h": 255 }, "large": { "w": 1024, "resize": "fit", "h": 768 }, "medium": { "w": 600, "resize": "fit", "h": 450 } }, "source_user_id": 16864598, "media_url": "http://pbs.twimg.com/media/CRSL2MRWgAAGOcj.jpg", "display_url": "pic.twitter.com/i3004WyF4g<http://pic.twitter.com/i3004WyF4g>", "type": "photo", "url": "http://t.co/i3004WyF4g", "id": 654301608998764544, "media_url_https": "https://pbs.twimg.com/media/CRSL2MRWgAAGOcj.jpg", "expanded_url": "http://twitter.com/lordlancaster/status/654301626665189376/photo/1", "source_user_id_str": "16864598", "indices": [143, 144], "source_status_id_str": "654301626665189376", "source_status_id": 654301626665189376, "id_str": "654301608998764544" }, { "sizes": { "thumb": { "w": 150, "resize": "crop", "h": 150 }, "small": { "w": 340, "resize": "fit", "h": 255 }, "large": { "w": 1024, "resize": "fit", "h": 768 }, "medium": { "w": 600, "resize": "fit", "h": 450 } }, "source_user_id": 16864598, "media_url": "http://pbs.twimg.com/media/CRSL2MQWwAAP4Qo.jpg", "display_url": "pic.twitter.com/i3004WyF4g<http://pic.twitter.com/i3004WyF4g>", "type": "photo", "url": "http://t.co/i3004WyF4g", "id": 654301608994586624, "media_url_https": "https://pbs.twimg.com/media/CRSL2MQWwAAP4Qo.jpg", "expanded_url": "http://twitter.com/lordlancaster/status/654301626665189376/photo/1", "source_user_id_str": "16864598", "indices": [143, 144], "source_status_id_str": "654301626665189376", "source_status_id": 654301626665189376, "id_str": "654301608994586624" }, { "sizes": { "thumb": { "w": 150, "resize": "crop", "h": 150 }, "small": { "w": 340, "resize": "fit", "h": 255 }, "large": { "w": 1024, "resize": "fit", "h": 768 }, "medium": { "w": 600, "resize": "fit", "h": 450 } }, "source_user_id": 16864598, "media_url": "http://pbs.twimg.com/media/CRSL2M8WcAEXowZ.jpg", "display_url": "pic.twitter.com/i3004WyF4g<http://pic.twitter.com/i3004WyF4g>", "type": "photo", "url": "http://t.co/i3004WyF4g", "id": 654301609179115521, "media_url_https": "https://pbs.twimg.com/media/CRSL2M8WcAEXowZ.jpg", "expanded_url": "http://twitter.com/lordlancaster/status/654301626665189376/photo/1", "source_user_id_str": "16864598", "indices": [143, 144], "source_status_id_str": "654301626665189376", "source_status_id": 654301626665189376, "id_str": "654301609179115521" }] } } Appreciate any help! Thanks, Joel ________________________________ THIS ELECTRONIC MESSAGE, INCLUDING ANY ACCOMPANYING DOCUMENTS, IS CONFIDENTIAL and may contain information that is privileged and exempt from disclosure under applicable law. If you are neither the intended recipient nor responsible for delivering the message to the intended recipient, please note that any dissemination, distribution, copying or the taking of any action in reliance upon the message is strictly prohibited. If you have received this communication in error, please notify the sender immediately. Thank you. ________________________________ THIS ELECTRONIC MESSAGE, INCLUDING ANY ACCOMPANYING DOCUMENTS, IS CONFIDENTIAL and may contain information that is privileged and exempt from disclosure under applicable law. If you are neither the intended recipient nor responsible for delivering the message to the intended recipient, please note that any dissemination, distribution, copying or the taking of any action in reliance upon the message is strictly prohibited. If you have received this communication in error, please notify the sender immediately. Thank you. ________________________________ THIS ELECTRONIC MESSAGE, INCLUDING ANY ACCOMPANYING DOCUMENTS, IS CONFIDENTIAL and may contain information that is privileged and exempt from disclosure under applicable law. If you are neither the intended recipient nor responsible for delivering the message to the intended recipient, please note that any dissemination, distribution, copying or the taking of any action in reliance upon the message is strictly prohibited. If you have received this communication in error, please notify the sender immediately. Thank you. ________________________________ THIS ELECTRONIC MESSAGE, INCLUDING ANY ACCOMPANYING DOCUMENTS, IS CONFIDENTIAL and may contain information that is privileged and exempt from disclosure under applicable law. If you are neither the intended recipient nor responsible for delivering the message to the intended recipient, please note that any dissemination, distribution, copying or the taking of any action in reliance upon the message is strictly prohibited. If you have received this communication in error, please notify the sender immediately. Thank you. ====================================================================== THIS ELECTRONIC MESSAGE, INCLUDING ANY ACCOMPANYING DOCUMENTS, IS CONFIDENTIAL and may contain information that is privileged and exempt from disclosure under applicable law. If you are neither the intended recipient nor responsible for delivering the message to the intended recipient, please note that any dissemination, distribution, copying or the taking of any action in reliance upon the message is strictly prohibited. If you have received this communication in error, please notify the sender immediately. Thank you.
