Hi, > If you have any tutorial for extracting data from complex nested json >arrays (as the example given in my previous email), please send it.
90% of working with the real world is cleansing bad data. People under-sell hive's flexibility in situations like this. This is what I do hive> compile ` import org.apache.hadoop.hive.ql.exec.UDF \; import groovy.json.JsonSlurper \; import org.apache.hadoop.io.Text \; public class JsonExtract extends UDF { public int evaluate(Text a){ def jsonSlurper = new JsonSlurper() \; def obj = jsonSlurper.parseText(a.toString())\; return obj.val1\; } } ` AS GROOVY NAMED json_extract.groovy; hive> CREATE TEMPORARY FUNCTION json_extract as 'JsonExtract'; hive> select json_extract('{"val1": 2}') from date_dim limit 1; select json_extract('{"val1": 2}') from date_dim limit 1 OK 2 Time taken: 0.13 seconds, Fetched: 1 row(s) Caveats - this generates bytecode at runtime, so keep an eye on the hive> list jars; Because there's no real namespacing, naming your classes/functions the same while developing can drive you crazy (a little). Cheers, Gopal