Hi,

> If you have any tutorial for extracting data from complex nested json
>arrays (as the example given in my previous email), please send it.

90% of working with the real world is cleansing bad data. People
under-sell hive's flexibility in situations like this.


This is what I do 

hive> compile `
import org.apache.hadoop.hive.ql.exec.UDF \;
import groovy.json.JsonSlurper \;
import org.apache.hadoop.io.Text \;
public class JsonExtract extends UDF {
  public int evaluate(Text a){
    def jsonSlurper = new JsonSlurper() \;
    def obj = jsonSlurper.parseText(a.toString())\;
    return  obj.val1\;
  }
} ` AS GROOVY NAMED json_extract.groovy;


hive> CREATE TEMPORARY FUNCTION json_extract as 'JsonExtract';


hive> select json_extract('{"val1": 2}') from date_dim limit 1;

select json_extract('{"val1": 2}') from date_dim limit 1
OK
2
Time taken: 0.13 seconds, Fetched: 1 row(s)


Caveats - this generates bytecode at runtime, so keep an eye on the

hive> list jars;

Because there's no real namespacing, naming your classes/functions the
same while developing can drive you crazy (a little).

Cheers,
Gopal








Reply via email to