Hi, I have been dealing with some heavily nested and complex JSON data. It has all sorts of combinations like: Struct<....results: <array<struct<...scores: <array<int>>>>>,...>
I wanted to know which approach you find better: using the SerDe or using the UDFs. In my opinion the two approaches can be compared in the following way. Please correct me if your experience has been different in some regard: Approach 1: UDF based Approach 2: SerDe based *Table Schema* Very small. Big. Mirrors the JSON structure and directly proportional to how complex/heavily nested JSON is. *Size/verbosity of Query* More esp. if using 'lateral view json_tuple(..)' Less *Maintenance Effort: JSON structure changes* Update Query (optional)* Update Schema. Update Query (optional)* *Processing heavily nested/complex JSON* May need to write couple of custom UDFs but all in all possible. The SerDes available out there need patching as they are not mature.** - *Only if the field that got added or whose position got changed is needed to be queried. - Do you know any JSON SerDes that are robust enough to process complex JSONs ? I think I find more examples of UDF based approach. If the SerDes are less error prone then the queries will end up being really succinct. Regards, Himanshu