It looks like I'm hitting this bug in jackson-core 2.2.3 which is included in the version of CDH I'm on: https://github.com/FasterXML/jackson-core/issues/115
Jackson-core 2.3.0 has the fix. On Tue, Jan 24, 2017 at 5:14 PM, Andrew Ehrlich <and...@aehrlich.com> wrote: > On Spark 1.6.0, calling json_tuple() with an emoji character in one of the > values returns nulls: > > Input: > """ > "myJsonBody": { > "field1": "📻" > } > """ > > Query: > """ > ... > LATERAL VIEW JSON_TUPLE(e.myJsonBody,'field1') k AS field1, > ... > > """ > > This looks like a platform-dependent issue; the parsing works fine on my > local computer (OSX, 1.6.3) and fails on the remote cluster(Centos7, 1.6.0) > > I noticed that in 1.6.0, json_tuple was implemented this way: > https://github.com/apache/spark/pull/7946/files > > So far I have: > > - Checked all java system properties related to charsets on drivers > and executors > - Turned up logging to debug level and checked for relevant messages > > Any more input? Should I try the dev mailing list? >