Hi all, Wanted to let you know of a potential bug I've run into when loading custom jar's dynamically (i.e. "ADD JAR /path/to/jar"). Hopefully someone can tell me if this is a bug, expected behavior, or something I'm causing myself :)
We have a custom StorageHandler that we're updating from Hive 1.2.1 to Hive 3.0.0. During testing we found that under some circumstances, queries to tables backed by our StorageHandler would return result sets with 'NULL' in each cell. Digging in, we found that our SerDe's deserialize() method was returning null after a failed "instanceof" sanity check on the input Writable. Debugging a bit, we found that the "instanceof" operands were the same class/package, but had been loaded by two different UDFClassLoader instances. This behavior seems suspiciously like what was warned against in an early comment on HIVE-11878 when UDFClassLoader was introduced, so I'm 99% sure it is unintended. (see: https://issues.apache.org/jira/browse/HIVE-11878?focusedCommentId=14876858&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14876858) The behavior is reproducible with the following steps: 1. Find a custom StorageHandler to use. I wrote a stub StorageHandler here (https://github.com/gerlowskija/hive-bug-serde/) which reproduces the issue. 2. Create a table using the StorageHandler: hive -n $hive_user -p $hive_pass -e "ADD JAR /tmp/mycustomserde.jar; CREATE EXTERNAL TABLE my_ext_table (hello_col STRING, world_col STRING) STORED BY 'com.helloworld.serde.HelloWorldStorageHandler' LOCATION '/tmp/some_dir';" 3. Put some data in your external table: hive -n $hive_user -p $hive_pass -e "ADD JAR /tmp/mycustomserde.jar; INSERT INTO my_ext_table VALUES ('hello', 'world');" 4. Query your external table: hive -n $hive_user -p $hive_pass -e "ADD JAR /tmp/mycustomserde.jar; SELECT * FROM my_ext_table;" Depending on the custom serde you're using the bug might exhibit itself differently. But most SerDe's, which cast the "Writable" arg to a specific Writable implementation in their deserialize method, will print a table full of 'NULL' values. (The provided stub StorageHandler shows the bug this way. It also logs the "instanceof" operands out to hiveserver2.log, making the behavior clearer: "Received unexpected Writable class. Expected com.helloworld.serde.HelloWorldWritable from classloader org.apache.hadoop.hive.ql.exec.UDFClassLoader@489d24e9, but actually was com.helloworld.serde.HelloWorldWritable from classloader org.apache.hadoop.hive.ql.exec.UDFClassLoader@75517e2b"). I've written the behavior and reproduction steps up in more detail here: https://github.com/gerlowskija/hive-bug-serde/. Please let me know if this is a true bug in Hive as I suspect, or if there's something I can be doing to avoid these Classloader conflicts. Thanks, Jason