Hi, I'm writing some Hive UDFs, using JNI to talk to a native C library. The C library requires some expensive initialization, and maintains its internal state via a handle. To avoid re-initializing this library at every row, I initialize the library on the first row, then store the handle as a static variable in the Java world and fetch that for subsequent rows. This is all working fine.
The tough part is that the library also requires the caller to do cleanup, to release that internal state. Being Java, there are no destructors, of course. And I can't rely on 'finalize'. So I can't figure out where to clean up this library. Q 1: Is there anything in the Hive + UDF world that will tell my Java code when the query is finished, so that I can cleanup that library? Or, is there any Java mechanism that I can use to do this? I'm using the 'UDF' class not 'GenericUDF', but I don't think that matters. I don't see anything in either that looks like a cleanup, and GenericUDF's 'close' doesn't ever get called, AFAICT. Q 2: Because I'm storing the library's internal state handle as a static variable in the Java code, it would be available to any threads that use the Java code. That would be a problem. So, my question is: Will a single UDF instance ever be accessed by more than one thread ? In other words, are UDFs thread-safe ? Even if the query contains multiple UDF calls ? I need to know if my assumption about being able to store this C-library's state as a Java 'static' is a safe assumption or not. Thanks in advance -c