Hi,

I'm writing some Hive UDFs, using JNI to talk to a native C library. The C 
library requires some expensive initialization, and maintains its internal 
state via a handle. To avoid re-initializing this library at every row, I 
initialize the library on the first row, then store the handle as a static 
variable in the Java world and fetch that for subsequent rows. This is all 
working fine.

The tough part is that the library also requires the caller to do cleanup, to 
release that internal state. Being Java, there are no destructors, of course. 
And I can't rely on 'finalize'. So I can't figure out where to clean up this 
library.

Q 1: Is there anything in the Hive + UDF world that will tell my Java code when 
the query is finished, so that I can cleanup that library? Or, is there any 
Java mechanism that I can use to do this?

I'm using the 'UDF' class not 'GenericUDF', but I don't think that matters. I 
don't see anything in either that looks like a cleanup, and GenericUDF's 
'close' doesn't ever get called, AFAICT.

Q 2: Because I'm storing the library's internal state handle as a static 
variable in the Java code, it would be available to any threads that use the 
Java code. That would be a problem. So, my question is: Will a single UDF 
instance ever be accessed by more than one thread ? In other words, are UDFs 
thread-safe ? Even if the query contains multiple UDF calls ? I need to know if 
my assumption about being able to store this C-library's state as a Java 
'static' is a safe assumption or not.



Thanks in advance

-c

Reply via email to