Hadoop tasks use a single thread, so there won't be multiple threads
accessing the UDF.
However, there's a flip side of thread safety if your UDF maintains state;
is it receiving all the data it should or is the data being sharded over
multiple processes in a way that defeats the UDF? My favorite
Yes , in a map only query your udf will be executed at the mapper side.
I don't know how you can make your udf thread safe. But what I do set the
number of reducers to 1 and make sure that I write a query which has both map
and reduce.
Then the udf will be executed at the reduce phase and suf
Hi All,
Could anyone describe what the required thread safety for a UDF is? I
understand that one is instantiated for each use of the function in an
expression, but can there be multiple threads executing the methods of a
single UDF object at once?
Thanks,
Shaun