Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change 
notification.

The "Hive/GenericUDAFCaseStudy" page has been changed by ArvindPrabhakar.
http://wiki.apache.org/hadoop/Hive/GenericUDAFCaseStudy?action=diff&rev1=1&rev2=2

--------------------------------------------------

  
  == Writing the source ==
  
- As stated above, create a new file called 
`ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFHistogram.java`, 
relative to the Hive root directory. Please see the 
`ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFHistogramNumeric.java`
 for a detailed example of a UDAF.
+ This section gives a high-level outline of how to implement your own generic 
UDAF. For a concrete example, look at any of the existing UDAF sources present 
in `ql/src/java/org/apache/hadoop/hive/ql/udf/generic/` directory.
+ 
+ At a high-level, there are two parts to implementing a Generic UDAF. The 
first is to write an ''evaluator'', and the second is to create a ''resolver''. 
An evaluator is the actual implementation of the generic UDAF with the 
processing logic in place. The resolver on the other provides a mechanism for 
the evaluator to be accessed by the query processing framework.
+ 
+ All evaluators must extend from the abstract base class 
org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator. This class provides 
a few abstract methods that must be implemented by the extending class. These 
methods establish the processing semantics followed by the UDAF. Please refer 
to the javadocs for the abstract methods to see their exact specifications.
+ 
+ The implementation of resolver is done by either implementing the interface 
org.apache.hadoop.hive.ql.udf.GenericUDAFResolver2 or extending from the 
abstract class 
org.apache.hadoop.hive.ql.udf.generic.AbstractGenericUDAFResolver. There is 
also an interface org.apache.hadoop.hive.ql.udf.GenericUDAFResolver that can be 
implemented, but is deprecated as of 0.6.0 release. The key difference between 
GenericUDAFResolver and GenericUDAFResovler2 interface is the fact that the 
later allows the evaluator implementation to access extra information regarding 
the function invocation such as the presence of DISTINCT qualifier or the 
invocation with the wildcard syntax such as FUNCTION(*). Evaluators that 
implement the deprecated GenericUDAFResolver interface will not be able to tell 
the difference between an invocation such as FUNCTION() or FUNCTION(*) since 
the information regarding specification of the wildcard is not available. 
Similarly, these implementations will also not be able to tell the difference 
between FUNCTION(EXPR) vs FUNCTION(DISTINCT EXPR) since the information 
regarding presence of the DISTINCT qualifier too is not available.
+ 
+ Note that while the resolvers which implement the GenericUDAFResolver2 
interface are provided the extra information regarding the presence of DISTINCT 
qualifier of invocation with the wildcard syntax, they can choose to ignore it 
completely if it is of no significance to them. The underlying data 
manipulation to ensure DISTINCT nature of the expression values is actually 
done by the framework and not by the evaluator or resolver. For UDAF 
implementations that do not care about this extra information, they could 
simply extend from the AbstractGenericUDAFResolver interface which insulates 
the implementation from this information. It also offers an easy way to 
transition previously written UDAF implementations to migrate to the new 
resolver interface without having to re-write the implementation since the 
change from implementing GenericUDAFResolver interface to extending 
AbstractGenericUDAFResolver class is fairly minimal. There may be issues with 
implementations that are part of a inheritance hierarchy since it may not be 
easy to change the base class.
  
  == Modifying the function registry ==
  

Reply via email to