Hi Robin,igor

Thanks for the suggestion and links. Based on examples I found, below is my 
UDF. However, I am getting following error when trying to run it. Not sure what 
the error means

============= ERROR ====================
FAILED: Hive Internal Error: 
java.lang.RuntimeException(java.lang.NoSuchMethodException: [D.<init>())
java.lang.RuntimeException: java.lang.NoSuchMethodException: [D.<init>()
        at 
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:115)
        at 
org.apache.hadoop.hive.serde2.objectinspector.ReflectionStructObjectInspector.create(ReflectionStructObjectInspector.java:170)
        at 
org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters$StructConverter.<init>(ObjectInspectorConverters.java:225)
        at 
org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters.getConverter(ObjectInspectorConverters.java:127)
        at 
org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters$StructConverter.<init>(ObjectInspectorConverters.java:221)
        at 
org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters.getConverter(ObjectInspectorConverters.java:127)


============= UDF CODE ==================
package com.netflix.hive.udaf;

import java.io.IOException;
import java.lang.reflect.Array;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;
import java.util.Set;

import org.apache.hadoop.hive.ql.exec.Description;
import org.apache.hadoop.hive.ql.exec.UDAF;
import org.apache.hadoop.hive.ql.exec.UDAFEvaluator;
import org.apache.hadoop.hive.serde2.io.DoubleWritable;
import org.apache.hadoop.io.IntWritable;

@Description(
                name = "MFFoldIn",
                value = "_FUNC_(expr, nb) - Computes latent features for a 
given item/user based user/item vectors",
                extended = "Example:\n"
)

public class MFFoldIn extends UDAF {
        
        public static class MFFoldInEvaluator implements UDAFEvaluator{
                public static class PartialResult{
                        double[] c1;
                        double[][] c2;
                        double[][] c3;
                        double wm;
                        double lambda;
                        int itemCount;
                        double[][] varco;
                        Set<Long> observedShows;
                        
                        public int getDimensionsCount() throws Exception{
                                if(c1 != null) return c1.length;
                                throw new Exception("Unknown dimension count");
                        }
                }
                
                private UserVecBuilder builder;
                
                public void init() {
                        builder = null;
                }
                
                public boolean iterate(DoubleWritable wm, DoubleWritable lambda,
                                IntWritable itemCount, String itemSquaredFile, 
                                DoubleWritable weight, List<Double> lf,
                                Long item) throws IOException{
                        
                        double[] lflist = new double[lf.size()];
                        for(int i=0; i<lf.size(); i++)
                                lflist[i] = lf.get(i).doubleValue();
                        
                        if(builder == null) builder = new UserVecBuilder();
                        
                        if(!builder.isReady()){
                                builder.setW_m(wm.get());
                                builder.setLambda(lambda.get());
                                builder.setItemRowCount(itemCount.get());
                                
builder.readItemCovarianceMatFiles(itemSquaredFile, lflist.length);             
                
                        }

                                
                        builder.add(item, lflist, weight.get());
                        
                        return true;
                        
                }
                
                public PartialResult terminatePartial(){
                        PartialResult partial = new PartialResult();
                        partial.c1 = builder.getComponent1();
                        partial.c2 = builder.getComponent2();
                        partial.c3 = builder.getComponent3();
                        partial.wm = builder.getW_m();
                        partial.lambda = builder.getLambda();
                        partial.observedShows = builder.getObservedShows();
                        partial.itemCount = builder.getItemRowCount();
                        partial.varco = builder.getVarCovar();
                        return partial;
                }
                
                public boolean merge(PartialResult other){
                        if(other == null) return true;
                        if(builder == null) builder = new UserVecBuilder();
                        
                        if(!builder.isReady()){
                                builder.setW_m(other.wm);
                                builder.setLambda(other.lambda);
                                builder.setItemRowCount(other.itemCount);
                                builder.setItemCovarianceMat(other.varco);
                                builder.setComponent1(other.c1);
                                builder.setComponent2(other.c2);
                                builder.setComponent3(other.c3);
                                builder.setObservedShows(other.observedShows);
                        }else{
                                builder.merge(other.c1, other.c2, other.c3, 
other.observedShows);
                        }
                        return true;
                }
                
                public double[] terminate(){
                        if(builder == null) return null;
                        return builder.build();
                }
                
        }

}


====================
On Jul 29, 2013, at 4:37 PM, Igor Tatarinov wrote:

> I found this Cloudera example helpful:
> http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/org.apache.hadoop.hive/hive-contrib/0.7.0-cdh3u0/org/apache/hadoop/hive/contrib/udaf/example/UDAFExampleMaxMinNUtil.java#UDAFExampleMaxMinNUtil.Evaluator
> 
> igor
> decide.com
> 
> 
> 
> On Mon, Jul 29, 2013 at 4:32 PM, Ritesh Agrawal <[email protected]> wrote:
> Hi Robin,
> 
> Thanks for the suggestion. I did find such an example in Hadoop The 
> definitive guide book. However I am not total confused.
> 
> The book extends UDAF instead of AbstractGenericUDAFResolver. Which one is 
> recommended ?
> 
> Also the example in the book uses DoubleWritable as a return type for the 
> "terminate" function. However, I will be returning an arraylist of double. Do 
> I always need to written objects that are derived from WritableComponents.
> 
> Ritesh
> On Jul 29, 2013, at 4:15 PM, Robin Morris wrote:
> 
> > I believe a map will be passed correctly from the terminatePartial to the
> > merge functions.  But it seems a bit of overkill.
> >
> > Why not define a class within your UDAF which has 4 public data members,
> > and return instances of that class from terminatePartial()?
> >
> > Robin
> >
> >
> > On 7/29/13 3:19 PM, "Ritesh Agrawal" <[email protected]> wrote:
> >
> >> Hi all,
> >>
> >> I am writing my first UDAF. In my terminatePartial() function, I need to
> >> store different data having different data types. Below is a list of
> >> items that I need to store
> >> 1. C1 : list of doubles
> >> 2. C2: list of doubles
> >> 3. C3: double
> >> 4. Show: list of strings
> >>
> >>
> >> I am wondering can I use simple HashMap and store these different objects
> >> into it. Will it automatically serialize or will I need to write my own
> >> serializiable method. Also is there any example of a UDAF that shows how
> >> to use map type structure for storing partial results.
> >>
> >> Thanks
> >>
> >> Ritesh
> >
> 
> 

Reply via email to