Assume that I would like to write this pig script:

REGISTER myudfs.jar;
A = LOAD 'hist_data' AS (id: chararray,  word: chararray, count : float );
B = GROUP A BY id
C = CROSS B, B
D = FOREACH C GENERATE $0, $2, myudfs.HIST($1,$3);
F = ORDER D BY DESC $2
DUMP C;

I take (id, histogram) pairs and I would like to perform a all-to-all 
comparison 

The cross operation  is an overkill because my measure myudfs.HIST($1,$3) is 
symmetric thus ( could cut by half the comparisons), but it will do.

My Real Question is : 
Where I can find a template for the description of this myudfs.HIST($1,$3) ? 

For example: 


package myudfs;
import java.io.IOException;
import org.apache.pig.EvalFunc;
import org.apache.pig.data.Tuple;
import org.apache.pig.impl.util.WrappedIOException;
import Anomaly;

public class HIST extends EvalFunc (?) implements Algebraic 
{

    public String getInitial() {return Initial.class.getName();}
    public String getIntermed() {return Intermed.class.getName();}
    public String getFinal() {return Final.class.getName();}
    static public class Initial extends EvalFunc (Tuple) {
        public Tuple exec(Tuple input) throws IOException {return 
TupleFactory.getInstance().newTuple(count(input));}
    }
    static public class Intermed extends EvalFunc (Tuple) {
        public Tuple exec(Tuple input) throws IOException {return 
TupleFactory.getInstance().newTuple(sum(input));}
    }
    static public class Final extends EvalFunc (Long) {
        public Tuple exec(Tuple input) throws IOException {return sum(input);}
    }

    public Float exec(?) throws IOException {
        if (input == null || input.size() == 0)
            return null;
        try{
            String[]  words1  = new String[];   // need to retrieve the words 
from $1 code above
            double[] counts1 = new double[];  // need to retrieve the counts 
from $1 from above 
            String[]  words2  = new String[];   // from $3 
            double[] counts2 = new double[];
           
            return Anomaly.dist(words1, counts1,words2,count2);
        }catch(Exception e){
            throw WrappedIOException.wrap("Caught exception processing input 
row ", e);
        }
    }
}

Reply via email to