Hi All,
I have some question about using EB's VectorWritableConverter in my Pig script 
for data vectorization.
I am generating the tuples using a UDF, however for 
simplicity I am loading the data from a file in the following code. My 
UDF returns tuples of the form (1,0,1,1...) etc.

My map.dat file has the following format:

1,0,1,1
0,1,1,1,
0,0,1,1,
1,1,0,0,
.......
.......
........

I register the necessary jar files. 

%declare SEQFILE_LOADER 'com.twitter.elephantbird.pig.load.SequenceFileLoader';
%declare TEXT_CONVERTER 'com.twitter.elephantbird.pig.util.TextConverter';
%declare LONG_CONVERTER 
'com.twitter.elephantbird.pig.util.LongWritableConverter';
%declare VECTOR_CONVERTER 
'com.twitter.elephantbird.pig.mahout.VectorWritableConverter';

/* Loading from a file instead of UDF for simplicity */

A = LOAD 'map.dat';

/*
 I am not sure how to use the VectorWritableConverter to convert tuple 
in the relation A to a vector using VectorWritableConverter */
B = FOREACH A GENERATE $VECTOR_CONVERTER();

DUMP B;
                                          

Reply via email to