IMHO, if you want this to be more generic, I would have it just take the full line, and then parse it out. Why? Because what happens when you have an indeterminate number of columns? That's my own pesonal opinion though. As far as implementation, I would return a DataBag (because what you want are many rows, and Bags = rows).
you want these two things to make the Tuples and output bag: private static final TupleFactory mTupleFactory = TupleFactory.getInstance(); private static final BagFactory mBagFactory = BagFactory.getInstance(); Their use is described in the Pig api, but essentially, you'll have something like this (this is off the cuff and needs some love, but is the general idea)... DataBag output = mBagFactory.newDefaultBag(); String[] vals = ((String)input.get(0)).split("|"); List<Object> protoTuple = new ArrayList<Object>(3); protoTuple.add(vals[0]); //the first will be the ID protoTuple.add(null); protoTuple.add(null); for (int i = 1; i < vals.length; i++) { String[] colAndValue = vals[i].split(":"); protoTuple.set(1, colAndValue[0]); //the column name protoTuple.set(2, colAndValue[0]); //the value output.add(mTupleFactory.newTuple(protoTuple)); //the default of newTuple(List) is to copy the List over, which is what we want } return output; the output will always have ID, then col and val. You want to flatten the output of this UDF. 2012/7/2 naresh <meumanar...@gmail.com> > Thanks for the suggestions. > > @Jonathan Coveney: > > input tuple : (id1,column1,column2) > output : two tuples (id1,column1) and (id2,column2) so it is List<Tuple> > or should I return a Bag? > > public class SPLITTUPPLE extends EvalFunc <List<Tuple>> > { > public List<Tuple> exec(Tuple input) throws IOException { > if (input == null || input.size() == 0) > return null; > try{ > // not sure how whether I can create tuples on my own. Looks > like I should use TupleFactory. > // return list of tuples. > }catch(Exception e){ > throw WrappedIOException.wrap("Caught exception processing > input row ", e); > } > } > } > > Can you point me to some example? > > Thanks for your time, > Naresh. > > On Mon, Jul 2, 2012 at 9:34 AM, Jonathan Coveney <jcove...@gmail.com> > wrote: > > > You can probably hack together something that will do exactly this > without > > writing a UDF, but I think a UDF will be most useful here...especially if > > you want to add more columns, etc etc. > > > > 2012/7/1 Subir S <subir.sasiku...@gmail.com> > > > > > Would FLATTEN help? > > > > > > B = GROUP A by ID; > > > > > > C = FOREACH B GENERATE group, FLATTEN ($1); > > > > > > Might work i guess. Not tested. > > > > > > On Mon, Jul 2, 2012 at 8:04 AM, naresh <meumanar...@gmail.com> wrote: > > > > > > > Hi, > > > > > > > > I am new to pig scripting. I like to generate multiple tuples > > > from > > > > a single tuple. What I mean is: > > > > > > > > I have file with following data in it. > > > > > > > > >> cat data > > > > > > > > ID | ColumnName1:Value1 | ColumnName2:Value2 > > > > > > > > so I load it by the following command > > > > > > > > grunt >> A = load '$data' using PigStorage('|'); > > > > > > > > grunt >> dump A; > > > > > > > > (ID,ColumnName1:Value1,ColumnName2:Value2) > > > > > > > > Now I want to split this tuple into two tuples. > > > > > > > > (ID, ColumnName1, Value1) > > > > (ID, ColumnName2, Value2) > > > > > > > > Can I use UDF along with foreach and generate. Some thing like the > > > > following? > > > > > > > > grunt >> foreach A generate SOMEUDF(A) > > > > > > > > Thanks for your time, > > > > Naresh. > > > > > > > > > >