By extending an abstract class, you can reuse the generics for the pig input's Tuple ETL validation, and a consistent hook for your DataBag parsing logic. Consider the following abstract class ParseBagAsBag, which can be extended by your own MyDatabagParserToDataBag, with override to method parser_logic() and with references to the output super.bag:
public abstract class ParseBagAsBag extends EvalFunc<DataBag> { public TupleFactory tuple_factory = TupleFactory.getInstance(); public BagFactory bag_factory = BagFactory.getInstance(); public DataBag bag; /** * Wrapper for Deconstructing the input Tuple to extract DataBag component. * @param input Tuple containing DataBag. * @return DataBag of parser logic, NULL iff bag is empty. * @throws IOException */ @Override public DataBag exec(Tuple input) throws IOException { this.tuple = this.tuple_factory.newTuple(); // if valid, create a new Tuple from factory if (input != null) { // @precondition check if ((!input.isNull()) && (input.size() > 0)) { // @precondition check; tuple is non-empty and interesting Object oBag = input.get(0); // DataBag wrapped in a one-element Tuple if (oBag instanceof DataBag) { // @precondition check; type pig.DataBag DataBag databag = (DataBag) oBag; parser_logic(databag); } } } return (this.bag.size() > 0) ? this.bag : null; // return the bag iff modified from factory instantiation, otherwise return NULL Object } public abstract void parser_logic(DataBag databag) throws IOException; } Hope this helps. -Dan On Mon, Mar 18, 2013 at 11:01 AM, Jonathan Coveney <jcove...@gmail.com>wrote: > Ah, I suppose I was just proving it oculd be done. > > To make a new one, you'd do: > > public class MyUdf extends EvalFunc<DataBag> { > private static final BagFactory mBagFactory = BagFactory.getInstance(); > public DataBag exec(Tuple input) throws IOException { > DataBag output = mBagFactory.newDefaultBag(); > for (Tuple t : (DataBag)input.get(0)) { > output.add(t); > } > return output; > } > } > > > > > 2013/3/18 Kris Coward <k...@melon.org> > > > > > But he asked for a function that returns *another* bag ;) > > > > Snark aside, when returning bags or tuples, it's also worthwhile to at > > least consider also defining the output schema, which for your example > > code would probably mean > > > > public Schema outputSchema(Schema input){ > > Schema output = new Schema(); > > output.add(input.getField(0)); > > return output; > > } > > > > (possibly with some omitted exception handling) > > > > -Kris > > > > On Mon, Mar 18, 2013 at 11:19:17AM +0100, Jonathan Coveney wrote: > > > Absolutely. > > > > > > public class MyUdf extends EvalFunc<DataBag> { > > > public DataBag exec(Tuple input) throws IOException { > > > return (DataBag)input.get(0); > > > } > > > } > > > > > > > > > A dummy example, but there you go. DataBag is a valid pig type like any > > > other, so you just returnit like you would normally. > > > > > > > > > 2013/3/18 pranjal rajput <fighterjockey...@gmail.com> > > > > > > > Hi, > > > > Can we define a UDF in pig that takes a bag as an input and returns > > another > > > > bag as output? > > > > How can this be done? > > > > Thanks, > > > > -- > > > > regards > > > > Pranjal > > > > > > > > -- > > Kris Coward http://unripe.melon.org/ > > GPG Fingerprint: 2BF3 957D 310A FEEC 4733 830E 21A4 05C7 1FEB 12B3 > > >