yeah. That works great. Thanks you Jonathan. On Thu, Mar 1, 2012 at 5:14 PM, Jonathan Coveney <jcove...@gmail.com> wrote:
> FLATTEN is kind of quirky. If you FLATTEN(null), it will return null, but > if you FLATTEN a bag that is empty (ie size=0), it will throw away the row. > I would have your UDF return an empty bag and let the flatten wipe it out. > > 2012/3/1 Dexin Wang <wangde...@gmail.com> > > > Hi, > > > > I have a UDF that parses a line and then return a bag, and sometimes the > > line is bad so I'm returning null in the UDF. In my pig script, I'd like > to > > filter those nulls like this: > > > > raw = LOAD 'raw_input' AS (line:chararray); > > parsed = FOREACH raw GENERATE FLATTEN(MyUDF(line)); -- get two fields > in > > the tuple: id and name > > DUMP parsed; > > > > (id1,name1) > > (id2,name2) > > () > > (id3,name3) > > > > parsed_no_nulls = FILTER parsed BY id IS NOT NULL; > > DUMP parsed_no_nulls; > > > > (id1,name1) > > (id2,name2) > > (id3,name3) > > > > This works, but I'm getting this warning: > > > > WARN > > > > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigHadoopLogger > > - > > > > > org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject: > > Attempt to access field which was not found in the input > > > > When I try to use IsEmpty to filter, I get this error "Cannot test a NULL > > for emptiness". > > > > What's the correct way to filter out these null bags returned from my > UDF? > > > > Thanks. > > Dexin > > >