Hi, I have a UDF that parses a line and then return a bag, and sometimes the line is bad so I'm returning null in the UDF. In my pig script, I'd like to filter those nulls like this:
raw = LOAD 'raw_input' AS (line:chararray); parsed = FOREACH raw GENERATE FLATTEN(MyUDF(line)); -- get two fields in the tuple: id and name DUMP parsed; (id1,name1) (id2,name2) () (id3,name3) parsed_no_nulls = FILTER parsed BY id IS NOT NULL; DUMP parsed_no_nulls; (id1,name1) (id2,name2) (id3,name3) This works, but I'm getting this warning: WARN org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigHadoopLogger - org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject: Attempt to access field which was not found in the input When I try to use IsEmpty to filter, I get this error "Cannot test a NULL for emptiness". What's the correct way to filter out these null bags returned from my UDF? Thanks. Dexin