Hi: Q1: maybe there is something wrong with the udf itself? Q2: How do you specify the data as dirty? One of your 6 fields is null? then you could something like: FILTER BY ($0 IS NULL OR $1 IS NULL...)
Ruslan On Fri, Apr 19, 2013 at 6:57 AM, 何琦 <h...@mobicloud.com.cn> wrote: > > Hi, > > Q1:I have a question about how to use filter on tuple. > The code is: > -------------------------------------------------------- > REGISTER pig.jar; > raw = LOAD 'data.txt' USING PigStorage('|') AS (phoneNum, tag, flow, > duration, count); > sumed = FOREACH (GROUP raw BY (phoneNum, tag)){ > totalFlow = SUM(raw.flow); > totalDuration = SUM(raw.duration); > totalCount = SUM(raw.count); > GENERATE flatten(group), TOTUPLE(tutalFlow, totalDuration, totalCount) > AS condition; > }; > filtered = FILTER sumed BY com.filter.TagFilter(condition); > DUMP filtered; > -------------------------------------------------------- > But I got an error: > ERROR 1045: > <file reduce.pig, line 9, column 23> Could not infer the matching function > for com.filter.TagFilter as multiple or none of them fit. Please use an > explicit cast. > Is there anything wrong? > > Q2:how to deal with dity datas. > there are some dity datas in my files, such as; > $cat data.txt > 1|2|3|4|5|6 > 2|3|4|5|6| > 0|2|3| > 7|7||0|| > The third line is dity data for me. I want to filter it. But no matter > SIZE(),COUNT() or anything else, I can't filter it. > Is there any function or method to solve this question? >