[ https://issues.apache.org/jira/browse/PIG-1252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Daniel Dai closed PIG-1252. --------------------------- > Diamond splitter does not generate correct results when using Multi-query > optimization > -------------------------------------------------------------------------------------- > > Key: PIG-1252 > URL: https://issues.apache.org/jira/browse/PIG-1252 > Project: Pig > Issue Type: Bug > Affects Versions: 0.6.0 > Reporter: Viraj Bhat > Assignee: Richard Ding > Fix For: 0.7.0 > > Attachments: PIG-1252-2.patch, PIG-1252.patch > > > I have script which uses split but somehow does not use one of the split > branch. The skeleton of the script is as follows > {code} > loadData = load '/user/viraj/zebradata' using > org.apache.hadoop.zebra.pig.TableLoader('col1,col2, col3, col4, col5, col6, > col7'); > prjData = FOREACH loadData GENERATE (chararray) col1, (chararray) col2, > (chararray) col3, (chararray) ((col4 is not null and col4 != '') ? col4 : > ((col5 is not null) ? col5 : '')) as splitcond, (chararray) (col6 == 'c' ? 1 > : IS_VALID ('200', '0', '0', 'input.txt')) as validRec; > SPLIT prjData INTO trueDataTmp IF (validRec == '1' AND splitcond != ''), > falseDataTmp IF (validRec == '1' AND splitcond == ''); > grpData = GROUP trueDataTmp BY splitcond; > finalData = FOREACH grpData { > orderedData = ORDER trueDataTmp BY col1,col2; > GENERATE FLATTEN ( MYUDF (orderedData, 60, > 1800, 'input.txt', 'input.dat','20100222','5', 'debug_on')) as (s,m,l); > } > dump finalData; > {code} > You can see that "falseDataTmp" is untouched. > When I run this script with no-Multiquery (-M) option I get the right result. > This could be the result of complex BinCond's in the POLoad. We can get rid > of this error by using FILTER instead of SPIT. > Viraj -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.