Thanks, it does look like this was the problem. -----Original Message----- From: Cheolsoo Park [mailto:[email protected]] Sent: Monday, December 02, 2013 7:05 AM To: [email protected] Subject: Re: strange problem with count and distinct subscribers
Which version are you using? I am wondering whether PIG-3466 fixes your error- https://issues.apache.org/jira/browse/PIG-3466 You can reproduce the error only when loading more data. You also see a random type cast error. My guess is that you ran into the race condition that PIG-3466 fixed, and your bag is corrupted resulting in the type cast error. On Mon, Nov 18, 2013 at 6:30 AM, Noam Lavie <[email protected]> wrote: > Hi, > I'm trying to run the following pig script (it main purpose is to read > inputs that contains info about phone calls, the script suppose to > count the different types of calls and the different subscribers that made > them): > > SET default_parallel 40; > allFiles = LOAD > 'maprfs:///analytics/data/consumers/mapred/facts/done/FACT_VOICE_GE_Analytics9_1/20131114/' > USING PigStorage(','); > allFilesFiltered = FILTER allFiles BY $11 MATCHES '.*On.*' AND $4 > 0; > datesList = FOREACH allFilesFiltered GENERATE SUBSTRING($0, 0, 10) AS > day, > $11 AS callType, $4 AS amount, $1 AS subscriberKey; datesGroups = > GROUP datesList BY (day, callType); datesGroupsAmount = foreach > datesGroups { > unique_seubscriber = DISTINCT datesList.subscriberKey; > GENERATE group.day, group.callType, COUNT(datesList), > SUM(datesList.amount), COUNT(unique_seubscriber); }; dump > datesGroupsAmount; > > the problem is with the unique_seubscriber. The count and distinct > doesn't work. The strange thing is that if I run script separately for > each sub folder's input - the run will succeed for each part, but if > I'm giving the hall inputs folders together it fails and I get the following > error: > ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open > iterator for alias datesGroupsAmount > > Another error that I get from time to time (if I'm making small > changes in the script) is: > ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open > iterator for alias datesGroupsAmount. Backend error : > java.lang.Boolean cannot be cast to org.apache.pig.data.Tuple (myne > there is a connection between the two errors?) > > Here is the log file: > > Pig Stack Trace > --------------- > ERROR 1066: Unable to open iterator for alias datesGroupsAmount > > org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable > to open iterator for alias datesGroupsAmount > at > org.apache.pig.PigServer.openIterator(PigServer.java:836) > at > org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:696) > at > org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:320) > at > org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:194) > at > org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:170) > at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84) > at org.apache.pig.Main.run(Main.java:604) > at org.apache.pig.Main.main(Main.java:157) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native > Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:601) > at org.apache.hadoop.util.RunJar.main(RunJar.java:197) > Caused by: java.io.IOException: Job terminated with anomalous status FAILED > at > org.apache.pig.PigServer.openIterator(PigServer.java:828) > ... 12 more > > > any help will be appreciate > thanks > Noam > > > ________________________________ > > This email contains proprietary and/or confidential information of Pontis. > If you have received this email in error, please delete all copies > without delay and do not copy, distribute, or rely on any information > contained in this email. > ________________________________ This email contains proprietary and/or confidential information of Pontis. If you have received this email in error, please delete all copies without delay and do not copy, distribute, or rely on any information contained in this email.
