Thanks, it does look like this was the problem.

-----Original Message-----
From: Cheolsoo Park [mailto:[email protected]]
Sent: Monday, December 02, 2013 7:05 AM
To: [email protected]
Subject: Re: strange problem with count and distinct subscribers

Which version are you using? I am wondering whether PIG-3466 fixes your
error-
https://issues.apache.org/jira/browse/PIG-3466

You can reproduce the error only when loading more data. You also see a random 
type cast error. My guess is that you ran into the race condition that PIG-3466 
fixed, and your bag is corrupted resulting in the type cast error.





On Mon, Nov 18, 2013 at 6:30 AM, Noam Lavie <[email protected]> wrote:

> Hi,
> I'm trying to run the following pig script (it main purpose is to read
> inputs that contains info about phone calls, the script suppose to
> count the different types of calls and the different subscribers that made 
> them):
>
> SET default_parallel 40;
> allFiles = LOAD
> 'maprfs:///analytics/data/consumers/mapred/facts/done/FACT_VOICE_GE_Analytics9_1/20131114/'
> USING PigStorage(',');
> allFilesFiltered = FILTER allFiles BY $11 MATCHES '.*On.*' AND $4 > 0;
> datesList = FOREACH allFilesFiltered GENERATE SUBSTRING($0, 0, 10) AS
> day,
> $11 AS callType, $4 AS amount, $1 AS subscriberKey; datesGroups =
> GROUP datesList BY (day, callType); datesGroupsAmount = foreach
> datesGroups {
>     unique_seubscriber = DISTINCT datesList.subscriberKey;
>     GENERATE group.day, group.callType, COUNT(datesList),
> SUM(datesList.amount), COUNT(unique_seubscriber); }; dump
> datesGroupsAmount;
>
> the problem is with the  unique_seubscriber. The count and distinct
> doesn't work. The strange thing is that if I run script separately for
> each sub folder's input  - the run will succeed for each part, but if
> I'm giving the hall  inputs folders together it fails and I get the following 
> error:
> ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open
> iterator for alias datesGroupsAmount
>
> Another error that I get from time to time (if I'm making small
> changes in the script) is:
> ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open
> iterator for alias datesGroupsAmount. Backend error :
> java.lang.Boolean cannot be cast to org.apache.pig.data.Tuple (myne
> there is a connection between the two errors?)
>
> Here is the log file:
>
> Pig Stack Trace
> ---------------
> ERROR 1066: Unable to open iterator for alias datesGroupsAmount
>
> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable
> to open iterator for alias datesGroupsAmount
>                 at
> org.apache.pig.PigServer.openIterator(PigServer.java:836)
>                 at
> org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:696)
>                 at
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:320)
>                 at
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:194)
>                 at
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:170)
>                 at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84)
>                 at org.apache.pig.Main.run(Main.java:604)
>                 at org.apache.pig.Main.main(Main.java:157)
>                 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
> Method)
>                 at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>                 at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>                 at java.lang.reflect.Method.invoke(Method.java:601)
>                 at org.apache.hadoop.util.RunJar.main(RunJar.java:197)
> Caused by: java.io.IOException: Job terminated with anomalous status FAILED
>                 at
> org.apache.pig.PigServer.openIterator(PigServer.java:828)
>                 ... 12 more
>
>
> any help will be appreciate
> thanks
> Noam
>
>
> ________________________________
>
> This email contains proprietary and/or confidential information of Pontis.
> If you have received this email in error, please delete all copies
> without delay and do not copy, distribute, or rely on any information
> contained in this email.
>

________________________________

This email contains proprietary and/or confidential information of Pontis. If 
you have received this email in error, please delete all copies without delay 
and do not copy, distribute, or rely on any information contained in this email.

Reply via email to