Hi,

I've been compiling some top 25 lists for the frequency with which
values appear in certain columns in a relation, and based on some of the
counts, am curious to see if some of the values occur particularly often
together. To this end, I've been running the following code:

abandonSpike4pvnUid = GROUP abandonSpike3 BY (pv_num, user_id);
abandonSpike5pvnUid = FOREACH abandonSpike4pvnUid GENERATE group.pv_num AS pvn, 
group.user_id AS uid, COUNT(abandonSpike3) AS session_count;
abandonSpike6pvnUid = FILTER abandonSpike5pvnUid BY (session_count > 20);
abandonSpike7pvnUid = ORDER abandonSpike6pvnUid BY session_count DESC;
abandonSpikePvnUid = LIMIT abandonSpike7pvnUid 25;
dump abandonSpikePvnUid;

Where abandonSpike3 has already been successfully grouped by single
columns to generate the existing top-25 lists.

What's odd is that I was getting the error:

2012-11-27 23:17:00,112 [main] ERROR org.apache.pig.tools.grunt.Grunt -
ERROR 2997: Unable to recreate exception from backed error: Error: Java
heap space

and figured that I'd probably made the groups too small (and therefore
the list of groups too big) for ORDER to run properly. My solution was
to insert the FILTER line above, but even trying to get the output of
that line without any ORDERing (i.e. dump abandonSpike6pvnUid), I'm
continuing to get the same error message.

It would seem that I'm misjudging what's over-filling the heap, and
would appreciate a tip on what else it might be. In case it helps, the
logfile.stack trace from my most recent attempt follows:

================================================================================
Backend error message
---------------------
java.lang.Throwable: Child Error
        at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:242)
Caused by: java.io.IOException: Task process exit with nonzero status of 1.
        at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:229)

Backend error message
---------------------
Error: GC overhead limit exceeded

Pig Stack Trace
---------------
ERROR 6015: During execution, encountered a Hadoop error.

org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open 
iterator for alias abandonSpike6pvnUid. Backend error : During execution, 
encountered a Hadoop error.
        at org.apache.pig.PigServer.openIterator(PigServer.java:753)
        at
org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:615)
        at
org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:303)
        at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:168)
        at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:144)
        at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:76)
        at org.apache.pig.Main.run(Main.java:455)
        at org.apache.pig.Main.main(Main.java:107)
Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR
6015: During execution, encountered a Hadoop error.
        at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:242)
Caused by: java.lang.Throwable: Child Error
        ... 1 more
Caused by: java.io.IOException: Task process exit with nonzero status of 1.
================================================================================

Thanks,
Kris

-- 
Kris Coward                                     http://unripe.melon.org/
GPG Fingerprint: 2BF3 957D 310A FEEC 4733  830E 21A4 05C7 1FEB 12B3

Reply via email to