When you say "Store D into a tmp file", which store func are you using?

On 03/25/2011 10:44 AM, Andreas Paepcke wrote:
Hi,

Has anyone seen the following?

I am getting an error when running ORDER:
    ERROR 1071: Cannot convert a Unknown to a String

The error occurs in DataType.java:885. At the end of that switch
statement variable 'type' is -1, and variable 'o' is a string that looks
like a leftover from the prior statements A or B. The value of 'o' is:

%!PS-Adobe-2.0
%%Creator: dvips(k) 5.86 Copyright 1999 Radical Eye Software
%%Title: arXiv:astro-ph/0005123 v3   2 Oct 2000
%%Pages: 7
%%PageOrder: As...

Note that if I skip the ORDER statement, everything works, and looks
correct in the resulting file. Random order, of course.

The error does not occur if I make one simple change:
Store D into a tmp file, then LOAD that file and execute E without
any change to that statement.

Pseudocode below, followed by the stack trace.

A    = LOAD 'foo' "
        USING aLoader()
        AS (url:chararray,
               date:chararray,
               pageSize:int,
               position:int,
               docidInCrawl:int,
               httpHeader:chararray,
               content:chararray);

B    = FOREACH A GENERATE
        udf();

-- B is of the form {(chararray,chararray,int), (chararray,chararray,int),
... }

D = FOREACH B GENERATE flatten($0) AS (token:chararray, docID:chararray,
tokenPos:int);
E = ORDER D BY token ASC;
STORE E INTO 'bar';


org.apache.pig.backend.executionengine.ExecException: ERROR 1071: Cannot
convert a Unknown to a String
     at org.apache.pig.data.DataType.toString(DataType.java:885)
     at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.getNext(POCast.java:642)
     at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:367)
     at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:291)
     at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:276)
     at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:240)
     at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:276)
     at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:240)
     at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:236)
     at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:231)
     at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53)
     at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
     at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
     at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)

Reply via email to