When you say "Store D into a tmp file", which store func are you using?
On 03/25/2011 10:44 AM, Andreas Paepcke wrote:
Hi, Has anyone seen the following? I am getting an error when running ORDER: ERROR 1071: Cannot convert a Unknown to a String The error occurs in DataType.java:885. At the end of that switch statement variable 'type' is -1, and variable 'o' is a string that looks like a leftover from the prior statements A or B. The value of 'o' is: %!PS-Adobe-2.0 %%Creator: dvips(k) 5.86 Copyright 1999 Radical Eye Software %%Title: arXiv:astro-ph/0005123 v3 2 Oct 2000 %%Pages: 7 %%PageOrder: As... Note that if I skip the ORDER statement, everything works, and looks correct in the resulting file. Random order, of course. The error does not occur if I make one simple change: Store D into a tmp file, then LOAD that file and execute E without any change to that statement. Pseudocode below, followed by the stack trace. A = LOAD 'foo' " USING aLoader() AS (url:chararray, date:chararray, pageSize:int, position:int, docidInCrawl:int, httpHeader:chararray, content:chararray); B = FOREACH A GENERATE udf(); -- B is of the form {(chararray,chararray,int), (chararray,chararray,int), ... } D = FOREACH B GENERATE flatten($0) AS (token:chararray, docID:chararray, tokenPos:int); E = ORDER D BY token ASC; STORE E INTO 'bar'; org.apache.pig.backend.executionengine.ExecException: ERROR 1071: Cannot convert a Unknown to a String at org.apache.pig.data.DataType.toString(DataType.java:885) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.getNext(POCast.java:642) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:367) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:291) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:276) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:240) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:276) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:240) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:236) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:231) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)