Hi all,
Type information apparently gets lost following ORDER BY in certain situations.
Am I doing something wrong or is this a bug? I am a Pig newbie.
I'm running Apache Pig version 0.12.1.2.1.1.0-385 (rexported) in local mode
(ie, pig -x local)
I have a small test file with two fields:
grunt> ls /data
file:/data/.pig_schema<r 1> 242
file:/data/.pig_header<r 1> 11
file:/data/part-m-00000<r 1> 33
file:/data/_SUCCESS<r 1> 0
grunt> cat /data/part-m-00000
foo,3.0
bar,4.0
foo,10.0
hi,12.0
I load it, project a column, order that column and the type gets lost:
grunt> A = LOAD '/data' USING PigStorage(',', '-schema');
grunt> DESCRIBE A;
A: {name: chararray,value: double}
grunt> B = foreach A generate $1 as value:double;
grunt> DESCRIBE B;
B: {value: double}
grunt> C = ORDER B by value;
grunt> DESCRIBE C;
C: {value: double}
grunt> DUMP C;
Pig fails during DUMP C:
java.lang.Exception: java.lang.ClassCastException:
org.apache.pig.data.DataByteArray cannot be cast to java.lang.Double
at
org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)
Caused by: java.lang.ClassCastException: org.apache.pig.data.DataByteArray
cannot be cast to java.lang.Double
at
org.apache.pig.backend.hadoop.HDataType.getWritableComparableTypes(HDataType.java:109)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Map.collect(PigGenericMapReduce.java:111)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:284)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:277)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
at
org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
In the DESCRIBE right before the DUMP the type was 'double' but during the dump
it's 'bytearray'
Also,
- DUMP B; following the foreach statement works fine.
- Without the foreach the script works fine:
--
-- This script works
--
A = LOAD '/data' USING PigStorage(',', '-schema');
C = ORDER A by value;
DUMP C;
Thanks!
Joe
American Family Insurance Company | American Family Life Insurance Company |
American Family Mutual Insurance Company | American Standard Insurance Company
of Ohio | American Standard Insurance Company of Wisconsin | Midvale Indemnity
Company | Home Office - 6000 American Parkway | Madison, WI 53783
*If you are not the intended recipient, please contact the sender and delete
this e-mail, any attachments and all copies.