Pig data objects returned by CassandraStorage behave irrationally.
------------------------------------------------------------------

                 Key: CASSANDRA-3552
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3552
             Project: Cassandra
          Issue Type: Bug
          Components: Contrib
    Affects Versions: 1.0.3
         Environment: Ubuntu
            Reporter: Chris Howe


When I try to perform computations on data that I get back from 
CassandraStorage in Pig, I see inexplicable results.

For example, on a column family that has UTF8Type as the key validator, I do 
the following:

A = LOAD 'cassandra://keyspace/colfam' USING CassandraStorage();
B = FOREEACH A GENERATE (chararray) key;
STORE B INTO 'tempfile';
C = LOAD 'tempfile' AS (key:chararray);
D1 = FOREACH B GENERATE SUBSTRING(key,0,10);
D2 = FOREACH C GENERATE SUBSTRING(key,0,10);

DUMP D1;
DUMP D2;


For D1 I get
()
()
()
()
()

For D2 I get:
(a)
(b x y)
(b)
(a b c)
(a c b)


Clearly something has gone awry!

I have tried many workarounds and other functions. TOKENIZE has an entirely 
different behavior:

E = FOREACH B GENERATE TOKENIZE(key)

Ultimately this throws an exception:
2011-12-01 15:01:56,007 [Thread-149] WARN  
org.apache.hadoop.mapred.LocalJobRunner - job_local_0010
org.apache.pig.backend.executionengine.ExecException: ERROR 2114: Expected 
input to be chararray, but got org.apache.pig.data.DataByteArray
        at org.apache.pig.builtin.TOKENIZE.exec(TOKENIZE.java:62)
        at org.apache.pig.builtin.TOKENIZE.exec(TOKENIZE.java:43)
        at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:245)
        at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:287)
        at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:338)
        at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:290)
        at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:237)
        at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:232)
        at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:647)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323)
        at 
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:210)



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to