Pig data objects returned by CassandraStorage behave irrationally. ------------------------------------------------------------------
Key: CASSANDRA-3552 URL: https://issues.apache.org/jira/browse/CASSANDRA-3552 Project: Cassandra Issue Type: Bug Components: Contrib Affects Versions: 1.0.3 Environment: Ubuntu Reporter: Chris Howe When I try to perform computations on data that I get back from CassandraStorage in Pig, I see inexplicable results. For example, on a column family that has UTF8Type as the key validator, I do the following: A = LOAD 'cassandra://keyspace/colfam' USING CassandraStorage(); B = FOREEACH A GENERATE (chararray) key; STORE B INTO 'tempfile'; C = LOAD 'tempfile' AS (key:chararray); D1 = FOREACH B GENERATE SUBSTRING(key,0,10); D2 = FOREACH C GENERATE SUBSTRING(key,0,10); DUMP D1; DUMP D2; For D1 I get () () () () () For D2 I get: (a) (b x y) (b) (a b c) (a c b) Clearly something has gone awry! I have tried many workarounds and other functions. TOKENIZE has an entirely different behavior: E = FOREACH B GENERATE TOKENIZE(key) Ultimately this throws an exception: 2011-12-01 15:01:56,007 [Thread-149] WARN org.apache.hadoop.mapred.LocalJobRunner - job_local_0010 org.apache.pig.backend.executionengine.ExecException: ERROR 2114: Expected input to be chararray, but got org.apache.pig.data.DataByteArray at org.apache.pig.builtin.TOKENIZE.exec(TOKENIZE.java:62) at org.apache.pig.builtin.TOKENIZE.exec(TOKENIZE.java:43) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:245) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:287) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:338) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:290) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:237) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:232) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:647) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:210) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira