(I'm using Cassandra 1.2.8 and Pig 0.11.1) I'm loading some simple data from Cassandra into Pig using CqlStorage. The CqlStorage loader defines a Pig schema based on the Cassandra schema, but it seems to be wrong.
If I do: data = LOAD 'cql://bookdata/books' USING CqlStorage(); DESCRIBE data; I get this: data: {isbn: chararray,bookauthor: chararray,booktitle: chararray,publisher: chararray,yearofpublication: int} However, if I DUMP data, I get results like these: ((isbn,0425093387),(bookauthor,Georgette Heyer),(booktitle,Death in the Stocks),(publisher,Berkley Pub Group),(yearofpublication,1986)) Clearly the results from Cassandra are key/value pairs, as would be expected. I don't know why the schema generated by CqlStorage() would be so different. This is really causing me problems trying to access the column values. I tried a naive approach of FLATTENing each tuple, then trying to access the values that way: flattened = FOREACH data GENERATE FLATTEN(isbn), FLATTEN(booktitle), ... values = FOREACH flattened GENERATE $1 AS ISBN, $3 AS BookTitle, ... As soon as I try to access field $5, Pig complains about the index being out of bounds. Is there a way to solve the schema/reality mismatch? Am I doing something wrong, or have I stumbled across a defect? Thanks, Chad