(I'm using Cassandra 1.2.8 and Pig 0.11.1)
I'm loading some simple data from Cassandra into Pig using CqlStorage. The
CqlStorage loader defines a Pig schema based on the Cassandra schema, but
it seems to be wrong.
If I do:
data = LOAD 'cql://bookdata/books' USING CqlStorage();
DESCRIBE data;
I get this:
data: {isbn: chararray,bookauthor: chararray,booktitle:
chararray,publisher: chararray,yearofpublication: int}
However, if I DUMP data, I get results like these:
((isbn,0425093387),(bookauthor,Georgette Heyer),(booktitle,Death in the
Stocks),(publisher,Berkley Pub Group),(yearofpublication,1986))
Clearly the results from Cassandra are key/value pairs, as would be
expected. I don't know why the schema generated by CqlStorage() would be so
different.
This is really causing me problems trying to access the column values. I
tried a naive approach of FLATTENing each tuple, then trying to access the
values that way:
flattened = FOREACH data GENERATE
FLATTEN(isbn),
FLATTEN(booktitle),
...
values = FOREACH flattened GENERATE
$1 AS ISBN,
$3 AS BookTitle,
...
As soon as I try to access field $5, Pig complains about the index being
out of bounds.
Is there a way to solve the schema/reality mismatch? Am I doing something
wrong, or have I stumbled across a defect?
Thanks,
Chad