(I'm using Cassandra 1.2.8 and Pig 0.11.1)

I'm loading some simple data from Cassandra into Pig using CqlStorage. The
CqlStorage loader defines a Pig schema based on the Cassandra schema, but
it seems to be wrong.

If I do:

data = LOAD 'cql://bookdata/books' USING CqlStorage();
DESCRIBE data;

I get this:

data: {isbn: chararray,bookauthor: chararray,booktitle:
chararray,publisher: chararray,yearofpublication: int}

However, if I DUMP data, I get results like these:

((isbn,0425093387),(bookauthor,Georgette Heyer),(booktitle,Death in the
Stocks),(publisher,Berkley Pub Group),(yearofpublication,1986))

Clearly the results from Cassandra are key/value pairs, as would be
expected. I don't know why the schema generated by CqlStorage() would be so
different.

This is really causing me problems trying to access the column values. I
tried a naive approach of FLATTENing each tuple, then trying to access the
values that way:

flattened = FOREACH data GENERATE
  FLATTEN(isbn),
  FLATTEN(booktitle),
  ...
values = FOREACH flattened GENERATE
  $1 AS ISBN,
  $3 AS BookTitle,
  ...

As soon as I try to access field $5, Pig complains about the index being
out of bounds.

Is there a way to solve the schema/reality mismatch? Am I doing something
wrong, or have I stumbled across a defect?

Thanks,
Chad

Reply via email to