Hi,

I'm trying to figure out what's going on with my cassandra/hadoop/pig
system.  I created a "mini" copy of my main cassandra data by randomly
subsampling to get ~50,000 keys.  I was then writing pig scripts but also
the equivalent operation using simple single threaded code to double check
pig.

Of course my very first test failed.  After doing a pig DUMP on the raw
data, what appears to be happening is I'm only getting the first 1024
columns of a key.  After some googling, this seems to be known behavior
unless you add "?widerows=true" to the pig load URI. I tried this, but
it didn't seem to fix anything :-(   Here's the the start of my pig script:
foo = LOAD 'cassandra://KEYSPACE/COLUMN_FAMILY?widerows=true' USING
CassandraStorage() AS (key:chararray, columns:bag {column:tuple (name,
value)});

I'm using cassandra 1.1.5 from datastax rpms.  I'm using hadoop
(0.20.2+923.418-1) and pig (0.8.1+28.39-1) from cloudera rpms.

What am I doing wrong?  Or, how I can enable debugging/logging to next
figure out what is going on?  I haven't had to debug hadoop+pig+cassandra
much, other than doing DUMP/ILLUSTRATE from pig.

will

Reply via email to