Hi, I'm trying to figure out what's going on with my cassandra/hadoop/pig system. I created a "mini" copy of my main cassandra data by randomly subsampling to get ~50,000 keys. I was then writing pig scripts but also the equivalent operation using simple single threaded code to double check pig.
Of course my very first test failed. After doing a pig DUMP on the raw data, what appears to be happening is I'm only getting the first 1024 columns of a key. After some googling, this seems to be known behavior unless you add "?widerows=true" to the pig load URI. I tried this, but it didn't seem to fix anything :-( Here's the the start of my pig script: foo = LOAD 'cassandra://KEYSPACE/COLUMN_FAMILY?widerows=true' USING CassandraStorage() AS (key:chararray, columns:bag {column:tuple (name, value)}); I'm using cassandra 1.1.5 from datastax rpms. I'm using hadoop (0.20.2+923.418-1) and pig (0.8.1+28.39-1) from cloudera rpms. What am I doing wrong? Or, how I can enable debugging/logging to next figure out what is going on? I haven't had to debug hadoop+pig+cassandra much, other than doing DUMP/ILLUSTRATE from pig. will