Re: Pig not reading all cassandra data

Matthew E. Kennedy Wed, 02 Feb 2011 13:35:27 -0800

I noticed in the jobtracker log that when the pig job kicks off, I get the 
following info message:


2011-02-02 09:13:07,269 INFO org.apache.hadoop.mapred.JobInProgress: Input size 
for job job_201101241634_0193 = 0. Number of splits = 1

So I looked at the job.split file that is created for the Pig job and compared 
it to the job.split file created for the map-reduce job.  The map reduce file 
contains an entry for each split, whereas the  job.split file for the Pig job 
contains just the one split.

I added some code to the ColumnFamilyInputFormat to output what it thinks it 
sees as it should be creating input splits for the pig jobs, and the call to 
getSplits() appears to be returning the correct list of splits.  I can't figure 
out where it goes wrong though when the splits should be written to the 
job.split file.

Does anybody know the specific class responsible for creating that file in a 
Pig job, and why it might be affected by using the pig CassandraStorage module?

Is anyone else successfully running Pig jobs against a 0.7 cluster?

Thanks,
Matt

Re: Pig not reading all cassandra data

Reply via email to