There is some information on the wiki
http://wiki.apache.org/cassandra/HadoopSupport about a resource leak before
0.6.2 versions that can result in a TimeoutException. But you're on 0.6.5 so
should be ok.
I had a quick look at the Hadoop code and could not see where to change the
timeout (that would be the obvious thing to try). If you have a look in the
ConfigHelper.java though it says
/**
* The number of rows to request with each get range slices request.
* Too big and you can either get timeouts when it takes Cassandra too
* long to fetch all the data. Too small and the performance
* will be eaten up by the overhead of each request.
*
* @param conf Job configuration you are about to run
* @param batchsize Number of rows to request each time
*/
public static void setRangeBatchSize(Configuration conf, int batchsize)
{
conf.setInt(RANGE_BATCH_SIZE_CONFIG, batchsize);
}
The config item name is ""cassandra.range.batch.size".
Try reducing the batch size first and see if the timeouts go away. Though it
does not sound like you have a lot of data.
An 0.7 beta2 may be out this week. But it's still beta.
Hope that helps.
Aaron
On 25 Sep 2010, at 07:17, Saket Joshi wrote:
> Hi Experts,
>
> I need help on an exception integrating cassandra-hadoop. I am getting the
> following exception, when running a Hadoop Map reduce job
> http://pastebin.com/RktaqDnj
> I am using cassandra 0.6.5 , 3 node cluster. I don’t get any exception when
> the data I am processing is very small < 5 rows and 100 columns, but get
> the error with modest data is > 5 rows 500 columns. I went though some of the
> forums where people have experienced the same issue.
> http://www.listware.net/201005/cassandra-user/21897-timeout-while-running-simple-hadoop-job.html
> . Is this a bug with Cassandra-hadoop classes and is that fixed in 0.7 for
> sure? how stable is 0.7 beta ? In the system.log I see a lot of ” index has
> reached its threshold; switching in a fresh Memtable” messages
>
> Has Anyone faced a similar issue and solved it? Is migrating to 0.7 the only
> solution?
>
> Thanks,
> Saket
>
> Stack Trace of the Exception:
> {ava.lang.RuntimeException: TimedOutException()
> at
> org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.maybeInit(ColumnFamilyRecordReader.java:186)
> at
> org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.computeNext(ColumnFamilyRecordReader.java:236)
> at
> org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.computeNext(ColumnFamilyRecordReader.java:104)
> at
> com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:135)
> at
> com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:130)
> at
> org.apache.cassandra.hadoop.ColumnFamilyRecordReader.nextKeyValue(ColumnFamilyRecordReader.java:98)
> at
> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:423)
> at
> org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:583)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> at org.apache.hadoop.mapred.Child.main(Child.java:170)
> Caused by: TimedOutException()
> at
> org.apache.cassandra.thrift.Cassandra$get_range_slices_result.read(Cassandra.java:11094)
> at
> org.apache.cassandra.thrift.Cassandra$Client.recv_get_range_slices(Cassandra.java:628)
> at
> org.apache.cassandra.thrift.Cassandra$Client.get_range_slices(Cassandra.java:602)
> at
> org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.maybeInit(ColumnFamilyRecordReader.java:164)}