I was able to workaround this problem by modifying the
ColumnFamilyRecordReader class from the org.apache.cassandra.hadoop package.
Since the errors where TimeoutException, I added sleep and retry logic
around
rows = client.get_range_slices(keyspace,

 new ColumnParent(cfName),

    predicate,

        keyRange,

        ConsistencyLevel.ONE);

in RowIterator.maybeInit() function. And it works. :)

Check out http://pastebin.com/FxV4Gw5U for the modified maybeInit()
function. Please note that I also made slight modification to the
ConfigHelper to pass-in the sleep time and the max retry count. The

ConfigHelper.getRetryCount(conf) and ConfigHelper.getSleepTime(conf);

are not part of the original ConfigHelper.

Hope this helps anyone facing similar problems.

Regards,
Jairam

On 14 January 2011 11:59, Jairam Chandar <jairam.chan...@imagini.net> wrote:

> The cassandra logs strangely show no errors at the time of failure.
> Changing the RPCTimeoutInMillis seemed to help. Though it slowed down the
> job considerably, it seems to be finishing by changing the timeout value
> to 1 min. Unfortunately, I cannot be sure if it will continue to work if
> the data increases further. Hopefully will be upgrading to the recently
> released final version of 0.7.0.
>
> Thanks for all the help and suggestions.
>
> Warm regards,
> Jairam Chandar
>
> On 13/01/2011 14:47, "Jeremy Hanna" <jeremy.hanna1...@gmail.com> wrote:
>
> >On Jan 12, 2011, at 12:40 PM, Jairam Chandar wrote:
> >
> >> Hi folks,
> >>
> >> We have a Cassandra 0.6.6 cluster running in production. We want to run
> >>Hadoop (version 0.20.2) jobs over this cluster in order to generate
> >>reports.
> >> I modified the word_count example in the contrib folder of the
> >>cassandra distribution. While the program is running fine for small
> >>datasets (in the order of 100-200 MB) on small clusters (2 machines), it
> >>starts to give errors while trying to run on a bigger cluster (5
> >>machines) with much larger dataset (400 GB). Here is the error that we
> >>get -
> >>
> >> java.lang.RuntimeException: TimedOutException()
> >>     at
> >>org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.maybeIni
> >>t(ColumnFamilyRecordReader.java:186)
> >>     at
> >>org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.computeN
> >>ext(ColumnFamilyRecordReader.java:236)
> >>     at
> >>org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.computeN
> >>ext(ColumnFamilyRecordReader.java:104)
> >>     at
> >>com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractItera
> >>tor.java:135)
> >>     at
> >>com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:
> >>130)
> >>     at
> >>org.apache.cassandra.hadoop.ColumnFamilyRecordReader.nextKeyValue(ColumnF
> >>amilyRecordReader.java:98)
> >>     at
> >>org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(Map
> >>Task.java:423)
> >>     at
> >>org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
> >>     at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
> >>     at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
> >>     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> >>     at org.apache.hadoop.mapred.Child.main(Child.java:170)
> >> Caused by: TimedOutException()
> >>     at
> >>org.apache.cassandra.thrift.Cassandra$get_range_slices_result.read(Cassan
> >>dra.java:11094)
> >>     at
> >>org.apache.cassandra.thrift.Cassandra$Client.recv_get_range_slices(Cassan
> >>dra.java:628)
> >>     at
> >>org.apache.cassandra.thrift.Cassandra$Client.get_range_slices(Cassandra.j
> >>ava:602)
> >>     at
> >>org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.maybeIni
> >>t(ColumnFamilyRecordReader.java:164)
> >>     ... 11 more
> >>
> >
> >I wonder if messing with RpcTimeoutInMillis in storage-conf.xml would
> >help.
> >
> >>
> >>
> >>
> >> I came across this page on the Cassandra wiki -
> >>http://wiki.apache.org/cassandra/HadoopSupport and tried modifying the
> >>ulimit and changing batch sizes. These did not help. Though the number
> >>of successful map tasks increased, it eventually fails since the total
> >>number of map tasks is huge.
> >>
> >> Any idea on what could be causing this? The program we are running is a
> >>very slight modification of the word_count example with respect to
> >>reading from Cassandra. The only change being specific keyspace,
> >>columnfamily and columns. The rest of the code for reading is the same
> >>as the word_count example in the source code for Cassandra 0.6.6.
> >>
> >> Thanks and regards,
> >> Jairam Chandar
> >
>
>
>

Reply via email to