Huh? Isn't that the whole point of using Map/Reduce?
On Fri, May 7, 2010 at 8:44 AM, Jonathan Ellis <jbel...@gmail.com> wrote: > Sounds like you need to configure Hadoop to not create a whole bunch > of Map tasks at once > > On Fri, May 7, 2010 at 3:47 AM, gabriele renzi <rff....@gmail.com> wrote: >> Hi everyone, >> >> I am trying to develop a mapreduce job that does a simple >> selection+filter on the rows in our store. >> Of course it is mostly based on the WordCount example :) >> >> >> Sadly, while it seems the app runs fine on a test keyspace with little >> data, when run on a larger test index (but still on a single node) I >> reliably see this error in the logs >> >> 10/05/06 16:37:58 WARN mapred.LocalJobRunner: job_local_0001 >> java.lang.RuntimeException: TimedOutException() >> at >> org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.maybeInit(ColumnFamilyRecordReader.java:165) >> at >> org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.computeNext(ColumnFamilyRecordReader.java:215) >> at >> org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.computeNext(ColumnFamilyRecordReader.java:97) >> at >> com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:135) >> at >> com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:130) >> at >> org.apache.cassandra.hadoop.ColumnFamilyRecordReader.nextKeyValue(ColumnFamilyRecordReader.java:91) >> at >> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:423) >> at >> org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67) >> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143) >> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:583) >> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) >> at >> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:176) >> Caused by: TimedOutException() >> at >> org.apache.cassandra.thrift.Cassandra$get_range_slices_result.read(Cassandra.java:11015) >> at >> org.apache.cassandra.thrift.Cassandra$Client.recv_get_range_slices(Cassandra.java:623) >> at >> org.apache.cassandra.thrift.Cassandra$Client.get_range_slices(Cassandra.java:597) >> at >> org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.maybeInit(ColumnFamilyRecordReader.java:142) >> ... 11 more >> >> and after that the job seems to finish "normally" but no results are >> produced. >> >> FWIW this is on 0.6.0 (we didn't move to 0.6.1 yet because, well, if >> it ain't broke don't fix it). >> >> The single node has a data directory of about 127GB in two column >> families, off which the one used in the mapred job is about 100GB. >> The cassandra server is run with 6GB of heap on a box with 8GB >> available and no swap enabled. read/write latency from cfstat are >> >> Read Latency: 0.8535837762577986 ms. >> Write Latency: 0.028849603764075547 ms. >> >> row cache is not enabled, key cache percentage is default. Load on the >> machine is basically zero when the job is not running. >> >> As my code is 99% that from the wordcount contrib, I shall notice that >> In 0.6.1's contrib (and trunk) there is a RING_DELAY constant that we >> can supposedly change, but it's apparently not used anywhere, but as I >> said, running on a single node this should not be an issue anyway. >> >> Does anyone has suggestions or has seen this error before? On the >> other hand, did people run this kind of jobs in similar conditions >> flawlessly, so I can consider it just my problem? >> >> >> Thanks in advance for any help. >> > > > > -- > Jonathan Ellis > Project Chair, Apache Cassandra > co-founder of Riptano, the source for professional Cassandra support > http://riptano.com >