(Moving to u...@.) Isn't reducing the number of map tasks the easiest way to tune this?
Also: in 0.7 you can use NetworkTopologyStrategy to designate a group of nodes as your hadoop "datacenter" so the workloads won't overlap. On Tue, Oct 19, 2010 at 3:22 PM, Michael Moores <mmoo...@real.com> wrote: > Does it make sense to add some kind of throttle capability on the > ColumnFamilyRecordReader for Hadoop? > > If I have 60 or so Map tasks running at the same time when the cluster is > already heavily loaded with OLTP operations, I can get some decreased on-line > performance > that may not be acceptable. (I'm loading an 8 node cluster with 2000 TPS.) > By default my cluster of 8 nodes (which are also the Hadoop JobTracker nodes) > has 8 Map tasks per node making the get_range_slices call, based on what the > ColumnFamilyInputFormat has calculated from my token ranges. > I can increase the inputSplitSize (ConfigHelper.setInputSplitSIze()) so that > there > is only one Map task per node, and this helps quite a bit. > > But is it reasonable to provide a configurable sleep to cause a wait in > between smaller size range queries? That would stretch out the Map time > and let the OLTP processing be less affected. > > > --Michael > > > -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com