Hi All, I am trying to write some map-reduce tasks so I can find out stuff like - how many records have X status? I am using 0.7.0 and have 5 nodes with ~100G of data on each node.
I have written the code based on the word_count example and the map-reduce is running successfully BUT is extremely slow (about 2 hours for the simplest key count). I am now looking to track down the slowness and tune my process, or explore alternative ways to achieve the same goal. Can anyone point me to a way to tune my map-reduce job? Does anyone have any experience exploring Cassandra data with Hadoop cluster configuration? ( As suggested in http://wiki.apache.org/cassandra/HadoopSupport#ClusterConfig) Thanks, Orr