On Thu, 2011-08-18 at 08:54 +0200, Patrik Modesto wrote: > But there is the another problem with Hadoop-Cassandra, if there is no > node available for a range of keys, it fails on RuntimeError. For > example having a keyspace with RF=1 and a node is down all MapReduce > tasks fail.
CASSANDRA-2388 is related but not the same. Before 0.8.4 the behaviour was if the local cassandra node didn't have the split's data the tasktracker would connect to another cassandra node where the split's data could be found. So even <0.8.4 with RF=1 you would have your hadoop job fail. Although I've reopened CASSANDRA-2388 (and reverted the code locally) because the new behaviour in 0.8.4 leads to abysmal tasktracker throughput (for me task allocation doesn't seem to honour data-locality according to split.getLocations()). > I've reworked my previous patch, that was addressing this > issue and now there are ConfigHelper methods for enable/disable > ignoring unavailable ranges. > It's available here: http://pastebin.com/hhrr8m9P (for version 0.7.8) I'm interested in this patch and see it's usefulness but no one will act until you attach it to an issue. (I think a new issue is appropriate here). ~mck