I'm trying to do a bulk load from a Cassandra/Hadoop job using the BulkOutputFormat class. It appears that the reducers are generating the SSTables, but is failing to load them into the cluster:
12/09/14 14:08:13 INFO mapred.JobClient: Task Id : attempt_201208201337_0184_r_000004_0, Status : FAILED java.io.IOException: Too many hosts failed: [/10.4.0.6, /10.4.0.5, /10.4.0.2, /10.4.0.1, /10.4.0.3, /10.4.0.4] at org.apache.cassandra.hadoop.BulkRecordWriter.close(BulkRecordWriter.java:242) at org.apache.cassandra.hadoop.BulkRecordWriter.close(BulkRecordWriter.java:207) at org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.close(ReduceTask.java:579) at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:650) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:417) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121) at org.apache.hadoop.mapred.Child.main(Child.java:249) A brief look at the BulkOutputFormat class shows that it depends on SSTableLoader. My Hadoop cluster and my Cassandra cluster are co-located on the same set of machines. I haven't found any stated restrictions, but does this technique only work if the Hadoop cluster is distinct from the Cassandra cluster? Any suggestions on how to get past this problem? Thanks in advance. Brian