cassandra/hadoop BulkOutputFormat failures

Brian Jeltema Fri, 14 Sep 2012 11:35:00 -0700

I'm trying to do a bulk load from a Cassandra/Hadoop job using the 
BulkOutputFormat class.
It appears that the reducers are generating the SSTables, but is failing to 
load them into the cluster:


12/09/14 14:08:13 INFO mapred.JobClient: Task Id : 
attempt_201208201337_0184_r_000004_0, Status : FAILED
 java.io.IOException: Too many hosts failed: [/10.4.0.6, /10.4.0.5, /10.4.0.2, 
/10.4.0.1, /10.4.0.3, /10.4.0.4] 
        at 
org.apache.cassandra.hadoop.BulkRecordWriter.close(BulkRecordWriter.java:242)
        at 
org.apache.cassandra.hadoop.BulkRecordWriter.close(BulkRecordWriter.java:207)
        at 
org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.close(ReduceTask.java:579)
        at 
org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:650)
        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:417)
        at org.apache.hadoop.mapred.Child$4.run(Child.java:255) 
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)   
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
        at org.apache.hadoop.mapred.Child.main(Child.java:249)  

A brief look at the BulkOutputFormat class shows that it depends on 
SSTableLoader. My Hadoop cluster
and my Cassandra cluster are co-located on the same set of machines. I haven't 
found any stated restrictions,
but does this technique only work if the Hadoop cluster is distinct from the 
Cassandra cluster? Any suggestions
on how to get past this problem?

Thanks in advance.

Brian

cassandra/hadoop BulkOutputFormat failures

Reply via email to