Hello, Corey None of the two examples uses BatchWriter and contex.write in the same job.
Data consistency is a good point. I need to rethink about the my task. Thank you ! It really helps. Best, Huanchen On Oct 16, 2012, at 11:55 PM, Corey Nolet wrote: > Huanchen, > > The AccumuloOutputFormat just passes along the connection information (i.e. > username, password, instance, zookeepers) so that an Accumulo connector can > be created in each output worker (that is, each mapper or reducer). You could > do this on your own by passing the connection information around in the > Configuration() and creating the BatchWriter in the mappers (map-only job) or > the reducer and then use your HDFS output format to emit the data elsewhere. > > I have not looked at these examples but I'm assuming they are doing the same > thing? Though I haven't tried this myself, I can't see why it wouldn't work. > When having 2 output endpoints, you will most likely want to think about a > strategy to deal with a successful Accumulo write but a failure in writing to > HDFS- if data consistency is something you need to guarantee. > > > Corey > > On Oct 16, 2012, at 10:48 PM, Huanchen Zhang wrote: > >> Hello, Corey >> >> Thank you for your answer. >> >> Can I use InsertWithBatchWriter for this task ? I mean, use context.write to >> write to hdfs, use batchwriter.addMutation to write to accumulo. >> >> Huanchen >> >> On Oct 16, 2012, at 10:25 PM, Corey Nolet wrote: >> >>> You can extend the output format to write to both and have the resulting >>> record writer underneath write to the correct endpoint depending on the >>> items submitted from the job. >>> >>> >>> >>> >>> >>> On Oct 16, 2012, at 10:16 PM, Huanchen Zhang wrote: >>> >>>> Hello, >>>> >>>> Hese I have a mapreduce job which needs to write to accumulo. I checked >>>> the examples. It seems there are two different ways to write to accumulo, >>>> one is InsertWithBatchWriter, one is InsertWithOutputFormat. >>>> >>>> So, what is the difference of them ? Which one should I choose ? >>>> >>>> I actually need to write to accumulo and hdfs in the same job. I seems >>>> InsertWithOutputFormat cannot do this, because it needs to set the output >>>> format as "AccumuloOutputFormat.class", and can only write to accumulo in >>>> one job, right ? >>>> >>>> Thank you. >>>> >>>> Best, >>>> Huanchen >>> >> >