Hello, Corey

None of the two examples uses BatchWriter and contex.write in the same job.

Data consistency is a good point. I need to rethink about the my task.

Thank you ! It really helps.

Best,
Huanchen

On Oct 16, 2012, at 11:55 PM, Corey Nolet wrote:

> Huanchen,
> 
> The AccumuloOutputFormat just passes along the connection information (i.e. 
> username, password, instance, zookeepers) so that an Accumulo connector can 
> be created in each output worker (that is, each mapper or reducer). You could 
> do this on your own by passing the connection information around in the 
> Configuration() and creating the BatchWriter in the mappers (map-only job) or 
> the reducer and then use your HDFS output format to emit the data elsewhere.
> 
> I have not looked at these examples but I'm assuming they are doing the same 
> thing? Though I haven't tried this myself, I can't see why it wouldn't work. 
> When having 2 output endpoints, you will most likely want to think about a 
> strategy to deal with a successful Accumulo write but a failure in writing to 
> HDFS- if data consistency is something you need to guarantee.
> 
> 
> Corey
> 
> On Oct 16, 2012, at 10:48 PM, Huanchen Zhang wrote:
> 
>> Hello,  Corey
>> 
>> Thank you for your answer.
>> 
>> Can I use InsertWithBatchWriter for this task ? I mean, use context.write to 
>> write to hdfs, use batchwriter.addMutation to write to accumulo.
>> 
>> Huanchen
>> 
>> On Oct 16, 2012, at 10:25 PM, Corey Nolet wrote:
>> 
>>> You can extend the output format to write to both and have the resulting 
>>> record writer underneath write to the correct endpoint depending on the 
>>> items submitted from the job.
>>> 
>>> 
>>> 
>>> 
>>> 
>>> On Oct 16, 2012, at 10:16 PM, Huanchen Zhang wrote:
>>> 
>>>> Hello,
>>>> 
>>>> Hese I have a mapreduce job which needs to write to accumulo. I checked 
>>>> the examples. It seems there are two different ways to write to accumulo, 
>>>> one is InsertWithBatchWriter, one is InsertWithOutputFormat.
>>>> 
>>>> So, what is the difference of them ? Which one should I choose ?
>>>> 
>>>> I actually need to write to accumulo and hdfs in the same job. I seems 
>>>> InsertWithOutputFormat cannot do this, because it needs to set the output 
>>>> format as "AccumuloOutputFormat.class", and can only write to accumulo in 
>>>> one job, right ?
>>>> 
>>>> Thank you.
>>>> 
>>>> Best,
>>>> Huanchen
>>> 
>> 
> 

Reply via email to