The Accumulo (Data Center) Replication feature is for having multiple
active Accumulo clusters all containing the same data.
HDFS provides replication as a means for durability of the data it is
storing. The files that Accumulo creates on one HDFS instance are
replicated by HDFS. This does not help if your entire cluster become
unavailable. That is what the data center replication Accumulo feature
solves.
While both can be called "replication", they serve very different purposes.
Yamini Joshi wrote:
Hello
I was going through some Accumulo docs and found out about replication.
To enable replication,one needs to make some config settings as
described in
https://github.com/apache/accumulo/blob/master/docs/src/main/asciidoc/chapters/replication.txt.
I cannot seem to grasp the difference between this replication conf and
the replication on HDFS level. What exactly is the use case for
replication? Are the replicated instances visible to the clients?
Best regards,
Yamini Joshi