Re: Data Replication

Josh Elser Sun, 16 Oct 2016 11:00:47 -0700

Exactly right, Vaibhav.

vaibhav thapliyal wrote:

I think neither of these would contribute much to load balancing. HDFS
replication is mostly a safeguard against Single Points of failure in a
Hadoop cluster. However, Data center replication would ensure the
availability of an Accumulo instance.


On 16 October 2016 at 21:02, Yamini Joshi <yamini.1...@gmail.com
<mailto:yamini.1...@gmail.com>> wrote:

    In other words, what helps in load balancing? HDFS replication or
    Data center replication?

    Best regards,
    Yamini Joshi

    On Sat, Oct 15, 2016 at 10:44 PM, Yamini Joshi
    <yamini.1...@gmail.com <mailto:yamini.1...@gmail.com>> wrote:

        So HDFS is for durability while replication is for availability?
        I'm assuming that the client is unaware of the replicated
        instance and queries the DB with no knowledge of which
        instance/table will return the result.

        Best regards,
        Yamini Joshi

        On Thu, Oct 13, 2016 at 11:46 AM, Josh Elser
        <josh.el...@gmail.com <mailto:josh.el...@gmail.com>> wrote:

            I'm not familiar with MongoDB. Perhaps someone else can
            confirm this for you.

            Yamini Joshi wrote:

                So, can I say that if I have a table split across nodes
                (i.e. num
                tablets > 1) and HDFS replication in my system, it is
                sort of equivalent
                to a sharded and replicated mongo architecture?

                Best regards,
                Yamini Joshi

                On Thu, Oct 13, 2016 at 11:06 AM, Josh Elser
                <josh.el...@gmail.com <mailto:josh.el...@gmail.com>
                <mailto:josh.el...@gmail.com
                <mailto:josh.el...@gmail.com>>> wrote:

                     The Accumulo (Data Center) Replication feature is
                for having
                     multiple active Accumulo clusters all containing
                the same data.

                     HDFS provides replication as a means for durability
                of the data it
                     is storing. The files that Accumulo creates on one
                HDFS instance are
                     replicated by HDFS. This does not help if your
                entire cluster become
                     unavailable. That is what the data center
                replication Accumulo
                     feature solves.

                     While both can be called "replication", they serve
                very different
                     purposes.


                     Yamini Joshi wrote:

                         Hello

                         I was going through some Accumulo docs and
                found out about
                         replication.
                         To enable replication,one needs to make some
                config settings as
                         described in
                
https://github.com/apache/accumulo/blob/master/docs/src/main/asciidoc/chapters/replication.txt
                
<https://github.com/apache/accumulo/blob/master/docs/src/main/asciidoc/chapters/replication.txt>
                
<https://github.com/apache/accumulo/blob/master/docs/src/main/asciidoc/chapters/replication.txt
                
<https://github.com/apache/accumulo/blob/master/docs/src/main/asciidoc/chapters/replication.txt>>.
                         I cannot seem to grasp the difference between
                this replication
                         conf and
                         the replication on HDFS level. What exactly is
                the use case for
                         replication? Are the replicated instances
                visible to the clients?

                         Best regards,
                         Yamini Joshi

Re: Data Replication

Reply via email to