Likely, there isn't going to be a positive impact to read performance with an increased number of replicas (unless the number of replicas approaches the number of datanodes, which is infeasible except for very, very small instances).

Given Accumulo's lax policy of Tablet placement WRT HDFS block location, the only benefit is rack-local or node-local network communication instead of cross-rack communication. This highly depends on the network bandwidth between the nodes and racks in your system.

Accumulo tries to keep Tablets assigned to the same TabletServer under the assumption that there should be a local copy of all blocks for the files a Tablet references. However, once a TabletServer dies or the HDFS balancer is run, there's likely zero HDFS block locality until the files for the Tablet are compacted.

Christopher wrote:
HDFS replication is transparent to Accumulo (though, the number of
replicas is configurable in Accumulo, on a per-table basis). Its primary
purpose is failure tolerance, but it *may* have an impact on read
performance. I'm not certain how significant that is, though.

There is no separate read-only and write-only copies of data on HDFS.
HDFS replication is at the block level, and files are updated by
appending new blocks to the files. All blocks are readable, and only new
blocks are written.

On Thu, Nov 10, 2016 at 11:28 AM Yamini Joshi <yamini.1...@gmail.com
<mailto:yamini.1...@gmail.com>> wrote:

    Hello all

    Does the HDFS replication improve performance of queries on Accumulo
    or is it transparent to the Accumulo system? If it does improve the
    performance by some notion of load balancing, is there is a Read
    Only or Write Only copy of data on HDFS for Accumulo?

    Best regards,
    Yamini Joshi

Reply via email to