Likely, there isn't going to be a positive impact to read performance
with an increased number of replicas (unless the number of replicas
approaches the number of datanodes, which is infeasible except for very,
very small instances).
Given Accumulo's lax policy of Tablet placement WRT HDFS block location,
the only benefit is rack-local or node-local network communication
instead of cross-rack communication. This highly depends on the network
bandwidth between the nodes and racks in your system.
Accumulo tries to keep Tablets assigned to the same TabletServer under
the assumption that there should be a local copy of all blocks for the
files a Tablet references. However, once a TabletServer dies or the HDFS
balancer is run, there's likely zero HDFS block locality until the files
for the Tablet are compacted.
Christopher wrote:
HDFS replication is transparent to Accumulo (though, the number of
replicas is configurable in Accumulo, on a per-table basis). Its primary
purpose is failure tolerance, but it *may* have an impact on read
performance. I'm not certain how significant that is, though.
There is no separate read-only and write-only copies of data on HDFS.
HDFS replication is at the block level, and files are updated by
appending new blocks to the files. All blocks are readable, and only new
blocks are written.
On Thu, Nov 10, 2016 at 11:28 AM Yamini Joshi <yamini.1...@gmail.com
<mailto:yamini.1...@gmail.com>> wrote:
Hello all
Does the HDFS replication improve performance of queries on Accumulo
or is it transparent to the Accumulo system? If it does improve the
performance by some notion of load balancing, is there is a Read
Only or Write Only copy of data on HDFS for Accumulo?
Best regards,
Yamini Joshi