The delay due to replication is rarely a large problem in traditional
map-reduce programs since many writes are occurring at once. The real
problem comes because you are consuming 3x the total disk bandwidth so that
the theoretical maximum equilibrium write bandwidth is limited to the
lesser of ha
I think higher replication only makes read easier as client can choose to read
block from nearest node.
Writes are done using replication pipeline so client does wait for ack from all
nodes but writes to only first node. It would be interesting to see if there
are any benchmarks for delay cau
Hi all,
Is anyone aware of any survey/paper/report showing the relationship
between a replication factor and its penalty/benefit on write/read
operations?
BR,
George
--
---