Hi,
I have a question regarding Spark streaming resiliency and the
documentation is ambiguous :

The documentation says that the default configuration use a replication
factor of 2 for data received but the recommendation is to use write ahead
logs to guarantee data resiliency with receivers.

"Additionally, it is recommended that the replication of the received data
within Spark be disabled when the write ahead log is enabled as the log is
already stored in a replicated storage system."
The doc says it useless to duplicate with WAL, but what is the benefit of
using WAL instead of the internal in memory replication ? I would assume
it's better to replicate in memory than write on a replicated FS reagarding
performance...

Can a streaming expert explain me ?
BR

Reply via email to