Hey,

We have a Spark job that is OOMEing on the master, which we haven't
seen before.

The heap dump shows 70 byte[]s, owned by various Akka threads, all 48mb
each (3.3gb total), which I assume is from the maxMbInFlight value.

We have 30 slaves in the cluster, spark standalone, running Spark 0.9.

Is it expected to have this many Akka buffers around? I suppose if we
have 30 slaves, and 2 executors/slave, that would be 60
connections/threads. So if it over shoots to 70...

We can probably just lower the maxMbInFlight, as we're not pulling any
results back to the master anyway.

Does my reasoning make sense?

Thanks,
Stephen


Reply via email to