Changes to Spark's networking subsystem

Patrick Wendell Sat, 01 Nov 2014 13:17:52 -0700

== Short version ==
A recent commit replaces Spark's networking subsystem with one based
on Netty rather than raw sockets. Users running off of master can
disable this change by setting
"spark.shuffle.blockTransferService=nio". We will be testing with this
during the QA period for Spark 1.2. The new implementation is designed
to increase stability and decrease GC pressure during shuffles.


== Long version ==
For those who haven't been following the associated PR's and JIRA's:

We recently merged PR #2753 which creates a "network" package which
does not depend on Spark core. #2753 introduces a Netty-based
BlockTransferService to replace the NIO-based ConnectionManager, used
for transferring shuffle and RDD cache blocks between Executors (in
other words, the transport layer of the BlockManager).

The new BlockTransferService is intended to provide increased
stability, decreased maintenance burden, and decreased garbage
collection. By relying on Netty to take care of the low-level
networking, the actual transfer code is simpler and easier to verify.
By making use of ByteBuf pooling, we can lower both memory usage and
memory churn by reusing buffers. This was actually a critical
component of the petasort benchmark, where the code originated from.

While building this component, we realized it was a good opportunity
to extract out the core transport functionality from Spark so we could
reuse it for SPARK-3796, which calls for an external service which can
serve Spark shuffle files. Thus, we created the "network/common"
package, containing the functionality for setting up a simple control
plane and an efficient data plane over a network. This part is
functionally independent from Spark and is in fact written in Java to
further minimize dependencies.

PR #3001 finishes the work of creating an external shuffle service by
creating a "network/shuffle" package which deals with serving Spark
shuffle files from outside of an executor. The intention is that this
server can be run anywhere -- including inside the Spark Standalone
Worker or the YARN NodeManager, or as a separate service inside Mesos
-- and provide the ability to scale up and down executors without
losing shuffle data.

Thanks to Aaron, Reynold and others who have worked on these
improvements over the last month.

- Patrick

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Changes to Spark's networking subsystem

Reply via email to