Hi all, Spark's 100ms+ stage-launching overhead limits its applicability in low-latency stream processing and deep learning. The Drizzle paper published in SOSP '17 seems to solve this problem well by submitting a group of stages together to amortize the stage-launching overhead. It is also used by deep learning framework BigDL. Unfortunately, its current open-source repository (https://github.com/amplab/drizzle-spark) is based on an old version of Spark (2.1.1).
My question is: Does Spark support group-scheduling techniques like Drizzle? If not, does Spark plan to develop this feature in the future? Best, Bowen Yu