[Streaming][Structured Streaming] Understanding dynamic allocation in streaming jobs

Karthik Palaniappan Tue, 22 Aug 2017 12:25:17 -0700

I'm trying to understand dynamic allocation in Spark Streaming and Structured 
Streaming. It seems if you set spark.dynamicAllocation.enabled=true, both 
frameworks use Core's dynamic allocation algorithm -- request executors if the 
task backlog is a certain size, and remove executors if they idle for a certain 
period of time.



However, as this Cloudera post points out, that algorithm doesn't really make 
sense for streaming: 
https://www.cloudera.com/documentation/enterprise/5-5-x/topics/spark_streaming.html.
 When writing a toy streaming job, I did run into the issue where executors are 
never removed. Cloudera's suggestion of turning off dynamic allocation seems 
unreasonable -- Spark applications should grow/shrink to match their workload.


I see that Spark Streaming has its own (undocumented) configuration for dynamic 
allocation: https://issues.apache.org/jira/browse/SPARK-12133. Is that actually 
a supported feature? Or was that just an experiment? I had trouble getting this 
to work, but I'll follow up in a different thread.


Also, does Structured Streaming have its own dynamic allocation algorithm?


Thanks,

Karthik Palaniappan

[Streaming][Structured Streaming] Understanding dynamic allocation in streaming jobs

Reply via email to