I'm trying to understand dynamic allocation in Spark Streaming and Structured Streaming. It seems if you set spark.dynamicAllocation.enabled=true, both frameworks use Core's dynamic allocation algorithm -- request executors if the task backlog is a certain size, and remove executors if they idle for a certain period of time.
However, as this Cloudera post points out, that algorithm doesn't really make sense for streaming: https://www.cloudera.com/documentation/enterprise/5-5-x/topics/spark_streaming.html. When writing a toy streaming job, I did run into the issue where executors are never removed. Cloudera's suggestion of turning off dynamic allocation seems unreasonable -- Spark applications should grow/shrink to match their workload. I see that Spark Streaming has its own (undocumented) configuration for dynamic allocation: https://issues.apache.org/jira/browse/SPARK-12133. Is that actually a supported feature? Or was that just an experiment? I had trouble getting this to work, but I'll follow up in a different thread. Also, does Structured Streaming have its own dynamic allocation algorithm? Thanks, Karthik Palaniappan