This has been brought up a few times. I will focus on Spark Structured Streaming
Autoscaling does not support Spark Structured Streaming (SSS). Why because streaming jobs are typically long-running jobs that need to maintain state across micro-batches. Autoscaling is designed to scale up and down Spark clusters in response to workload changes However, this would cause problems for Spark Structured Streaming jobs because it would cause the jobs to lose their state. These jobs continuously process incoming data and update their state incrementally (see checkpoint directory). Autoscaling, which can dynamically add or remove worker nodes, would disrupt this stateful processing. Although Spark itself supports dynamic allocation, (i.e. which can add or remove executor nodes based on demand), it is not the same as autoscaling in cloud like GCP etc like Kubernetes or managed clusters. For now you need to plan your workload in SSS accordingly. My general advice, the usual thing to watch from Spark GUI Processing Time (Process Rate) + Reserved Capacity < Batch Interval (Batch Duration) If your sink has an issue absorbing data in a timely manner as per above formulae, you will see the defect on the Processing Rate Batch Interval, i.e. the rate at which the upstream source sends messages through Kafka or other source. We can start by assuming that the rate of increase in the number of messages processed (processing time) will require an *additional reserved capacity*. We can anticipate a heuristic 70% (~1SD) increase in the processing time so in theory you should be able to handle all this work below the batch interval. HTH Mich Talebzadeh, Distinguished Technologist, Solutions Architect & Engineer London United Kingdom view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer:* Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction. On Tue, 10 Oct 2023 at 16:11, Kiran Biswal <biswalki...@gmail.com> wrote: > Hello Experts > > Is there any true auto scaling option for spark? The dynamic auto scaling > works only for batch. Any guidelines on spark streaming autoscaling and > how that will be tied to any cluster level autoscaling solutions? > > Thanks >