Load distribution in Structured Streaming

Eric Beabes Mon, 06 Jul 2020 13:52:56 -0700

In my structured streaming job I've noticed that a LOT of data keeps going
to one executor whereas other executors don't process that much data. As a
result, tasks on that executor take a lot of time to complete. In other
words, the distribution is skewed.


I believe in Structured streaming the Partitions in the input Kafka topic
get evenly distributed amongst exectors, right? In our input Kafka topic
the data is fairly evenly distributed amongst partitions - I would think.
Any reason for this skew? Is there a way to fix it by using a Partitioner
or something like that? Please let me know.

Thanks in advance for the help.

Load distribution in Structured Streaming

Reply via email to