In my structured streaming job I've noticed that a LOT of data keeps going
to one executor whereas other executors don't process that much data. As a
result, tasks on that executor take a lot of time to complete. In other
words, the distribution is skewed.

I believe in Structured streaming the Partitions in the input Kafka topic
get evenly distributed amongst exectors, right? In our input Kafka topic
the data is fairly evenly distributed amongst partitions - I would think.
Any reason for this skew? Is there a way to fix it by using a Partitioner
or something like that? Please let me know.

Thanks in advance for the help.

Reply via email to