Re: Spark 2.4 Structured Streaming Kafka assign API polling same offsets

2019-03-01 Thread Kristopher Kane
I figured out why. We are not persisting the data at the end of .load(). Thus, every operation like count() is going back to Kafka for the data again. On Fri, Mar 1, 2019 at 10:10 AM Kristopher Kane wrote: > > We are using the assign API to do batch work with Spark and Kafka. > What I'm seeing

Spark 2.4 Structured Streaming Kafka assign API polling same offsets

2019-03-01 Thread Kristopher Kane
We are using the assign API to do batch work with Spark and Kafka. What I'm seeing is the Spark executor work happening in the back ground and constantly polling the same data over and over until the main thread commits the offsets. Is the below a blocking operation? Dataset df =