[ https://issues.apache.org/jira/browse/SPARK-35312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jungtaek Lim resolved SPARK-35312. ---------------------------------- Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 32653 [https://github.com/apache/spark/pull/32653] > Introduce new Option in Kafka source to specify minimum number of records to > read per trigger > --------------------------------------------------------------------------------------------- > > Key: SPARK-35312 > URL: https://issues.apache.org/jira/browse/SPARK-35312 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming > Affects Versions: 3.1.1 > Reporter: Satish Gopalani > Assignee: Satish Gopalani > Priority: Major > Fix For: 3.2.0 > > > Kafka source currently provides options to set the maximum number of offsets > to read per trigger. > I will like to introduce a new option to specify the minimum number of > offsets to read per trigger i.e. *minOffsetsPerTrigger*. > This new option will allow skipping trigger/batch when the number of records > available in Kafka is low. This is a very useful feature in cases where we > have a sudden burst of data at certain intervals in a day and data volume is > low for the rest of the day. Tunning such jobs is difficult as decreasing > trigger processing time increasing the number of batches and hence cluster > resource usage and adds to small file issues. Increasing trigger processing > time adds consumer lag. This will save cluster resources and also help solve > small file issues as it is running lesser batches. > Along with this, I would like to introduce '*maxTriggerDelay*' option which > will help to avoid cases of infinite delay in scheduling trigger and the > trigger will happen irrespective of records available if the maxTriggerDelay > time exceeds the last trigger. It would be an optional parameter with a > default value of 15 mins. _This option will be only applicable if > minOffsetsPerTrigger is set._ > *minOffsetsPerTrigger* option would be optional of course, but once specified > it would take precedence over *maxOffestsPerTrigger* which will be honored > only after *minOffsetsPerTrigger* is satisfied. > -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org