[ 
https://issues.apache.org/jira/browse/SPARK-35312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jungtaek Lim resolved SPARK-35312.
----------------------------------
    Fix Version/s: 3.2.0
       Resolution: Fixed

Issue resolved by pull request 32653
[https://github.com/apache/spark/pull/32653]

> Introduce new Option in Kafka source to specify minimum number of records to 
> read per trigger
> ---------------------------------------------------------------------------------------------
>
>                 Key: SPARK-35312
>                 URL: https://issues.apache.org/jira/browse/SPARK-35312
>             Project: Spark
>          Issue Type: Improvement
>          Components: Structured Streaming
>    Affects Versions: 3.1.1
>            Reporter: Satish Gopalani
>            Assignee: Satish Gopalani
>            Priority: Major
>             Fix For: 3.2.0
>
>
> Kafka source currently provides options to set the maximum number of offsets 
> to read per trigger.
> I will like to introduce a new option to specify the minimum number of 
> offsets to read per trigger i.e. *minOffsetsPerTrigger*.
> This new option will allow skipping trigger/batch when the number of records 
> available in Kafka is low. This is a very useful feature in cases where we 
> have a sudden burst of data at certain intervals in a day and data volume is 
> low for the rest of the day. Tunning such jobs is difficult as decreasing 
> trigger processing time increasing the number of batches and hence cluster 
> resource usage and adds to small file issues. Increasing trigger processing 
> time adds consumer lag. This will save cluster resources and also help solve 
> small file issues as it is running lesser batches.
>  Along with this, I would like to introduce '*maxTriggerDelay*' option which 
> will help to avoid cases of infinite delay in scheduling trigger and the 
> trigger will happen irrespective of records available if the maxTriggerDelay 
> time exceeds the last trigger. It would be an optional parameter with a 
> default value of 15 mins. _This option will be only applicable if 
> minOffsetsPerTrigger is set._
> *minOffsetsPerTrigger* option would be optional of course, but once specified 
> it would take precedence over *maxOffestsPerTrigger* which will be honored 
> only after *minOffsetsPerTrigger* is satisfied.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to