[ 
https://issues.apache.org/jira/browse/SPARK-29314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Burak Yavuz reassigned SPARK-29314:
-----------------------------------

    Assignee: Jungtaek Lim

> ProgressReporter.extractStateOperatorMetrics should not overwrite updated as 
> 0 when it actually runs a batch even with no data
> ------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-29314
>                 URL: https://issues.apache.org/jira/browse/SPARK-29314
>             Project: Spark
>          Issue Type: Bug
>          Components: Structured Streaming
>    Affects Versions: 2.4.4, 3.0.0
>            Reporter: Jungtaek Lim
>            Assignee: Jungtaek Lim
>            Priority: Major
>
> SPARK-24156 brought the ability to run a batch without actual data to enable 
> fast state cleanup as well as emit evicted outputs without waiting actual 
> data to come.
> This breaks some assumption on 
> `ProgressReporter.extractStateOperatorMetrics`. See comment in source code:
> {code:java}
> // lastExecution could belong to one of the previous triggers if 
> `!hasNewData`.
> // Walking the plan again should be inexpensive.
> {code}
> and newNumRowsUpdated is replaced to 0 if hasNewData is false. It makes sense 
> if we copy progress from previous execution (which means no batch is run for 
> this time), but after SPARK-24156 the precondition is broken. 
> Spark should still replace the value of newNumRowsUpdated with 0 if there's 
> no batch being run and it needs to copy the old value from previous 
> execution, but it shouldn't touch the value if it runs a batch for no data.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to