[jira] [Resolved] (SPARK-30900) FileStreamSource: Avoid reading compact metadata log twice if the query stops from compact batch and restarts

Jungtaek Lim (Jira) Mon, 30 Nov 2020 20:12:05 -0800


     [ 
https://issues.apache.org/jira/browse/SPARK-30900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Jungtaek Lim resolved SPARK-30900.
----------------------------------
    Fix Version/s: 3.1.0
       Resolution: Fixed

Issue resolved by pull request 27649
[https://github.com/apache/spark/pull/27649]

> FileStreamSource: Avoid reading compact metadata log twice if the query stops 
> from compact batch and restarts
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-30900
>                 URL: https://issues.apache.org/jira/browse/SPARK-30900
>             Project: Spark
>          Issue Type: Improvement
>          Components: Structured Streaming
>    Affects Versions: 3.1.0
>            Reporter: Jungtaek Lim
>            Assignee: Jungtaek Lim
>            Priority: Minor
>             Fix For: 3.1.0
>
>
> When restarting the query, there is a case which the query starts from 
> compaction batch, and the batch has source metadata file to read. One case is 
> that the previous query succeeded to read from inputs, but not finalized the 
> batch for various reasons.
> This case FileStreamSource will read the compact metadata file twice, one for 
> retrieving all files to build seen file map, another one for retrieving 
> entries in the batch. If the query processes huge number of inputs so far, 
> compact metadata file becomes considerably bigger, so reading once more adds 
> unnecessary latency on processing startup batch.
> This issue tracks the effort to address this case.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-30900) FileStreamSource: Avoid reading compact metadata log twice if the query stops from compact batch and restarts

Reply via email to