Sunil Kumar created SPARK-18157:
-----------------------------------

             Summary: CLONE - Support purging aged file entry for 
FileStreamSource metadata log
                 Key: SPARK-18157
                 URL: https://issues.apache.org/jira/browse/SPARK-18157
             Project: Spark
          Issue Type: Sub-task
          Components: SQL, Streaming
            Reporter: Sunil Kumar
            Priority: Minor


Currently with SPARK-15698, FileStreamSource metadata log will be compacted 
periodically (10 batches by default), this means compacted batch file will 
contain whole file entries been processed. With the time passed, the compacted 
batch file will be accumulated to a relative large file. 

With SPARK-17165, now {{FileStreamSource}} doesn't track the aged file entry, 
but in the log we still keep the full records,  this is not necessary and quite 
time-consuming during recovery. So here propose to also add file entry purging 
ability to {{FileStreamSource}} metadata log.

This is pending on SPARK-15698.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to