HeartSaVioR opened a new pull request #31495: URL: https://github.com/apache/spark/pull/31495
### What changes were proposed in this pull request? This PR proposes to optimize WAL commit phase via following changes: * cache offset log to avoid FS get operation per batch * use FS exist operation instead of FS list operation on purge (2 operations per batch) ### Why are the changes needed? There're inefficiency on WAL commit phase which can be easily optimized via using a small driver memory. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Manually tested with debug log. (Verified that cache is used, cache keeps the size as 2, only one exist call is used instead of list call) Experimental on AWS S3 + S3 guard: > before the patch > after the patch Experimental on Azure: > before the patch > after the patch ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org