Burak Yavuz created SPARK-11419: ----------------------------------- Summary: WriteAheadLog recovery improvements for when closeFileAfterWrite is enabled Key: SPARK-11419 URL: https://issues.apache.org/jira/browse/SPARK-11419 Project: Spark Issue Type: Improvement Components: Streaming Reporter: Burak Yavuz
The support for closing WriteAheadLog files after writes was just merged in. Closing every file after a write is a very expensive operation as it creates many small files on S3. It's not necessary to enable it on HDFS anyway. However, when you have many small files on S3, recovery takes very long. We can parallelize the recovery process. In addition, files start stacking up pretty quickly, and deletes may not be able to keep up, therefore we should add support for that as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org