nsivabalan opened a new pull request, #6561: URL: https://github.com/apache/hudi/pull/6561
### Change Logs Apparently clustering is being triggered twice since we don't cache the write status and for doing some validation, we do isEmpty on JavaRDD<WriteStatus> which ended up calling it again. ### Impact Could improve the clustering performance. **Risk level: medium** If not for the fix, clustering could be triggered twice, but only one set of files will be included in the final commit metadata. Duplicated copy will be deleted during marker reconciliation step. Test/Verification: Manually verified that if not for the fix, markers are created twice(two files differ just in write token) and later reconcilation step deletes one of them. With the fix, I don't see such duplicates. Only one file is created for clustering and during reconciliation, nothing gets deleted. ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org