jonvex opened a new pull request, #7413: URL: https://github.com/apache/hudi/pull/7413
### Change Logs Currently, all of the Custom Bulk Insert ColumnSortPartitioner impls incorrectly return "true" from the "arePartitionRecordsSorted" method, even though records might not necessarily be sorted by the partition-path columns as is required by this method. I fixed the implementations to return true only if the sort column names list starts with the partition-path column name. ### Impact In the case when these Partitioners are used and the sort column names don't start with the partitionPath, this could lead to a Parquet writers being closed prematurely when writing files creating a LOT of small files in the current implementation. This fix will prevent this. ### Risk level (write none, low medium or high below) low ### Documentation Update Maybe need to change "hoodie.clustering.plan.strategy.sort.columns" to explain this? And any other configs that are used to set the sort ordering. ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org