westonpace commented on issue #15138: URL: https://github.com/apache/arrow/issues/15138#issuecomment-1399076522
Internally we do this by hashing the column. There is a PR under way (though it could use some review) to add a new hash compute function (https://github.com/apache/arrow/pull/13487). If that were to merge then you could approximate this pretty well by partitioning on bit_wise_and(hash(x), mask) where `mask` is something like `0x7` (for 8 groups) or `0xF` (for 16 groups), etc. To support non-powers-of-two groups we would need modulo. There is a PR in place for modulo but it seems to have stalled. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org