westonpace commented on issue #15138:
URL: https://github.com/apache/arrow/issues/15138#issuecomment-1399076522

   Internally we do this by hashing the column.  There is a PR under way 
(though it could use some review) to add a new hash compute function 
(https://github.com/apache/arrow/pull/13487).  If that were to merge then you 
could approximate this pretty well by partitioning on bit_wise_and(hash(x), 
mask) where `mask` is something like `0x7` (for 8 groups) or `0xF` (for 16 
groups), etc.  To support non-powers-of-two groups we would need modulo.  There 
is a PR in place for modulo but it seems to have stalled.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to