GitHub user amoeba added a comment to the discussion: Parquet row_group_size has confusing units
Hi @ThermodynamicBeta, thanks for starting this discussion. The units are in terms of rows and I believe the docstring was written with that knowledge and intent. You can see this by looking at https://github.com/apache/arrow/issues/34280 and https://github.com/apache/arrow/pull/34435. > ...it also mentions 1024 * 1024 and 64Mi, which suggests its units are bytes. Do you say that "Mi" suggests bytes because Mi is a base 2 prefix and we don't tend to use base 2 when counting? I want to make sure I understand your take on this since others are likely to be confused too. I think its current use here is correct and would be even more correct if we used `MiRows` (mebirows). Better even would be a small rewrite to avoid the confusion entirely. Let me know on the above question and, additionally, if you'd be interested in filing a PR with an improvement. Otherwise, I'd be happy to have a go. GitHub link: https://github.com/apache/arrow/discussions/46650#discussioncomment-13326600 ---- This is an automatically sent email for [email protected]. To unsubscribe, please send an email to: [email protected]
