GitHub user amoeba added a comment to the discussion: Parquet row_group_size 
has confusing units

Hi @ThermodynamicBeta, thanks for starting this discussion. The units are in 
terms of rows and I believe the docstring was written with that knowledge and 
intent. You can see this by looking at 
https://github.com/apache/arrow/issues/34280 and 
https://github.com/apache/arrow/pull/34435.

> ...it also mentions 1024 * 1024 and 64Mi, which suggests its units are bytes.

Do you say that "Mi" suggests bytes because Mi is a base 2 prefix and we don't 
tend to use base 2 when counting? I want to make sure I understand your take on 
this since others are likely to be confused too.

I think its current use here is correct and would be even more correct if we 
used `MiRows` (mebirows). Better even would be a small rewrite to avoid the 
confusion entirely.

Let me know on the above question and, additionally, if you'd be interested in 
filing a PR with an improvement. Otherwise, I'd be happy to have a go.

GitHub link: 
https://github.com/apache/arrow/discussions/46650#discussioncomment-13326600

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]

Reply via email to