Re: [DISCUSS] Clarify min-max truncation in Parquet statistics

2024-09-09 Thread Alkis Evlogimenos
TIL thank you. In the case of aggregate pushdown, can't an engine use inexact min/max? I understand it will have to scan at least one row group. To be exact it will have to scan all row groups that match the inexact min (or max) across all row groups. If we consider that the optimization is intere

Re: [DISCUSS] Clarify min-max truncation in Parquet statistics

2024-09-09 Thread Jan Finis
> > Another option is to keep the complexity, make inexact the default (and > thus not pay bytes for it on the wire) and allow engines to emit exact > BINARY if they so desire. I would argue that - for correctness - inexact *has to be* the default, as legacy writers will not write it. So you canno

Re: [DISCUSS] Adopt Variant Spec from Spark?

2024-09-09 Thread Gang Wu
Update: The Spark community has successfully closed the vote [1] to move Variant spec to Parquet. I will start a formal vote on the Parquet side, though we have already received sufficient binding votes in this thread. [1] https://lists.apache.org/thread/pkybo148j6qyn2wsjnmyrhqs3crn9b89 Best, Gan

Re: [VOTE] Adopt Variant from Spark

2024-09-09 Thread Gang Wu
No, the proposed approach comes from a quick math from all received responses which is acceptable by most people. If people have strong opinions to use a dedicated repo for variant spec and impl, would it be good to express "+1 for dedicated repo" in this vote as well? On Tue, Sep 10, 2024 at 10: