Hi,
> The exact types inside the dense_union would be chosen when encoding.
Ah, this approach doesn't standardize VALUE_SCHEMA (use a
fixed VALUE_SCHEMA). If it works in real world, it's more
flexible.
> Version markers in two-sided protocols never work well long term:
> see Parquet files l
Hi,
>> | Name | Type | Comments |
>> ||---| |
>> | column | utf8 | (2) |
>
> The uft8 type would presume that column names are unique (although I
> like it better than referring to columns by int
Hi,
We can use 4. for per-batch statistics. Because 4. uses
separated API call. Users can design the separated API call
for per-batch statistics.
Thanks,
--
kou
In
"Re: [DISCUSS] Statistics through the C data interface" on Thu, 6 Jun 2024
13:14:08 +0200,
Alessandro Molina wrote:
> I bro
Hi,
>> Metadata:
>> | Name | Value | Comments |
>> ||---|- |
>> | ARROW::statistics::version | 1.0.0 | (1) |
>
> I'm not sure this is useful, but it doesn't hurt.
The Apache Arrow columnar format uses semantic
versioning. So I th
+1. I think the benefits outweigh the risks.
On Wed, Jun 5, 2024 at 3:05 PM Anja wrote:
>
> I did want to start off by acknowledging that all of the pros you listed
> for mimalloc are accurate.
>
> I did want to contribute the times that people have been caught off-guard
> by the perceived increa
> I just used quantiles as an example of a statistic that's not in the current
> proposed spec, but that some engines would like to expose.
All statistics are optional so we can always add more to the spec.
> In other words, a plain integer makes extensibility more difficult than a
> string.
O
Le 07/06/2024 à 18:30, Felipe Oliveira Carvalho a écrit :
On Fri, Jun 7, 2024 at 6:24 AM Antoine Pitrou wrote:
Le 07/06/2024 à 04:27, Felipe Oliveira Carvalho a écrit :
I've been thinking about how to encode statistics on Arrow arrays and
how to keep the set of statistics known by both pr