Re: [DISCUSS] Drop Java 8 support

2024-05-26 Thread Gang Wu
Hi, IMHO, Apache Parquet Java [1] cannot drop Java 8 in all 1.x releases to keep maximum backward compatibility. There was a discussion on the 2.x major release [2] and v3 format [3]. I think it is a good chance to drop Java 8 from the 2.x release. [1] https://github.com/apache/parquet-java [2]

Re: [DISCUSS] Statistics through the C data interface

2024-05-26 Thread Sutou Kouhei
Hi, > It is usually fine but > occasionally ends up with schema metadata that is lying (e.g., when > unifying schemas from multiple files in a dataset, I believe pyarrow > will sometimes assign metadata from one file to the entire dataset > and/or

Re: [DISCUSS] Statistics through the C data interface

2024-05-26 Thread Sutou Kouhei
Hi, > To start, data might be sourced in various manners: > > - Arrow IPC files may be mapped from shared memory > - Arrow IPC streams may be received via some RPC framework (à la Flight) > - The Arrow libraries may be used to read from file formats like Parquet or > CSV > - ADBC drivers may be

Re: [DISCUSS] Statistics through the C data interface

2024-05-26 Thread Sutou Kouhei
Hi, > ADBC might be too big of a leap in complexity now, but "we just need C > Data Interface + statistics" is unlikely to remain true for very long > as projects grow in complexity. Does this mean that we will need C Data Interface + statistics + XXX + ... for query planning and so on? Or does