[DISCUSS] Encoding improvements (follow-up from Parquet "V3" discussion)

2024-05-27 Thread Micah Kornfield
As a follow-up to the "V3" Discussions [1][2] I wanted to start a thread on improvements to encodings. There are several areas to pursue here: 1. Curating a standard set of benchmarks and criteria for determining if a new encoding is worth adding. 2. Developing new encodings 3. Better implement

[DISCUSS] Improvements to File Footer metadata (v3 discussion follow-up)

2024-05-27 Thread Micah Kornfield
As a follow-up to the "V3" Discussions [1][2] I wanted to start a thread on improvements to the footer metadata. Based on conversation so far, there have been a few proposals [3][4][5] to help better support files with wide schemas and many row-groups. I think there are a lot of interesting ideas

[DISCUSS] Infrastructure/Documentation improvement in Parquet

2024-05-27 Thread Micah Kornfield
As a follow-up to the "V3" Discussions [1][2] I wanted to start a discussion to see who is interested in improving Parquet infrastructure. In particular, as we consider newer features, I think we should be considering regular major version releases, to allow for new features to become default. The

Re: Interest in Parquet V3

2024-05-27 Thread Micah Kornfield
Hi Everyone, Just to follow up, conversations on the summary doc [1] have largely slowed down. In my mind I think of roughly three different tracks, and I'll start threads to get a sense of who is interested (please be on the lookout for discussion threads). I think as those conversations branch

Re: [DISCUSS] Arrow dropping Java 8 support

2024-05-27 Thread Gábor Szádovszky
Thanks a lot Weston for bringing this up. Last time we discussed a potential java upgrade, Hadoop was the one not allowing us to do so. Hadoop is still on java 8. If we want to keep Arrow on the latest version, we will need to upgrade to java 11. In this case we won't be able to support Hadoop wit