Re: [DISCUSS] Apache Iceberg / Apache Hudi support in Arrow

2022-10-04 Thread Will Jones
And to clarify: IIUC FlightSQL is necessary to wrap a Java implementation, since it would be impractical to wrap Java code in the ADBC C interface. But for a Rust / C++ implementation, one could just directly implement ADBC. (Although a FlightSQL implementation would also be useful for distributed

Re: [DISCUSS] Apache Iceberg / Apache Hudi support in Arrow

2022-10-04 Thread David Li
It's possible we could wrap Iceberg et al. in Flight SQL to provide this, exposing Iceberg metadata via the Flight SQL endpoints, and table reads via Substrait plans. (Clients could send Substrait plans through ADBC, and we could integrate ADBC as a type of dataset.) I'm not familiar enough

Re: [DISCUSS] Apache Iceberg / Apache Hudi support in Arrow

2022-10-03 Thread Antoine Pitrou
Hi all, Le 03/10/2022 à 17:03, Will Jones a écrit : Hi Rusty, Note we discussed Iceberg a while ago [1]. I don't think we've discussed Hudi in any depth. As I see it, we are waiting on three things: 1. Someone willing to move forward the Iceberg / Hudi integration. 2. The Iceberg and Hudi

Re: [DISCUSS] Apache Iceberg / Apache Hudi support in Arrow

2022-10-03 Thread Matt Topol
I wanted to chime in that a current long term goal I am working towards is a Golang iceberg implementation that will also integrate with the Golang Arrow modules. I'm not sure how much desire there is for it, but I do know at least two consumers that would greatly benefit from it. But, at least

Re: [DISCUSS] Apache Iceberg / Apache Hudi support in Arrow

2022-10-03 Thread Will Jones
Hi Rusty, Note we discussed Iceberg a while ago [1]. I don't think we've discussed Hudi in any depth. As I see it, we are waiting on three things: 1. Someone willing to move forward the Iceberg / Hudi integration. 2. The Iceberg and Hudi projects need native libraries that we can use. The base

[DISCUSS] Apache Iceberg / Apache Hudi support in Arrow

2022-10-03 Thread Rusty Conover
Hi Arrow Team, Arrow is fantastic for manipulating the Parquet file format. There is an increasing desire to have the ability to update, delete and insert the rows stored in Parquet files, but without rewriting the Parquet files in their entirety. It is not uncommon to have gigabytes/petabytes