Re: [Format][Union] polymorphic vectors vs ADT style vectors

2024-04-02 Thread Felipe Oliveira Carvalho
Algebraic Data Types (Sums and Products) are very abstract. This means they don't fully specify a concrete/physical layout [1]: different physical layouts can match the same algebraic definition. As an in-memory data format specification, Arrow doesn't and shouldn't rigidly specify concretization r

Re: [VOTE] Protocol for Dissociated Arrow IPC Transports

2024-04-02 Thread Weston Pace
Forgot link: [1] https://developer.mozilla.org/en-US/docs/WebAssembly/JavaScript_interface/Memory On Tue, Apr 2, 2024 at 11:38 AM Weston Pace wrote: > Thanks for taking the time to address my concerns. > > > I've split the S3/HTTP URI flight pieces out into a separate document and > > separate

Re: [VOTE] Protocol for Dissociated Arrow IPC Transports

2024-04-02 Thread Weston Pace
Thanks for taking the time to address my concerns. > I've split the S3/HTTP URI flight pieces out into a separate document and > separate thing to vote on at the request of several people who wanted to > view these as two separate proposals to vote on. So this vote *only* covers > adopting the pro

Re: [Format][ADBC] GetObjects semantics for system objects

2024-04-02 Thread Joel Lubinitsky
I'm fine with this approach as well. Mostly I think it will just be helpful to say something about it in the spec or docs. This could mean prescribing specific behavior, or simply stating that the behavior is implementation-specific. Otherwise the next best thing for implementers to do is compare w

Re: [ANNOUNCE] New Committer Joel Lubinitsky

2024-04-02 Thread Jacob Wujciak
Congratulations and welcome Joel! Joel Lubinitsky schrieb am Di., 2. Apr. 2024, 14:21: > Thanks everyone! It's such a privilege to be working with you all, looking > forward to building even more. > > On Mon, Apr 1, 2024 at 5:34 PM David Li wrote: > > > Congrats Joel! > > > > On Tue, Apr 2, 202

Re: Upgrading Java version in build toolchain

2024-04-02 Thread Laurent Goujon
At code level we need to separate language features from library features? It should be possible to leverage memory API for example through reflection and/or multi-release jar files, but record is a language feature and it would not possible to use it without targeting java 17 at the source level.

Re: [Format][Union] polymorphic vectors vs ADT style vectors

2024-04-02 Thread James Henderson
Thanks @Antoine/@Weston - we've raised an issue [1] for the same in Arrow Java as suggested. Cheers, James [1]: https://github.com/apache/arrow/issues/40951 On Tue, 2 Apr 2024 at 14:29, Finn Völkel wrote: > @weston I think my mentioning of ADT was a mistake. I am just thinking of > sum types

Re: [Format][ADBC] GetObjects semantics for system objects

2024-04-02 Thread David Li
I don't see why we should exclude them; I would also caution against treating the driver behavior (especially only drivers in one language) as a reference. On Tue, Apr 2, 2024, at 23:04, Joel Lubinitsky wrote: > Hi, > > The ADBC spec does not currently define whether system > catalogs/schemas/tab

[Format][ADBC] GetObjects semantics for system objects

2024-04-02 Thread Joel Lubinitsky
Hi, The ADBC spec does not currently define whether system catalogs/schemas/tables (e.g. information_schema.columns, sqlite_master, etc) should be included in the result of ConnectionGetObjects. A survey of existing driver implementations such as sqlite and postgresql indicates that the current c

Re: [Format][Union] polymorphic vectors vs ADT style vectors

2024-04-02 Thread Finn Völkel
@weston I think my mentioning of ADT was a mistake. I am just thinking of sum types (https://en.wikipedia.org/wiki/Tagged_union) which I should have just called differently. You are thinking of a product type which is better represented by a StructVector with nullable child vectors. @antoine Thank

Re: [Discussion][C++][FlightRPC] What stage to submit a PR for Flight SQL ODBC driver

2024-04-02 Thread Jean-Baptiste Onofré
Hi folks, Here's a quick update on the ODBC donation: - I created the PR today - I updated the original code to use Arrow main and dependencies - I did a cleanup on the CMakeLists.txt files The PR is a draft for now as I'm still working on it: - finalize the updates and fix the build - verify on

Re: [Format][Union] polymorphic vectors vs ADT style vectors

2024-04-02 Thread Weston Pace
Wouldn't support for ADT require expressing more than 1 type id per record? In other words, if `put` has type id 1, `delete` has type id 2, and `erase` has type id 3 then there is no way to express something is (for example) both type id 1 and type id 3 because you can only have one type id per re

Re: [ANNOUNCE] New Committer Joel Lubinitsky

2024-04-02 Thread Joel Lubinitsky
Thanks everyone! It's such a privilege to be working with you all, looking forward to building even more. On Mon, Apr 1, 2024 at 5:34 PM David Li wrote: > Congrats Joel! > > On Tue, Apr 2, 2024, at 05:42, Weston Pace wrote: > > Congratulations Joel! > > > > On Mon, Apr 1, 2024 at 1:16 PM Bryce M

Re: [Format][Union] polymorphic vectors vs ADT style vectors

2024-04-02 Thread Finn Völkel
I also meant Algebraic Data Type not Abstract Data Type (too many acronymns). On Tue, 2 Apr 2024 at 13:28, Antoine Pitrou wrote: > > Thanks. The Arrow spec does support multiple union members with the same > type, but not all implementations do. The C++ implementation should > support it, though

Re: [Format][Union] polymorphic vectors vs ADT style vectors

2024-04-02 Thread Antoine Pitrou
Thanks. The Arrow spec does support multiple union members with the same type, but not all implementations do. The C++ implementation should support it, though to my surprise we do not seem to have any tests for it. If the Java implementation doesn't, then you can probably open an issue for

Re: [Format][Union] polymorphic vectors vs ADT style vectors

2024-04-02 Thread Finn Völkel
> Can you explain what ADT means ? Sorry about that. ADT stands for Abstract Data Type. What do I mean by an ADT style vector? Let's take an example from the project I am on. We have an `op` union vector with three child vectors `put`, `delete`, `erase`. `delete` and `erase` have the same type bu

Re: [Format][Union] polymorphic vectors vs ADT style vectors

2024-04-02 Thread Steve Kim
Thank you for asking this question. I have the same question. I noted a similar problem in the c++/python implementation: https://github.com/apache/arrow/issues/19157#issuecomment-1528037394 On Tue, Apr 2, 2024, 04:30 Finn Völkel wrote: > Hi, > > my question primarily concerns the union layout

Re: [Format][Union] polymorphic vectors vs ADT style vectors

2024-04-02 Thread Antoine Pitrou
Can you explain what ADT means ? Le 02/04/2024 à 11:31, Finn Völkel a écrit : Hi, my question primarily concerns the union layout described at https://arrow.apache.org/docs/format/Columnar.html#union-layout There are two ways to use unions: - polymorphic vectors (world 1) - ADT st

[Format][Union] polymorphic vectors vs ADT style vectors

2024-04-02 Thread Finn Völkel
Hi, my question primarily concerns the union layout described at https://arrow.apache.org/docs/format/Columnar.html#union-layout There are two ways to use unions: - polymorphic vectors (world 1) - ADT style vectors (world 2) In world 1 you have a vector that stores different types. In the

Re: [RESULT][VOTE] Release Apache Arrow ADBC 0.11.0 - RC0

2024-04-02 Thread David Li
Update on tasks: [x] Close the GitHub milestone/project [x] Add the new release to the Apache Reporter System [x] Upload source release artifacts to Subversion [x] Create the final GitHub release [x] Update website [x] Upload wheels/sdist to PyPI [x] Publish Maven packages [x] Update tags for Go m

Re: Upgrading Java version in build toolchain

2024-04-02 Thread Jean-Baptiste Onofré
Hi Laurent It makes sense to me. I started this "move" (on the plugin side of the thing) as part of the reproducible build effort. At code level, I think it would be great to leverage some features from Java 17+ (I'm thinking about record, memory API, etc). I would be more than happy to help on t