Re: [proposal] Arrow Intermediate Representation to facilitate the transformation of row-oriented data sources into Arrow columnar representation

2022-07-29 Thread Micah Kornfield
Please note this message and the previous one from the author violate our Code of Conduct [1]. Specifically "Do not insult or put down other participants." Please try to be professional in communications and focus on the technical issues at hand. [1]

Re: [DISCUSS][Format] Dynamic data encodings in the IPC format and C ABI

2022-07-29 Thread Sasha Krassovsky
Hi, I’ve also had quite a few thoughts on this, as it is somewhat strange at the moment (within the context of Acero at least) that e.g. IntegerDictionary is not the same type as an Integer, meaning that we have to manually cast between the two or reject any operation that mixes the two. I was

[DISCUSS][Format] Dynamic data encodings in the IPC format and C ABI

2022-07-29 Thread Wes McKinney
hi all, Since we've been recently discussing adding new data types, memory formats, or data encodings to Arrow, I wanted to bring up a more "big picture" question around how we could support data whose encodings may change throughout the lifetime of a data stream sent via the IPC format (e.g.

[VOTE] Release Apache Arrow 9.0.0 - RC2

2022-07-29 Thread Krisztián Szűcs
Hi, I would like to propose the following release candidate (RC2) of Apache Arrow version 9.0.0. This is a release consisting of 507 resolved JIRA issues[1]. This release candidate is based on commit: ea6875fd2a3ac66547a9a33c5506da94f3ff07f2 [2] The source release rc2 is hosted at [3]. The

Re: [ARROW-17255] Logical JSON type in Arrow

2022-07-29 Thread Wes McKinney
I think either path: * Canonical extension type * First-class type in the Type union in Flatbuffers would be OK. The canonical extension type option is the preferable path here, I think, because it allows Arrow implementations without any special handling for JSON to allow the data to pass

Re: [ARROW-17255] Logical JSON type in Arrow

2022-07-29 Thread Micah Kornfield
Just to be clear, I think we are referring to a "well known"/canonical extension type [1] here? I'd also be in favor of this (Disclaimer I'm a colleague of Padeep's) [1] https://arrow.apache.org/docs/format/Columnar.html#extension-types On Fri, Jul 29, 2022 at 3:19 PM Wes McKinney wrote: >

Re: [ARROW-17255] Logical JSON type in Arrow

2022-07-29 Thread Wes McKinney
This seems like a common-enough data type that having a first-class logical type would be a good idea (perhaps even more so than UUID!). Compute engines would be able to implement kernels that provide manipulations of JSON data similar to what you can do with jq or GraphQL. On Fri, Jul 29, 2022

Re: CMake dependencies for arrow flight

2022-07-29 Thread Li Jin
(Nvm the libre2 error, It was my mistake) On Fri, Jul 29, 2022 at 4:49 PM Li Jin wrote: > Also, if it is the google re2, is there a minimum version required? > Currently my system has re2 from 20201101. > > On Fri, Jul 29, 2022 at 4:45 PM Li Jin wrote: > >> Thanks David! >> >> I used the code

Re: CMake dependencies for arrow flight

2022-07-29 Thread David Li
Not sure what specifically is causing that error in your case, sorry. RE2 is indeed the regex engine. Arrow appears to select a newer version by default [1] but I'm not sure if this is *required*. However, the error indicates that the library just isn't there, or at least can't be found at

Re: CMake dependencies for arrow flight

2022-07-29 Thread Li Jin
Also, if it is the google re2, is there a minimum version required? Currently my system has re2 from 20201101. On Fri, Jul 29, 2022 at 4:45 PM Li Jin wrote: > Thanks David! > > I used the code in the sql flight Cmakelist. Unfortunately I hit another > error, I wonder if you happen to know a

Re: CMake dependencies for arrow flight

2022-07-29 Thread Li Jin
Thanks David! I used the code in the sql flight Cmakelist. Unfortunately I hit another error, I wonder if you happen to know a quick fix for this? (I don't know about libre2, is it https://github.com/google/re2 or sth else?) "libre2.so: cannot open shared object file: No such file or directory"

Re: CMake dependencies for arrow flight

2022-07-29 Thread David Li
You'll also need to link to arrow_flight (and ditto for other libraries you may want to use). Note that due to ARROW-12175 you may need a bit of finagling if you're using CMake as your build system [1]. You can see a small workaround at [2]. [1]:

Re: CMake dependencies for arrow flight

2022-07-29 Thread Li Jin
(This is with Arrow 7.0.0) On Fri, Jul 29, 2022 at 3:52 PM Li Jin wrote: > Hi! > > I saw this error when linking my code against arrow flight and suspect I > didn't write my cmake correctly: > > "error: undefined reference to arrow::flight::Location::Location()" > > I followed

CMake dependencies for arrow flight

2022-07-29 Thread Li Jin
Hi! I saw this error when linking my code against arrow flight and suspect I didn't write my cmake correctly: "error: undefined reference to arrow::flight::Location::Location()" I followed https://arrow.apache.org/docs/cpp/build_system.html#cmake and linked my executable with arrow_shared. Is

Re: [RUST][Go][proposal] Arrow Intermediate Representation to facilitate the transformation of row-oriented data sources into Arrow columnar representation

2022-07-29 Thread Gavin Ray
> there are scalar api functions that can be logically used to process rows of data, but they are executed on columnar batches of data. > As mentioned previously it is better to have an API that applies row level transformations than to have an intermediary row level memory format. Another way of

Re: [RUST][Go][proposal] Arrow Intermediate Representation to facilitate the transformation of row-oriented data sources into Arrow columnar representation

2022-07-29 Thread Lee, David
In pyarrow.compute which is an extension of the C++ implementation there are scalar api functions that can be logically used to process rows of data, but they are executed on columnar batches of data. As mentioned previously it is better to have an API that applies row level transformations

Re: [Flight][Java][JDBC] IP clearance of Flight JDBC Driver

2022-07-29 Thread David Li
The vote/form are now done [1]. (There were a few points of clarification required.) Up next: I will merge the PR into the branch, then merge master into the branch. After that we can open a final mega PR for review/merge into master. [1]:

[DISCUSS] [RUST] object_store release planning / schedule

2022-07-29 Thread Andrew Lamb
Hi, We have completed IP clearance, code merge, and CI for the Rust object store implementation. One final unresolved discussion is when to release new versions. I would like invite anyone with an opinion to discuss the proposal for a new release on [1] More details on the progress of the

Re: [proposal] Arrow Intermediate Representation to facilitate the transformation of row-oriented data sources into Arrow columnar representation

2022-07-29 Thread Andrew Lamb
There has been a substantial amount of effort put into the arrow-rs Rust Parquet implementation to handle the corner cases of nested structs and list, and all the fun of various levels of nullability. Do let us know if you happen to try writing nested structures directly to parquet and have

Re: [RUST][Go][proposal] Arrow Intermediate Representation to facilitate the transformation of row-oriented data sources into Arrow columnar representation

2022-07-29 Thread Andrew Lamb
I am +0 on a standard API -- in the Rust arrow-rs implementation we tend to borrow inspiration from the C++ / Java interfaces and then create appropriate Rust APIs. There is also a row based format in DataFusion [1] (Rust) and it is used to implement certain GroupBy and Sorts (similarly to what

Re: [VOTE] Release Apache Arrow 9.0.0 - RC1

2022-07-29 Thread Sutou Kouhei
-1 Sorry. I found a problem in Linux packages. I'm fixing this at https://github.com/apache/arrow/pull/13739 . Thanks, -- kou In "[VOTE] Release Apache Arrow 9.0.0 - RC1" on Thu, 28 Jul 2022 16:47:33 +0200, Krisztián Szűcs wrote: > Hi, > > I would like to propose the following release

Re: [RUST][Go][proposal] Arrow Intermediate Representation to facilitate the transformation of row-oriented data sources into Arrow columnar representation

2022-07-29 Thread Laurent Quérel
Hi Julian, My intermediate representation is indeed an API and does not define a specific physical format (which could be different from one language to another, or even not exist at all in some cases). That being said, I didn't understand your feedback and I'm sure there's something to dig into

Re: [RUST][Go][proposal] Arrow Intermediate Representation to facilitate the transformation of row-oriented data sources into Arrow columnar representation

2022-07-29 Thread Laurent Quérel
Hi Gavin, I was not aware of this initiative but indeed, these two proposals have much in common. The implementation I am working on is available here https://github.com/lquerel/otel-arrow-adapter (directory pkg/air). I would be happy to get your feedback and identify with you the possible gaps to

Re: [RUST][Go][proposal] Arrow Intermediate Representation to facilitate the transformation of row-oriented data sources into Arrow columnar representation

2022-07-29 Thread Laurent Quérel
Hi Sasha, Thank you very much for this informative comment. It's interesting to see another use of a row-based API in the context of a query engine. I think that there is some thought to be given to whether or not it is possible to converge these two use cases into a single public row-based API.