Re: [RUST][Go][proposal] Arrow Intermediate Representation to facilitate the transformation of row-oriented data sources into Arrow columnar representation

2022-07-29 Thread Gavin Ray
> there are scalar api functions that can be logically used to process rows of data, but they are executed on columnar batches of data. > As mentioned previously it is better to have an API that applies row level transformations than to have an intermediary row level memory format. Another way of

Re: [RUST][Go][proposal] Arrow Intermediate Representation to facilitate the transformation of row-oriented data sources into Arrow columnar representation

2022-07-29 Thread Lee, David
In pyarrow.compute which is an extension of the C++ implementation there are scalar api functions that can be logically used to process rows of data, but they are executed on columnar batches of data. As mentioned previously it is better to have an API that applies row level transformations

Re: [RUST][Go][proposal] Arrow Intermediate Representation to facilitate the transformation of row-oriented data sources into Arrow columnar representation

2022-07-29 Thread Andrew Lamb
I am +0 on a standard API -- in the Rust arrow-rs implementation we tend to borrow inspiration from the C++ / Java interfaces and then create appropriate Rust APIs. There is also a row based format in DataFusion [1] (Rust) and it is used to implement certain GroupBy and Sorts (similarly to what

Re: [RUST][Go][proposal] Arrow Intermediate Representation to facilitate the transformation of row-oriented data sources into Arrow columnar representation

2022-07-29 Thread Laurent Quérel
Hi Julian, My intermediate representation is indeed an API and does not define a specific physical format (which could be different from one language to another, or even not exist at all in some cases). That being said, I didn't understand your feedback and I'm sure there's something to dig into

Re: [RUST][Go][proposal] Arrow Intermediate Representation to facilitate the transformation of row-oriented data sources into Arrow columnar representation

2022-07-29 Thread Laurent Quérel
Hi Gavin, I was not aware of this initiative but indeed, these two proposals have much in common. The implementation I am working on is available here https://github.com/lquerel/otel-arrow-adapter (directory pkg/air). I would be happy to get your feedback and identify with you the possible gaps to

Re: [RUST][Go][proposal] Arrow Intermediate Representation to facilitate the transformation of row-oriented data sources into Arrow columnar representation

2022-07-29 Thread Laurent Quérel
Hi Sasha, Thank you very much for this informative comment. It's interesting to see another use of a row-based API in the context of a query engine. I think that there is some thought to be given to whether or not it is possible to converge these two use cases into a single public row-based API.

Re: [RUST][Go][proposal] Arrow Intermediate Representation to facilitate the transformation of row-oriented data sources into Arrow columnar representation

2022-07-28 Thread Julian Hyde
If the 'row-oriented format' is an API rather than a physical data representation then it can be implemented via coroutines and could therefore have less scattered patterns of read/write access. By 'coroutines' I'm being rather imprecise, but I hope you get the general idea. An asynchronous API

Re: [RUST][Go][proposal] Arrow Intermediate Representation to facilitate the transformation of row-oriented data sources into Arrow columnar representation

2022-07-28 Thread Gavin Ray
This is essentially the same idea as the proposal here I think -- row/map-based representation & conversion functions for ease of use: [RFC] [Java] Higher-level "DataFrame"-like API. Lower barrier to entry, increase adoption/audience and productivity. · Issue #12618 · apache/arrow (github.com)

Re: [RUST][Go][proposal] Arrow Intermediate Representation to facilitate the transformation of row-oriented data sources into Arrow columnar representation

2022-07-28 Thread Sasha Krassovsky
Hi everyone, I just wanted to chime in that we already do have a form of row-oriented storage inside of `arrow/compute/row/row_internal.h`. It is used to store rows inside of GroupBy and Join within Acero. We also have utilities for converting to/from columnar storage (and AVX2 implementations

Re: [RUST][Go][proposal] Arrow Intermediate Representation to facilitate the transformation of row-oriented data sources into Arrow columnar representation

2022-07-28 Thread Laurent Quérel
Thank you Micah for a very clear summary of the intent behind this proposal. Indeed, I think that clarifying from the beginning that this approach aims at facilitating experimentation more than efficiency in terms of performance of the transformation phase would have helped to better understand my

[RUST][Go][proposal] Arrow Intermediate Representation to facilitate the transformation of row-oriented data sources into Arrow columnar representation

2022-07-28 Thread Micah Kornfield
Hi Laurent, I'm retitling this thread to include the specific languages you seem to be targeting in the subject line to hopefully get more eyes from maintainers in those languages. Thanks for clarifying the goals. If I can restate my understanding, the intended use-case here is to provide easy