> there are scalar api functions that can be logically used to process rows
of data, but they are executed on columnar batches of data.
> As mentioned previously it is better to have an API that applies row
level transformations than to have an intermediary row level memory format.
Another way of
In pyarrow.compute which is an extension of the C++ implementation there are
scalar api functions that can be logically used to process rows of data, but
they are executed on columnar batches of data.
As mentioned previously it is better to have an API that applies row level
transformations
I am +0 on a standard API -- in the Rust arrow-rs implementation we tend to
borrow inspiration from the C++ / Java interfaces and then create
appropriate Rust APIs.
There is also a row based format in DataFusion [1] (Rust) and it is used to
implement certain GroupBy and Sorts (similarly to what
Hi Julian,
My intermediate representation is indeed an API and does not define a
specific physical format (which could be different from one language to
another, or even not exist at all in some cases). That being said, I didn't
understand your feedback and I'm sure there's something to dig into
Hi Gavin,
I was not aware of this initiative but indeed, these two proposals have
much in common. The implementation I am working on is available here
https://github.com/lquerel/otel-arrow-adapter (directory pkg/air). I would
be happy to get your feedback and identify with you the possible gaps to
Hi Sasha,
Thank you very much for this informative comment. It's interesting to see
another use of a row-based API in the context of a query engine. I think
that there is some thought to be given to whether or not it is possible to
converge these two use cases into a single public row-based API.
If the 'row-oriented format' is an API rather than a physical data
representation then it can be implemented via coroutines and could
therefore have less scattered patterns of read/write access.
By 'coroutines' I'm being rather imprecise, but I hope you get the
general idea. An asynchronous API
This is essentially the same idea as the proposal here I think --
row/map-based representation & conversion functions for ease of use:
[RFC] [Java] Higher-level "DataFrame"-like API. Lower barrier to entry,
increase adoption/audience and productivity. · Issue #12618 · apache/arrow
(github.com)
Hi everyone,
I just wanted to chime in that we already do have a form of row-oriented
storage inside of `arrow/compute/row/row_internal.h`. It is used to store rows
inside of GroupBy and Join within Acero. We also have utilities for converting
to/from columnar storage (and AVX2 implementations
Thank you Micah for a very clear summary of the intent behind this
proposal. Indeed, I think that clarifying from the beginning that this
approach aims at facilitating experimentation more than efficiency in terms
of performance of the transformation phase would have helped to better
understand my
Hi Laurent,
I'm retitling this thread to include the specific languages you seem to be
targeting in the subject line to hopefully get more eyes from maintainers
in those languages.
Thanks for clarifying the goals. If I can restate my understanding, the
intended use-case here is to provide easy
11 matches
Mail list logo