I am +1 on either - imo:
* it is important to have either available
* both provide a non-trivial improvement over what we have
* the trade-off is difficult to decide upon - I trust whomever is
implementing it to experiment and decide which better fits Arrow and the
ecosystem.
Thank you so much
On Sun, Jul 31, 2022 at 8:05 AM Antoine Pitrou wrote:
>
>
> Hi Wes,
>
> Le 31/07/2022 à 00:02, Wes McKinney a écrit :
> >
> > I understand there are still some aspects of this project that cause
> > some squeamishness (like having arbitrary memory addresses embedded
> > within array values whose
Hi,
I am trying to follow the C++ implementation with respect to mmap IPC files
and reading them zero-copy, in the context of reproducing it in Rust.
My understanding from reading the source code is that we essentially:
* identify the memory regions (offset and length) of each of the buffers,
Hello!
We recently updated Arrow to 7.0.0 and hit some error with our old code
(Details below). I wonder if there is a new way to do this with the current
version?
import pyarrow
import pyarrow.parquet as pq
df = pd.DataFrame({"aa": [1, 2, 3], "bb": [1, 2, 3]})
uri =
>
> It would be reasonable to restrict JSON to utf8, and tell people they
> need to transcode in the rare cases where some obnoxious software
> outputs utf16-encoded JSON.
+1 I think this aligns with the latest JSON RFC [1] as well.
Sounds good to me too. +1 on the canonical extension type
Hello Everyone,
in ARROW-17224 [1] the painfully slow solve times of conda (while
installing arrow) were surfaced. The only solution seems to be to use mamba
(or wait until the mamba solver is integrated in conda...).
As waiting for a very long time is pretty bad U/DX should we recommend
mamba
Potentially extending the IPC format to support these additional
flexibilities is the easy part.
The difficult part is to shoehorn the newstanding flexibility into
existing APIs, also leaking into the expectations of downstream users.
For example, in C++ it is expected that a
Hi,
+1 (non-binding)
TLDR, I've found an issue on Integration tests with UBUNTU 22.04 and
openjdk 18 but seems to be around how archery runs integration tests (See
JIRA [1]).
I've been able to verify the release without issues:
TEST_DEFAULT=0 TEST_SOURCE=1
I like the idea of adding these specialized encodings in some sort of
optional extension or wrapper around a RecordBatch, that maybe even isn't
standardized at all in Arrow.
As Sasha observes, with the notable exception of Dictionary, Arrow has
exactly one physical encoding for each logical type.