>
> > 2. What do we do about different non-utf8 encodings? There does not
> appear
> > to be a consensus yet on this point. One option is to only allow utf8
> > encoding and force implementers to convert non-utf8 to utf8. Second
> option
> > is to allow all encodings and capture the encoding in the
Le 01/08/2022 à 22:53, Pradeep Gollakota a écrit :
Thanks for all the great feedback.
To proceed forward, we seem to need decisions around the following:
1. Whether to use arrow extensions or first class types. The consensus is
building towards using arrow extensions.
+1
2. What do we do
I am +1 on either - imo:
* it is important to have either available
* both provide a non-trivial improvement over what we have
* the trade-off is difficult to decide upon - I trust whomever is
implementing it to experiment and decide which better fits Arrow and the
ecosystem.
Thank you so much fo
On Sun, Jul 31, 2022 at 8:05 AM Antoine Pitrou wrote:
>
>
> Hi Wes,
>
> Le 31/07/2022 à 00:02, Wes McKinney a écrit :
> >
> > I understand there are still some aspects of this project that cause
> > some squeamishness (like having arbitrary memory addresses embedded
> > within array values whose l
Hi,
I am trying to follow the C++ implementation with respect to mmap IPC files
and reading them zero-copy, in the context of reproducing it in Rust.
My understanding from reading the source code is that we essentially:
* identify the memory regions (offset and length) of each of the buffers,
via
Hello!
We recently updated Arrow to 7.0.0 and hit some error with our old code
(Details below). I wonder if there is a new way to do this with the current
version?
import pyarrow
import pyarrow.parquet as pq
df = pd.DataFrame({"aa": [1, 2, 3], "bb": [1, 2, 3]})
uri = "gs://amp_bucket_liao/tr
>
> It would be reasonable to restrict JSON to utf8, and tell people they
> need to transcode in the rare cases where some obnoxious software
> outputs utf16-encoded JSON.
+1 I think this aligns with the latest JSON RFC [1] as well.
Sounds good to me too. +1 on the canonical extension type option
Hello Everyone,
in ARROW-17224 [1] the painfully slow solve times of conda (while
installing arrow) were surfaced. The only solution seems to be to use mamba
(or wait until the mamba solver is integrated in conda...).
As waiting for a very long time is pretty bad U/DX should we recommend
mamba in
Potentially extending the IPC format to support these additional
flexibilities is the easy part.
The difficult part is to shoehorn the newstanding flexibility into
existing APIs, also leaking into the expectations of downstream users.
For example, in C++ it is expected that a RecordBatchRea
Hi,
+1 (non-binding)
TLDR, I've found an issue on Integration tests with UBUNTU 22.04 and
openjdk 18 but seems to be around how archery runs integration tests (See
JIRA [1]).
I've been able to verify the release without issues:
TEST_DEFAULT=0 TEST_SOURCE=1 dev/release/verify-release-candidate.s
I like the idea of adding these specialized encodings in some sort of
optional extension or wrapper around a RecordBatch, that maybe even isn't
standardized at all in Arrow.
As Sasha observes, with the notable exception of Dictionary, Arrow has
exactly one physical encoding for each logical type.
11 matches
Mail list logo