Thank you for asking this question. I have the same question.
I noted a similar problem in the c++/python implementation:
https://github.com/apache/arrow/issues/19157#issuecomment-1528037394
On Tue, Apr 2, 2024, 04:30 Finn Völkel wrote:
> Hi,
>
> my question primarily concerns the union layout
Hello everyone,
I would like to be able to quickly seek to an arbitrary row in an Arrow
file.
With the current file format, reading the file footer alone is not enough to
determine the record batch that contains a given row index. The row counts
of the record batches are only found in the metadat
I prefer the lz4 frame format for the reasons that Antoine stated.
To be friendly to users, the Arrow IPC documentation could mention
that lz4 compression may break Java interoperability. If block
dependency is the only obstacle to Java interoperability, the Arrow
IPC implementation could disable
I recommend that you direct these questions to u...@arrow.apache.org
(https://mail-archives.apache.org/mod_mbox/arrow-user/).
On Fri, Jan 29, 2021 at 7:07 AM Joris Peeters
wrote:
>
> Hello,
>
> I'm writing an HTTP server in Java that provides Arrow data to users. For
> performance, I keep the mo
> This should be possible already, at least on git master but perhaps also
> in 2.0.0. Which problem are you encountering?
With pyarrow 2.0.0, I encountered the following:
```
>>> import pyarrow as pa
>>> import pyarrow.parquet as pq
>>> import pyarrow.dataset as ds
>>> pa.__version__
'2.0.0'
>>
more generally by
enabling users to specify type coercion/promotion when mapping Parquet
types to Arrow types.
Are other users interested in this feature? Is anyone opposed?
Steve Kim
I have been following the discussion on a pull request (
https://github.com/apache/arrow/pull/7030) by Hongze Zhang to use the
high-level dataset API via JNI.
An obstacle that was encountered in this PR is that there is not a good way
to pass a filter expression via JNI. Expressions have a defined
> Would that keep compatibility with existing files produces by Parquet C++?
Changing the lz4 implementation to be compatible with parquet-mr/hadoop
would break compatibility with any existing files that were written by
Parquet C++ using lz4 compression. I believe that it is not possible to
reliab
The Parquet format specification is ambiguous about the exact details of
LZ4 compression. However, the *de facto* reference implementation in Java
(parquet-mr) uses the Hadoop LZ4 codec.
I think that it is important for Parquet c++ to have compatibility and
feature parity with parquet-mr when poss