Hi,
AFAIK compressed IPC arrow files do not support random access (like
uncompressed counterparts) - you need to decompress the whole batch (or at
least the columns you need). A "RecordBatch" is the compression unit of the
file. Think of it like a parquet file whose every row group has a single
Why aren't all the compressed batches the chunk size I specified in
write_feather (700)? How can I know which batch my slice resides in if
this is not a constant? Using pyarrow 9.0.0
This file contains 1.5 billion rows. I need a way to know where to look
for, say, [780567127,922022522)
The following seems like good news... like I should be able to decompress
just one column of a RecordBatch in the middle of a compressed feather v2
file. Is there a Python API for this kind of access? C++?
/// Provided for forward compatibility in case we need to support different
///
``Internal structure supports random access and slicing from the middle.
This also means that you can read a large file chunk by chunk without
having to pull the whole thing into memory.''
https://ursalabs.org/blog/2020-feather-v2/
For a compressed v2 file, can I decompress just one column of a
Congratulations, Raphael!
-Original Message-
From: Matthew Topol
Sent: Wednesday, September 21, 2022 4:17 PM
To: dev@arrow.apache.org
Subject: Re: [ANNOUNCE] New Arrow PMC member: Raphael Taylor-Davies
Congrats!!
On Tue, Sep 20, 2022 at 7:23 PM Wes McKinney wrote:
> Congratulations!
+1 (non-binding)
On Wed, Sep 21, 2022 at 1:16 PM Matthew Topol
wrote:
> +1 (non-binding)!
>
> On Wed, Sep 21, 2022 at 1:34 PM Dewey Dunnington
> wrote:
>
> > +1! (non-binding)
> >
> > On Wed, Sep 21, 2022 at 12:47 PM Gavin Ray
> wrote:
> >
> > > +1 (non-binding/I'm not important)
> > >
> > >
Congrats!!
On Tue, Sep 20, 2022 at 7:23 PM Wes McKinney wrote:
> Congratulations!
>
> On Tue, Sep 20, 2022 at 12:37 PM Ashish wrote:
> >
> > Congratulations !!
> >
> > On Tue, Sep 20, 2022 at 10:17 AM Ian Joiner
> wrote:
> >
> > > Congrats Raphael!
> > >
> > > On Mon, Sep 19, 2022 at 9:56 PM
+1 (non-binding)!
On Wed, Sep 21, 2022 at 1:34 PM Dewey Dunnington
wrote:
> +1! (non-binding)
>
> On Wed, Sep 21, 2022 at 12:47 PM Gavin Ray wrote:
>
> > +1 (non-binding/I'm not important)
> >
> > On Wed, Sep 21, 2022 at 11:40 AM David Li wrote:
> >
> > > Hello,
> > >
> > > We have been
Thanks Weston - I have not rewritten Python/C++ bridge so this is also new
to me and I am hoping to get some information from people that know how to
do this.
I will leave this open for other people to offer help :) and will ask some
internal folks as well.
Will circle back on this.
On Tue, Sep
Oh thanks Weston I am glad not the only one - I will wait for the PR and
will try to pull that in then.
Thanks,
Li
On Wed, Sep 21, 2022 at 1:54 PM Weston Pace wrote:
> Funny you should mention this, I just ran into the same problem :).
> We use StartAndCollect so much in our unit tests that
Funny you should mention this, I just ran into the same problem :).
We use StartAndCollect so much in our unit tests that there must be
some usefulness there. You are correct that it is not an API that can
be used outside of tests.
I added utility methods DeclarationToTable,
+1! (non-binding)
On Wed, Sep 21, 2022 at 12:47 PM Gavin Ray wrote:
> +1 (non-binding/I'm not important)
>
> On Wed, Sep 21, 2022 at 11:40 AM David Li wrote:
>
> > Hello,
> >
> > We have been discussing [1] standard interfaces for Arrow-based database
> > access and have been working on
+1 (non-binding/I'm not important)
On Wed, Sep 21, 2022 at 11:40 AM David Li wrote:
> Hello,
>
> We have been discussing [1] standard interfaces for Arrow-based database
> access and have been working on implementations of the proposed interfaces
> [2], all under the name "ADBC". This proposal
Hello,
We have been discussing [1] standard interfaces for Arrow-based database access
and have been working on implementations of the proposed interfaces [2], all
under the name "ADBC". This proposal aims to provide a unified client
abstraction across Arrow-native database protocols (like
Hello!
I am testing a custom data source node I added to Acero and found myself in
need of collecting the results from an Acero query into memory.
Searching the codebase, I found "StartAndCollect" is what many of the tests
and benchmarks are using, but I am not sure if that is the public API to
Congratulations, Dan!
-Original Message-
From: Yijie Shen
Sent: Wednesday, September 21, 2022 9:12 AM
To: dev@arrow.apache.org
Subject: Re: [ANNOUNCE] New Arrow committer: Dan Harris
Congratulations, Dan!
On Wed, Sep 21, 2022 at 3:30 AM Weston Pace wrote:
> Congratulations Dan
>
>
Congratulations, Dan!
On Wed, Sep 21, 2022 at 3:30 AM Weston Pace wrote:
> Congratulations Dan
>
> On Tue, Sep 20, 2022 at 10:52 AM David Li wrote:
> >
> > Congrats, Dan!
> >
> > On Tue, Sep 20, 2022, at 13:43, L. C. Hsieh wrote:
> > > Congratulations!
> > >
> > > On Tue, Sep 20, 2022 at 10:38
17 matches
Mail list logo