Re: compressed feather v2 "slicing from the middle"

2022-09-21 Thread Jorge Cardoso Leitão
Hi, AFAIK compressed IPC arrow files do not support random access (like uncompressed counterparts) - you need to decompress the whole batch (or at least the columns you need). A "RecordBatch" is the compression unit of the file. Think of it like a parquet file whose every row group has a single

Re: compressed feather v2 "slicing from the middle"

2022-09-21 Thread John Muehlhausen
Why aren't all the compressed batches the chunk size I specified in write_feather (700)? How can I know which batch my slice resides in if this is not a constant? Using pyarrow 9.0.0 This file contains 1.5 billion rows. I need a way to know where to look for, say, [780567127,922022522)

Re: compressed feather v2 "slicing from the middle"

2022-09-21 Thread John Muehlhausen
The following seems like good news... like I should be able to decompress just one column of a RecordBatch in the middle of a compressed feather v2 file. Is there a Python API for this kind of access? C++? /// Provided for forward compatibility in case we need to support different ///

compressed feather v2 "slicing from the middle"

2022-09-21 Thread John Muehlhausen
``Internal structure supports random access and slicing from the middle. This also means that you can read a large file chunk by chunk without having to pull the whole thing into memory.'' https://ursalabs.org/blog/2020-feather-v2/ For a compressed v2 file, can I decompress just one column of a

RE: [ANNOUNCE] New Arrow PMC member: Raphael Taylor-Davies

2022-09-21 Thread Matthew Turner
Congratulations, Raphael! -Original Message- From: Matthew Topol Sent: Wednesday, September 21, 2022 4:17 PM To: dev@arrow.apache.org Subject: Re: [ANNOUNCE] New Arrow PMC member: Raphael Taylor-Davies Congrats!! On Tue, Sep 20, 2022 at 7:23 PM Wes McKinney wrote: > Congratulations!

Re: [VOTE] Adopt ADBC database client connectivity specification

2022-09-21 Thread Ashish
+1 (non-binding) On Wed, Sep 21, 2022 at 1:16 PM Matthew Topol wrote: > +1 (non-binding)! > > On Wed, Sep 21, 2022 at 1:34 PM Dewey Dunnington > wrote: > > > +1! (non-binding) > > > > On Wed, Sep 21, 2022 at 12:47 PM Gavin Ray > wrote: > > > > > +1 (non-binding/I'm not important) > > > > > >

Re: [ANNOUNCE] New Arrow PMC member: Raphael Taylor-Davies

2022-09-21 Thread Matthew Topol
Congrats!! On Tue, Sep 20, 2022 at 7:23 PM Wes McKinney wrote: > Congratulations! > > On Tue, Sep 20, 2022 at 12:37 PM Ashish wrote: > > > > Congratulations !! > > > > On Tue, Sep 20, 2022 at 10:17 AM Ian Joiner > wrote: > > > > > Congrats Raphael! > > > > > > On Mon, Sep 19, 2022 at 9:56 PM

Re: [VOTE] Adopt ADBC database client connectivity specification

2022-09-21 Thread Matthew Topol
+1 (non-binding)! On Wed, Sep 21, 2022 at 1:34 PM Dewey Dunnington wrote: > +1! (non-binding) > > On Wed, Sep 21, 2022 at 12:47 PM Gavin Ray wrote: > > > +1 (non-binding/I'm not important) > > > > On Wed, Sep 21, 2022 at 11:40 AM David Li wrote: > > > > > Hello, > > > > > > We have been

Re: Register custom ExecNode factories

2022-09-21 Thread Li Jin
Thanks Weston - I have not rewritten Python/C++ bridge so this is also new to me and I am hoping to get some information from people that know how to do this. I will leave this open for other people to offer help :) and will ask some internal folks as well. Will circle back on this. On Tue, Sep

Re: Correct way to collect results from an Acero query

2022-09-21 Thread Li Jin
Oh thanks Weston I am glad not the only one - I will wait for the PR and will try to pull that in then. Thanks, Li On Wed, Sep 21, 2022 at 1:54 PM Weston Pace wrote: > Funny you should mention this, I just ran into the same problem :). > We use StartAndCollect so much in our unit tests that

Re: Correct way to collect results from an Acero query

2022-09-21 Thread Weston Pace
Funny you should mention this, I just ran into the same problem :). We use StartAndCollect so much in our unit tests that there must be some usefulness there. You are correct that it is not an API that can be used outside of tests. I added utility methods DeclarationToTable,

Re: [VOTE] Adopt ADBC database client connectivity specification

2022-09-21 Thread Dewey Dunnington
+1! (non-binding) On Wed, Sep 21, 2022 at 12:47 PM Gavin Ray wrote: > +1 (non-binding/I'm not important) > > On Wed, Sep 21, 2022 at 11:40 AM David Li wrote: > > > Hello, > > > > We have been discussing [1] standard interfaces for Arrow-based database > > access and have been working on

Re: [VOTE] Adopt ADBC database client connectivity specification

2022-09-21 Thread Gavin Ray
+1 (non-binding/I'm not important) On Wed, Sep 21, 2022 at 11:40 AM David Li wrote: > Hello, > > We have been discussing [1] standard interfaces for Arrow-based database > access and have been working on implementations of the proposed interfaces > [2], all under the name "ADBC". This proposal

[VOTE] Adopt ADBC database client connectivity specification

2022-09-21 Thread David Li
Hello, We have been discussing [1] standard interfaces for Arrow-based database access and have been working on implementations of the proposed interfaces [2], all under the name "ADBC". This proposal aims to provide a unified client abstraction across Arrow-native database protocols (like

Correct way to collect results from an Acero query

2022-09-21 Thread Li Jin
Hello! I am testing a custom data source node I added to Acero and found myself in need of collecting the results from an Acero query into memory. Searching the codebase, I found "StartAndCollect" is what many of the tests and benchmarks are using, but I am not sure if that is the public API to

RE: [ANNOUNCE] New Arrow committer: Dan Harris

2022-09-21 Thread Matthew Turner
Congratulations, Dan! -Original Message- From: Yijie Shen Sent: Wednesday, September 21, 2022 9:12 AM To: dev@arrow.apache.org Subject: Re: [ANNOUNCE] New Arrow committer: Dan Harris Congratulations, Dan! On Wed, Sep 21, 2022 at 3:30 AM Weston Pace wrote: > Congratulations Dan > >

Re: [ANNOUNCE] New Arrow committer: Dan Harris

2022-09-21 Thread Yijie Shen
Congratulations, Dan! On Wed, Sep 21, 2022 at 3:30 AM Weston Pace wrote: > Congratulations Dan > > On Tue, Sep 20, 2022 at 10:52 AM David Li wrote: > > > > Congrats, Dan! > > > > On Tue, Sep 20, 2022, at 13:43, L. C. Hsieh wrote: > > > Congratulations! > > > > > > On Tue, Sep 20, 2022 at 10:38