Re: [VOTE][Format] Add experimental ArrowDeviceArray to C-Data API

2023-06-02 Thread Dewey Dunnington
I've already given my vote here, but wanted to share a proof-of-concept C implementation (== copy an arbitrary valid ArrowArray to given a suitable device implementation) of the proposed spec that includes Apple Metal [1] and could include CUDA as well (I did Metal first since Matt already worked u

Re: [VOTE][RUST] Release Apache Arrow Rust 41.0.0 RC1

2023-06-02 Thread Andrew Lamb
+1 (binding) Verified on x86_64 mac The content of this release looks very good 👌 Thank you Raphael Andrew On Fri, Jun 2, 2023 at 2:59 PM L. C. Hsieh wrote: > +1 (binding) > > Verified on M1 Mac. > > Thanks Raphael. > > On Fri, Jun 2, 2023 at 11:55 AM Raphael Taylor-Davies > wrote: > > > > H

Re: [VOTE][RUST] Release Apache Arrow Rust 41.0.0 RC1

2023-06-02 Thread L. C. Hsieh
+1 (binding) Verified on M1 Mac. Thanks Raphael. On Fri, Jun 2, 2023 at 11:55 AM Raphael Taylor-Davies wrote: > > Hi, > > I would like to propose a release of Apache Arrow Rust Implementation, > version 41.0.0. > > This release candidate is based on commit: > e1badc0542ca82e2304cc3f51a9d25ea2db

Re: [VOTE][RUST] Release Apache Arrow Rust Object Store 0.6.1 RC1

2023-06-02 Thread L. C. Hsieh
+1 (binding) Verified on M1 Mac. Thanks Raphael. On Fri, Jun 2, 2023 at 11:38 AM Andrew Lamb wrote: > > +1 (binding) > > I verified the signature and ran the verification script on mac x86_64 > Thank you Raphael > > On Fri, Jun 2, 2023 at 2:23 PM Raphael Taylor-Davies > wrote: > > > Hi, > > >

[VOTE][RUST] Release Apache Arrow Rust 41.0.0 RC1

2023-06-02 Thread Raphael Taylor-Davies
Hi, I would like to propose a release of Apache Arrow Rust Implementation, version 41.0.0. This release candidate is based on commit: e1badc0542ca82e2304cc3f51a9d25ea2dbb74eb [1] The proposed release tarball and signatures are hosted at [2]. The changelog is located at [3]. Please downloa

Re: [VOTE][RUST] Release Apache Arrow Rust Object Store 0.6.1 RC1

2023-06-02 Thread Andrew Lamb
+1 (binding) I verified the signature and ran the verification script on mac x86_64 Thank you Raphael On Fri, Jun 2, 2023 at 2:23 PM Raphael Taylor-Davies wrote: > Hi, > > I would like to propose a release of Apache Arrow Rust Object > Store Implementation, version 0.6.1. > > This release candi

[VOTE][RUST] Release Apache Arrow Rust Object Store 0.6.1 RC1

2023-06-02 Thread Raphael Taylor-Davies
Hi, I would like to propose a release of Apache Arrow Rust Object Store Implementation, version 0.6.1. This release candidate is based on commit: f323097584eaa8edb1193b4fb67bccadd39594f6 [1] The proposed release tarball and signatures are hosted at [2]. The changelog is located at [3]. Plea

Re: [DISCUSS] Acero's ScanNode and Row Indexing across Scans

2023-06-02 Thread Weston Pace
That makes sense! I can see how masked reads are useful in that kind of approach too. Thanks for the explanation. On Fri, Jun 2, 2023, 8:45 AM Will Jones wrote: > > The main downside with using the mask (or any solution based on a filter > > node / filtering) is that it requires that the delet

Re: Add limit and offset to ScannerOption

2023-06-02 Thread Weston Pace
The simplest way to do this sort of paging today would be to create multiple files and then you could read as few or as many files as you want. This approach also works regardless of format. With parquet/orc you can create multiple row groups / stripes within a single file, and then partition amon

Re: [DISCUSS] Acero's ScanNode and Row Indexing across Scans

2023-06-02 Thread Will Jones
> The main downside with using the mask (or any solution based on a filter > node / filtering) is that it requires that the delete indices go into the > plan itself. So you need to first read the delete files and then create > the plan. And, if there are many deleted rows, this can be costly. Ah

Re: [DISCUSS] Acero's ScanNode and Row Indexing across Scans

2023-06-02 Thread Weston Pace
Also, for clarity, I do agree with Gang that these are both valuable features in their own right. A mask makes a lot of sense for page indices. On Fri, Jun 2, 2023 at 7:36 AM Weston Pace wrote: > > then I think the incremental cost of adding the > > positional deletes to the mask is probably lo

Re: [DISCUSS] Acero's ScanNode and Row Indexing across Scans

2023-06-02 Thread Weston Pace
> then I think the incremental cost of adding the > positional deletes to the mask is probably lower than the anti-join. Do you mean developer cost? Then yes, I agree. Although there may be some subtlety in the pushdown to connect a dataset filter to a parquet reader filter. The main downside wi

Add limit and offset to ScannerOption

2023-06-02 Thread Wenbo Hu
Hi, I'm trying to implement a data management system by python with arrow flight. The well designed dataset with filesystem makes the data management even simpler. But I'm facing a situation: reading range in a dataset. Considering a dataset stored in feather format with 1 million rows in a