Re: [C++][Parquet] Handling empty files while reading Parquet files using C++

2023-07-03 Thread Gang Wu
Hi Luca, It seems to me that the problem comes from node->type_length(). It should be 0 instead of 64. Could you please check the value of column_index_ in the CheckColumn() before it throws? If you need further assistance, please create an issue on Github and it would be good to provide a file

Re: [ANNOUNCE] New Arrow committer: Kevin Gurney

2023-07-03 Thread Weston Pace
Congratulations Kevin! On Mon, Jul 3, 2023 at 5:18 PM Sutou Kouhei wrote: > On behalf of the Arrow PMC, I'm happy to announce that Kevin Gurney > has accepted an invitation to become a committer on Apache > Arrow. Welcome, and thank you for your contributions! > > -- > kou >

[ANNOUNCE] New Arrow committer: Kevin Gurney

2023-07-03 Thread Sutou Kouhei
On behalf of the Arrow PMC, I'm happy to announce that Kevin Gurney has accepted an invitation to become a committer on Apache Arrow. Welcome, and thank you for your contributions! -- kou

Re: [Python][Discuss] PyArrow Dataset as a Python protocol

2023-07-03 Thread Will Jones
Hello, After thinking about it, I think I understand the approach David Li and Ian are suggesting with respect to expressions. There will be some arguments that only PyArrow's own datasets support, but that aren't in the generic protocol. Passing PyArrow expressions to the filters argument should

[C++][Parquet] Handling empty files while reading Parquet files using C++

2023-07-03 Thread Luca Jones
Hi, I've been trying to read data from a Parquet file into a stream using the Parquet::StreamReader class for a while. The first column of my data consists of int64s - thus, I have been streaming data as follows: shared_ptr infile; PARQUET_ASSIGN_OR_THROW(infile,

Re: Webassembly?

2023-07-03 Thread Neal Richardson
Thanks, Joe. Looking forward to seeing this come together. Neal On Mon, Jul 3, 2023 at 11:29 AM Joe Marshall wrote: > Hi, > > I'm a pyodide developer amongst other things (webassembly cpython > intepreter) and I've got some PRs in progress on arrow relating to > webassembly support. I wondered

[RESULT][VOTE][RUST] Release Apache Arrow Rust 43.0.0 RC1

2023-07-03 Thread Raphael Taylor-Davies
With 5 +1 votes (5 binding) the release is approved The release is available here: https://dist.apache.org/repos/dist/release/arrow/arrow-rs-43.0.0 It has also been released to crates.io Thank you to everyone who helped verify this release Raphael On 30/06/2023 16:26, Raphael Taylor-Davies

Re: Question about large exec batch in acero

2023-07-03 Thread Ruoxi Sun
That makes perfect sense, esp. seeing the the zero-copy fashion for slicing the big input. Thanks Weston! *Rossi* Weston Pace 于2023年7月3日周一 22:33写道: > > is this overflow considered a bug? Or is large exec batch something that > should be avoided? > > This is not a bug and it is something that

Webassembly?

2023-07-03 Thread Joe Marshall
Hi, I'm a pyodide developer amongst other things (webassembly cpython intepreter) and I've got some PRs in progress on arrow relating to webassembly support. I wondered if it might be worth discussing my broader ideas for this on the list or at the biweekly development meeting? So far I have

Re: [VOTE][RUST] Release Apache Arrow Rust 43.0.0 RC1

2023-07-03 Thread vin jake
+1 (binding) Verified on M1 macbook. Thanks! On Fri, Jun 30, 2023, 23:27 Raphael Taylor-Davies wrote: > Hi, > > I would like to propose a release of Apache Arrow Rust Implementation, > version 43.0.0. > > This release candidate is based on commit: > 414235e7630d05cccf0b9f5032ebfc0858b8ae5b

Re: [VOTE][RUST] Release Apache Arrow Rust 43.0.0 RC1

2023-07-03 Thread Daniël Heres
+1 (binding) Thanks! Verified on M1 Mac Op vr 30 jun 2023 om 19:46 schreef Andrew Lamb : > +1 (binding) > > Thank you Raphael > > Verified on x86_64 mac > > > On Fri, Jun 30, 2023 at 1:12 PM L. C. Hsieh wrote: > > > +1 (binding) > > > > Verified on M1 Mac. > > > > Thanks Raphael. > > > > > >

Re: Question about large exec batch in acero

2023-07-03 Thread Weston Pace
> is this overflow considered a bug? Or is large exec batch something that should be avoided? This is not a bug and it is something that should be avoided. Some of the hash-join internals expect small batches. I actually thought the limit was 32Ki and not 64Ki because I think there may be some

Question about large exec batch in acero

2023-07-03 Thread Ruoxi Sun
Hi folks, I've encountered a bug when doing swiss join using a big exec batch, say, larger than 65535 rows, on the probe side. It turns out to be that in the algorithm, it is using `uint16_t` to represent the index within the probe exec batch (the materialize_batch_ids_buf

Re: [Python][Discuss] PyArrow Dataset as a Python protocol

2023-07-03 Thread Fokko Driesprong
Hey everyone, Chiming in here from the PyIceberg side. I would love to see the protocol as proposed in the PR. I did a small test , and it seems to be quite straightforward to implement and it brings a lot of potential.