[DISCUSS][Python] combine_chunks and copies

2023-10-18 Thread Spencer Nelson
pyarrow.ChunkedArray.combine_chunks is a method which is documented as "Flatten this ChunkedArray into a single non-chunked array." Incidentally, it happens to *always* copy the underlying chunk data - even if the ChunkedArray is composed of just a single contiguous chunk which could be returned

Re: [VOTE][Format] C data interface format strings for Utf8View and BinaryView

2023-10-18 Thread Jonathan Keane
+1 -Jon On Wed, Oct 18, 2023 at 2:26 PM Felipe Oliveira Carvalho < felipe...@gmail.com> wrote: > +1 > > On Wed, Oct 18, 2023 at 2:49 PM Dewey Dunnington > wrote: > > > +1! > > > > On Wed, Oct 18, 2023 at 2:14 PM Matt Topol > wrote: > > > > > > +1 > > > > > > On Wed, Oct 18, 2023 at 1:05 PM

Re: [VOTE][RUST] Release Apache Arrow Rust 48.0.0 RC2

2023-10-18 Thread Andrew Lamb
+1 (binding) -- thank you Raphael Verified on x86 Mac Hint for anyone else verifying, this is RC*2* (RC1 hit an issue[1]) Andrew [1]: https://github.com/apache/arrow-rs/pull/4950 On Wed, Oct 18, 2023 at 12:39 PM L. C. Hsieh wrote: > +1 (binding) > > Verified on M1 Mac. > > Thanks Raphael. >

Re: Apache Arrow file format

2023-10-18 Thread Antoine Pitrou
The fact that they describe Arrow and Feather as distinct formats (they're not!) with different characteristics is a bit of a bummer. Le 18/10/2023 à 22:20, Andrew Lamb a écrit : If you are looking for a more formal discussion and empirical analysis of the differences, I suggest reading "A

Re: Apache Arrow file format

2023-10-18 Thread Andrew Lamb
If you are looking for a more formal discussion and empirical analysis of the differences, I suggest reading "A Deep Dive into Common Open Formats for Analytical DBMSs" [1], a VLDB 2023 (runner up best paper!) that compares and contrasts Arrow, Parquet, ORC and Feather file formats. [1]

Re: [VOTE][Format] C data interface format strings for Utf8View and BinaryView

2023-10-18 Thread Felipe Oliveira Carvalho
+1 On Wed, Oct 18, 2023 at 2:49 PM Dewey Dunnington wrote: > +1! > > On Wed, Oct 18, 2023 at 2:14 PM Matt Topol wrote: > > > > +1 > > > > On Wed, Oct 18, 2023 at 1:05 PM Antoine Pitrou > wrote: > > > > > +1 > > > > > > Le 18/10/2023 à 19:02, Benjamin Kietzman a écrit : > > > > Hello all, > >

Re: [VOTE][Format] C data interface format strings for Utf8View and BinaryView

2023-10-18 Thread Dewey Dunnington
+1! On Wed, Oct 18, 2023 at 2:14 PM Matt Topol wrote: > > +1 > > On Wed, Oct 18, 2023 at 1:05 PM Antoine Pitrou wrote: > > > +1 > > > > Le 18/10/2023 à 19:02, Benjamin Kietzman a écrit : > > > Hello all, > > > > > > I propose "vu" and "vz" as format strings for the Utf8View and > > > BinaryView

Re: [VOTE][Format] C data interface format strings for Utf8View and BinaryView

2023-10-18 Thread Matt Topol
+1 On Wed, Oct 18, 2023 at 1:05 PM Antoine Pitrou wrote: > +1 > > Le 18/10/2023 à 19:02, Benjamin Kietzman a écrit : > > Hello all, > > > > I propose "vu" and "vz" as format strings for the Utf8View and > > BinaryView types in the Arrow C data interface [1]. > > > > The vote will be open for at

Re: [VOTE][Format] C data interface format strings for Utf8View and BinaryView

2023-10-18 Thread Antoine Pitrou
+1 Le 18/10/2023 à 19:02, Benjamin Kietzman a écrit : Hello all, I propose "vu" and "vz" as format strings for the Utf8View and BinaryView types in the Arrow C data interface [1]. The vote will be open for at least 72 hours. [ ] +1 - I'm in favor of these new C data format strings [ ] +0 [ ]

[VOTE][Format] C data interface format strings for Utf8View and BinaryView

2023-10-18 Thread Benjamin Kietzman
Hello all, I propose "vu" and "vz" as format strings for the Utf8View and BinaryView types in the Arrow C data interface [1]. The vote will be open for at least 72 hours. [ ] +1 - I'm in favor of these new C data format strings [ ] +0 [ ] -1 - I'm against adding these new format strings

Re: [VOTE][RUST] Release Apache Arrow Rust 48.0.0 RC2

2023-10-18 Thread L. C. Hsieh
+1 (binding) Verified on M1 Mac. Thanks Raphael. On Wed, Oct 18, 2023 at 6:59 AM Raphael Taylor-Davies wrote: > > Hi, > > I would like to propose a release of Apache Arrow Rust Implementation, > version 48.0.0 *RC2*. > > Please note that there were issues with the first release candidate that

Re: Apache Arrow file format

2023-10-18 Thread Raphael Taylor-Davies
To further what others have already mentioned, the IPC file format is primarily optimised for IPC use-cases, that is exchanging the entire contents between processes. It is relatively inexpensive to encode and decode, and supports all arrow datatypes, making it ideal for things like

[VOTE][RUST] Release Apache Arrow Rust 48.0.0 RC2

2023-10-18 Thread Raphael Taylor-Davies
Hi, I would like to propose a release of Apache Arrow Rust Implementation, version 48.0.0 *RC2*. Please note that there were issues with the first release candidate that required cutting a second. This release candidate is based on commit: 51ac6fec8755147cd6b1dfe7d76bfdcfacad0463 [1]

Re: Help regarding setting up the r package in arrow apache

2023-10-18 Thread Jonathan Keane
For development of the R package with docker containers, the link [1] that Nic sent in this same thread is the place to go. In addition to that docker-focused one, there are a handful of others that might prove useful to you in getting your development environment setup [2]. If you run into any

Re: Apache Arrow file format

2023-10-18 Thread Dewey Dunnington
Plenty of opinions here already, but I happen to think that IPC streams and/or Arrow File/Feather are wildly underutilized. For the use-case where you're mostly just going to read an entire file into R or Python it's a bit faster (and far superior to a CSV or pickling or .rds files in R). >

Re: Help regarding setting up the r package in arrow apache

2023-10-18 Thread Divyansh Khatri
I am trying to contribute to the arrow project.so i am trying to setup the project on locally. On Tue, 17 Oct 2023 at 05:14, Bryce Mecum wrote: > That error makes it look like you're running `docker compose up` from > the root of the Arrow source tree which is likely not what you want. > Are

Re: [ANNOUNCE] New Arrow committer: Curt Hagenlocher

2023-10-18 Thread Alenka Frim
Congrats and welcome Curt! On Tue, Oct 17, 2023 at 3:06 PM Joris Van den Bossche < jorisvandenboss...@gmail.com> wrote: > Welcome to the team, Curt! > > On Mon, 16 Oct 2023 at 23:17, Curt Hagenlocher > wrote: > > > > Thanks, all! > > > > On Mon, Oct 16, 2023 at 9:19 AM Dane Pitkin > > > wrote: