Re: [VOTE] Release Apache Arrow 12.0.1 - RC1

2023-06-12 Thread Jacob Wujciak-Jens
+1 non-binding, verified Go and C++ on manjaro On Mon, Jun 12, 2023 at 6:17 PM Raúl Cumplido wrote: > +1 non-binding > > I've run the following: > > TEST_DEFAULT=0 TEST_SOURCE=1 > ARROW_CMAKE_OPTIONS="-DBoost_NO_BOOST_CMAKE=ON" > dev/release/verify-release-candidate.sh 12.0.1 1 > TEST_DEFAULT=0

Re: [ANNOUNCE] New Arrow PMC member: Jie Wen (jakevin / jackwener)

2023-06-12 Thread Raúl Cumplido
Congratulations Jie!!! El lun, 12 jun 2023, 20:35, Matt Topol escribió: > Congrats Jie! > > On Sun, Jun 11, 2023 at 9:20 AM Andrew Lamb wrote: > > > The Project Management Committee (PMC) for Apache Arrow has invited > > Jie Wen to become a PMC member and we are pleased to announce > > that

Re: [Python] Dataset scanner fragment skip options.

2023-06-12 Thread Jerald Alex
hi Weston, Thank you so much for taking the time to respond. Really appreciate it. I'm using parquet files. So would it be possible to elaborate the below.? I cannot seem to find any documentation for ParquetFileFragment. "there may even be a way to skip row groups by creating a fragment per

Re: [ANNOUNCE] New Arrow PMC member: Jie Wen (jakevin / jackwener)

2023-06-12 Thread Matt Topol
Congrats Jie! On Sun, Jun 11, 2023 at 9:20 AM Andrew Lamb wrote: > The Project Management Committee (PMC) for Apache Arrow has invited > Jie Wen to become a PMC member and we are pleased to announce > that Jie Wen has accepted. > > Congratulations and welcome! >

Re: Converting Pandas DataFrame <-> Struct Array?

2023-06-12 Thread Spencer Nelson
Here's a one-liner that does it, but I expect it's moderately slower than the RecordBatch version: pa.array(df.itertuples(index=False), type=pa.struct([pa.field(col, pa.from_numpy_dtype(df.dtypes[col])) for col in df.columns])) Most of the complexity is in the 'type'. It's less scary than it

Re: Converting Pandas DataFrame <-> Struct Array?

2023-06-12 Thread Li Jin
Gentle bump. Not a big deal if I need to use the API above to do so, but bump in case someone has a better way. On Fri, Jun 9, 2023 at 4:34 PM Li Jin wrote: > Hello, > > I am looking for the best ways for converting Pandas DataFrame <-> Struct > Array. > > Currently I have: > >

Re: [VOTE] Release Apache Arrow 12.0.1 - RC1

2023-06-12 Thread Raúl Cumplido
+1 non-binding I've run the following: TEST_DEFAULT=0 TEST_SOURCE=1 ARROW_CMAKE_OPTIONS="-DBoost_NO_BOOST_CMAKE=ON" dev/release/verify-release-candidate.sh 12.0.1 1 TEST_DEFAULT=0 TEST_WHEELS=1 TEST_WHEEL_PLATFORM_TAGS="manylinux_2_17_x86_64.manylinux2014_x86_64"

Re: [Python] Dataset scanner fragment skip options.

2023-06-12 Thread Weston Pace
> I would like to know if it is possible to skip the specific set of batches, > for example, the first 10 batches and read from the 11th Batch. This sort of API does not exist today. You can skip files by making a smaller dataset with fewer files (and I think, with parquet, there may even be a

[Python] Dataset scanner fragment skip options.

2023-06-12 Thread Jerald Alex
Hi Experts, I have been using dataset.scanner to read the data with specific filter conditions and batch_size of 1000 to read the data. ds.scanner(filter=pc.field('a') != 3, batch_size=1000).to_batches() I would like to know if it is possible to skip the specific set of batches, for example,

Re: [VOTE] Release Apache Arrow 12.0.1 - RC1

2023-06-12 Thread Dewey Dunnington
+1! I ran TEST_DEFAULT=0 TEST_CPP=1 ARROW_CMAKE_OPTIONS="-DProtobuf_SOURCE=BUNDLED -DARROW_FLIGHT=OFF -DARROW_FLIGHT_SQL=OFF" ./verify-release-candidate.sh ...on MacOS Ventura aarch64. (Flight disabled because of protobuf issues). On Mon, Jun 12, 2023 at 10:28 AM Joris Van den Bossche wrote:

Re: [VOTE] Release Apache Arrow 12.0.1 - RC1

2023-06-12 Thread Joris Van den Bossche
+1 (verified source release on Ubuntu 20.04, using conda) On Sat, 10 Jun 2023 at 22:31, Sutou Kouhei wrote: > > +1 > > I ran the followings on Debian GNU/Linux sid: > > * TEST_DEFAULT=0 \ > TEST_SOURCE=1 \ > LANG=C \ > TZ=UTC \ > CUDAToolkit_ROOT=/usr \ >

Re: [VOTE][RUST][DataFusion] Release DataFusion Python Bindings 26.0.0 RC1

2023-06-12 Thread Jeremy Dyer
+1 (non-binding) Verified using verification script on Ubuntu 22 x86_64 machine Thanks for getting the release together Andy -Jeremy Dyer Get Outlook for iOS From: vin jake Sent: Monday, June 12, 2023 3:59:25 AM To: dev@arrow.apache.org

Re: [VOTE][RUST][DataFusion] Release DataFusion Python Bindings 26.0.0 RC1

2023-06-12 Thread vin jake
+1 (binding) Verified on my M1 Macbook. Thanks Andy. On Mon, Jun 12, 2023 at 4:10 AM Andy Grove wrote: > Hi, > > I would like to propose a release of Apache Arrow DataFusion Python > Bindings, > version 26.0.0. > > This release candidate is based on commit: >

[RESULT][VOTE][Julia] Release Apache Arrow Julia 2.6.2 RC1

2023-06-12 Thread Sutou Kouhei
Hi, The vote carries with 3 +1 binding votes, 1 +1 non-binding vote and no -1 votes. I'll publish this release to https://dist.apache.org/repos/dist/release/arrow/ . Thanks, -- kou In <20230610.044039.1468288593045013710@clear-code.com> "[VOTE][Julia] Release Apache Arrow Julia 2.6.2

Re: [VOTE][Julia] Release Apache Arrow Julia 2.6.2 RC1

2023-06-12 Thread Nic Crane
+1 (Ubuntu 22.04) On Mon, 12 Jun 2023 at 01:50, Sutou Kouhei wrote: > Hi, > > Could PMC members vote on this? > > Thanks, > -- > kou > > In <20230610.044039.1468288593045013710@clear-code.com> > "[VOTE][Julia] Release Apache Arrow Julia 2.6.2 RC1" on Sat, 10 Jun 2023 > 04:40:39 +0900