Re: [ANNOUNCE] New Arrow committer: Will Ayd

2024-10-02 Thread Joris Van den Bossche
Congratulations Will, and we are happy to have you! On Wed, 2 Oct 2024 at 00:11, Benjamin Kietzman wrote: > > Congratulations, Will! > > On Tue, Oct 1, 2024 at 3:07 PM Weston Pace wrote: > > > Congratulations Will > > > > On Tue, Oct 1, 2024, 2:25 PM Bryce Mecum wrote: > > > > > Congrats Will!

[DISCUSS] Monorepo GitHub workflow: allow one issue with multiple PRs

2024-09-11 Thread Joris Van den Bossche
Hi all, This is a discussion specifically for the GitHub development workflow we use in the monorepo, i.e. https://github.com/apache/arrow/ We have the unwritten(?) (but implicitly implied by our tooling) rule that we always should have one issue for one PR to close that issue. I would like to di

Re: [VOTE] Allow Decimal32 and Decimal64 bitwidths in Arrow Format

2024-09-05 Thread Joris Van den Bossche
+1 (binding) On Fri, 6 Sept 2024 at 03:57, Gang Wu wrote: > > +1 (non-binding) > > On Fri, Sep 6, 2024 at 3:57 AM Sutou Kouhei wrote: > > > +1 (binding) > > > > In > > "[VOTE] Allow Decimal32 and Decimal64 bitwidths in Arrow Format" on Wed, > > 4 Sep 2024 17:17:49 -0400, > > Matt Topol wro

Re: [VOTE] Split Go release process

2024-08-27 Thread Joris Van den Bossche
+1 (binding) On Mon, 26 Aug 2024 at 09:56, Antoine Pitrou wrote: > > +1 (binding) > > Le 26/08/2024 à 04:37, Sutou Kouhei a écrit : > > Hi, > > > > I would like to propose splitting Go release process. > > > > Motivation: > > > > * We want to reduce needless major releases because major > >re

Re: [VOTE][Format] Bool8 Canonical Extension Type

2024-08-06 Thread Joris Van den Bossche
+1 (binding) On Tue, 6 Aug 2024 at 17:41, Matt Topol wrote: > > +1 (binding) > > On Tue, Aug 6, 2024 at 11:40 AM Felipe Oliveira Carvalho < > felipe...@gmail.com> wrote: > > > +1 (non-binding) > > > > -- > > Felipe > > > > On Tue, Aug 6, 2024 at 6:24 AM Gang Wu wrote: > > > > > +1 (non-binding)

Re: [VOTE][Format] Opaque canonical extension type

2024-07-24 Thread Joris Van den Bossche
+1 (binding) On Wed, 24 Jul 2024 at 07:34, David Li wrote: > > Hello, > > I'd like to propose the 'Opaque' canonical extension type. Prior discussion > can be found at [1] and the proposal and implementations for C++, Go, Java, > and Python can be found at [2]. The proposal is additionally repr

[ANNOUNCE] New Arrow committer: Dane Pitkin

2024-05-07 Thread Joris Van den Bossche
On behalf of the Arrow PMC, I'm happy to announce that Dane Pitkin has accepted an invitation to become a committer on Apache Arrow. Welcome, and thank you for your contributions! Joris

Re: [VOTE][Format] UUID canonical extension type

2024-04-30 Thread Joris Van den Bossche
+1 (binding) On Tue, 30 Apr 2024 at 19:52, Jacob Wujciak wrote: > +1 (non-binding) > > Am Di., 30. Apr. 2024 um 17:48 Uhr schrieb Weston Pace < > weston.p...@gmail.com>: > > > +1 (binding) > > > > On Tue, Apr 30, 2024 at 7:53 AM Rok Mihevc wrote: > > > > > Thanks for all the reviews and comment

Re: [VOTE] Release Apache Arrow 16.0.0 - RC0

2024-04-17 Thread Joris Van den Bossche
+1 (binding) Tested source with conda on Ubuntu On Wed, 17 Apr 2024 at 16:28, Vibhatha Abeykoon wrote: > > I executed the following > > # Verifying C++ > > ```bash > TEST_DEFAULT=0 TEST_CPP=1 ./verify-release-candidate.sh 16.0.0 0 > ``` > > # Verifying C++ and Python > > ```bash > TEST_DEFAULT=0

Re: [ANNOUNCE] New Arrow committer: Sarah Gilmore

2024-04-11 Thread Joris Van den Bossche
Congrats! On Thu, 11 Apr 2024 at 22:56, Sarah Gilmore wrote: > > Thank you everyone! It's been awesome working with everyone and look > forwarding to continuing to do so! 😄 > > From: Ian Cook > Sent: Thursday, April 11, 2024 2:43 PM > To: dev@arrow.apache.org >

Re: [DISCUSS] Versioning and releases for apache/arrow components

2024-04-09 Thread Joris Van den Bossche
I am also in favor of this idea in general and in the principle, but (somewhat repeating others) I think we should be aware that this will create _more_ work overall for releasing (refactoring release scripts (at least initially), deciding which version to use for which component, etc), and not les

[DISCUSS] Expanding the Arrow PyCapsule Protocol with (non-CPU) Device support

2024-03-26 Thread Joris Van den Bossche
Hi all, Last year, we defined a protocol exposing the C Data Interface (schema, array and stream) in Python through PyCapsule objects and dunder methods `__arrow_c_schema/array/stream__` (https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html). A bit earlier last year, we als

Re: [INFO] Arrow 16.0.0 feature freeze - 8th April

2024-03-14 Thread Joris Van den Bossche
On Thu, 14 Mar 2024 at 17:28, Adam Lippai wrote: > > Pandas and NumPy will have major releases in the next month or so. Tracking > each other’s timelines might help avoiding unexpected breaks. > Yes, we are aware of that. Here is the issue for numpy 2.0 compatibility: https://github.com/apache/ar

Re: [VOTE] Move Arrow DataFusion Subproject to new Top Level Apache Project

2024-03-01 Thread Joris Van den Bossche
+1 (binding) On Fri, 1 Mar 2024 at 22:18, Sutou Kouhei wrote: > > +1 > > In > "[VOTE] Move Arrow DataFusion Subproject to new Top Level Apache Project" > on Fri, 1 Mar 2024 06:33:08 -0500, > Andrew Lamb wrote: > > > Hello, > > > > As we have discussed[1][2] I would like to vote on the prop

Re: [VOTE] Release Apache Arrow 14.0.2 - RC3

2023-12-14 Thread Joris Van den Bossche
+1 Successfully verified C++ and Python source and Python wheels on Ubuntu 20.04. On Wed, 13 Dec 2023 at 22:40, Raúl Cumplido wrote: > > Hi, > > A couple of minor nits for the release verification. > > The PR with the verification tasks [1] shows a couple of binary > verification failures. > > "

Re: [ANNOUNCE] New Arrow committer: Felipe Oliveira Carvalho

2023-12-11 Thread Joris Van den Bossche
Congrats Felipe! ;) On Mon, 11 Dec 2023 at 05:41, Alenka Frim wrote: > > Congratulations Felipe!! > > On Fri, Dec 8, 2023 at 12:25 PM Felipe Oliveira Carvalho < > felipe...@gmail.com> wrote: > > > Thank you everyone! > > > > -- > > Felipe > > github.com/felipecrv > > > > On Thu, Dec 7, 2023 at 11

Re: [ANNOUNCE] New Arrow committer: James Duong

2023-11-15 Thread Joris Van den Bossche
Congrats! On Thu, 16 Nov 2023 at 08:44, Sutou Kouhei wrote: > > On behalf of the Arrow PMC, I'm happy to announce that James Duong > has accepted an invitation to become a committer on Apache > Arrow. Welcome, and thank you for your contributions! > > -- > kou > >

Re: [ANNOUNCE] New Arrow PMC member: Raúl Cumplido

2023-11-13 Thread Joris Van den Bossche
Congrats, and thanks for all the releases you've already managed! Joris On Tue, 14 Nov 2023 at 08:15, Alenka Frim wrote: > > Yay! Congratulations Raul!!! > > On Tue, Nov 14, 2023 at 6:33 AM Vibhatha Abeykoon > wrote: > > > Congratulations Raúl !!! > > > > On Tue, Nov 14, 2023 at 10:54 AM wish m

Re: [ANNOUNCE] New Arrow committer: Xuwei Fu

2023-10-26 Thread Joris Van den Bossche
Congrats! On Wed, 25 Oct 2023 at 08:23, Ian Joiner wrote: > > Congrats! > > On Mon, Oct 23, 2023 at 2:33 AM Sutou Kouhei wrote: > > > On behalf of the Arrow PMC, I'm happy to announce that Xuwei Fu > > has accepted an invitation to become a committer on Apache > > Arrow. Welcome, and thank you f

Re: [VOTE][Format] C data interface format strings for Utf8View and BinaryView

2023-10-19 Thread Joris Van den Bossche
+1 On Wed, 18 Oct 2023 at 23:33, Jonathan Keane wrote: > > +1 > > -Jon > > > On Wed, Oct 18, 2023 at 2:26 PM Felipe Oliveira Carvalho < > felipe...@gmail.com> wrote: > > > +1 > > > > On Wed, Oct 18, 2023 at 2:49 PM Dewey Dunnington > > wrote: > > > > > +1! > > > > > > On Wed, Oct 18, 2023 at 2:1

Re: [ANNOUNCE] New Arrow committer: Curt Hagenlocher

2023-10-17 Thread Joris Van den Bossche
Welcome to the team, Curt! On Mon, 16 Oct 2023 at 23:17, Curt Hagenlocher wrote: > > Thanks, all! > > On Mon, Oct 16, 2023 at 9:19 AM Dane Pitkin > wrote: > > > Congrats Curt! > > > > On Mon, Oct 16, 2023 at 12:00 PM Kevin Gurney > > > > wrote: > > > > > Congratulations, Curt! > > > ___

Re: [ANNOUNCE] New Arrow PMC member: Jonathan Keane

2023-10-14 Thread Joris Van den Bossche
Congratulations! On Sat, 14 Oct 2023 at 20:02, Matt Topol wrote: > > Congrats Jon!!! > > On Sat, Oct 14, 2023, 1:42 PM David Li wrote: > > > Congrats Jon! > > > > On Sat, Oct 14, 2023, at 13:25, Ian Cook wrote: > > > Congratulations Jonathan! > > > > > > On Sat, Oct 14, 2023 at 13:24 Andrew Lamb

Re: [Vote][Format] (new proposal) C data interface format string for ListView and LargeListView arrays

2023-10-07 Thread Joris Van den Bossche
+1 On Sat, 7 Oct 2023 at 10:44, Antoine Pitrou wrote: > > > +1 from me. > > But I also reiterate my plea that these existing parsers get fixed so as > to entirely validate the format string instead of stopping early. > > Regards > > Antoine. > > > Le 06/10/2023 à 23:26, Felipe Oliveira Carvalho a

Re: [VOTE][Format] Variable shape tensor canonical extension type

2023-10-06 Thread Joris Van den Bossche
Worth noting that here were some minor changes made to the spec while the vote was active: - The "uniform_dimensions" metadata key was removed, since this can also be inferred from the "uniform_shape" information - The shape of non-constant dimensions in the "uniform_shape" entry is now represente

Re: [VOTE][Format] Add Utf8View Arrays to Arrow Format

2023-08-22 Thread Joris Van den Bossche
+1 On Mon, 21 Aug 2023 at 19:33, Weston Pace wrote: > > +1 > > Thanks to all for the discussion and thanks to Ben for all of the great > work. > > > On Mon, Aug 21, 2023 at 9:16 AM wish maple wrote: > > > +1 (non-binding) > > > > It would help a lot when processing UTF-8 related data! > > > > Xu

Re: Do we need CODEOWNERS ?

2023-07-04 Thread Joris Van den Bossche
I think it can be useful in certain cases, where the selection is specific enough (for example if all Go related PRs is not too much for Matt, this features sounds useful for him. I can also imagine if you are working on flight, just getting notifications for changes to the flight-related files mig

Re: [ANNOUNCE] New Arrow committer: Kevin Gurney

2023-07-04 Thread Joris Van den Bossche
Congrats Kevin! On Tue, 4 Jul 2023 at 13:47, David Li wrote: > > Welcome Kevin! > > On Tue, Jul 4, 2023, at 05:55, Raúl Cumplido wrote: > > Congratulations Kevin!!! > > > > El mar, 4 jul 2023 a las 3:32, Weston Pace () > > escribió: > >> > >> Congratulations Kevin! > >> > >> On Mon, Jul 3, 2023

Re: [ANNOUNCE] New Arrow PMC member: Dewey Dunnington

2023-06-23 Thread Joris Van den Bossche
Congrats Dewey! On Fri, 23 Jun 2023 at 16:54, Jacob Wujciak-Jens wrote: > > Well deserved! Congratulations Dewey! > > Ian Cook schrieb am Fr., 23. Juni 2023, 16:32: > > > Congratulations Dewey! > > > > On Fri, Jun 23, 2023 at 10:03 AM Matt Topol > > wrote: > > > > > > Congrats Dewey!! > > > > >

Re: [Parquet C++] Plan to bump default write version from 2.4 -> 2.6 (include nanoseconds LogicalType)

2023-06-15 Thread Joris Van den Bossche
t; Ian > > On Thu, Jun 15, 2023 at 12:25 PM Joris Van den Bossche > wrote: > > > > Hi all, > > > > Bringing up https://github.com/apache/arrow/issues/35746 to the > > mailing list: this issue proposes to bump the default Parquet version > > we use for writin

[Parquet C++] Plan to bump default write version from 2.4 -> 2.6 (include nanoseconds LogicalType)

2023-06-15 Thread Joris Van den Bossche
Hi all, Bringing up https://github.com/apache/arrow/issues/35746 to the mailing list: this issue proposes to bump the default Parquet version we use for writing to Parquet files in the C++ library (and in the various bindings including pyarrow and R arrow) from the current default of "2.4" to "2.6

Re: [ANNOUNCE] New Arrow PMC member: Jie Wen (jakevin / jackwener)

2023-06-13 Thread Joris Van den Bossche
Congratulations! On Mon, 12 Jun 2023 at 22:00, Raúl Cumplido wrote: > > Congratulations Jie!!! > > El lun, 12 jun 2023, 20:35, Matt Topol escribió: > > > Congrats Jie! > > > > On Sun, Jun 11, 2023 at 9:20 AM Andrew Lamb wrote: > > > > > The Project Management Committee (PMC) for Apache Arrow ha

Re: [Python] Dataset scanner fragment skip options.

2023-06-13 Thread Joris Van den Bossche
On Mon, 12 Jun 2023 at 21:30, Jerald Alex wrote: > > hi Weston, > > Thank you so much for taking the time to respond. Really appreciate it. > > I'm using parquet files. So would it be possible to elaborate the below.? I > cannot seem to find any documentation for ParquetFileFragment. > > "there ma

Re: Converting Pandas DataFrame <-> Struct Array?

2023-06-13 Thread Joris Van den Bossche
I think your original code roundtripping through RecordBatch (`pa.RecordBatch.from_pandas(df).to_struct_array()`) is the best option at the moment. The RecordBatch<->StructArray part is a cheap (zero-copy) conversion, and by using RecordBatch.from_pandas, you can rely on all pandas<->arrow conversi

Re: [VOTE] Release Apache Arrow 12.0.1 - RC1

2023-06-12 Thread Joris Van den Bossche
+1 (verified source release on Ubuntu 20.04, using conda) On Sat, 10 Jun 2023 at 22:31, Sutou Kouhei wrote: > > +1 > > I ran the followings on Debian GNU/Linux sid: > > * TEST_DEFAULT=0 \ > TEST_SOURCE=1 \ > LANG=C \ > TZ=UTC \ > CUDAToolkit_ROOT=/usr \ > ARROW_CMA

Re: [ANNOUNCE] New Arrow committer: Marco Neumann

2023-05-11 Thread Joris Van den Bossche
Congrats Marco! On Thu, 11 May 2023 at 15:05, Weston Pace wrote: > > Congratulations! > > On Thu, May 11, 2023 at 4:28 AM vin jake wrote: > > > Congratulations Marco! > > > > On Thu, May 11, 2023 at 7:18 AM Andrew Lamb wrote: > > > > > On behalf of the Arrow PMC, I'm happy to announce that Marc

Re: [ANNOUNCE] New Arrow PMC member: Matt Topol

2023-05-04 Thread Joris Van den Bossche
Congrats Matt! On Thu, 4 May 2023 at 06:31, Nic Crane wrote: > > Congratulations! > > On Thu, 4 May 2023, 05:24 Vibhatha Abeykoon, wrote: > > > Congratulations Matt! > > > > On Thu, May 4, 2023 at 7:35 AM Ian Cook wrote: > > > > > Congratulations Matt!!! > > > > > > On Wed, May 3, 2023 at 9:55 

Re: [DISCUSS][Format] Starting the draft implementation of the ArrayView array format

2023-04-26 Thread Joris Van den Bossche
On Wed, 26 Apr 2023 at 02:37, Weston Pace wrote: > > For context, there was some discussion on this back in [1]. At that time > this was called "sequence view" but I do not like that name. However, > array-view array is a little confusing. Given this is similar to list can > we go with list-vie

Re: [VOTE] Formalize how to change format

2023-04-26 Thread Joris Van den Bossche
+1 On Wed, 26 Apr 2023 at 04:18, Sutou Kouhei wrote: > > Hi, > > I've added one more note about documentation: > > We must update the corresponding documentation (files in > ``_) > too. > > https://github.com/apache/arrow/pull/35

Re: Proposal: add a bot to close PRs that haven't been updated in 30 days

2023-03-31 Thread Joris Van den Bossche
On Fri, 31 Mar 2023 at 17:38, Alessandro Molina wrote: > > .. > My question probably would be... If a PR was sitting ignored for 30 days > without anyone from the community feeling the need to review and merge it > and without its primary author feeling the need to push for getting it > merged. Is

Re: Proposal: add a bot to close PRs that haven't been updated in 30 days

2023-03-31 Thread Joris Van den Bossche
I am personally not a huge fan of auto-closing PRs. Especially not after a short period like 30 days (I think that's too short for an open source project), and we have to be careful with messaging. Very often such a PR is "stale" because it is waiting for reviews. I know we have the labels now that

Re: [ANNOUNCE] New Arrow PMC member: Will Jones

2023-03-13 Thread Joris Van den Bossche
Congrats Will! On Mon, 13 Mar 2023 at 22:01, Michel Miotto Barbosa wrote: > > Congratulations Wiil! > > A disposição | At your disposal > > Michel Miotto Barbosa > https://www.linkedin.com/in/michelmiottobarbosa/ > mmiottobarb...@gmail.com > +55 11 984 342 347 > > > > > On Mon, Mar 13, 2023 at 2:

Re: [VOTE][Format] Fixed shape tensor Canonical Extension Type

2023-03-07 Thread Joris Van den Bossche
+1 (binding) On Tue, 7 Mar 2023 at 23:35, David Li wrote: > > +1 (binding) > > Just one comment, though: since we also define a separate "Tensor" IPC > structure in Arrow, maybe we should state the relationship somewhere in the > documentation? (Even if the answer is "no relationship".) > > On

Re: [VOTE] Release Apache Arrow nanoarrow 0.1.0 - RC1

2023-03-02 Thread Joris Van den Bossche
+1 (binding) Verified on Ubuntu 20.04 It worked with conda R for me, I only needed to ensure to install a conda compiler to get it building (https://github.com/apache/arrow-nanoarrow/pull/142) On Thu, 2 Mar 2023 at 05:29, Jin Shang wrote: > > +1 (non-binding). Verified on macOS 12.5 aarch64 and

Re: [VOTE] Release Apache Arrow ADBC 0.2.0 - RC1

2023-02-21 Thread Joris Van den Bossche
On Wed, 22 Feb 2023 at 00:55, Sutou Kouhei wrote: > > Hi, > > In > "Re: [VOTE] Release Apache Arrow ADBC 0.2.0 - RC1" on Thu, 16 Feb 2023 > 09:19:50 +0100, > Joris Van den Bossche wrote: > > > current directory: > > /tmp/adbc-verification/a

Re: [VOTE] Format: Fixed shape tensor Canonical Extension Type

2023-02-21 Thread Joris Van den Bossche
On Tue, 21 Feb 2023 at 18:00, Rok Mihevc wrote: > > > > > Should we rule that `dim_names` and `permutation` are mutually exclusive? > > > > Since `dim_names` have to "map to the physical layout (row-major)" that > means permutation will always be trivial which indeed makes it unnecessary > to stor

Re: Proposal: renaming the 'master' branch to 'main'

2023-02-17 Thread Joris Van den Bossche
Also for https://github.com/apache/arrow the default branch is now renamed to "main". You will see some instructions the first time visiting the github repo since the rename, but copying them here below. You can rename the master branch on your fork as well (visiting https://github.com//arrow will

Re: [VOTE] Release Apache Arrow ADBC 0.2.0 - RC1

2023-02-16 Thread Joris Van den Bossche
On Wed, 15 Feb 2023 at 21:31, Sutou Kouhei wrote: > > > not finding /usr/bin/mkdir > > Could you show the log of this? Yes: current directory: /tmp/adbc-verification/apache-arrow-adbc-0.2.0/glib/vendor/bundle/ruby/3.1.0/gems/fiddle-1.1.1/ext/fiddle make DESTDIR\= install make: /usr/bin/mkdir: Co

Re: [VOTE] Release Apache Arrow ADBC 0.2.0 - RC1

2023-02-15 Thread Joris Van den Bossche
+1 (binding) I ran the verification on Ubuntu 20.04 using conda: $ USE_CONDA=1 ARROW_TMPDIR=/tmp/adbc-verification ./dev/release/verify-release-candidate.sh 0.2.0 1 ... Release candidate looks good! I only had a problem with installing some ruby dependencies (for GLIB tests), not finding /usr/bi

Re: [DISCUSS] Fixed shape tensor Canonical Extension Type

2023-02-14 Thread Joris Van den Bossche
On Tue, 7 Feb 2023 at 19:32, Quentin Lhoest wrote: > > Hi, > > If I remember correctly one can already pass `types_mapper` > to `pa.Table.to_pandas`, to allow Ray or HF Datasets to define > their own pandas extension types associated to the arrow > extension types. I guess this could also be used

Re: [DISCUSS] Fixed shape tensor Canonical Extension Type

2023-02-03 Thread Joris Van den Bossche
On Thu, 2 Feb 2023 at 16:06, Clark Zinzow wrote: > > Hi Alenka, > > Great work on the RFC, I'm super excited to see this! I was planning to > open a similar RFC at some point over the next few weeks, so this just > saved me a bunch of work. :D > > At the Ray project [1], we've developed two tensor

Re: [DISCUSS] The default commit message for merge button

2023-01-31 Thread Joris Van den Bossche
> > > > > > > On Tue, 31 Jan 2023 at 06:43 Antoine Pitrou > > > wrote: > > > > > > > > > >> > > > > >> +1 for "pull request title *and* description". > > > > >> > > > > >>

Re: [DISCUSS] The default commit message for merge button

2023-01-31 Thread Joris Van den Bossche
I would personally prefer to use just "Pull request title" instead of "Pull request title and description". In my experience, including the description in the commit message (as we already do) more often gives noise to the output of `git log`, and you can always go from the commit to the PR to see

Re: [ANNOUNCE] New Arrow committer: Jacob Wujciak

2022-12-15 Thread Joris Van den Bossche
Congrats! On Fri, 16 Dec 2022 at 03:22, Dewey Dunnington wrote: > > Congrats, Jacob! > > On Thu, Dec 15, 2022 at 9:26 PM Matt Topol wrote: > > > Congrats Jacob!! > > > > On Thu, Dec 15, 2022, 7:53 PM Neal Richardson > > > > wrote: > > > > > Congrats! > > > > > > On Thu, Dec 15, 2022 at 7:00 PM

Re: Current state of using GitHub issues for Arrow

2022-12-08 Thread Joris Van den Bossche
On Tue, 6 Dec 2022 at 08:41, Benson Muite wrote: > > > > For sure the exact workflows will still be further refined while starting > > to use this. And if there are things missing or unclear in the current > > practices around how to handle GitHub issues or any other feedback or > > ideas, this th

Re: [ANNOUNCE] New Arrow committer: Raúl Cumplido

2022-12-07 Thread Joris Van den Bossche
Congrats Raúl! On Wed, 7 Dec 2022 at 09:03, Raúl Cumplido wrote: > Thank you everyone! > > El mar, 6 dic 2022, 17:30, Weston Pace escribió: > > > Congratulations! > > > > On Tue, Dec 6, 2022 at 7:57 AM Nic wrote: > > > > > > Congratulations! > > > > > > On Tue, 6 Dec 2022 at 15:49, Ian Cook w

Current state of using GitHub issues for Arrow

2022-11-30 Thread Joris Van den Bossche
Hi all, There is a separate vote thread about already using GitHub issues for all new issues (and in practice users also are already doing that, since new JIRA signup has been disabled). And so to prepare for this, there has been some ongoing work to update our workflows to handle GitHub issues (b

Re: Arrow sync call November 23 at 12:00 US/Eastern, 17:00 UTC

2022-11-28 Thread Joris Van den Bossche
On Mon, 28 Nov 2022 at 12:09, Joris Van den Bossche wrote: > > FYI: Raúl also already opened a PR to update the merge script to work > with github issues: https://github.com/apache/arrow/pull/14731 (sorry, that PR is to update the github actions workflow (the bot that comments on PRs)

Re: Arrow sync call November 23 at 12:00 US/Eastern, 17:00 UTC

2022-11-28 Thread Joris Van den Bossche
FYI: Raúl also already opened a PR to update the merge script to work with github issues: https://github.com/apache/arrow/pull/14731 Personally I also think that we should consider using the merge button instead of our script (or at least re-evaluate what the script still does better, or might now

Re: [VOTE] Disable ASF Jira issue reporting

2022-11-24 Thread Joris Van den Bossche
+1 On Wed, 23 Nov 2022 at 22:37, Todd Farmer wrote: > > Hello, > > I would like to propose that issue reporting in ASF Jira for the Apache > Arrow project be disabled, and all users directed to use GitHub issues for > reporting going forward. GitHub issue reporting is now enabled [1] in > respons

Re: [VOTE] Disable ASF Jira issue reporting

2022-11-24 Thread Joris Van den Bossche
On Thu, 24 Nov 2022 at 11:31, Antoine Pitrou wrote: > > > Are all the required labels ready? I don't seem to see the components > in https://github.com/apache/arrow/labels. Also, we should curate the > existing labels and namespace all the remaining ones so that the > categories can be easily unde

Re: [DISCUSS]: Interim plan for new users reporting issues before GitHub migration

2022-11-18 Thread Joris Van den Bossche
+1 on pointing new users directly to GitHub issues. On Thu, 17 Nov 2022 at 21:15, MAURICIO ANDRES VARGAS SEPULVEDA wrote: > > Hi! > > +Inf to Nic's point > > Asking to write a Gh issue seems to be the easiest > > Get Outlook for Android > >

Re: Request for Patch release of 10.0.1

2022-11-08 Thread Joris Van den Bossche
Hey Matt, See also the "[DISCUSS] Pyarrow wheels for Python 3.11" thread, where I think the conclusion is that we need a 10.0.1 bugfix release anyway for Python as well (and in practice this means a bug fix release for the full repo). For that purpose, a 10.0.1 milestone is created, and normally a

Re: [RESULT][VOTE] Release Apache Arrow 10.0.0 - RC0

2022-10-27 Thread Joris Van den Bossche
- [?] Upload wheels/sdist to pypi Done: https://pypi.org/project/pyarrow/10.0.0 On Wed, 26 Oct 2022 at 13:56, Neal Richardson wrote: > > I will submit the R package to CRAN. > > Neal > > On Wed, Oct 26, 2022 at 4:40 AM Sutou Kouhei wrote: > > > Thanks!!! > > > > Current status: > > > > - [Done]

Re: [ANNOUNCE] New Arrow committer: Ben Baumgold

2022-10-27 Thread Joris Van den Bossche
Congratulations, and welcome Ben! On Thu, 27 Oct 2022 at 05:11, Weston Pace wrote: > > Congratulations Ben! > > On Wed, Oct 26, 2022 at 2:05 PM David Li wrote: > > > > Welcome Ben! > > > > On Wed, Oct 26, 2022, at 17:57, Ian Joiner wrote: > > > Congrats Ben! > > > > > > Ian > > > > > > On Wednes

Re: [ANNOUNCE] New Arrow committer: Bogumił Kamiński

2022-10-27 Thread Joris Van den Bossche
Congrats! On Wed, 26 Oct 2022 at 23:56, Ian Joiner wrote: > > Congrats Bogumił! > > Ian > > On Tuesday, October 25, 2022, Sutou Kouhei wrote: > > > Hi, > > > > On behalf of the Arrow PMC, I'm happy to announce that Bogumił Kamiński > > has accepted an invitation to become a committer on Apache >

Re: [ANNOUNCE] New Arrow committer: Eric Patrick Hanson

2022-10-27 Thread Joris Van den Bossche
Congratulations, and welcome Eric! On Thu, 27 Oct 2022 at 13:53, Eric Hanson wrote: > > thanks, I'm excited to join! > > On 2022/10/26 21:38:53 Sutou Kouhei wrote: > > On behalf of the Arrow PMC, I'm happy to announce that Eric Patrick Hanson > > has accepted an invitation to become a committer o

Re: [Discuss][Python] Stop publishing universal wheels?

2022-10-27 Thread Joris Van den Bossche
The cibuildwheel documentation has a note about this (https://cibuildwheel.readthedocs.io/en/stable/faq/#universal2), quoting: > The dual-architecture universal2 has a few benefits, but a key benefit to > a universal wheel is that a user can bundle these wheels into an application > and ship a sin

Re: [VOTE] Move issue tracking to GitHub Issues

2022-10-27 Thread Joris Van den Bossche
+1 On Thu, 27 Oct 2022 at 07:27, Jacob Quinn wrote: > > +1 > > On Wed, Oct 26, 2022 at 5:04 PM Neal Richardson > wrote: > > > I propose that we move issue tracking from the ASF's Jira to GitHub Issues. > > This has been discussed on [1] and [2] and there seems to be consensus. A > > number of Ar

Re: [ANNOUNCE] New Arrow PMC member: Jacob Quinn

2022-10-26 Thread Joris Van den Bossche
Congratulations! On Wed, 26 Oct 2022 at 19:25, Weston Pace wrote: > > Congrats Jacob! > > On Wed, Oct 26, 2022 at 6:10 AM Jacob Wujciak > wrote: > > > > Congrats! > > > > On Wed, Oct 26, 2022 at 8:31 AM Alenka Frim > > wrote: > > > > > Congratulations! > > > > > > On Wed, Oct 26, 2022 at 7:54 A

Re: [ANNOUNCE] New Arrow PMC member: Nicola Crane

2022-10-26 Thread Joris Van den Bossche
Congratulations Nic! ;) On Wed, 26 Oct 2022 at 19:26, Weston Pace wrote: > > Thanks Nic and congratulations! > > On Wed, Oct 26, 2022 at 8:28 AM Raúl Cumplido wrote: > > > > Thanks Nic for your contributions! > > > > El mié, 26 oct 2022 a las 17:17, Antoine Pitrou () > > escribió: > > > > > > >

Re: [DISCUSS] Move issue tracking to

2022-10-24 Thread Joris Van den Bossche
I would also support a migration of our issues to GitHub. It seems unlikely to me that another third-party tool would be good enough to make the whole experience better (given that we already use GitHub for PRs). And I agree with others that keep using JIRA is not a good option with this change. A

Re: Parser for expressions

2022-10-12 Thread Joris Van den Bossche
Another advantage of "add(x, y)" is that this matches our current string representation for expressions. Although that might give the impression that we support anything that we output as string, and so that raises the question if we want to make this explicit: if we add parsing capabilities, woul

Re: Usage of the name Feather?

2022-09-06 Thread Joris Van den Bossche
Personally, I like the "Feather" name (and actually think it could help disambiguate the file vs in-memory distinction), but I understand that we have chosen a certain path (eg ".arrow" is the official registered extension), and have to move on. However, I think we need to be very careful in how w

Re: [ANNOUNCE] New Arrow PMC member: Weston Pace

2022-09-06 Thread Joris Van den Bossche
Congrats Weston! It is great to have you on the team! On Tue, 6 Sept 2022 at 06:10, Weston Pace wrote: > Thank you everyone! I look forward to continuing to work with you all. > > On Mon, Sep 5, 2022 at 3:56 PM Kun Liu wrote: > > > > Congrats Weston!! > > > > > > Gavin Ray 于2022年9月6日周二 08:04写

Re: DISCUSS: [Format] Rules and procedures for Canonical extension types

2022-08-17 Thread Joris Van den Bossche
+1 on the overall proposal, documenting those in a central place sounds good to me. On Wed, 17 Aug 2022 at 18:10, Antoine Pitrou wrote: > > > > * The specification text to be added *must* follow these requirements > > 1) It *must* have a well-defined name starting with "ARROW:" > One remar

Re: Do we have nightly source tar ball

2022-07-07 Thread Joris Van den Bossche
We do upload nightly wheels to an alternative PyPI index (see https://arrow.apache.org/docs/python/install.html#installing-nightly-packages), at https://pypi.fury.io/arrow-nightlies/pyarrow, and it seems we actually also upload an sdist there. (it could still be more reliable to used HEAD, though,

Re: Merge a pull request with GitHub API

2022-05-25 Thread Joris Van den Bossche
+1 on the change, regardless of the discussion around merge button vs script, already starting to "merge" the PR through the script is certainly an improvement! On Tue, 24 May 2022 at 23:16, Sutou Kouhei wrote: > Hi, > > Do you have any objections to this? If nobody objects this, > I'll merge th

Re: [DISC] (Python) Dropping support for manylinux2010

2022-05-09 Thread Joris Van den Bossche
+1 as well Joris On Thu, 5 May 2022 at 22:29, Sutou Kouhei wrote: > +1 > > Our next major release will be in July or August. I think > that pypa will drop support for manylinux2010 officially > when release a next major version. > > Thanks, > -- > kou > > In > "[DISC] (Python) Dropping suppo

Re: [VOTE] Release Apache Arrow 8.0.0 - RC2

2022-05-03 Thread Joris Van den Bossche
While testing the previous RC against the dask test suite, I discovered a regression in the pyarrow dataset bindings: https://issues.apache.org/jira/browse/ARROW-16413 (and I have a PR open to fix it). I don't know if I can vote -1 for it (since it is strictly speaking not a release verification i

Re: Arrow sync call April 27 at 12:00 US/Eastern, 16:00 UTC

2022-04-27 Thread Joris Van den Bossche
As a small clarification: the zoom meeting link itself should still work for anyone to join, it's only there is no one from Voltron Data to lead the meeting / take notes (so I also won't be present today). Joris On Wed, 27 Apr 2022 at 13:05, Benson Muite wrote: > Hi, > > Can host if required, t

Re: [Docs] sphinx-tabs and build errors

2022-03-22 Thread Joris Van den Bossche
ve a fairly minimal build by default. > > Ian > > > > On Monday, March 21, 2022, Joris Van den Bossche < > jorisvandenboss...@gmail.com> wrote: > > > On Sun, 20 Mar 2022 at 07:50, Ian Joiner wrote: > > > > > Hi, > > > > > >

Re: [Docs] sphinx-tabs and build errors

2022-03-21 Thread Joris Van den Bossche
On Sun, 20 Mar 2022 at 07:50, Ian Joiner wrote: > Hi, > > I’d like to ask about how the documentation is built. I have followed the > instructions to build and install the C++ and Python libraries in my > virtual environment and then followed the instructions for building the > documentation. How

Re: [Discuss] Best practice for storing key-value metadata for Extension Types

2022-02-10 Thread Joris Van den Bossche
latively simple parameterization (to the point that > >>> protobuf/flatbuffers isn't really needed). For example, the substrate > >>> consumer PR has five extension types at the moment (e.g. uuid, > >>> varchar) and only two of them are parameterized and each

Re: [VOTE] Release Apache Arrow 7.0.0 - RC10

2022-02-09 Thread Joris Van den Bossche
It's not that simple though ;) For me, clicking on "7.0.0" works fine, and results in going to the stable docs. And the dropdown shows "7.0 (stable)" and "8.0 (dev)". You might need a hard refresh of the browser page (the list of versions might be cached). Joris On Wed, 9 Feb 2022 at 07:03, Ian

Re: [Discuss] Best practice for storing key-value metadata for Extension Types

2022-02-08 Thread Joris Van den Bossche
On Tue, 8 Feb 2022 at 17:37, Jorge Cardoso Leitão wrote: > ... > > Wrt to binary, imo the challenge is: > * we state that backward incompatible changes to the c data interface > require a new spec [1] > Note that this discussion wouldn't change anything about the C Data Interface spec itself. Th

[Discuss] Best practice for storing key-value metadata for Extension Types

2022-02-08 Thread Joris Van den Bossche
Hi all, There is currently some discussion regarding how we can formalize/document "well known" extension types (see the "[DISCUSS] New Types (Schema.fbs vs Extension Types)" thread). There is ongoing work on an extension type to store arrays / tensors by Rok ( https://issues.apache.org/jira/brows

Re: [DISCUSS] New Types (Schema.fbs vs Extension Types)

2022-02-08 Thread Joris Van den Bossche
numbers (ARROW-638 <https://issues.apache.org/jira/browse/ARROW-638>, this has a PR) and 8-bit boolean values (ARROW-1674 <https://issues.apache.org/jira/browse/ARROW-1674>). But I think we should mainly look at demand / someone wanting to implement this, and (for you) this

Re: [DISCUSS] Binary Values in Key value pairs WAS: Re: [INFO_REQUEST][FLIGHT] - Dynamic schema changes in ArrowFlight streams

2022-02-04 Thread Joris Van den Bossche
Reviving this thread, I don't think anything happened in the meantime on this topic? >From rereading the thread, it seems David mentioned two possible ideas: - A new [byte] binary_value field in the existing KeyValue type, next to the existing string value field. And if you have valid string metad

Re: [DISCUSS] New Types (Schema.fbs vs Extension Types)

2022-01-25 Thread Joris Van den Bossche
On Sat, 22 Jan 2022 at 20:27, Rok Mihevc wrote: > > Thanks for the input Weston! > > How about arrow/experimental/format/ExtensionTypes.fbs or > arrow/format/ExtensionTypes.fbs for language independent schema and > loosely arrow//extensions for implementations? > > Having machine readable definiti

Re: [DISCUSS] Splitting the sphinx-based Arrow docs into separate sphinx projects

2021-12-16 Thread Joris Van den Bossche
erms-in-a-glossary-of-a-different-project also confirms this should work for glossary items as well) > (we don't have a glossary, but we may want to add one) > > Regards > > Antoine. > > > Le 08/12/2021 à 14:20, Joris Van den Bossche a écrit : > > Hi all, >

[DISCUSS] Splitting the sphinx-based Arrow docs into separate sphinx projects

2021-12-08 Thread Joris Van den Bossche
Hi all, We use the Sphinx documentation tool for the main Arrow documentation (the format specs, the developer docs) and several of the languages (C++, Python, part of the Java docs). This lives in the /docs subdirectory of the Arrow repo. Currently, this is a single large sphinx project. The adv

Re: About support ORC Filter pushdown

2021-11-29 Thread Joris Van den Bossche
For reference, the issue opened for this is https://issues.apache.org/jira/browse/ARROW-14890 As mentioned on the JIRA, I am not aware of direct plans to implement this, but it would certainly be nice to have this functionality for ORC and contributions are welcome. On Fri, 26 Nov 2021 at 21:37,

Re: [Doc] ORC-related documentation

2021-11-25 Thread Joris Van den Bossche
Hi Ian, Yes, more documentation regarding ORC would be very welcome! I think your list of missing docs is correct: - It's briefly mentioned in the Python API docs (https://arrow.apache.org/docs/python/api/formats.html#orc-files), but incomplete - The C++ reference docs list the OrcFileFormat for

Re: [Python] Shall all-null object type columns in Pandas be converted into Arrow columns with NullType type?

2021-11-22 Thread Joris Van den Bossche
People ran into similar issues with such all-NA columns with Parquet as well (with the difference that Parquet actually supports a null type, but if you have a partitioned dataset, this could lead to conflicting schemas). The typical workaround for the user to provide the schema when writing / conv

Re: [ANNOUNCE] New Arrow PMC member: Joris Van den Bossche

2021-11-17 Thread Joris Van den Bossche
gt; > > > On Nov 17, 2021, at 5:55 PM, Wes McKinney wrote: > > > > > > The Project Management Committee (PMC) for Apache Arrow has invited > > > Joris Van den Bossche to become a PMC member and we are pleased to > > > announce that Joris has accepted. > > > > > > Congratulations and welcome! > > > > >

Re: [Parquet][C++][Python] Maximum Row Group Length Default

2021-11-17 Thread Joris Van den Bossche
In addition, would it be useful to be able to change this max_row_group_length from Python? Currently that writer property can't be changed from Python, you can only specify the row_group_size (chunk_size in C++) when writing a table, but that's currently only useful to set it to something that is

Re: [VOTE] Release Apache Arrow 6.0.1 - RC0

2021-11-08 Thread Joris Van den Bossche
Although causing more delay for the release, I would also vote for including Weston's PR. Otherwise it would be very unfortunate that users can't preserve the existing behaviour (of 5.0). Joris On Sun, 7 Nov 2021 at 22:17, Sutou Kouhei wrote: > Hi, > > Python developers, what do you think about

Re: [DISCUSS][Python] Public Cython API

2021-08-25 Thread Joris Van den Bossche
On Wed, 25 Aug 2021 at 17:21, Antoine Pitrou wrote: > > Le 25/08/2021 à 17:12, Joris Van den Bossche a écrit : > > One example of consumer of our Cython API is cudf ( > > https://github.com/rapidsai/cudf). > > I am not very familiar with the package itself, but browsing

Re: [DISCUSS][Python] Public Cython API

2021-08-25 Thread Joris Van den Bossche
One example of consumer of our Cython API is cudf ( https://github.com/rapidsai/cudf). I am not very familiar with the package itself, but browsing its code, I see that they do for example cimport RecordBatchReader ( https://github.com/rapidsai/cudf/blob/f6d31fa95d9b8d8658301438d0f9ba22a1c131aa/pyt

Re: [VOTE][Format] Add in a new interval type can combines Month, Days and Nanoseconds

2021-08-19 Thread Joris Van den Bossche
+1 On Wed, 18 Aug 2021 at 20:16, David Li wrote: > +1 > > On Wed, Aug 18, 2021, at 10:31, Neal Richardson wrote: > > +1 > > > > On Wed, Aug 18, 2021 at 6:06 AM Antoine Pitrou > wrote: > > > > > +1 (binding) > > > > > > > > > Le 17/08/2021 à 21:49, Micah Kornfield a écrit : > > > > Hello, > > >

  1   2   3   4   >