Re: [VOTE] Release Apache Arrow 4.0.0 - RC3

2021-04-26 Thread Sutou Kouhei
I'll also update MSYS2 packages: 1. [x] open a pull request to bump the version numbers in the source code 2. [x] upload source 3. [kou] upload binaries 4. [x] update website 5. [x] upload ruby gems 6. [ ] upload js packages 8. [x] upload C# packages 9. [x] upload rust crates 10. [ ]

Re: [VOTE] Release Apache Arrow 4.0.0 - RC3

2021-04-26 Thread Krisztián Szűcs
On Tue, Apr 27, 2021 at 1:05 AM Andy Grove wrote: > > The following Rust crates have been published: arrow, arrow-flight, parquet, > parquet_derive, datafusion Thanks Andy! The current status is: 1. [x] open a pull request to bump the version numbers in the source code 2. [x] upload source 3.

Re: [VOTE] Release Apache Arrow 4.0.0 - RC3

2021-04-26 Thread Andy Grove
The following Rust crates have been published: arrow, arrow-flight, parquet, parquet_derive, datafusion On Mon, Apr 26, 2021 at 4:34 PM Andy Grove wrote: > Yes, I can handle the Rust release. > > On Mon, Apr 26, 2021, 4:17 PM Krisztián Szűcs > wrote: > >> @Andy Grove could you please handle

Re: [VOTE] Release Apache Arrow 4.0.0 - RC3

2021-04-26 Thread Andy Grove
Yes, I can handle the Rust release. On Mon, Apr 26, 2021, 4:17 PM Krisztián Szűcs wrote: > @Andy Grove could you please handle the rust release? > > On Mon, Apr 26, 2021 at 11:51 PM Krisztián Szűcs > wrote: > > > > 1. [x] open a pull request to bump the version numbers in the source > code >

Re: [VOTE] Release Apache Arrow 4.0.0 - RC3

2021-04-26 Thread Krisztián Szűcs
@Andy Grove could you please handle the rust release? On Mon, Apr 26, 2021 at 11:51 PM Krisztián Szűcs wrote: > > 1. [x] open a pull request to bump the version numbers in the source code > 2. [x] upload source > 3. [kou] upload binaries > 4. [x] update website > 5. [x] upload ruby gems >

Re: [VOTE] Release Apache Arrow 4.0.0 - RC3

2021-04-26 Thread Krisztián Szűcs
1. [x] open a pull request to bump the version numbers in the source code 2. [x] upload source 3. [kou] upload binaries 4. [x] update website 5. [x] upload ruby gems 6. [ ] upload js packages 8. [x] upload C# packages 9. [ ] upload rust crates 10. [ ] update conda recipes 11. [in-progress]

Re: compute::is_in rejects duplicates in value_set

2021-04-26 Thread Niranda Perera
Sure. PFA the JIRA https://issues.apache.org/jira/browse/ARROW-12554 On Mon, Apr 26, 2021 at 4:31 PM Wes McKinney wrote: > In principle I don't see an issue with having duplicates in the value set, > could you open a Jira issue? > > On Mon, Apr 26, 2021 at 3:27 PM Niranda Perera > wrote: > > >

Re: [VOTE] Release Apache Arrow 4.0.0 - RC3

2021-04-26 Thread Sutou Kouhei
Hi, It seems that we can use ASF's Artifactory. (I deleted centos-rc/ accidentally. :<) I'll update our script and upload to there. Thanks, -- kou In "Re: [VOTE] Release Apache Arrow 4.0.0 - RC3" on Mon, 26 Apr 2021 21:01:28 +0200, Krisztián Szűcs wrote: > The current status of the

Re: compute::is_in rejects duplicates in value_set

2021-04-26 Thread Wes McKinney
In principle I don't see an issue with having duplicates in the value set, could you open a Jira issue? On Mon, Apr 26, 2021 at 3:27 PM Niranda Perera wrote: > Hi all, > > In the arrow release-4.0.0 branch, the compute::is_in operation rejects > duplicate values in the value_set [1]. This was

compute::is_in rejects duplicates in value_set

2021-04-26 Thread Niranda Perera
Hi all, In the arrow release-4.0.0 branch, the compute::is_in operation rejects duplicate values in the value_set [1]. This was not the case in arrow 2.0 >=. I was wondering if this strict restriction is required? Because ultimately, a hash set would be created from the value_set values, and

Re: [VOTE] Release Apache Arrow 4.0.0 - RC3

2021-04-26 Thread Krisztián Szűcs
The current status of the post-release tasks: 1. [x] open a pull request to bump the version numbers in the source code 2. [x] upload source 3. [can't do] upload binaries 4. [x] update website 5. [x] upload ruby gems 6. [ ] upload js packages 8. [ ] upload C# packages 9. [ ] upload rust

Re: [VOTE] Release Apache Arrow 4.0.0 - RC3

2021-04-26 Thread Neal Richardson
I've started a draft blog post announcing 4.0: https://github.com/apache/arrow-site/pull/104 I added notes on committers/PMC additions and the R changes; please others fill in the rest--github "suggestions" or just push to my branch. Thanks! Neal On Mon, Apr 26, 2021 at 11:19 AM Krisztián Szűcs

Re: [VOTE] Release Apache Arrow 4.0.0 - RC3

2021-04-26 Thread Krisztián Szűcs
The VOTE carries with 4 binding +1 and 5 non-binding +1 votes. Thanks everyone! I'm starting the post release tasks and keep you posted about the current status. On Mon, Apr 26, 2021 at 6:06 PM Neal Richardson wrote: > > +1 (binding) > > GitHub Actions verifications are green and R artifact

Re: [DISCUSS] [Rust] Python-datafusion

2021-04-26 Thread Jorge Cardoso Leitão
Hi Micah, All testing is actually done from Python: create a record batch in pyarrow, push it to datafusion, consume it back in Python, and compare the result using pyarrows' equality. Sometimes parquet is used instead. The library is tested against pyarrow==1 from pypi: we can bump that, but if

Re: [DISCUSS] [Rust] Python-datafusion

2021-04-26 Thread Alessandro Molina
Would "incorporate" mean that the codebase is moved into the arrow repository or is the plan to keep a separate repository for datafusion-python but under the apache org? On Sun, Apr 25, 2021 at 10:40 PM Daniël Heres wrote: > Hi Jorge, > > Awesome, I think this is a super valuable addition and

Re: [VOTE] Release Apache Arrow 4.0.0 - RC3

2021-04-26 Thread Neal Richardson
+1 (binding) GitHub Actions verifications are green and R artifact builds are successful. Neal On Mon, Apr 26, 2021 at 6:02 AM Krisztián Szűcs wrote: > On Sun, Apr 25, 2021 at 10:59 PM Sutou Kouhei wrote: > > > > Here: https://github.com/apache/arrow/pull/10126 > I've incorporated the

Re: [DISCUSS] experimental repos

2021-04-26 Thread Jorge Cardoso Leitão
Hi Micah, For code that is mergeable, I would say that a branch is superior, as it keeps lineage and thus enables rebasing. IMO there are two use-cases for this mechanism: * create a new component from scratch (e.g. Ballista, bindings to language Z, python-datafusion). * re-write an existing

Re: [Python] Custom Metadata in PyArrow

2021-04-26 Thread Joris Van den Bossche
On Fri, 23 Apr 2021 at 14:50, Michael Lavina wrote: > Hello Team, > > The docs for Custom Metadata in PyArrow say TODO > https://arrow.apache.org/docs/python/data.html#custom-schema-and-field-metadata > So I am wondering if someone has any example of adding some custom > metadata to PyArrow. >

Re: [Announce][Rust] JIRA Issues migrated to github issues

2021-04-26 Thread Wes McKinney
Thanks for doing this Andrew. On Mon, Apr 26, 2021 at 8:38 AM Andrew Lamb wrote: > I have migrated over all JIRA issues that were marked as "Rust" or > "Rust-DataFusion" to new issues in the https://github.com/apache/arrow-rs > and https://github.com/apache/arrow-datafusion repos respectively.

[Announce][Rust] JIRA Issues migrated to github issues

2021-04-26 Thread Andrew Lamb
I have migrated over all JIRA issues that were marked as "Rust" or "Rust-DataFusion" to new issues in the https://github.com/apache/arrow-rs and https://github.com/apache/arrow-datafusion repos respectively. There are now no open JIRA issues [1] for the Rust implementation. My script moved the

Re: [VOTE] Release Apache Arrow 4.0.0 - RC3

2021-04-26 Thread Krisztián Szűcs
On Sun, Apr 25, 2021 at 10:59 PM Sutou Kouhei wrote: > > Here: https://github.com/apache/arrow/pull/10126 I've incorporated the automatic verification step to the release procedure so we can start the VOTE after having positive feedback from the verification tasks. > > In > "Re: [VOTE] Release

[NIGHTLY] Arrow Build Report for Job nightly-2021-04-26-0

2021-04-26 Thread Crossbow
Arrow Build Report for Job nightly-2021-04-26-0 All tasks: https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-04-26-0 Failed Tasks: - conda-linux-gcc-py36-arm64: URL: