Re: [Discuss] Single offset per array has a non-trivial performance implication

2021-10-27 Thread Jorge Cardoso Leitão
Hi, > A big +1 to this, covering all the edge cases with slices is pretty complicated (there was at least one long standing bug related to this in the 6.0 release). I imagine there are potentially more lurking in the code base. Thanks for this observation, arrow-rs faces a similar issue: it is

Re: [VOTE][RESULT] Release Apache Arrow 6.0.0 - RC3

2021-10-27 Thread Dominik Moritz
I think it would be good to make RC releases [1] for the JavaScript package as well so that we can test them out easily. Even better, if we could make regular releases (tagged with e.g. `canary`) when a JS feature is merged, people could find issues even before we start the release process. [1]

Re: [VOTE][RESULT] Release Apache Arrow 6.0.0 - RC3

2021-10-27 Thread Neal Richardson
Looking ahead, is there anything we can add to CI and/or release verification to catch things like this before the release? There already is a 6.0.1 fix version in Jira so please tag any issues accordingly. Neal On Wed, Oct 27, 2021 at 7:07 PM Matthew Topol wrote: > There was also an issue

Re: [VOTE] Remove compute from Arrow JS

2021-10-27 Thread Brian Hulette
+1 I don't think there's much reason to keep the compute code around when there's a more performant, easier to use alternative. I think the only unique feature of the arrow compute code was the ability to optimize queries on dictionary-encoded columns, but Jeff added this to Arquero almost a year

[VOTE] Remove compute from Arrow JS

2021-10-27 Thread Dominik Moritz
Dear Arrow community, We are proposing to remove the compute code from Arrow JS. Right now, the compute code is encapsulated in a DataFrame class that extends Table. The DataFrame implements a few functions such as filtering and counting with expressions. However, the predicate code is not very

RE: [VOTE][RESULT] Release Apache Arrow 6.0.0 - RC3

2021-10-27 Thread Matthew Topol
There was also an issue found with the Go package deployment which prevents actually using `go get -u github.com/apache/arrow/go/arrow@v6.0.0` . The fix is in https://github.com/apache/arrow/pull/11566. I Second the request for a 6.0.1 release with a patch before the release post goes out to

Re: [VOTE][RESULT] Release Apache Arrow 6.0.0 - RC3

2021-10-27 Thread Dominik Moritz
We have found an issue with the JavaScript packages that prevents people from using Arrow in browser bundles. The fix is in https://github.com/apache/arrow/pull/11565. Could we make a 6.0.1 release of the NPM package with the patch before the release post goes out? On Oct 27, 2021 at 02:55:20,

Re: Arrow in HPC

2021-10-27 Thread David Li
Thanks for the clarification Yibo, looking forward to the results. Even if it is a very hacky PoC it will be interesting to see how it affects performance, though as Keith points out there are benefits in general to UCX (or similar library), and we can work out the implementation plan from

Re: [DISCUSS][Rust] Biweekly sync call for arrow/datafusion again?

2021-10-27 Thread Benson Muite
Cannot host at 4:00 UTC on 28 October, but can host at 4:00 UTC on 11 November. Benson On 10/27/21 1:02 PM, Andrew Lamb wrote: We have some proposed agenda items[1] for the Rust sync[1] this week so I will plan to see anyone who is interested tomorrow. Andrew [1]

Re: [Discuss] Single offset per array has a non-trivial performance implication

2021-10-27 Thread Antoine Pitrou
Le 26/10/2021 à 21:30, Jorge Cardoso Leitão a écrit : Hi, One aspect of the design of "arrow2" is that it deals with array slices differently from the rest of the implementations. Essentially, the offset is not stored in ArrayData, but on each individual Buffer. Some important consequence

Re: Arrow sync call October 27 at 12:00 US/Eastern, 16:00 UTC

2021-10-27 Thread Nic
Meeting notes below Attendees: Nic Crane Jonathan Keane Eduardo Ponce Niranda Perera Benson Muite Micah Kornfield Joris Van den Bossche Discussion: - Update on release - the vote passed and post-release tasks are in progress. The R package has been submitted to CRAN and resubmitting shortly as

Re: Arrow in HPC

2021-10-27 Thread Benson Muite
UCX is interesting, relatively new and seems like it may be easier to integrate. MPI is the most commonly used backend for HPC. Influencing the development of UCX is more difficult than influencing the development of MPI, but both have a slower pace of development than Arrow. One may want to

Re: [DISCUSS][Rust] Biweekly sync call for arrow/datafusion again?

2021-10-27 Thread Andrew Lamb
We have some proposed agenda items[1] for the Rust sync[1] this week so I will plan to see anyone who is interested tomorrow. Andrew [1] https://docs.google.com/document/d/1atCVnoff5SR4eM4Lwf2M1BBJTY6g3_HUNR6qswYJW_U/edit# [2]

Re: [VOTE][RESULT] Release Apache Arrow 6.0.0 - RC3

2021-10-27 Thread Sutou Kouhei
1. [in-pr] bump version numbers 2. [done] upload source 3. [done] upload binaries 4. [in-pr] update website 5. [depends-on-brew] upload ruby gems 6. [done] upload js packages 8. [done] upload C# packages 10. [ ] update conda recipes 11. [done] upload wheels/sdist to pypi 12. [ ] update homebrew