Re: [C++] Big-endian support

2020-04-21 Thread Kazuaki Ishizaki
Thank you for your comments. I see that the developers would assist of other parts, too. For developing OSS on big-endian, here are resource for an environment and CI. They would be helpful for code review, too. A trial zLinux VM for OSS development is available. Once we create a VM with RHEL o

Re: [C++] Revamping approach to Arrow compute kernel development

2020-04-21 Thread Micah Kornfield
Hi Wes, I haven't had time to read the doc, but wanted to ask some questions on points raised on the thread. * For efficiency, kernels used for array-expr evaluation should write > into preallocated memory as their default mode. This enables the > interpreter to avoid temporary memory allocations

Re: [DISCUSS] Reducing scope of work for Arrow 1.0.0 release

2020-04-21 Thread Micah Kornfield
Hi Wes, I think we might be closer than we think on the Java side to having the functionality listed (I've added comments inline at the end with the features you listed in the original e-mail). My biggest concern is I don't think there is a clear path forward for Sparse Unions. Getting compatibil

[jira] [Created] (ARROW-8552) [Rust] support column iteration for parquet row

2020-04-21 Thread QP Hou (Jira)
QP Hou created ARROW-8552: - Summary: [Rust] support column iteration for parquet row Key: ARROW-8552 URL: https://issues.apache.org/jira/browse/ARROW-8552 Project: Apache Arrow Issue Type: Improvemen

Re: [C++] Big-endian support

2020-04-21 Thread Micah Kornfield
> > That said, if big-endian developers would assist with > other parts of the C++ project as a sort of "quid-pro-quo" to balance > the time spent on code review relating to big-endian that would be > helpful. I think setting/resetting up setting up CI would need to be included in this, otherwise

[jira] [Created] (ARROW-8551) [CI][Gandiva] Use docker image with LLVM 8 to build gandiva linux jar

2020-04-21 Thread Prudhvi Porandla (Jira)
Prudhvi Porandla created ARROW-8551: --- Summary: [CI][Gandiva] Use docker image with LLVM 8 to build gandiva linux jar Key: ARROW-8551 URL: https://issues.apache.org/jira/browse/ARROW-8551 Project: Ap

Re: Gandiva projector for dictionary array

2020-04-21 Thread Yue Ni
Thanks a lot Wes. I will give the arrow::compute::Cast API a try. BTW, although I don't have any working proposal yet, I wonder what format/process we typically follow for such a proposal? I assume I need to do some experiment locally and draft a email describing the proposal and send it to the de

[jira] [Created] (ARROW-8550) [CI] Don't run cron GHA jobs on forks

2020-04-21 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-8550: -- Summary: [CI] Don't run cron GHA jobs on forks Key: ARROW-8550 URL: https://issues.apache.org/jira/browse/ARROW-8550 Project: Apache Arrow Issue Type: Im

Re: 0.17 release blog post: help needed

2020-04-21 Thread Neal Richardson
Hope you weren't editing the google doc while I was moving it to https://github.com/apache/arrow-site/pull/55. If so, would you mind copying over any relevant changes? Neal On Tue, Apr 21, 2020 at 1:53 PM Wes McKinney wrote: > I did a few tweaks and cleanups. There are still a number of TODO >

[jira] [Created] (ARROW-8549) [R] Assorted post-0.17 release cleanups

2020-04-21 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-8549: -- Summary: [R] Assorted post-0.17 release cleanups Key: ARROW-8549 URL: https://issues.apache.org/jira/browse/ARROW-8549 Project: Apache Arrow Issue Type:

[jira] [Created] (ARROW-8548) [Website] 0.17 release post

2020-04-21 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-8548: -- Summary: [Website] 0.17 release post Key: ARROW-8548 URL: https://issues.apache.org/jira/browse/ARROW-8548 Project: Apache Arrow Issue Type: Improvement

Re: 0.17 release blog post: help needed

2020-04-21 Thread Wes McKinney
I did a few tweaks and cleanups. There are still a number of TODO items in this document. It would be good to finish (or remove) these so this can be published tomorrow or Thursday On Mon, Apr 20, 2020 at 7:47 AM Fan Liya wrote: > > I have added some Java items. > > Best, > Liya Fan > > On Mon, A

[jira] [Created] (ARROW-8547) [Rust] Implement JsonEqual for UnionArray

2020-04-21 Thread Paddy Horan (Jira)
Paddy Horan created ARROW-8547: -- Summary: [Rust] Implement JsonEqual for UnionArray Key: ARROW-8547 URL: https://issues.apache.org/jira/browse/ARROW-8547 Project: Apache Arrow Issue Type: New Fe

[jira] [Created] (ARROW-8546) [Rust] Handle UnionArray in get_fb_field_type

2020-04-21 Thread Paddy Horan (Jira)
Paddy Horan created ARROW-8546: -- Summary: [Rust] Handle UnionArray in get_fb_field_type Key: ARROW-8546 URL: https://issues.apache.org/jira/browse/ARROW-8546 Project: Apache Arrow Issue Type: Bu

[jira] [Created] (ARROW-8545) Allow fast writing of Decimal column to parquet

2020-04-21 Thread Fons de Leeuw (Jira)
Fons de Leeuw created ARROW-8545: Summary: Allow fast writing of Decimal column to parquet Key: ARROW-8545 URL: https://issues.apache.org/jira/browse/ARROW-8545 Project: Apache Arrow Issue Ty

Re: [DISCUSS] Reducing scope of work for Arrow 1.0.0 release

2020-04-21 Thread Neal Richardson
I'm all for making our next release be 1.0. Everything is about tradeoffs, and while I too would like to see a complete Java implementation, I think the costs of further delaying 1.0 outweigh the benefits of holding it indefinitely in hopes that there will be enough availability of Java developers

[jira] [Created] (ARROW-8544) [CI][Crossbow] Add a status.json to the gh-pages summary of nightly builds to get around rate limiting

2020-04-21 Thread Krisztian Szucs (Jira)
Krisztian Szucs created ARROW-8544: -- Summary: [CI][Crossbow] Add a status.json to the gh-pages summary of nightly builds to get around rate limiting Key: ARROW-8544 URL: https://issues.apache.org/jira/browse/ARRO

Re: [DISCUSS] Reducing scope of work for Arrow 1.0.0 release

2020-04-21 Thread Wes McKinney
hi Bryan -- with the way that things are going, if we were to block the 1.0.0 release on completing the Java work, it could be a very long time to wait (long time = more than 6 months from now). I don't think that's acceptable. The Versioning document was formally adopted last August and so a year

[jira] [Created] (ARROW-8543) [C++] IO: single pass coalescing algorithm

2020-04-21 Thread Mayur Srivastava (Jira)
Mayur Srivastava created ARROW-8543: --- Summary: [C++] IO: single pass coalescing algorithm Key: ARROW-8543 URL: https://issues.apache.org/jira/browse/ARROW-8543 Project: Apache Arrow Issue T

Re: [DISCUSS] Reducing scope of work for Arrow 1.0.0 release

2020-04-21 Thread Bryan Cutler
I really would like to see a 1.0.0 release with complete implementations for C++ and Java. From my experience, that interoperability has been a major selling point for the project. That being said, my time for contributions has been pretty limited lately and I know that Java has been lagging, so if

Re: Gandiva projector for dictionary array

2020-04-21 Thread Wes McKinney
On Tue, Apr 21, 2020 at 6:34 AM Yue Ni wrote: > > Hi there, > > I am currently using gandiva C++ library doing projection/selection for > Arrow record batch, in my record batch, I have some fields encoded with > dictionary encoding, I wonder how I can apply gandiva functions for these > dictionary

Re: [DISCUSS] Reducing scope of work for Arrow 1.0.0 release

2020-04-21 Thread Wes McKinney
Hi all -- are there some opinions about this? Thanks On Thu, Apr 16, 2020 at 5:30 PM Wes McKinney wrote: > > hi folks, > > Previously we had discussed a plan for making a 1.0.0 release based on > completeness of columnar format integration tests and making > forward/backward compatibility guaran

Re: [VOTE] Release Apache Arrow 0.17.0 - RC0

2020-04-21 Thread Wes McKinney
It looks like the rebase-PR step didn't work correctly per Micah's comment (didn't work on my PR for ARROW-2714 either). Might want to look into why not On Tue, Apr 21, 2020 at 6:23 AM Krisztián Szűcs wrote: > > On Tue, Apr 21, 2020 at 4:28 AM Andy Grove wrote: > > > > Well, I got trhe crates pu

Re: [C++] Revamping approach to Arrow compute kernel development

2020-04-21 Thread Wes McKinney
On Tue, Apr 21, 2020 at 7:32 AM Antoine Pitrou wrote: > > > Le 21/04/2020 à 13:53, Wes McKinney a écrit : > >> > >> That said, in the SortToIndices case, this wouldn't be a problem, since > >> only the second pass writes to the output. > > > > This kernel is not valid for normal array-exprs (see t

Re: [C++] Revamping approach to Arrow compute kernel development

2020-04-21 Thread Antoine Pitrou
Le 21/04/2020 à 13:53, Wes McKinney a écrit : >> >> That said, in the SortToIndices case, this wouldn't be a problem, since >> only the second pass writes to the output. > > This kernel is not valid for normal array-exprs (see the spreadsheet I > linked), such as what you can write in SQL > > K

[ANNOUNCE] Apache Arrow 0.17.0 released

2020-04-21 Thread Krisztián Szűcs
The Apache Arrow community is pleased to announce the 0.17.0 release. The release includes 582 resolved issues ([1]) since the 0.16.0 release. The release is available now from our website, [2] and [3]: https://arrow.apache.org/install/ Release notes are available at: https://arrow.apache

[jira] [Created] (ARROW-8542) [Release] Fix checksum url in the website post release script

2020-04-21 Thread Krisztian Szucs (Jira)
Krisztian Szucs created ARROW-8542: -- Summary: [Release] Fix checksum url in the website post release script Key: ARROW-8542 URL: https://issues.apache.org/jira/browse/ARROW-8542 Project: Apache Arrow

Re: [C++] Revamping approach to Arrow compute kernel development

2020-04-21 Thread Wes McKinney
hi Antoine, On Tue, Apr 21, 2020 at 4:54 AM Antoine Pitrou wrote: > > > Le 21/04/2020 à 11:13, Antoine Pitrou a écrit : > > > It would be interesting to know how costly repeated > allocation/deallocation is. Modern allocators like jemalloc do their > own caching instead of always returning memo

Re: [C++] Big-endian support

2020-04-21 Thread Wes McKinney
I will add that I think big-endian support would be valuable so that the library can be used everywhere, including more exotic mainframe type systems like IBM Z. That said, the code review burden to other C++ developers is likely to become significant, so a solo developer with access to big-endian

Re: [C++] Revamping approach to Arrow compute kernel development

2020-04-21 Thread Wes McKinney
Hi Sven, On Mon, Apr 20, 2020 at 11:49 PM Sven Wagner-Boysen wrote: > > Hi Wes, > > I think reducing temporary memory allocation is a great effort and will > show great benefit in compute intensive scenarios. > As we are mainly working with the Rust and Datafusion part of the Arrow > project I wa

[jira] [Created] (ARROW-8541) [Release] Don't remove previous source releases automatically

2020-04-21 Thread Krisztian Szucs (Jira)
Krisztian Szucs created ARROW-8541: -- Summary: [Release] Don't remove previous source releases automatically Key: ARROW-8541 URL: https://issues.apache.org/jira/browse/ARROW-8541 Project: Apache Arrow

Gandiva projector for dictionary array

2020-04-21 Thread Yue Ni
Hi there, I am currently using gandiva C++ library doing projection/selection for Arrow record batch, in my record batch, I have some fields encoded with dictionary encoding, I wonder how I can apply gandiva functions for these dictionary encoded fields. Currently, there is no gandiva function ha

Re: [VOTE] Release Apache Arrow 0.17.0 - RC0

2020-04-21 Thread Krisztián Szűcs
On Tue, Apr 21, 2020 at 4:28 AM Andy Grove wrote: > > Well, I got trhe crates published, but there's a nasty workaround for users > that want to use these crates as a dependency and it means there is no real > dependency management on the Flight protocol version. I think the answer is > that we ne

[NIGHTLY] Arrow Build Report for Job nightly-2020-04-21-0

2020-04-21 Thread Crossbow
Arrow Build Report for Job nightly-2020-04-21-0 All tasks: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-21-0 Failed Tasks: - centos-8-amd64: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-04-21-0-github-centos-8-amd64 - debian-buster-a

[jira] [Created] (ARROW-8540) [C++] Create memory allocation benchmark

2020-04-21 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-8540: - Summary: [C++] Create memory allocation benchmark Key: ARROW-8540 URL: https://issues.apache.org/jira/browse/ARROW-8540 Project: Apache Arrow Issue Type: W

Re: [C++] Revamping approach to Arrow compute kernel development

2020-04-21 Thread Antoine Pitrou
Le 21/04/2020 à 11:13, Antoine Pitrou a écrit : > > This assumes that all these kernels can safely write into one of their > inputs. This should be true for trivial ones, but not if e.g. a kernel > makes two passes over its input. For example, the SortToIndices kernel > first scans the input f

[jira] [Created] (ARROW-8539) [CI] "AMD64 MacOS 10.15 GLib & Ruby" fails

2020-04-21 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-8539: - Summary: [CI] "AMD64 MacOS 10.15 GLib & Ruby" fails Key: ARROW-8539 URL: https://issues.apache.org/jira/browse/ARROW-8539 Project: Apache Arrow Issue Type:

Re: [C++] Revamping approach to Arrow compute kernel development

2020-04-21 Thread Antoine Pitrou
Hi Wes, Le 18/04/2020 à 23:41, Wes McKinney a écrit : > > There are some problems with our current collection of kernels in the > context of array-expr evaluation in query processing: > > * For efficiency, kernels used for array-expr evaluation should write > into preallocated memory as their