Re: [DISCUSS] Format changes: process and requirements

2019-03-17 Thread Paul Taylor
Hi Jacques, I think we should have two complete implementations. I don't think having one feature in C# and Go and another in JavaScript and Rust does justice to the project goals. Agree 100%. We may already be in this situation with the DictionaryBatch "isDelta" flag. I haven't checked the C

[jira] [Created] (ARROW-4941) [Rust] Enhance documentation for parquet

2019-03-17 Thread Owen Nelson (JIRA)
Owen Nelson created ARROW-4941: -- Summary: [Rust] Enhance documentation for parquet Key: ARROW-4941 URL: https://issues.apache.org/jira/browse/ARROW-4941 Project: Apache Arrow Issue Type: Improve

[jira] [Created] (ARROW-4940) [Rust] Enhance documentation for datafusion

2019-03-17 Thread Owen Nelson (JIRA)
Owen Nelson created ARROW-4940: -- Summary: [Rust] Enhance documentation for datafusion Key: ARROW-4940 URL: https://issues.apache.org/jira/browse/ARROW-4940 Project: Apache Arrow Issue Type: Impr

[jira] [Created] (ARROW-4939) [Python] Add wrapper for "sum" kernel

2019-03-17 Thread Philipp Moritz (JIRA)
Philipp Moritz created ARROW-4939: - Summary: [Python] Add wrapper for "sum" kernel Key: ARROW-4939 URL: https://issues.apache.org/jira/browse/ARROW-4939 Project: Apache Arrow Issue Type: Impr

Re: [DISCUSS] Format changes: process and requirements

2019-03-17 Thread Micah Kornfield
I think this makes sense, and it is a good clarification on process. It might be a good idea to also give a preliminary vote on an existing backlog of JIRAs so people don't waste time starting PRs if they won't be supported (or supported via a custom metadata type). I've included a list below doi

Re: [DISCUSS] Format changes: process and requirements

2019-03-17 Thread Jacques Nadeau
> > How about "at least two native implementations" instead of > "Java and C++"? Now, we have multiple native > implementations: > I think we should have two complete implementations. I don't think having one feature in C# and Go and another in JavaScript and Rust does justice to the project goals

Re: [DISCUSS] Format changes: process and requirements

2019-03-17 Thread Kouhei Sutou
Hi, Basically, I agree with the proposal. > reference implementation/support in Java and C++ How about "at least two native implementations" instead of "Java and C++"? Now, we have multiple native implementations: * C++ * C# * Go * Java * JavaScript * Rust So, we can choose at leas

[jira] [Created] (ARROW-4938) [Glib] Undefined symbols error occurred when GIR file is being generated.

2019-03-17 Thread Kenta Murata (JIRA)
Kenta Murata created ARROW-4938: --- Summary: [Glib] Undefined symbols error occurred when GIR file is being generated. Key: ARROW-4938 URL: https://issues.apache.org/jira/browse/ARROW-4938 Project: Apache

Re: [DISCUSS] Format changes: process and requirements

2019-03-17 Thread Wes McKinney
hi Jacques, I agree with your reasoning. I think it's important to go through the implementation exercise to make sure we have the details right and everyone is on board with adding something new (or changing or deprecating something existing). I also agree with formalizing the voting process with

[jira] [Created] (ARROW-4937) [R] Clean pkg-config related logic

2019-03-17 Thread Kouhei Sutou (JIRA)
Kouhei Sutou created ARROW-4937: --- Summary: [R] Clean pkg-config related logic Key: ARROW-4937 URL: https://issues.apache.org/jira/browse/ARROW-4937 Project: Apache Arrow Issue Type: Improvement

[DISCUSS] Format changes: process and requirements

2019-03-17 Thread Jacques Nadeau
I want to bring up a concern I have with the recent changes to the format. To me, part of the strength of the project is that you have multiple bindings for Arrow the are cross-compatible and consistent. As such, I'd like to propose the following process when dealing with format changes: - Form

Re: [Discuss][Java, Non-C++ generally] Support for 64-bit int array lengths?

2019-03-17 Thread Jacques Nadeau
I definitely thinks it makes sense to introduce a second list vector and enhance complexwriter/field reader to support longs in Java. I wouldn't replace the existing list vector or associated apis. I'm up for ArrowBuf changes to use long indexes after we get it pointing at arbitrary memory as well

[jira] [Created] (ARROW-4936) [Rust] Add parquet test file for all supported types in 2.5.0 format

2019-03-17 Thread Andy Grove (JIRA)
Andy Grove created ARROW-4936: - Summary: [Rust] Add parquet test file for all supported types in 2.5.0 format Key: ARROW-4936 URL: https://issues.apache.org/jira/browse/ARROW-4936 Project: Apache Arrow

[jira] [Created] (ARROW-4935) Errors from jemalloc when building pyarrow from source on OSX and Debian

2019-03-17 Thread Gregory Hayes (JIRA)
Gregory Hayes created ARROW-4935: Summary: Errors from jemalloc when building pyarrow from source on OSX and Debian Key: ARROW-4935 URL: https://issues.apache.org/jira/browse/ARROW-4935 Project: Apach

[jira] [Created] (ARROW-4934) [Python] Address deprecation notice that will be a bug in Python 3.8

2019-03-17 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-4934: --- Summary: [Python] Address deprecation notice that will be a bug in Python 3.8 Key: ARROW-4934 URL: https://issues.apache.org/jira/browse/ARROW-4934 Project: Apache Arr

Re: Parquet test files with all data types

2019-03-17 Thread Wes McKinney
As an aside (and probably a discussion for the Parquet community) it would be useful to develop an integration testing framework similar to what we've done for Arrow (with JSON "point of truth") but for Parquet files. The limited integration testing across Parquet implementations is definitely conc

[jira] [Created] (ARROW-4933) [R] Autodetect Parquet support using pkg-config

2019-03-17 Thread Uwe L. Korn (JIRA)
Uwe L. Korn created ARROW-4933: -- Summary: [R] Autodetect Parquet support using pkg-config Key: ARROW-4933 URL: https://issues.apache.org/jira/browse/ARROW-4933 Project: Apache Arrow Issue Type:

[jira] [Created] (ARROW-4932) [GLib] Use G_DECLARE_DERIVABLE_TYPE macro

2019-03-17 Thread Kenta Murata (JIRA)
Kenta Murata created ARROW-4932: --- Summary: [GLib] Use G_DECLARE_DERIVABLE_TYPE macro Key: ARROW-4932 URL: https://issues.apache.org/jira/browse/ARROW-4932 Project: Apache Arrow Issue Type: Task

[jira] [Created] (ARROW-4931) [C++] CMake fails on gRPC ExternalProject

2019-03-17 Thread Uwe L. Korn (JIRA)
Uwe L. Korn created ARROW-4931: -- Summary: [C++] CMake fails on gRPC ExternalProject Key: ARROW-4931 URL: https://issues.apache.org/jira/browse/ARROW-4931 Project: Apache Arrow Issue Type: Bug

Re: CMake refactor Heads-up

2019-03-17 Thread Suvayu Ali
On Sat, Mar 16, 2019 at 04:31:32PM +0100, Uwe L. Korn wrote: > > > 4. AFAIU, the pyarrow build expects the libraries in > > $CMAKE_INSTALL_PREFIX/lib. This will never be accepted by a distro. I do > > realise this one is probably hard to resolve, given how the builds are > > setup at the mome

[jira] [Created] (ARROW-4930) Remove LIBDIR assumptions in Python build

2019-03-17 Thread Suvayu Ali (JIRA)
Suvayu Ali created ARROW-4930: - Summary: Remove LIBDIR assumptions in Python build Key: ARROW-4930 URL: https://issues.apache.org/jira/browse/ARROW-4930 Project: Apache Arrow Issue Type: Improvem

Re: Parquet test files with all data types

2019-03-17 Thread Uwe L. Korn
Hello Andy, I guess these files stem from the beginning of the Parquet format, when only INT96 timestamps were available. Feel free to add more of them. Using the Java implementation is best, this is definitely the reference through age and wide usage. Uwe > Am 17.03.2019 um 00:55 schrieb And