[jira] [Created] (ARROW-7625) Parquet GLib and Red Parquet (Ruby) do not allow specifying compression type

2020-01-20 Thread Keith Gable (Jira)
Keith Gable created ARROW-7625: -- Summary: Parquet GLib and Red Parquet (Ruby) do not allow specifying compression type Key: ARROW-7625 URL: https://issues.apache.org/jira/browse/ARROW-7625 Project: Apach

[jira] [Created] (ARROW-7624) [Rust] Soundess issues via `Buffer` methods

2020-01-20 Thread Jim Turner (Jira)
Jim Turner created ARROW-7624: - Summary: [Rust] Soundess issues via `Buffer` methods Key: ARROW-7624 URL: https://issues.apache.org/jira/browse/ARROW-7624 Project: Apache Arrow Issue Type: Bug

Re: [Format] Make fields required?

2020-01-20 Thread Wes McKinney
To help with the discussion, here is a patch with 9 "definitely required" fields made required, and the associated generated C++ changes https://github.com/apache/arrow/compare/master...wesm:flatbuffers-required (I am not 100% sure about Field.children always being non-null, if there were some do

[jira] [Created] (ARROW-7623) [C++] Update generated flatbuffers files

2020-01-20 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-7623: - Summary: [C++] Update generated flatbuffers files Key: ARROW-7623 URL: https://issues.apache.org/jira/browse/ARROW-7623 Project: Apache Arrow Issue Type: T

[jira] [Created] (ARROW-7622) [Format] Mark Tensor and SparseTensor fields required

2020-01-20 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-7622: - Summary: [Format] Mark Tensor and SparseTensor fields required Key: ARROW-7622 URL: https://issues.apache.org/jira/browse/ARROW-7622 Project: Apache Arrow

Re: [DISCUSS] C Data Interface, take 2

2020-01-20 Thread Wes McKinney
Independent of the particulars of the discussion, the C++ project needs to be free to create a C API for itself. If you want to try to block the C++ contributors from doing this we may be barreling toward a governance crisis in the project. I'm stepping back from this discussion for a time now to a

Re: [DISCUSS] C Data Interface, take 2

2020-01-20 Thread Jacques Nadeau
I don't see this as an endogenous concern of the C++ project. I appreciate your goal with saying so but I think this has broader ramifications around fragmentation of the project. The core challenge that we're dealing with is we introduced foundational concepts in some implementations that go beyo

Re: [Format] Make fields required?

2020-01-20 Thread Wes McKinney
On Mon, Jan 20, 2020 at 12:20 PM Jacques Nadeau wrote: > > > > > I think what we have determined is that the changes that are being > > discussed in this thread would not render any existing serialized > > Flatbuffers unreadable, unless they are malformed / unable to be > > read with the current l

[jira] [Created] (ARROW-7621) [Doc] Doc build fails

2020-01-20 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-7621: - Summary: [Doc] Doc build fails Key: ARROW-7621 URL: https://issues.apache.org/jira/browse/ARROW-7621 Project: Apache Arrow Issue Type: Bug Compon

Re: [Format] Make fields required?

2020-01-20 Thread Jacques Nadeau
> > I think what we have determined is that the changes that are being > discussed in this thread would not render any existing serialized > Flatbuffers unreadable, unless they are malformed / unable to be > read with the current libraries. > I think we need to separate two different things: Poin

Re: [DISCUSS] C Data Interface, take 2

2020-01-20 Thread Wes McKinney
hi Jacques, Taking a step back from the discussion, the original problem statement was to enable third party projects to produce the data structure used by C++ Array classes in C without depending on the C++ code That's the ArrayData class here https://github.com/apache/arrow/blob/master/cpp/src

Re: [DISCUSS] C Data Interface, take 2

2020-01-20 Thread Jacques Nadeau
As I noted on the pull request, I think fundamentally this work is at odds with the Arrow specification and being used to introduce a shadow specification. I don't think our intentions about how people should use something really influence how people will actually use or perceive it. They'll just

Re: [DISCUSS] C Data Interface, take 2

2020-01-20 Thread Wes McKinney
hi folks, I just made a comment in https://github.com/apache/arrow/pull/6026 that I wanted to surface here on the mailing list. It seems that to reach consensus for a C interface that is intended to be broadly used by multiple programming languages, we may make some compromises that harm or outri

Re: [Format] Make fields required?

2020-01-20 Thread Wes McKinney
> Unless I'm misunderstanding your proposal, that doesn't deal with the data > that has already been produced that may have been written in a way that > this change finds non-consumable but works today. I think what we have determined is that the changes that are being discussed in this thread wou

Re: [Format] Make fields required?

2020-01-20 Thread Antoine Pitrou
Le 20/01/2020 à 17:17, Jacques Nadeau a écrit : >> >> To be clear, I agree that we need to check that our various validation >> and integration suites pass properly. But once that is done and >> assuming all the metadata variations are properly tested, data >> variations should not pose any prob

Re: [Format] Make fields required?

2020-01-20 Thread Jacques Nadeau
> > To be clear, I agree that we need to check that our various validation > and integration suites pass properly. But once that is done and > assuming all the metadata variations are properly tested, data > variations should not pose any problem. > Unless I'm misunderstanding your proposal, that

[jira] [Created] (ARROW-7620) [Rust] Windows builds failing due to flatbuffer compile error

2020-01-20 Thread Neville Dipale (Jira)
Neville Dipale created ARROW-7620: - Summary: [Rust] Windows builds failing due to flatbuffer compile error Key: ARROW-7620 URL: https://issues.apache.org/jira/browse/ARROW-7620 Project: Apache Arrow

Re: [Format] Make fields required?

2020-01-20 Thread Antoine Pitrou
Le 20/01/2020 à 16:26, Jacques Nadeau a écrit : > I think it is too late in the game to make this fundamental change. It > would be very hard to assess whether it is no op or has massive > implications to existing datasets. Just among Dremio customers in the 30 > days we stored more than 100mm da

[jira] [Created] (ARROW-7619) [Crossbow] Consider removing artifact patterns

2020-01-20 Thread Krisztian Szucs (Jira)
Krisztian Szucs created ARROW-7619: -- Summary: [Crossbow] Consider removing artifact patterns Key: ARROW-7619 URL: https://issues.apache.org/jira/browse/ARROW-7619 Project: Apache Arrow Issue

Re: [Format] Make fields required?

2020-01-20 Thread Jacques Nadeau
I think it is too late in the game to make this fundamental change. It would be very hard to assess whether it is no op or has massive implications to existing datasets. Just among Dremio customers in the 30 days we stored more than 100mm datasets that leveraged the current format. I'm supportive

[jira] [Created] (ARROW-7618) [C++] Fix crashes or undefined behaviour on corrupt IPC input

2020-01-20 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-7618: - Summary: [C++] Fix crashes or undefined behaviour on corrupt IPC input Key: ARROW-7618 URL: https://issues.apache.org/jira/browse/ARROW-7618 Project: Apache Arrow

[jira] [Created] (ARROW-7617) [Python] Slices of Dataframes with Categorical columns are not respected in write_to_dataset

2020-01-20 Thread Vladimir (Jira)
Vladimir created ARROW-7617: --- Summary: [Python] Slices of Dataframes with Categorical columns are not respected in write_to_dataset Key: ARROW-7617 URL: https://issues.apache.org/jira/browse/ARROW-7617 Proj

[jira] [Created] (ARROW-7616) [Java] Support comparing value ranges for dense union vector

2020-01-20 Thread Liya Fan (Jira)
Liya Fan created ARROW-7616: --- Summary: [Java] Support comparing value ranges for dense union vector Key: ARROW-7616 URL: https://issues.apache.org/jira/browse/ARROW-7616 Project: Apache Arrow Issu

[jira] [Created] (ARROW-7615) [CI][Gandiva] Ensure that the gandiva jar has only a whitelisted set of shared dependencies as part of Travis CI job

2020-01-20 Thread Projjal Chanda (Jira)
Projjal Chanda created ARROW-7615: - Summary: [CI][Gandiva] Ensure that the gandiva jar has only a whitelisted set of shared dependencies as part of Travis CI job Key: ARROW-7615 URL: https://issues.apache.org/jira