Re: [DISCUSS] Proposal about integration test of arrow parquet reader

2019-10-08 Thread Renjie Liu
On Wed, Oct 9, 2019 at 12:11 PM Andy Grove wrote: > I'm very interested in helping to find a solution to this because we really > do need integration tests for Rust to make sure we're compatible with other > implementations... there is also the ongoing CI dockerization work that I > feel is

Re: [DISCUSS] Proposal about integration test of arrow parquet reader

2019-10-08 Thread Andy Grove
I'm very interested in helping to find a solution to this because we really do need integration tests for Rust to make sure we're compatible with other implementations... there is also the ongoing CI dockerization work that I feel is related. I haven't looked at the current integration tests yet

[DISCUSS] Proposal about integration test of arrow parquet reader

2019-10-08 Thread Renjie Liu
Hi: I'm developing rust version of reader which reads parquet into arrow array. To verify the correct of this reader, I use the following approach: 1. Define schema with protobuf. 2. Generate json data of this schema using other language with more sophisticated implementation (e.g.

[jira] [Created] (ARROW-6822) [Website] merge_pr.py is published

2019-10-08 Thread Kouhei Sutou (Jira)
Kouhei Sutou created ARROW-6822: --- Summary: [Website] merge_pr.py is published Key: ARROW-6822 URL: https://issues.apache.org/jira/browse/ARROW-6822 Project: Apache Arrow Issue Type:

Table.cast throws ArrowNotImplementedError (pyarrow==0.15.0)

2019-10-08 Thread Lucas Pickup
So it seems in 'pyarrow==0.15.0' `Table.columns` now returns ChunkedArray instead of Column. This has broken `Table.cast()` as it just calls `Table.itercolumns` and expects the yielded values to have a `.cast()` method, which ChunkedArray doesn't. Was `Table.cast()` missed in cleaning up after

Re: PSA: TensorFlow Extended (TFX) is proposing to use Arrow for in-memory representations

2019-10-08 Thread Wes McKinney
Thanks Micah, I've been following the discussion. I encourage others to participate as well On Mon, Oct 7, 2019 at 3:11 PM Micah Kornfield wrote: > > The proposal is in PR form at: [1]. I thought I'd mention it here in case > people are interested but haven't seen it yet. > > [1]

[jira] [Created] (ARROW-6821) [C++][Parquet] Do not require Thrift compiler when building (but still require library)

2019-10-08 Thread Wes McKinney (Jira)
Wes McKinney created ARROW-6821: --- Summary: [C++][Parquet] Do not require Thrift compiler when building (but still require library) Key: ARROW-6821 URL: https://issues.apache.org/jira/browse/ARROW-6821

Re: [DISCUSS] C-level in-process array protocol

2019-10-08 Thread Wes McKinney
On Tue, Oct 8, 2019 at 3:34 PM Wes McKinney wrote: > > hi Jacques, > > On Tue, Oct 8, 2019 at 1:54 PM Jacques Nadeau wrote: > > > > I removing all my objections to this work. > > > > I wish there was more feedback from additional community members. I > > continue to be concerned about

Re: [DISCUSS] C-level in-process array protocol

2019-10-08 Thread Wes McKinney
hi Jacques, On Tue, Oct 8, 2019 at 1:54 PM Jacques Nadeau wrote: > > I removing all my objections to this work. > > I wish there was more feedback from additional community members. I continue > to be concerned about fragmentation. I don't agree with the arguments here > that we need to add a

Re: [DISCUSS] C-level in-process array protocol

2019-10-08 Thread Uwe L. Korn
I'm not sure whether flatbuffers is actually an issue in the end but keeping it out of the C-API definitely simplifies it a bit adoption-wise. I don't think that though that using protobuf would make a difference here. In general, I really like the C-interface work as sadly C-APIs are still the

Re: [DISCUSS] C-level in-process array protocol

2019-10-08 Thread Jacques Nadeau
I removing all my objections to this work. I wish there was more feedback from additional community members. I continue to be concerned about fragmentation. I don't agree with the arguments here that we need to add a new api to make it easy for people to *not* use Arrow codebase. It seems like a

[jira] [Created] (ARROW-6820) [C++] [Doc] Map specification and implementation inconsistent

2019-10-08 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-6820: - Summary: [C++] [Doc] Map specification and implementation inconsistent Key: ARROW-6820 URL: https://issues.apache.org/jira/browse/ARROW-6820 Project: Apache Arrow

[jira] [Created] (ARROW-6819) arrow::read_parquet ignores as_data_frame when sparklyr package is attached

2019-10-08 Thread Ryan Patrick Kyle (Jira)
Ryan Patrick Kyle created ARROW-6819: Summary: arrow::read_parquet ignores as_data_frame when sparklyr package is attached Key: ARROW-6819 URL: https://issues.apache.org/jira/browse/ARROW-6819

[jira] [Created] (ARROW-6818) [Doc] Format docs confusing

2019-10-08 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-6818: - Summary: [Doc] Format docs confusing Key: ARROW-6818 URL: https://issues.apache.org/jira/browse/ARROW-6818 Project: Apache Arrow Issue Type: Bug

[jira] [Created] (ARROW-6817) dynamic_cast fails on Mac C++

2019-10-08 Thread Henri Gough (Jira)
Henri Gough created ARROW-6817: -- Summary: dynamic_cast fails on Mac C++ Key: ARROW-6817 URL: https://issues.apache.org/jira/browse/ARROW-6817 Project: Apache Arrow Issue Type: Bug

[jira] [Created] (ARROW-6816) [Archery] Cleanup integration module to use companion classes

2019-10-08 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-6816: - Summary: [Archery] Cleanup integration module to use companion classes Key: ARROW-6816 URL: https://issues.apache.org/jira/browse/ARROW-6816

Re: [DRAFT] Apache Arrow Board Report - October 2019

2019-10-08 Thread Wes McKinney
New draft ## Description: The mission of Apache Arrow is the creation and maintenance of software related to columnar in-memory processing and data interchange ## Issues: * We are struggling with Continuous Integration scalability as the project has definitely outgrown what Travis CI and

Re: [DRAFT] Apache Arrow Board Report - October 2019

2019-10-08 Thread Wes McKinney
Yes, I agree with raising the issue to the board. On Tue, Oct 8, 2019 at 8:31 AM Antoine Pitrou wrote: > > > I agree. Especially given that the constraints imposed by Infra don't > help solving the problem. > > Regards > > Antoine. > > > Le 08/10/2019 à 15:02, Uwe L. Korn a écrit : > > I'm not

Re: [DRAFT] Apache Arrow Board Report - October 2019

2019-10-08 Thread Antoine Pitrou
I agree. Especially given that the constraints imposed by Infra don't help solving the problem. Regards Antoine. Le 08/10/2019 à 15:02, Uwe L. Korn a écrit : > I'm not sure what qualifies for "board attention" but it seems that CI is a > critical problem in Apache projects, not just Arrow.

Re: [DRAFT] Apache Arrow Board Report - October 2019

2019-10-08 Thread Uwe L. Korn
I'm not sure what qualifies for "board attention" but it seems that CI is a critical problem in Apache projects, not just Arrow. Should we raise that? Uwe On Tue, Oct 8, 2019, at 12:00 AM, Wes McKinney wrote: > Here is a start for our Q3 board report > > ## Description: > The mission of Apache

[jira] [Created] (ARROW-6815) Timestamps saved via Pandas and PyArrow unreadable in Hive and Presto

2019-10-08 Thread Mark Litwintschik (Jira)
Mark Litwintschik created ARROW-6815: Summary: Timestamps saved via Pandas and PyArrow unreadable in Hive and Presto Key: ARROW-6815 URL: https://issues.apache.org/jira/browse/ARROW-6815 Project: