[jira] [Created] (ARROW-6911) [Java] Provide composite comparator

2019-10-16 Thread Liya Fan (Jira)
Liya Fan created ARROW-6911: --- Summary: [Java] Provide composite comparator Key: ARROW-6911 URL: https://issues.apache.org/jira/browse/ARROW-6911 Project: Apache Arrow Issue Type: New Feature

Re: [Discuss] Streaming: Differentiate between length of RecordBatch and utilized portion-- common use-case?

2019-10-16 Thread Micah Kornfield
Hi John and Wes, A few thoughts: One of the issues which we didn't get into in prior discussions, is the proposal is essentially changing the unit of exchange from RecordBatches to a segment of a RecordBatch. I think I brought this up earlier in discussions, an interesting idea that Trill [1], a

Re: [Discuss][FlightRPC] Extensions to Flight: "DoBidirectional"

2019-10-16 Thread David Li
I was definitely considering having control messages without data, and I thought that could be encoded by a FlightData with only app_metadata set. I think I understand your position now: FlightData should always carry (some) data (with optional metadata)? That makes sense to me, and is consistent

[jira] [Created] (ARROW-6910) pyarrow.parquet.read_table(...) takes up lots of memory which is not released until program exits

2019-10-16 Thread V Luong (Jira)
V Luong created ARROW-6910: -- Summary: pyarrow.parquet.read_table(...) takes up lots of memory which is not released until program exits Key: ARROW-6910 URL: https://issues.apache.org/jira/browse/ARROW-6910

Re: [Discuss] Streaming: Differentiate between length of RecordBatch and utilized portion-- common use-case?

2019-10-16 Thread John Muehlhausen
"that's where the danger lies" What danger? I have no idea what the specific danger is, assuming that all reference implementations have test cases that hedge around this. I contend that it can only be useful and will never be harmful. What are the counter-examples of concrete harm?

[jira] [Created] (ARROW-6909) [Python] Define PyObjectBuffer with Py_XDECREF logic in destructor for object array memory

2019-10-16 Thread Wes McKinney (Jira)
Wes McKinney created ARROW-6909: --- Summary: [Python] Define PyObjectBuffer with Py_XDECREF logic in destructor for object array memory Key: ARROW-6909 URL: https://issues.apache.org/jira/browse/ARROW-6909

Re: [Discuss] Streaming: Differentiate between length of RecordBatch and utilized portion-- common use-case?

2019-10-16 Thread Wes McKinney
On Wed, Oct 16, 2019 at 12:32 PM John Muehlhausen wrote: > > I really need to "get into the zone" on some other development today, but I > want to remind us of something earlier in the thread that gave me the > impression I wasn't stomping on too many paradigms with this proposal: > > Wes: ``So

Re: [Discuss] Streaming: Differentiate between length of RecordBatch and utilized portion-- common use-case?

2019-10-16 Thread Wes McKinney
hi John, On Wed, Oct 16, 2019 at 11:59 AM John Muehlhausen wrote: > > I'm in Python, I'm a user, and I'm not allowed to import pyarrow because it > isn't for me. I think you're misrepresenting what I'm saying. It's our expectations that users will largely consume pyarrow indirectly as a

[jira] [Created] (ARROW-6908) Add support for Bazel

2019-10-16 Thread Aryan Naraghi (Jira)
Aryan Naraghi created ARROW-6908: Summary: Add support for Bazel Key: ARROW-6908 URL: https://issues.apache.org/jira/browse/ARROW-6908 Project: Apache Arrow Issue Type: New Feature

[jira] [Created] (ARROW-6907) Allow Plasma store to batch notifications to clients

2019-10-16 Thread Danyang (Jira)
Danyang created ARROW-6907: -- Summary: Allow Plasma store to batch notifications to clients Key: ARROW-6907 URL: https://issues.apache.org/jira/browse/ARROW-6907 Project: Apache Arrow Issue Type:

[jira] [Created] (ARROW-6906) Use re2 instead of std::regex in Dataset partitionschemes implementation

2019-10-16 Thread Prudhvi Porandla (Jira)
Prudhvi Porandla created ARROW-6906: --- Summary: Use re2 instead of std::regex in Dataset partitionschemes implementation Key: ARROW-6906 URL: https://issues.apache.org/jira/browse/ARROW-6906

Re: [Discuss] Streaming: Differentiate between length of RecordBatch and utilized portion-- common use-case?

2019-10-16 Thread John Muehlhausen
I'm in Python, I'm a user, and I'm not allowed to import pyarrow because it isn't for me. There exists some Arrow record batches in plasma. I need to get one slice of one batch as a pandas dataframe. What do I do? There exists some Arrow record batches in a file. I need to get one slice of

Re: [NIGHTLY] Arrow Build Report for Job nightly-2019-10-16-0

2019-10-16 Thread Krisztián Szűcs
The OSX builds are failing because home-brew tries to compile the dependencies instead of installing the precompiled binaries. It might be because the outdated Xcode version we use, perhaps brew has stopped providing binaries for older Xcode. I've created a tracking jira

[jira] [Created] (ARROW-6905) [Packaging][OSX] Nightly builds on MacOS are failing because of brew compile timeouts

2019-10-16 Thread Krisztian Szucs (Jira)
Krisztian Szucs created ARROW-6905: -- Summary: [Packaging][OSX] Nightly builds on MacOS are failing because of brew compile timeouts Key: ARROW-6905 URL: https://issues.apache.org/jira/browse/ARROW-6905

Re: Arrow sync call October 16 at 12:00 US/Eastern, 16:00 UTC

2019-10-16 Thread Neal Richardson
Attendees: Micah Kornfield Uwe Korn Bryan Cutler Rok Mihevc Prudhvi Porandla Ursa Labs (Antoine, Ben, François, Joris, Krisztián, Neal, Wes, in the same room!) Discussion: * Cython in conda: Uwe to update * When to do 0.15.1? There are only 2 open issues left tagged with 0.15.1. Only bug fixes.

[jira] [Created] (ARROW-6904) [Python] Implement MapArray and MapType

2019-10-16 Thread Bryan Cutler (Jira)
Bryan Cutler created ARROW-6904: --- Summary: [Python] Implement MapArray and MapType Key: ARROW-6904 URL: https://issues.apache.org/jira/browse/ARROW-6904 Project: Apache Arrow Issue Type:

Re: [Discuss] Streaming: Differentiate between length of RecordBatch and utilized portion-- common use-case?

2019-10-16 Thread Wes McKinney
On Wed, Oct 16, 2019 at 10:17 AM John Muehlhausen wrote: > > "pyarrow is intended as a developer-facing library, not a user-facing one" > > Is that really the core issue? I doubt you would want to add this proposed > logic to pandas even though it is user-facing, because then pandas will >

Re: [Discuss][FlightRPC] Extensions to Flight: "DoBidirectional"

2019-10-16 Thread Jacques Nadeau
Hey David, RE: Async: I was trying to match the pattern we use for doget/doput for async. Yes, more thinking java given java grpc's async always pattern. On the comment around the FlightData, I think it is overloading the message to use metadata for this. If I want to send a control message

[jira] [Created] (ARROW-6903) [Python] Wheels broken after ARROW-6860 changes

2019-10-16 Thread Wes McKinney (Jira)
Wes McKinney created ARROW-6903: --- Summary: [Python] Wheels broken after ARROW-6860 changes Key: ARROW-6903 URL: https://issues.apache.org/jira/browse/ARROW-6903 Project: Apache Arrow Issue

Re: [Discuss] Streaming: Differentiate between length of RecordBatch and utilized portion-- common use-case?

2019-10-16 Thread Wes McKinney
hi John, > As a practical matter, the reason metadata is not a good solution for me is > that it requires awareness on the part of the reader. I want (e.g.) a > researcher in Python to be able to map a file of batches in IPC format > without needing to worry about the fact that the file was

[jira] [Created] (ARROW-6902) [C++] Add String*/Binary* support for Compare kernels

2019-10-16 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-6902: - Summary: [C++] Add String*/Binary* support for Compare kernels Key: ARROW-6902 URL: https://issues.apache.org/jira/browse/ARROW-6902 Project: Apache

Re: [C++] The quest for zero-dependency builds

2019-10-16 Thread Antoine Pitrou
Perhaps meson is also worth exploring? Le 15/10/2019 à 23:06, Micah Kornfield a écrit : Hi Wes, I agree on both accounts that it won't be a done in the short term, and it makes sense to tackle in incrementally. Like I said I don't have much bandwidth at the moment but might be able to

Arrow sync call October 16 at 12:00 US/Eastern, 16:00 UTC

2019-10-16 Thread Neal Richardson
Hi all, our biweekly call is coming up in a couple of hours at https://meet.google.com/vtm-teks-phx. All are welcome to join. Notes will be sent out to the mailing list afterwards. Neal

[NIGHTLY] Arrow Build Report for Job nightly-2019-10-16-0

2019-10-16 Thread Crossbow
Arrow Build Report for Job nightly-2019-10-16-0 All tasks: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-10-16-0 Failed Tasks: - wheel-manylinux1-cp27mu: URL:

[jira] [Created] (ARROW-6901) [Rust][Parquet] Rust Parquet SerializedFileWriter writes total_num_rows as zero

2019-10-16 Thread Matthew Franglen (Jira)
Matthew Franglen created ARROW-6901: --- Summary: [Rust][Parquet] Rust Parquet SerializedFileWriter writes total_num_rows as zero Key: ARROW-6901 URL: https://issues.apache.org/jira/browse/ARROW-6901

[jira] [Created] (ARROW-6900) PyArrow cant serialize pandas IntegerArray

2019-10-16 Thread Sayed Mohammad Hossein Torabi (Jira)
Sayed Mohammad Hossein Torabi created ARROW-6900: Summary: PyArrow cant serialize pandas IntegerArray Key: ARROW-6900 URL: https://issues.apache.org/jira/browse/ARROW-6900 Project:

[jira] [Created] (ARROW-6899) to_pandas() not implemented on list

2019-10-16 Thread Razvan Chitu (Jira)
Razvan Chitu created ARROW-6899: --- Summary: to_pandas() not implemented on list Key: ARROW-6899 URL: https://issues.apache.org/jira/browse/ARROW-6899 Project: Apache Arrow Issue Type: Bug

Re: [DISCUSS] Dictionary Encoding Clarifications/Future Proofing

2019-10-16 Thread Micah Kornfield
I'll plan on starting a vote in the next day or two if there are no further objections/comments. On Sun, Oct 13, 2019 at 11:06 AM Micah Kornfield wrote: > I think the only point asked on the PR that I think is worth discussing is > assumptions about dictionaries at the beginning of streams. > >

Re: [Discuss] Streaming: Differentiate between length of RecordBatch and utilized portion-- common use-case?

2019-10-16 Thread Micah Kornfield
Still thinking through the implications here, but to save others from having to go search [1] is the PR. [1] https://github.com/apache/arrow/pull/5663/files On Tue, Oct 15, 2019 at 1:42 PM John Muehlhausen wrote: > A proposal with linked PR now exists in ARROW-5916 and Wes commented that > we

[jira] [Created] (ARROW-6898) [Java] Fix potential memory leak in ArrowWriter and several test classes

2019-10-16 Thread Ji Liu (Jira)
Ji Liu created ARROW-6898: - Summary: [Java] Fix potential memory leak in ArrowWriter and several test classes Key: ARROW-6898 URL: https://issues.apache.org/jira/browse/ARROW-6898 Project: Apache Arrow