[jira] [Created] (ARROW-5100) [JS] Writer swaps byte order if buffers share the same underlying ArrayBuffer

2019-04-02 Thread Paul Taylor (JIRA)
Paul Taylor created ARROW-5100: -- Summary: [JS] Writer swaps byte order if buffers share the same underlying ArrayBuffer Key: ARROW-5100 URL: https://issues.apache.org/jira/browse/ARROW-5100 Project: Apac

[jira] [Created] (ARROW-5099) Compiling Plasma TensorFlow op has Python 2 bug.

2019-04-02 Thread Robert Nishihara (JIRA)
Robert Nishihara created ARROW-5099: --- Summary: Compiling Plasma TensorFlow op has Python 2 bug. Key: ARROW-5099 URL: https://issues.apache.org/jira/browse/ARROW-5099 Project: Apache Arrow I

Re: How to understand this comment

2019-04-02 Thread Wes McKinney
On Tue, Apr 2, 2019 at 10:08 PM ming zhang wrote: > > in a case where there are multiple ways to retrieve this logical data set, > how to represent this in the response? > > for example, assume there is a data set that has > part 1 in endpoint 1 and part 2 in endpoint 2 with tcp as transport > bot

[jira] [Created] (ARROW-5098) [Website] Update APT install document for 0.13.0

2019-04-02 Thread Kouhei Sutou (JIRA)
Kouhei Sutou created ARROW-5098: --- Summary: [Website] Update APT install document for 0.13.0 Key: ARROW-5098 URL: https://issues.apache.org/jira/browse/ARROW-5098 Project: Apache Arrow Issue Typ

Re: [DISCUSS][Format] Time Interval Changes

2019-04-02 Thread Micah Kornfield
Based on the discussion so far, my attempt at concrete Schema proposals below.Jacques I think summarizes what we've discussed, apologies if I've misunderstood. Wes would Option 1 work to support the Pandas Time Delta use-case? I'm leaning towards Option 1 if it satisfies everyone (but happy t

[jira] [Created] (ARROW-5097) [Packaging][CentOS6] arrow-lib has unresolvable dependencies

2019-04-02 Thread Kouhei Sutou (JIRA)
Kouhei Sutou created ARROW-5097: --- Summary: [Packaging][CentOS6] arrow-lib has unresolvable dependencies Key: ARROW-5097 URL: https://issues.apache.org/jira/browse/ARROW-5097 Project: Apache Arrow

Re: How to understand this comment

2019-04-02 Thread ming zhang
in a case where there are multiple ways to retrieve this logical data set, how to represent this in the response? for example, assume there is a data set that has part 1 in endpoint 1 and part 2 in endpoint 2 with tcp as transport both part 1 and part 2 in endpoint 3, with infiniband as transport

[jira] [Created] (ARROW-5096) [Packaging][deb] plasma-store-server packages are missing

2019-04-02 Thread Kouhei Sutou (JIRA)
Kouhei Sutou created ARROW-5096: --- Summary: [Packaging][deb] plasma-store-server packages are missing Key: ARROW-5096 URL: https://issues.apache.org/jira/browse/ARROW-5096 Project: Apache Arrow

Re: How to understand this comment

2019-04-02 Thread Wes McKinney
Hi, A FlightGetInfo plan corresponds to a single logical dataset. The dataset may be spread across multiple endpoints, so if you want the whole dataset you have to execute DoGet against them all. I'm not sure what you mean by "provide more than one execution plan". - Wes On Tue, Apr 2, 2019 at

How to understand this comment

2019-04-02 Thread ming zhang
Hi All Wonder how to understand this comment? Looks like this assume we only have one "itinerary" and finish it need to consume all the flight endpoints. /* * A list of endpoints associated with the flight. To consume the whole * flight, all endpoints must be consumed. */ What if we

Re: 0.13.0 APT repository is broken

2019-04-02 Thread Kouhei Sutou
Hi Wes, Thanks for your response. I'll regenerate the API metadata and make 0.13.0 APT repository unofficial. Thanks, -- kou In "Re: 0.13.0 APT repository is broken" on Tue, 2 Apr 2019 19:29:48 -0500, Wes McKinney wrote: > hi Kou, > > Either solution you propose seems OK to me. Regenera

Re: [Discuss][Format] Arrow Flight URI scheme proposal

2019-04-02 Thread Wes McKinney
I started a vote for the other Flight discussion thread, which will close on Friday. Since I'm about to leave on vacation can Antoine or Jacques run the vote for this one? Thanks On Tue, Apr 2, 2019 at 7:07 AM David Li wrote: > > Agreed with Antoine on grpc+tcp as the default. A gRPC server > ge

Re: 0.13.0 APT repository is broken

2019-04-02 Thread Wes McKinney
hi Kou, Either solution you propose seems OK to me. Regenerating the APT metadata and marking the packages as non-official would be OK if it saves you and others the effort of doing a full release. We made enough process changes from 0.12.0 to 0.13.0 that it seemed inevitable for some things to br

C++ and Python size problems with Arrow 0.13.0

2019-04-02 Thread Wes McKinney
hi folks, I that the arrow-cpp conda packages for Windows have ballooned in size to nearly 140 megabytes for RC4 https://bintray.com/apache/arrow/python-rc/0.13.0-rc4#files/python-rc/0.13.0-rc4 Looking at one of these packages it seems the Windows static libraries are huge -- I'm not sure why th

Re: [VOTE] Proposed changes to Arrow Flight protocol

2019-04-02 Thread Wes McKinney
+1 (binding) On Tue, Apr 2, 2019 at 7:05 PM Wes McKinney wrote: > > Hi, > > David Li has proposed to make the following additions or changes > to the Flight gRPC service definition [1] and general design, as explained in > greater detail in the linked Google Docs document [2]. Arrow > Flight is a

[VOTE] Proposed changes to Arrow Flight protocol

2019-04-02 Thread Wes McKinney
Hi, David Li has proposed to make the following additions or changes to the Flight gRPC service definition [1] and general design, as explained in greater detail in the linked Google Docs document [2]. Arrow Flight is an in-development messaging framework for creating services that can, among othe

[jira] [Created] (ARROW-5095) [Flight][C++] Flight DoGet doesn't expose server error message

2019-04-02 Thread David Li (JIRA)
David Li created ARROW-5095: --- Summary: [Flight][C++] Flight DoGet doesn't expose server error message Key: ARROW-5095 URL: https://issues.apache.org/jira/browse/ARROW-5095 Project: Apache Arrow Is

0.13.0 APT repository is broken

2019-04-02 Thread Kouhei Sutou
Hi, Our APT repositories for Debian and Ubuntu are broken: https://issues.apache.org/jira/browse/ARROW-5087 It seems that APT metadata (package list) is only broken. So we can fix this by re-generate APT metadata. We can still use voted binary artifacts. Can we re-generate APT metadata and upl

[jira] [Created] (ARROW-5094) [Packaging] Add APT/Yum verification scripts

2019-04-02 Thread Kouhei Sutou (JIRA)
Kouhei Sutou created ARROW-5094: --- Summary: [Packaging] Add APT/Yum verification scripts Key: ARROW-5094 URL: https://issues.apache.org/jira/browse/ARROW-5094 Project: Apache Arrow Issue Type: I

Re: Arrow Flight protocol/API questions

2019-04-02 Thread Jacques Nadeau
Yes, vote sounds good. List lgtm On Tue, Apr 2, 2019 at 1:13 PM David Li wrote: > The proposed changes (also in the document [1]): > > Proposal 1: In FlightData, add a bytes field for application-defined > metadata. > In DoPut, change the return type to be streaming, and add a bytes > field to P

[jira] [Created] (ARROW-5093) [Packaging] Add support for selective binary upload

2019-04-02 Thread Kouhei Sutou (JIRA)
Kouhei Sutou created ARROW-5093: --- Summary: [Packaging] Add support for selective binary upload Key: ARROW-5093 URL: https://issues.apache.org/jira/browse/ARROW-5093 Project: Apache Arrow Issue

[jira] [Created] (ARROW-5092) [C#] Source Link doesn't work with the C# release script

2019-04-02 Thread Eric Erhardt (JIRA)
Eric Erhardt created ARROW-5092: --- Summary: [C#] Source Link doesn't work with the C# release script Key: ARROW-5092 URL: https://issues.apache.org/jira/browse/ARROW-5092 Project: Apache Arrow I

Re: Arrow Flight protocol/API questions

2019-04-02 Thread David Li
The proposed changes (also in the document [1]): Proposal 1: In FlightData, add a bytes field for application-defined metadata. In DoPut, change the return type to be streaming, and add a bytes field to PutResult for application-defined metadata. Proposal 2: In client/server APIs, add a call opti

Re: Arrow Flight protocol/API questions

2019-04-02 Thread Wes McKinney
I think we can have a vote. Can you write a summary bulleted list of the changes/additions in brief? Jacques, what do you think? On Tue, Apr 2, 2019 at 1:31 PM David Li wrote: > > Just wanted to circle back to this - I've gotten a lot of feedback on > the linked document, and I appreciate all th

Re: [DISCUSS] Format changes: process and requirements

2019-04-02 Thread Wes McKinney
I created https://cwiki.apache.org/confluence/display/ARROW/Columnar+Format+1.0+Milestone some time ago to try to track the status of different implementations and various in-flight discussions about columnar format evolution. Can some others take a look at that and perhaps update some sections?

Re: Arrow Flight protocol/API questions

2019-04-02 Thread David Li
Just wanted to circle back to this - I've gotten a lot of feedback on the linked document, and I appreciate all the suggestions. Discussion seems to have quieted down; is this ready for a vote (perhaps as individual format changes)? Thanks, David On 3/22/19, David Li wrote: > Sorry about that! I

Re: [RESULT][VOTE] Release Apache Arrow 0.13.0 - RC4

2019-04-02 Thread Wes McKinney
I'll update the main docs site today On Mon, Apr 1, 2019 at 1:11 PM Wes McKinney wrote: > > I have written a blog post, additions from other maintainers would be welcome > > https://github.com/apache/arrow/pull/4091 > > I'll plan to publish tomorrow morning. > > Would someone like to help with up

Re: Dask and Arrow Parquet Rewrite

2019-04-02 Thread Wes McKinney
hi Matt, Thanks for this summary. It's important for the Dask user base to be well supported dealing with Parquet files since this reflects a sizable fraction of Arrow users at the moment. I also appreciate your effort to establish cleaner boundaries between the projects to make ongoing maintenanc

Re: FlightGetInfo or FlightInfo

2019-04-02 Thread Wes McKinney
I opened https://issues.apache.org/jira/browse/ARROW-5091 On Tue, Apr 2, 2019 at 9:53 AM ming zhang wrote: > > it is not a big deal for sure. it is just help new comer to build a mental > model quicker. code is for reader anyway. > > i will submit a patch later > > thanks > ming > > > On Tue, Apr

[jira] [Created] (ARROW-5091) [Flight] Rename FlightGetInfo message to FlightInfo

2019-04-02 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-5091: --- Summary: [Flight] Rename FlightGetInfo message to FlightInfo Key: ARROW-5091 URL: https://issues.apache.org/jira/browse/ARROW-5091 Project: Apache Arrow Issue

Re: [DISCUSS][Format] Time Interval Changes

2019-04-02 Thread Wes McKinney
Since there were some mentions of leap seconds: I think the intent of the timedelta/duration type should be to express the difference between UNIX timestamps (from second to nanosecond resolution), which don't include leap seconds. We use the timedelta64[ns] type in pandas for example, which is a

[ANNOUNCE] Apache Arrow 0.13.0 released

2019-04-02 Thread Wes McKinney
The Apache Arrow community is pleased to announce the 0.13.0 release. It includes 550 resolved issues ([1]) since the 0.12.0 release. The release is available now from our website, [2] and [3]: http://arrow.apache.org/install/ Read about what's new in the release http://arrow.apache.org/blog/

Re: [DISCUSS][Format] Time Interval Changes

2019-04-02 Thread Jacques Nadeau
> > I could go either way, it has some benefits for forward compatibility I > suppose, but on the other hand YAGNI, if you feel strongly, I'm ok > including it. However, the more optional fields we have for a specific > enum value, makes me lean more towards a new type instead of just an enum. > I

Re: FlightGetInfo or FlightInfo

2019-04-02 Thread ming zhang
it is not a big deal for sure. it is just help new comer to build a mental model quicker. code is for reader anyway. i will submit a patch later thanks ming On Tue, Apr 2, 2019 at 10:14 AM Wes McKinney wrote: > I don't have a problem with the name personally but writing a patch > would be the

Re: FlightGetInfo or FlightInfo

2019-04-02 Thread Jacques Nadeau
FlightInfo sgtm On Tue, Apr 2, 2019 at 7:14 AM Wes McKinney wrote: > I don't have a problem with the name personally but writing a patch > would be the next step. > > On Tue, Apr 2, 2019 at 8:59 AM ming zhang > wrote: > > > > looks like we are ok with FlighInfo. what is the next step? should

[jira] [Created] (ARROW-5090) Linking failure on MacOS due to @rpath in dylib

2019-04-02 Thread Jeroen (JIRA)
Jeroen created ARROW-5090: - Summary: Linking failure on MacOS due to @rpath in dylib Key: ARROW-5090 URL: https://issues.apache.org/jira/browse/ARROW-5090 Project: Apache Arrow Issue Type: Bug

Re: FlightGetInfo or FlightInfo

2019-04-02 Thread Wes McKinney
I don't have a problem with the name personally but writing a patch would be the next step. On Tue, Apr 2, 2019 at 8:59 AM ming zhang wrote: > > looks like we are ok with FlighInfo. what is the next step? should I write > a proposal or submit a patch? i just start working around Arrow Flight and

Re: FlightGetInfo or FlightInfo

2019-04-02 Thread ming zhang
looks like we are ok with FlighInfo. what is the next step? should I write a proposal or submit a patch? i just start working around Arrow Flight and need to learn the procedure here. thanks On Tue, Apr 2, 2019 at 9:48 AM Antoine Pitrou wrote: > > Oh, you're right. The corresponding method i

Re: FlightGetInfo or FlightInfo

2019-04-02 Thread Antoine Pitrou
Oh, you're right. The corresponding method is already named GetFlightInfo. Regards Antoine. Le 02/04/2019 à 15:37, Wes McKinney a écrit : > FlightGetInfo is a message so if we are going to change the name, we > should make it more noun-like, such as FlightInfo. > > On Tue, Apr 2, 2019 at 7:

Re: FlightGetInfo or FlightInfo

2019-04-02 Thread Wes McKinney
FlightGetInfo is a message so if we are going to change the name, we should make it more noun-like, such as FlightInfo. On Tue, Apr 2, 2019 at 7:08 AM Antoine Pitrou wrote: > > > If we change it, I vote for GetFlightInfo. > > Regards > > Antoine. > > > Le 02/04/2019 à 14:07, ming zhang a écrit :

Re: FlightGetInfo or FlightInfo

2019-04-02 Thread Antoine Pitrou
If we change it, I vote for GetFlightInfo. Regards Antoine. Le 02/04/2019 à 14:07, ming zhang a écrit : > Hi > > The name of FlightGetInfo is kind of strange. It is a noun+verb+noun, which > is not consistent with others like FlightData, FlightDescriptor, etc. The > 1st impression is that th

Re: [Discuss][Format] Arrow Flight URI scheme proposal

2019-04-02 Thread David Li
Agreed with Antoine on grpc+tcp as the default. A gRPC server generally won't offer both encrypted and unencrypted connections, so this won't establish an insecure session where a secure one is available. We could implement a TLS upgrade mechanism later as well. I've updated the document to match.

FlightGetInfo or FlightInfo

2019-04-02 Thread ming zhang
Hi The name of FlightGetInfo is kind of strange. It is a noun+verb+noun, which is not consistent with others like FlightData, FlightDescriptor, etc. The 1st impression is that this is a method, not a message, since other methods are verb+noun. Should FlightGetInfo to be FlightInfo? Thanks Ming

Re: [Discuss][Format] Arrow Flight URI scheme proposal

2019-04-02 Thread Antoine Pitrou
Le 02/04/2019 à 01:28, Jacques Nadeau a écrit : > My thinking is ideally the protocol would be more opaque than engineer-y in > that an upgrade would happen as part of the negotiation process. For > example, when a connection is made, client says "hey, I also support these > things" and then serv

[jira] [Created] (ARROW-5089) [C++/Python] Writing dictionary encoded columns to parquet is extremely slow when using chunk size

2019-04-02 Thread Florian Jetter (JIRA)
Florian Jetter created ARROW-5089: - Summary: [C++/Python] Writing dictionary encoded columns to parquet is extremely slow when using chunk size Key: ARROW-5089 URL: https://issues.apache.org/jira/browse/ARROW-5089