Re: [RESULT][VOTE] Release Apache Arrow 0.14.0 - RC0

2019-07-05 Thread Micah Kornfield
> > * All pull requests need to rebase on master by > "Rebasing the master branch on local release branch" Since it doesn't look like its been claimed i'll do it. On Thu, Jul 4, 2019 at 12:46 AM Sutou Kouhei wrote: > Hi, > > I need your help! > Could Rust developers see "Failed:" section? >

Re: [RESULT][VOTE] Release Apache Arrow 0.14.0 - RC0

2019-07-05 Thread Micah Kornfield
Actually, can someone clarify is the correct approach here to clone the @Kou's repo and use his RC0 branch to do the rebase? e.g. run: "./dev/release/post-00-rebase.sh apache-arrow-0.14.0-rc0"? Thanks, Micah On Fri, Jul 5, 2019 at 12:38 AM Micah Kornfield wrote: > * All pull requests need t

Re: linking 3rd party cython modules against pyarrow fails since 0.14.0

2019-07-05 Thread Weston Steimel
Hello, I wonder if perhaps that may be due to the work done for reducing the wheel size in https://issues.apache.org/jira/browse/ARROW-5082? On Thu, Jul 4, 2019 at 10:06 PM Stestagg wrote: > 1) pip install pyarrow==0.14.0 > 2) All the pyarrow files including, for example libarrow.so.14, but not

Re: [RESULT][VOTE] Release Apache Arrow 0.14.0 - RC0

2019-07-05 Thread Krisztián Szűcs
Hey Micah, Kou has already rebased the master branch of apache/arrow. So if you want to rebase PRs, then you should rebase on top of apache/arrow@master. On Fri, Jul 5, 2019 at 10:01 AM Micah Kornfield wrote: > Actually, can someone clarify is the correct approach here to clone the > @Kou's rep

Re: [RESULT][VOTE] Release Apache Arrow 0.14.0 - RC0

2019-07-05 Thread Sutou Kouhei
Hi Micah, Thanks for helping this. Sorry for my bad description of the task. > e.g. run: > > "./dev/release/post-00-rebase.sh apache-arrow-0.14.0-rc0"? I've already done this: >>> Done: >>> >>> * Rebasing the master branch on local release branch >>> >>> https://cwiki.apache.org/conflue

Re: [RESULT][VOTE] Release Apache Arrow 0.14.0 - RC0

2019-07-05 Thread Micah Kornfield
Thanks. Is there a script to do this or is it typically just done by hand? On Fri, Jul 5, 2019 at 1:12 AM Sutou Kouhei wrote: > Hi Micah, > > Thanks for helping this. > > Sorry for my bad description of the task. > > > e.g. run: > > > > "./dev/release/post-00-rebase.sh apache-arrow-0.14.0-rc0"?

Re: [RESULT][VOTE] Release Apache Arrow 0.14.0 - RC0

2019-07-05 Thread Sutou Kouhei
We did this by hand in the past releases. It may be better that we have a script to do this. In "Re: [RESULT][VOTE] Release Apache Arrow 0.14.0 - RC0" on Fri, 5 Jul 2019 01:16:42 -0700, Micah Kornfield wrote: > Thanks. Is there a script to do this or is it typically just done by hand? >

Re: [RESULT][VOTE] Release Apache Arrow 0.14.0 - RC0

2019-07-05 Thread Krisztián Szűcs
I prefer to use hub [1] to checkout a PR: hub pr checkout git rebase upstream/master git push -f [1]: https://github.com/github/hub On Fri, Jul 5, 2019 at 10:22 AM Sutou Kouhei wrote: > We did this by hand in the past releases. > > It may be better that we have a script to do this. > > In

Re: linking 3rd party cython modules against pyarrow fails since 0.14.0

2019-07-05 Thread Antoine Pitrou
That's quite likely indeed. A bit worrying is that this should have been caught by our unit tests. Regards Antoine. Le 05/07/2019 à 10:02, Weston Steimel a écrit : > Hello, > > I wonder if perhaps that may be due to the work done for reducing the wheel > size in https://issues.apache.org/j

Re: [RESULT][VOTE] Release Apache Arrow 0.14.0 - RC0

2019-07-05 Thread Micah Kornfield
OK, I wrote a quick script (I'll clean it up and send it out a PR tomorrow) and rebased everything that could be done so cleanly. What do we generally do about PRs that don't rebase cleanly? Thanks, Micah On Fri, Jul 5, 2019 at 1:29 AM Krisztián Szűcs wrote: > I prefer to use hub [1] to checko

[jira] [Created] (ARROW-5861) [Java] Initial implement to convert Avro record with primitive types

2019-07-05 Thread Ji Liu (JIRA)
Ji Liu created ARROW-5861: - Summary: [Java] Initial implement to convert Avro record with primitive types Key: ARROW-5861 URL: https://issues.apache.org/jira/browse/ARROW-5861 Project: Apache Arrow

[jira] [Created] (ARROW-5862) [Java] Provide dictionary builder

2019-07-05 Thread Liya Fan (JIRA)
Liya Fan created ARROW-5862: --- Summary: [Java] Provide dictionary builder Key: ARROW-5862 URL: https://issues.apache.org/jira/browse/ARROW-5862 Project: Apache Arrow Issue Type: New Feature

[jira] [Created] (ARROW-5863) Segmentation Fault via pytest-runner

2019-07-05 Thread Josh Bode (JIRA)
Josh Bode created ARROW-5863: Summary: Segmentation Fault via pytest-runner Key: ARROW-5863 URL: https://issues.apache.org/jira/browse/ARROW-5863 Project: Apache Arrow Issue Type: Bug C

Re: [CI] Ursabot Java builders

2019-07-05 Thread Krisztián Szűcs
Thanks to Sebastien we now have two Go builders [1] and I've just added a Rust builder [2]. Go build takes ~15 seconds. Rust build takes ~3 minutes. [1]: https://github.com/ursa-labs/ursabot/pull/125 [2]: https://github.com/ursa-labs/ursabot/pull/126 On Thu, Jul 4, 2019 at 7:41 PM Krisztián Szűc

[jira] [Created] (ARROW-5864) [Python] simplify cython wrapping of Result

2019-07-05 Thread Joris Van den Bossche (JIRA)
Joris Van den Bossche created ARROW-5864: Summary: [Python] simplify cython wrapping of Result Key: ARROW-5864 URL: https://issues.apache.org/jira/browse/ARROW-5864 Project: Apache Arrow

Re: [Discuss] Streaming: Differentiate between length of RecordBatch and utilized portion-- common use-case?

2019-07-05 Thread John Muehlhausen
So far it seems as if pyarrow is completely ignoring the RecordBatch.length field. More info to follow... On Tue, Jul 2, 2019 at 3:02 PM John Muehlhausen wrote: > Crikey! I'll do some testing around that and suggest some test cases to > ensure it continues to work, assuming that it does. > > -J

Re: [Discuss] Streaming: Differentiate between length of RecordBatch and utilized portion-- common use-case?

2019-07-05 Thread John Muehlhausen
This seems to help... still testing it though. Status GetFieldMetadata(int field_index, ArrayData* out) { auto nodes = metadata_->nodes(); // pop off a field if (field_index >= static_cast(nodes->size())) { return Status::Invalid("Ran out of field metadata, likely malformed");

flatbuffers vectors and --gen-object-api

2019-07-05 Thread John Muehlhausen
It seems as if Arrow expects for some vectors to be empty rather than null. (Examples: Footer.dictionaries, Field.children) Anyone using --gen-object-api with flatc will get code that writes null when (e.g.) _o->children.size() is zero in CreateField(). I may be missing something but I don't see

[jira] [Created] (ARROW-5865) [Release] Helper script for rebasing open pull requests on master

2019-07-05 Thread Micah Kornfield (JIRA)
Micah Kornfield created ARROW-5865: -- Summary: [Release] Helper script for rebasing open pull requests on master Key: ARROW-5865 URL: https://issues.apache.org/jira/browse/ARROW-5865 Project: Apache A

[Discuss] Format additions to Arrow for sparse data and data integrity

2019-07-05 Thread Micah Kornfield
Hi Arrow-dev, I’d like to make a straw-man proposal to cover some features that I think would be useful to Arrow, and that I would like to make a proof-of-concept implementation for in Java and C++. In particular, the proposal covers allowing for smaller data sizes via compression and encoding [1

Re: [Discuss] Format additions to Arrow for sparse data and data integrity

2019-07-05 Thread Jacques Nadeau
Hey Micah, you're formatting seems to be messed up on this mail. Some kind of copy/paste error? On Fri, Jul 5, 2019 at 11:54 AM Micah Kornfield wrote: > Hi Arrow-dev, > > I’d like to make a straw-man proposal to cover some features that I think > would be useful to Arrow, and that I would like t

Re: [Discuss] Format additions to Arrow for sparse data and data integrity

2019-07-05 Thread Jacques Nadeau
Initial thought: I don't think most of this should be targeted for 1.0. It is a lot of change/enhancement and seems like it would likely substantially delay 1.0. The one piece that seems least disruptive would be basic on the wire compression. You suggested that this be done on the buffer level but

Re: [Discuss] Format additions to Arrow for sparse data and data integrity

2019-07-05 Thread Micah Kornfield
Strange, I've pasted the contents into a google document at [1] [1] https://docs.google.com/document/d/1uJzWh63Iqk7FRbElHPhHrsmlfe0NIJ6M8-0kejPmwIw/edit On Fri, Jul 5, 2019 at 12:32 PM Jacques Nadeau wrote: > Hey Micah, you're formatting seems to be messed up on this mail. Some kind > of copy/

Re: [Discuss] Format additions to Arrow for sparse data and data integrity

2019-07-05 Thread Micah Kornfield
Hi Jacques, Thanks for the quick response. I don't think most of this should be targeted for 1.0. It is a lot of > change/enhancement and seems like it would likely substantially delay 1.0. I agree it shouldn't block 1.0. I think time based releases are working well for the community.But if

Re: [Discuss] Format additions to Arrow for sparse data and data integrity

2019-07-05 Thread Jacques Nadeau
One question and a random thought: What is the driving force for transport compression? Are you seeing that as a major bottleneck in particular circumstances? (I'm not disagreeing, just want to clearly define the particular problem you're worried about.) Random thought: what do you think of defin

Re: [Discuss][Java] Make the semantics of lastSet consistent

2019-07-05 Thread Jacques Nadeau
Ravindra, Praveen and Prudhvi, can you confirm the ramifications of this change and what impact this inconsistency has had downstream? On Thu, Jul 4, 2019 at 7:32 PM Fan Liya wrote: > There are two lastSet member variables in the code. One is in > BaseVariableWidthVector and the other is in List

[jira] [Created] (ARROW-5866) [C++] Remove duplicate library in cpp/Brewfile

2019-07-05 Thread Yosuke Shiro (JIRA)
Yosuke Shiro created ARROW-5866: --- Summary: [C++] Remove duplicate library in cpp/Brewfile Key: ARROW-5866 URL: https://issues.apache.org/jira/browse/ARROW-5866 Project: Apache Arrow Issue Type:

Re: [Discuss] Format additions to Arrow for sparse data and data integrity

2019-07-05 Thread Micah Kornfield
Hi Jacques, I think our e-mails might have crossed, so I'm consolidating my responses from the previous e-mail as well. I don't think most of this should be targeted for 1.0. It is a lot of > change/enhancement and seems like it would likely substantially delay 1.0. I agree it shouldn't block 1.