Re: [DISCUSS][Java] Should null values in VariableWidthVector/ListVector always takes 0 space?

2019-08-28 Thread Fan Liya
Hi Jacques and Ravindra, Thanks for your valuable feedback. Please let me talk more about contiguous memory: For some operations (like memory segment comparison, hash code computation, etc.), if we we chose option 1 or 2, we can get the result with a single call, without any reference to the

Re: [Discuss][Java] Refactor code for time related vectors

2019-08-28 Thread Fan Liya
Hi Jacques and Micah, Thanks a lot for your valuable feedback. The purpose of this change; 1. reduce the amount of code (by referencing implementations in the super class) 2. remove duplicated code by centralizing the code for get/set integers I agree that it violates the principle of accessing

Re: [DISCUSS][Java] Should null values in VariableWidthVector/ListVector always takes 0 space?

2019-08-28 Thread Ravindra Pindikura
On Wed, Aug 28, 2019 at 12:32 PM Fan Liya wrote: > Dear all, > > In the discussion of this PR (https://github.com/apache/arrow/pull/5073), > we are faced with a problem: > > Normally, in a VariableWidthVector (e.g. VarCharVector), a null value is > supposed to take no space in the data buffer.

Re: [Discuss][Java] Refactor code for time related vectors

2019-08-28 Thread Jacques Nadeau
On Wed, Aug 28, 2019, 7:42 PM Micah Kornfield wrote: > Hi Jacques, > > > > What problem are you trying to solve? It seems like you're proposing > > refactoring for refactoring's sake. > > I think Liya Fan is trying to eliminate duplicate code, which as long as it > doesn't violate design

Re: [Discuss][Java] Refactor code for time related vectors

2019-08-28 Thread Micah Kornfield
Hi Jacques, > What problem are you trying to solve? It seems like you're proposing > refactoring for refactoring's sake. I think Liya Fan is trying to eliminate duplicate code, which as long as it doesn't violate design principles seems like a good thing. >From the PR [1]: > One of the

[jira] [Created] (ARROW-6381) [C++] BufferOutputStream::Write is slow for many small writes

2019-08-28 Thread Wes McKinney (Jira)
Wes McKinney created ARROW-6381: --- Summary: [C++] BufferOutputStream::Write is slow for many small writes Key: ARROW-6381 URL: https://issues.apache.org/jira/browse/ARROW-6381 Project: Apache Arrow

Re: [Discuss][Java] Refactor code for time related vectors

2019-08-28 Thread Jacques Nadeau
What problem are you trying to solve? It seems like you're proposing refactoring for refactoring's sake. On Mon, Aug 26, 2019, 7:13 PM Fan Liya wrote: > Hi Jacques, > > Thanks for the valuable feedback. > > I agree that it is a good idea to explicitly make field vectors final. > > However, I

Re: [DISCUSS][Java] Should null values in VariableWidthVector/ListVector always takes 0 space?

2019-08-28 Thread Jacques Nadeau
#3 is the correct behavior and how the code was meant to be written. I don't see any problems with that pattern. This allows someone to (if they so decide) to null a value without having to rewrite the data. #3 is also a consistent behavior with all other vectors. Null values can use up space but

[jira] [Created] (ARROW-6380) Method pyarrow.parquet.read_table has memory spikes from version 0.14

2019-08-28 Thread Renan Alves Fonseca (Jira)
Renan Alves Fonseca created ARROW-6380: -- Summary: Method pyarrow.parquet.read_table has memory spikes from version 0.14 Key: ARROW-6380 URL: https://issues.apache.org/jira/browse/ARROW-6380

[jira] [Created] (ARROW-6378) [C++][Dataset] Implement TreeDataSource

2019-08-28 Thread Francois Saint-Jacques (Jira)
Francois Saint-Jacques created ARROW-6378: - Summary: [C++][Dataset] Implement TreeDataSource Key: ARROW-6378 URL: https://issues.apache.org/jira/browse/ARROW-6378 Project: Apache Arrow

[jira] [Created] (ARROW-6377) [C++] Extending STL API to support row-wise conversion

2019-08-28 Thread Omer Ozarslan (Jira)
Omer Ozarslan created ARROW-6377: Summary: [C++] Extending STL API to support row-wise conversion Key: ARROW-6377 URL: https://issues.apache.org/jira/browse/ARROW-6377 Project: Apache Arrow

[jira] [Created] (ARROW-6376) [Developer] PR merge script has "master" target ref hard-coded

2019-08-28 Thread Wes McKinney (Jira)
Wes McKinney created ARROW-6376: --- Summary: [Developer] PR merge script has "master" target ref hard-coded Key: ARROW-6376 URL: https://issues.apache.org/jira/browse/ARROW-6376 Project: Apache Arrow

[jira] [Created] (ARROW-6375) [C++] Extend ConversionTraits to allow efficiently appending list values in STL API

2019-08-28 Thread Omer Ozarslan (Jira)
Omer Ozarslan created ARROW-6375: Summary: [C++] Extend ConversionTraits to allow efficiently appending list values in STL API Key: ARROW-6375 URL: https://issues.apache.org/jira/browse/ARROW-6375

Re: [Format] Semantics for dictionary batches in streams

2019-08-28 Thread Wes McKinney
On Tue, Aug 27, 2019 at 6:05 PM Micah Kornfield wrote: > > I was thinking the file format must satisfy one of two conditions: > 1. Exactly one dictionarybatch per encoded column > 2. DictionaryBatches are interleaved correctly. Could you clarify? In the first case, there is no issue with

Re: [DISCUSSION] Automatically adding a the URL of the corresponding JIRA ticket as a comment in GitHub pull-request

2019-08-28 Thread Krisztián Szűcs
On Sat, Aug 24, 2019 at 6:12 PM Wes McKinney wrote: > Seems like a nice idea to me. We could also prompt contributors to > open a JIRA and prefix to the PR title if they have not already. > Sounds good to me, we could even create a ticket and ask the user to fill it. > > The ursabot process is

[jira] [Created] (ARROW-6374) [Java] Refactor the code for TimeXXVectors

2019-08-28 Thread Liya Fan (Jira)
Liya Fan created ARROW-6374: --- Summary: [Java] Refactor the code for TimeXXVectors Key: ARROW-6374 URL: https://issues.apache.org/jira/browse/ARROW-6374 Project: Apache Arrow Issue Type:

Re: [RESULT] [VOTE] Alter Arrow binary protocol to address 8-byte Flatbuffer alignment requirements (2nd vote)

2019-08-28 Thread Ji Liu
I could take the Java implementation and will take a close watch on this issue in the next few days. Thanks, Ji Liu -- From:Micah Kornfield Send Time:2019年8月28日(星期三) 17:14 To:dev Cc:Paul Taylor Subject:Re: [RESULT] [VOTE] Alter

Re: [RESULT] [VOTE] Alter Arrow binary protocol to address 8-byte Flatbuffer alignment requirements (2nd vote)

2019-08-28 Thread Micah Kornfield
I should have integration tests with 0.14.1 generated binaries in the next few days. I think the one remaining unassigned piece of work in the Java implementation, i can take that up next if no one else gets to it. On Tue, Aug 27, 2019 at 7:19 PM Wes McKinney wrote: > Here's the C++ changes >

[jira] [Created] (ARROW-6373) [C++] Make FixedWidthBinaryBuilder consistent with other primitive fixed width builders

2019-08-28 Thread Micah Kornfield (Jira)
Micah Kornfield created ARROW-6373: -- Summary: [C++] Make FixedWidthBinaryBuilder consistent with other primitive fixed width builders Key: ARROW-6373 URL: https://issues.apache.org/jira/browse/ARROW-6373

Re: AppVeyor re-build by committers is enabled

2019-08-28 Thread Antoine Pitrou
Ahhh... Thank you very much. That's gonna be convenient :-) Regards Antoine. Le 28/08/2019 à 03:21, Sutou Kouhei a écrit : > Hi, > > I asked INFRA to add the "arrow committers" GitHub team > to teams of the "ApacheSoftwareFoundation" AppVeyor account: > >

[DISCUSS][Java] Should null values in VariableWidthVector/ListVector always takes 0 space?

2019-08-28 Thread Fan Liya
Dear all, In the discussion of this PR (https://github.com/apache/arrow/pull/5073), we are faced with a problem: Normally, in a VariableWidthVector (e.g. VarCharVector), a null value is supposed to take no space in the data buffer. In particular, for a null value, we have start index == end