[jira] [Created] (ARROW-6420) [Java] Improve the performance of UnionVector when getting underlying vectors

2019-09-02 Thread Liya Fan (Jira)
Liya Fan created ARROW-6420: --- Summary: [Java] Improve the performance of UnionVector when getting underlying vectors Key: ARROW-6420 URL: https://issues.apache.org/jira/browse/ARROW-6420 Project: Apache

[jira] [Created] (ARROW-6419) [Website] Blog post about Parquet dictionary performance work coming in 0.15.x release

2019-09-02 Thread Wes McKinney (Jira)
Wes McKinney created ARROW-6419: --- Summary: [Website] Blog post about Parquet dictionary performance work coming in 0.15.x release Key: ARROW-6419 URL: https://issues.apache.org/jira/browse/ARROW-6419

[jira] [Created] (ARROW-6418) Plasma cmake targets are not exported

2019-09-02 Thread Tobias Mayer (Jira)
Tobias Mayer created ARROW-6418: --- Summary: Plasma cmake targets are not exported Key: ARROW-6418 URL: https://issues.apache.org/jira/browse/ARROW-6418 Project: Apache Arrow Issue Type: Bug

[jira] [Created] (ARROW-6417) [C++][Parquet] Non-dictionary BinaryArray reads from Parquet format have slowed down since 0.11.x

2019-09-02 Thread Wes McKinney (Jira)
Wes McKinney created ARROW-6417: --- Summary: [C++][Parquet] Non-dictionary BinaryArray reads from Parquet format have slowed down since 0.11.x Key: ARROW-6417 URL: https://issues.apache.org/jira/browse/ARROW-6417

[jira] [Created] (ARROW-6416) [Python] Confusing API & documentation regarding chunksizes

2019-09-02 Thread Arik Funke (Jira)
Arik Funke created ARROW-6416: - Summary: [Python] Confusing API & documentation regarding chunksizes Key: ARROW-6416 URL: https://issues.apache.org/jira/browse/ARROW-6416 Project: Apache Arrow

AW: KeyValue metadata for column

2019-09-02 Thread roman.karlstetter
Thanks for the feedback, I'll try to work on this if I find some time. Roman -Ursprüngliche Nachricht- Von: Wes McKinney Gesendet: Montag, 2. September 2019 16:25 An: dev@arrow.apache.org Betreff: Re: KeyValue metadata for column hi Roman, It's just not implemented. See issue related

[jira] [Created] (ARROW-6415) [R] Remove usage of R CMD config CXXCPP

2019-09-02 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-6415: -- Summary: [R] Remove usage of R CMD config CXXCPP Key: ARROW-6415 URL: https://issues.apache.org/jira/browse/ARROW-6415 Project: Apache Arrow Issue Type:

[jira] [Created] (ARROW-6414) pyarrow cannot (de)serialise an empty MultiIndex-ed column DataFrame

2019-09-02 Thread Stpehen Gowdy (Jira)
Stpehen Gowdy created ARROW-6414: Summary: pyarrow cannot (de)serialise an empty MultiIndex-ed column DataFrame Key: ARROW-6414 URL: https://issues.apache.org/jira/browse/ARROW-6414 Project: Apache

[jira] [Created] (ARROW-6413) [R] Support autogenerating column names

2019-09-02 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-6413: -- Summary: [R] Support autogenerating column names Key: ARROW-6413 URL: https://issues.apache.org/jira/browse/ARROW-6413 Project: Apache Arrow Issue Type:

Re: [DISCUSS][FORMAT] Concerning about character encoding of binary string data

2019-09-02 Thread Antoine Pitrou
Hello, Le 02/09/2019 à 10:39, Kenta Murata a écrit : > > There are two options to manage a character encoding in a BinaryArray. > The first way is introducing an optional character_encoding field in > BinaryType. The second way is using custom_metadata field to supply > the character encoding

Re: [DISCUSS][FORMAT] Concerning about character encoding of binary string data

2019-09-02 Thread Wes McKinney
hi Kenta, It seems like using ExtensionType would be a simple way to handle this for the immediate purpose of implementing user-facing Array types. If we wanted to change the the metadata representation to something more "built-in" then we can keep discussing this. It seems like having a distinct

Re: KeyValue metadata for column

2019-09-02 Thread Wes McKinney
hi Roman, It's just not implemented. See issue related to preserving Field-level Arrow metadata in Parquet https://issues.apache.org/jira/browse/ARROW-4359 I think implementing this should be pretty straightforward. You can follow the code that handles KeyValue metadata at the Schema level -

[jira] [Created] (ARROW-6412) [C++] arrow-flight-test can crash because of port allocation

2019-09-02 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-6412: - Summary: [C++] arrow-flight-test can crash because of port allocation Key: ARROW-6412 URL: https://issues.apache.org/jira/browse/ARROW-6412 Project: Apache Arrow

KeyValue metadata for column

2019-09-02 Thread roman.karlstetter
Hi everyone, reading the descriptions here https://parquet.apache.org/documentation/latest/#metadata, I think it should be possible in general to add arbitrary key-value metadata to any parquet column. Is that correct? If yes, in

[DISCUSS][FORMAT] Concerning about character encoding of binary string data

2019-09-02 Thread Kenta Murata
[Abstract] When we have a string data encoded in a character encoding other than UTF-8, we must use a BinaryArray for the data. But Apache Arrow doesn’t provide the way to specify what a character encoding used in a BinaryArray. In this mail, I’d like to discuss how Apache Arrow provides the way

Re: [DISCUSS][Format][C++] Improvement of sparse tensor format and implementation

2019-09-02 Thread Kenta Murata
2019年8月28日(水) 8:57 Rok Mihevc : > > On Wed, Aug 28, 2019 at 1:18 AM Wes McKinney wrote: > > > null/NA. But, as far as I'm aware, this component of pandas is > > relatively unique and was never intended as an alternatives to sparse > > matrix libraries. > > > > Another example is >

Re: [DISCUSS][Format][C++] Improvement of sparse tensor format and implementation

2019-09-02 Thread Kenta Murata
2019年8月28日(水) 6:05 Wes McKinney : > I'm also OK with these changes. Since we have not established a > versioning or compatibility policy with regards to "Other" data > structures like Tensor and SparseTensor, I don't know that a vote is > needed, just a pull request. I didn't understand that