Re: [DISCUSS] Formalizing "extension type" metadata in the Arrow binary protocol

2019-05-17 Thread Wes McKinney
As Micah brought up, as part of this we would like to formalize the use of "ARROW:" as a reserved metadata key prefix. This is similar to Apache Avro which uses "avro." as a reserved prefix [1]. If someone has a different idea about what the prefix should be I'm open to other ideas [1] : https://a

[jira] [Created] (ARROW-5360) [Rust] Builds are broken by rustyline on nightly 2019-05-16+

2019-05-17 Thread Neville Dipale (JIRA)
Neville Dipale created ARROW-5360: - Summary: [Rust] Builds are broken by rustyline on nightly 2019-05-16+ Key: ARROW-5360 URL: https://issues.apache.org/jira/browse/ARROW-5360 Project: Apache Arrow

Confluence edit access

2019-05-17 Thread Brian Hulette
Can I get edit access on confluence? I wanted to answer some of the questions about JS here: https://cwiki.apache.org/confluence/display/ARROW/Columnar+Format+1.0+Milestone My username is bhulette Thanks! Brian

[jira] [Created] (ARROW-5361) [R] Follow DictionaryType/DictionaryArray changes from ARROW-3144

2019-05-17 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-5361: --- Summary: [R] Follow DictionaryType/DictionaryArray changes from ARROW-3144 Key: ARROW-5361 URL: https://issues.apache.org/jira/browse/ARROW-5361 Project: Apache Arrow

Re: Confluence edit access

2019-05-17 Thread Wes McKinney
Added. You have another Confluence id (hulettbh -- perhaps before the LDAP migration?) that had already been permissioned. On Fri, May 17, 2019 at 9:42 AM Brian Hulette wrote: > > Can I get edit access on confluence? I wanted to answer some of the > questions about JS here: > https://cwiki.apache

[jira] [Created] (ARROW-5362) [C++] Compression round trip test can cause some sanitizers to to fail

2019-05-17 Thread Micah Kornfield (JIRA)
Micah Kornfield created ARROW-5362: -- Summary: [C++] Compression round trip test can cause some sanitizers to to fail Key: ARROW-5362 URL: https://issues.apache.org/jira/browse/ARROW-5362 Project: Ap

A couple of questions about pyarrow.parquet

2019-05-17 Thread Ted Gooch
Hi, I've been doing some work trying to get the parquet read path going for the python iceberg library. I have two questions that I couldn't get figured out, and was hoping I could get some guidance from the list here. First, I'd like to create a Par

[DISCUSS][C++] Unaligned memory accesses (undefined behavior)

2019-05-17 Thread Micah Kornfield
I recently ran UBSan over parquet and arrow code bases and there are quite a few unaligned pointer warnings (we do reinterpret casts on integer types without checking alignment). Most of them are in Arrow itself, which parquet calls into. Is this something the community would like to fix? I imag

Re: A couple of questions about pyarrow.parquet

2019-05-17 Thread Micah Kornfield
I can't help on the first question. Regarding push-down predicates, there is an open JIRA [1] to do just that [1] https://issues.apache.org/jira/browse/PARQUET-473

Re: A couple of questions about pyarrow.parquet

2019-05-17 Thread Wes McKinney
Please see also https://docs.google.com/document/d/1bVhzifD38qDypnSjtf8exvpP3sSB5x_Kw9m-n66FB2c/edit?usp=drivesdk And prior mailing list discussion. I will comment in more detail on the other items later On Fri, May 17, 2019, 2:44 PM Micah Kornfield wrote: > I can't help on the first question.

Re: [DISCUSS][C++] Unaligned memory accesses (undefined behavior)

2019-05-17 Thread Tim Armstrong
I don't know the Arrow and parquet-cpp codebases but this is exactly what we did in Impala to solve similar issues and we haven't had any performance problems with it - it should get compiled to a single load/store on x86-64. On Fri, May 17, 2019 at 12:22 PM Micah Kornfield wrote: > I recently r

Re: [DISCUSS][C++] Unaligned memory accesses (undefined behavior)

2019-05-17 Thread Antoine Pitrou
Le 17/05/2019 à 21:22, Micah Kornfield a écrit : > I recently ran UBSan over parquet and arrow code bases and there are quite > a few unaligned pointer warnings (we do reinterpret casts on integer types > without checking alignment). Most of them are in Arrow itself, which > parquet calls into.

[jira] [Created] (ARROW-5363) [GLib] Fix coding styles

2019-05-17 Thread Yosuke Shiro (JIRA)
Yosuke Shiro created ARROW-5363: --- Summary: [GLib] Fix coding styles Key: ARROW-5363 URL: https://issues.apache.org/jira/browse/ARROW-5363 Project: Apache Arrow Issue Type: Improvement

[jira] [Created] (ARROW-5364) [C++] Fix UTF-8 comments to ASCII in BuildUtils.cmake

2019-05-17 Thread Yosuke Shiro (JIRA)
Yosuke Shiro created ARROW-5364: --- Summary: [C++] Fix UTF-8 comments to ASCII in BuildUtils.cmake Key: ARROW-5364 URL: https://issues.apache.org/jira/browse/ARROW-5364 Project: Apache Arrow Issu

Re: A couple of questions about pyarrow.parquet

2019-05-17 Thread Ted Gooch
Thanks Micah and Wes. Definitely interested in the *Predicate Pushdown* and *Schema inference, schema-on-read, and schema normalization *sections. On Fri, May 17, 2019 at 12:47 PM Wes McKinney wrote: > Please see also > > > https://docs.google.com/document/d/1bVhzifD38qDypnSjtf8exvpP3sSB5x_Kw9m

[jira] [Created] (ARROW-5365) [C++][CI] Add UBSan and ASAN into CI

2019-05-17 Thread Micah Kornfield (JIRA)
Micah Kornfield created ARROW-5365: -- Summary: [C++][CI] Add UBSan and ASAN into CI Key: ARROW-5365 URL: https://issues.apache.org/jira/browse/ARROW-5365 Project: Apache Arrow Issue Type: Imp

[jira] [Created] (ARROW-5366) [Rust] Implement Duration and Interval Types

2019-05-17 Thread Neville Dipale (JIRA)
Neville Dipale created ARROW-5366: - Summary: [Rust] Implement Duration and Interval Types Key: ARROW-5366 URL: https://issues.apache.org/jira/browse/ARROW-5366 Project: Apache Arrow Issue Typ

[jira] [Created] (ARROW-5367) [Rust] Add temporal kernels

2019-05-17 Thread Neville Dipale (JIRA)
Neville Dipale created ARROW-5367: - Summary: [Rust] Add temporal kernels Key: ARROW-5367 URL: https://issues.apache.org/jira/browse/ARROW-5367 Project: Apache Arrow Issue Type: New Feature