RecordBatch.length vs. Buffer.length?

2019-05-03 Thread Jeffrey Green
Hello! I'm using the Java API for Arrow and am finding some ambiguity between the length field in a RecordBatch and the "byte-width-adjusted" length field in a Buffer. As per https://arrow.apache.org/docs/format/Metadata.html under the "Record data headers" section: "A record batch is a collectio

Re: [VOTE] Add new DurationInterval Type to Arrow Format

2019-05-03 Thread Wes McKinney
I've just reviewed the format and C++ changes in https://github.com/apache/arrow/pull/3644 which look good to me modulo minor comments. Can someone take a look at the Java changes soon so we move this toward completion? One question came up of whether "DurationInterval" is the name we want. It mi

[jira] [Created] (ARROW-5257) [Website] Update site to use "official" Apache Arrow logo, add clearly marked links to logo

2019-05-03 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-5257: --- Summary: [Website] Update site to use "official" Apache Arrow logo, add clearly marked links to logo Key: ARROW-5257 URL: https://issues.apache.org/jira/browse/ARROW-5257

[jira] [Created] (ARROW-5256) [Packaging][deb] Failed to build with LLVM 7.1.0

2019-05-03 Thread Kouhei Sutou (JIRA)
Kouhei Sutou created ARROW-5256: --- Summary: [Packaging][deb] Failed to build with LLVM 7.1.0 Key: ARROW-5256 URL: https://issues.apache.org/jira/browse/ARROW-5256 Project: Apache Arrow Issue Typ

Re: How about inet4/inet6/macaddr data types?

2019-05-03 Thread David Li
Sure, I've created https://issues.apache.org/jira/browse/ARROW-5255. PR: https://github.com/apache/arrow/pull/4251 I'm not sure if what I'm doing with my Vector subclass is quite right, but we'd especially like this in Java, so happy to work through any feedback. Also, as part of this discussion

[jira] [Created] (ARROW-5255) [Java] Implement user-defined data types API

2019-05-03 Thread David Li (JIRA)
David Li created ARROW-5255: --- Summary: [Java] Implement user-defined data types API Key: ARROW-5255 URL: https://issues.apache.org/jira/browse/ARROW-5255 Project: Apache Arrow Issue Type: Bug

[jira] [Created] (ARROW-5254) [Flight][Java] DoAction does not support result streams

2019-05-03 Thread David Li (JIRA)
David Li created ARROW-5254: --- Summary: [Flight][Java] DoAction does not support result streams Key: ARROW-5254 URL: https://issues.apache.org/jira/browse/ARROW-5254 Project: Apache Arrow Issue Type

Re: How about inet4/inet6/macaddr data types?

2019-05-03 Thread Wes McKinney
hi David -- would you like to open a PR and corresponding JIRA issue for discussion? We might want to hold a vote to formalize the extension type mechanism (and to fix the metadata names -- I agree that having an ARROW namespace would be better than what we are doing now) On Thu, May 2, 2019 at 7:

Re: PARQUET-1411 / PR 4185

2019-05-03 Thread TP Boudreau
No need for apologies, Wes, I appreciate you keeping this on your radar. I've made the changes and have pushed them to the PR branch. You can begin your review when you get the chance. --TPB On Thu, May 2, 2019 at 3:32 PM Wes McKinney wrote: > + Parquet dev list > > Thanks Tim for working on

RE: [DISCUSS][C++][Proposal] Threading engine for Arrow

2019-05-03 Thread Jed Brown
"Malakhov, Anton" writes: >> > the library creates threads internally. It's a disaster for managing >> > oversubscription and affinity issues among groups of threads and/or >> > multiple processes (e.g., MPI). > > This is exactly what I'm talking about referring as issues with threading > compo

Re: ARROW-3191: Making ArrowBuf work with arbitrary memory and setting io.netty.tryReflectionSetAccessible to true for java builds

2019-05-03 Thread Bryan Cutler
Hi Sidd, Does setting the system property io.netty.tryReflectionSetAccessible to true have any other adverse effect other than those warnings during build? Bryan On Thu, May 2, 2019 at 8:43 PM Jacques Nadeau wrote: > I'm onboard with this change. > > On Fri, Apr 26, 2019 at 2:14 AM Siddharth T

Re: [DISCUSS][C++][Proposal] Threading engine for Arrow

2019-05-03 Thread Antoine Pitrou
Le 03/05/2019 à 17:57, Jed Brown a écrit : > >>> The library is then free to use constructs like omp taskgroup/taskloop >>> as granularity warrants; it will never utilize threads that the >>> application didn't explicitly give it. >> >> I don't think we're planning to use OpenMP in Arrow, though

RE: [DISCUSS][C++][Proposal] Threading engine for Arrow

2019-05-03 Thread Malakhov, Anton
Thanks for your answers, > -Original Message- > From: Antoine Pitrou [mailto:anto...@python.org] > Sent: Friday, May 3, 2019 03:54 > Le 03/05/2019 à 05:47, Jed Brown a écrit : > > I would caution to please not commit to the MKL/BLAS model in which I'm actually talking about threading laye

Re: [DISCUSS][C++][Proposal] Threading engine for Arrow

2019-05-03 Thread Jed Brown
Antoine Pitrou writes: > Hi Jed, > > Le 03/05/2019 à 05:47, Jed Brown a écrit : >> I would caution to please not commit to the MKL/BLAS model in which the >> library creates threads internally. It's a disaster for managing >> oversubscription and affinity issues among groups of threads and/or >>

[jira] [Created] (ARROW-5253) [C++] external Snappy fails on Alpine

2019-05-03 Thread Francois Saint-Jacques (JIRA)
Francois Saint-Jacques created ARROW-5253: - Summary: [C++] external Snappy fails on Alpine Key: ARROW-5253 URL: https://issues.apache.org/jira/browse/ARROW-5253 Project: Apache Arrow

[jira] [Created] (ARROW-5252) [C++] Change variant implementation

2019-05-03 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-5252: - Summary: [C++] Change variant implementation Key: ARROW-5252 URL: https://issues.apache.org/jira/browse/ARROW-5252 Project: Apache Arrow Issue Type: Improv

Re: [DISCUSS][C++][Proposal] Threading engine for Arrow

2019-05-03 Thread Antoine Pitrou
Hi Jed, Le 03/05/2019 à 05:47, Jed Brown a écrit : > I would caution to please not commit to the MKL/BLAS model in which the > library creates threads internally. It's a disaster for managing > oversubscription and affinity issues among groups of threads and/or > multiple processes (e.g., MPI).

Re: [DISCUSS][C++][Proposal] Threading engine for Arrow

2019-05-03 Thread Antoine Pitrou
Hi Anton, Another possibility is to look at our C++ CSV reader and parser (in src/arrow/csv). It's the only piece of Arrow that uses non-trivial multi-threading right now (with tasks spawning new tasks dynamically, see InferringColumnBuilder). It's based on the ThreadPool and TaskGroup APIs (i