[jira] [Created] (ARROW-5411) Build error building on Mac OS Mojave

2019-05-23 Thread Miguel Cabrera (JIRA)
Miguel Cabrera created ARROW-5411: - Summary: Build error building on Mac OS Mojave Key: ARROW-5411 URL: https://issues.apache.org/jira/browse/ARROW-5411 Project: Apache Arrow Issue Type: Bug

[jira] [Created] (ARROW-5410) Crash at arrow::internal::FileWrite

2019-05-23 Thread Tham (JIRA)
Tham created ARROW-5410: --- Summary: Crash at arrow::internal::FileWrite Key: ARROW-5410 URL: https://issues.apache.org/jira/browse/ARROW-5410 Project: Apache Arrow Issue Type: Bug Environment:

Re: [Discuss][Format][Java] Finalizing Union Types

2019-05-23 Thread Micah Kornfield
I'd like to bump this thread, to see if anyone has any comments. If nobody objects I will try to start implementing the changes next week. Thanks, Micah On Mon, May 20, 2019 at 9:37 PM Micah Kornfield wrote: > In the past [1] there hasn't been agreement on the final requirements for > union ty

[jira] [Created] (ARROW-5409) [C++] Improvement for IsIn Kernel when right array is small

2019-05-23 Thread Preeti Suman (JIRA)
Preeti Suman created ARROW-5409: --- Summary: [C++] Improvement for IsIn Kernel when right array is small Key: ARROW-5409 URL: https://issues.apache.org/jira/browse/ARROW-5409 Project: Apache Arrow

[jira] [Created] (ARROW-5408) [Rust] Create struct array builder that creates null buffers

2019-05-23 Thread Neville Dipale (JIRA)
Neville Dipale created ARROW-5408: - Summary: [Rust] Create struct array builder that creates null buffers Key: ARROW-5408 URL: https://issues.apache.org/jira/browse/ARROW-5408 Project: Apache Arrow

Re: Python development setup and LLVM 7 / Gandiva

2019-05-23 Thread John Muehlhausen
Not sure why cmake isn't happy (as in original post). Environment is set up as per instructions: (pyarrow-dev) JGM-KTG-Mac-Mini:python jmuehlhausen$ conda list llvmdev # packages in environment at /Users/jmuehlhausen/miniconda3/envs/pyarrow-dev: # # NameVersion

Re: Python development setup and LLVM 7 / Gandiva

2019-05-23 Thread Wes McKinney
llvmdev=7 is in the conda_env_cpp.yml requirements file, are you using something else? https://github.com/apache/arrow/blob/master/ci/conda_env_cpp.yml#L31 On Thu, May 23, 2019 at 12:53 PM John Muehlhausen wrote: > > The pyarrow-dev conda environment does not include llvm 7, which appears to > b

Re: Python development setup and LLVM 7 / Gandiva

2019-05-23 Thread John Muehlhausen
The pyarrow-dev conda environment does not include llvm 7, which appears to be a requirement for Gandiva. So I'm just trying to figure out a pain-free way to add llvm 7 in a way that cmake can find it, for Mac. I had already solved the other Mac problem with export CONDA_BUILD_SYSROOT=/Users/jmue

Re: Java/Scala: efficient reading of Parquet into Arrow?

2019-05-23 Thread Wes McKinney
Cool. At some point we are interested in having simple compressed (e.g. with LZ4 or ZSTD) record batches natively in the Arrow protocol, see https://issues.apache.org/jira/browse/ARROW-300 On Thu, May 23, 2019 at 10:21 AM Joris Peeters wrote: > > Cool, thanks. I think we'll just go with reading

Re: [Python] Is there a way to specify a column as non-nullable with parquet.write_table?

2019-05-23 Thread Wes McKinney
Yes, but you will need to resolve https://issues.apache.org/jira/browse/ARROW-5169 write_table should respect the field-level nullability in the schema of the Table you pass On Thu, May 23, 2019 at 10:34 AM Tim Swast wrote: > > I'm currently using parquet as the intermediate format when uploadi

[Python] Is there a way to specify a column as non-nullable with parquet.write_table?

2019-05-23 Thread Tim Swast
I'm currently using parquet as the intermediate format when uploading a pandas DataFrame to Google BigQuery. We encounter a problem when trying to append a parquet file to a table with required fields (issue: https://github.com/googleapis/google-cloud-python/issues/8093). Is there a way to mark fi

[jira] [Created] (ARROW-5407) [C++] Integration test Travis CI entry builds many unnecessary targets

2019-05-23 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-5407: --- Summary: [C++] Integration test Travis CI entry builds many unnecessary targets Key: ARROW-5407 URL: https://issues.apache.org/jira/browse/ARROW-5407 Project: Apache Ar

Re: Java/Scala: efficient reading of Parquet into Arrow?

2019-05-23 Thread Joris Peeters
Cool, thanks. I think we'll just go with reading LZ4 compressed Arrow directly from disk then, and by-pass Parquet altogether. The compressed Arrow files are about 20% larger than the PQ files, but getting it into some useful form in memory is almost on par with pandas. At the moment, I don't need

[jira] [Created] (ARROW-5406) enable Subscribe and GetNotification from Java

2019-05-23 Thread Tim Emerick (JIRA)
Tim Emerick created ARROW-5406: -- Summary: enable Subscribe and GetNotification from Java Key: ARROW-5406 URL: https://issues.apache.org/jira/browse/ARROW-5406 Project: Apache Arrow Issue Type: N

[jira] [Created] (ARROW-5405) [Documentation] Move integration testing documentation to Sphinx docs, add instructions for JavaScript

2019-05-23 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-5405: --- Summary: [Documentation] Move integration testing documentation to Sphinx docs, add instructions for JavaScript Key: ARROW-5405 URL: https://issues.apache.org/jira/browse/ARROW-5405

Re: Java/Scala: efficient reading of Parquet into Arrow?

2019-05-23 Thread Wes McKinney
hi Joris, The Apache Parquet mailing list is d...@parquet.apache.org I'm copying the list here AFAIK parquet-mr doesn't feature vectorized reading (for Arrow or otherwise). There are some vectorized Java-based readers in the wild: in Dremio [1] and Apache Spark, at least. I'm interested to see

Java/Scala: efficient reading of Parquet into Arrow?

2019-05-23 Thread Joris Peeters
Hello, I'm trying to read a Parquet file from disk into Arrow in memory, in Scala. I'm wondering what the most efficient approach is, especially for the reading part. I'm aware that Parquet reading is perhaps beyond the scope of this mailing list but, - I believe Arrow and Parquet are closely int

Re: memory mapped IPC File of RecordBatches?

2019-05-23 Thread Wes McKinney
OK. Can you open a JIRA about fixing this? I don't recall the rationale for using MAP_PRIVATE to begin with, and since the behavior is unspecified on Linux it would be better to be consistent across platforms On Wed, May 22, 2019 at 11:02 PM John Muehlhausen wrote: > > Well, it works fine on Linu

[jira] [Created] (ARROW-5404) [C++] nonstd::string_view conflicts with std::string_view in c++17

2019-05-23 Thread Benjamin Kietzman (JIRA)
Benjamin Kietzman created ARROW-5404: Summary: [C++] nonstd::string_view conflicts with std::string_view in c++17 Key: ARROW-5404 URL: https://issues.apache.org/jira/browse/ARROW-5404 Project: Apa

Gandiva User Defined Functions

2019-05-23 Thread Praveen Kumar
Hi Sun, Starting a thread on the mailing list around the query you had - https://github.com/apache/arrow/issues/4375. Currently Gandiva does not support a user defined repository, we want to implement it sometime in the future but i am not sure when we will pick it up. In case you want to go ahea

[jira] [Created] (ARROW-5403) [C++] Test failures not propagated in Windows shared builds

2019-05-23 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-5403: - Summary: [C++] Test failures not propagated in Windows shared builds Key: ARROW-5403 URL: https://issues.apache.org/jira/browse/ARROW-5403 Project: Apache Arrow

Re: A couple of questions about pyarrow.parquet

2019-05-23 Thread Uwe L. Korn
Hello Ted, regarding predicate pushdown in Python, have a look at my unfinished PR at https://github.com/apache/arrow/pull/2623. This was stopped since we were missing native filter in Arrow. The requirements for that have now been implemented and we could probably reactivate the PR. Uwe On S

[jira] [Created] (ARROW-5402) [Plasma] Pin objects in plasma store

2019-05-23 Thread Zhijun Fu (JIRA)
Zhijun Fu created ARROW-5402: Summary: [Plasma] Pin objects in plasma store Key: ARROW-5402 URL: https://issues.apache.org/jira/browse/ARROW-5402 Project: Apache Arrow Issue Type: Improvement

[jira] [Created] (ARROW-5401) [CI] [C++] Print ccache statistics on Travis-CI

2019-05-23 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-5401: - Summary: [CI] [C++] Print ccache statistics on Travis-CI Key: ARROW-5401 URL: https://issues.apache.org/jira/browse/ARROW-5401 Project: Apache Arrow Issue

[jira] [Created] (ARROW-5400) [Rust] Test/ensure that reader and writer support zero-length record batches

2019-05-23 Thread Neville Dipale (JIRA)
Neville Dipale created ARROW-5400: - Summary: [Rust] Test/ensure that reader and writer support zero-length record batches Key: ARROW-5400 URL: https://issues.apache.org/jira/browse/ARROW-5400 Project:

[jira] [Created] (ARROW-5399) [Rust] [Testing] Add IPC test files to arrow-testing

2019-05-23 Thread Neville Dipale (JIRA)
Neville Dipale created ARROW-5399: - Summary: [Rust] [Testing] Add IPC test files to arrow-testing Key: ARROW-5399 URL: https://issues.apache.org/jira/browse/ARROW-5399 Project: Apache Arrow I