Summary of RLE and other compression efforts?

2020-03-09 Thread Evan Chan
Hi folks, I’m curious about the state of efforts for more compressed encodings in the Arrow columnar format. I saw discussions previously about RLE, but is there a place to summarize all of the different efforts that are ongoing to bring more compressed encodings? Is there an effort to

[Rust] Dictionary encoding for strings?

2020-03-09 Thread Evan Chan
Hi, Does the Rust implementation support dictionary encoded strings? It is not in the documentation anywhere, but there seem to be some variable-sized dictionary structs in the code base. If not, is there a plan to support it? Does DataFusion support reading from dictionary strings? It seems

[jira] [Created] (ARROW-8052) [Python] requirements-test.txt cannot be used with conda install --file

2020-03-09 Thread Wes McKinney (Jira)
Wes McKinney created ARROW-8052: --- Summary: [Python] requirements-test.txt cannot be used with conda install --file Key: ARROW-8052 URL: https://issues.apache.org/jira/browse/ARROW-8052 Project: Apache

[jira] [Created] (ARROW-8051) [Go] Dependency pmezard/go-difflib is unmaintained

2020-03-09 Thread Wes McKinney (Jira)
Wes McKinney created ARROW-8051: --- Summary: [Go] Dependency pmezard/go-difflib is unmaintained Key: ARROW-8051 URL: https://issues.apache.org/jira/browse/ARROW-8051 Project: Apache Arrow Issue

[NIGHTLY] Arrow Build Report for Job nightly-2020-03-09-0

2020-03-09 Thread Crossbow
Arrow Build Report for Job nightly-2020-03-09-0 All tasks: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-09-0 Failed Tasks: - centos-7: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-03-09-0-github-centos-7 - gandiva-jar-osx: URL:

[jira] [Created] (ARROW-8050) [Python][Packaging] Do not include generated Cython source files in wheel packages

2020-03-09 Thread Wes McKinney (Jira)
Wes McKinney created ARROW-8050: --- Summary: [Python][Packaging] Do not include generated Cython source files in wheel packages Key: ARROW-8050 URL: https://issues.apache.org/jira/browse/ARROW-8050

[jira] [Created] (ARROW-8049) [C++] Upgrade bundled Thrift version to 0.13.0

2020-03-09 Thread Wes McKinney (Jira)
Wes McKinney created ARROW-8049: --- Summary: [C++] Upgrade bundled Thrift version to 0.13.0 Key: ARROW-8049 URL: https://issues.apache.org/jira/browse/ARROW-8049 Project: Apache Arrow Issue

[jira] [Created] (ARROW-8048) [Python] Run memory leak tests nightly as follow up to ARROW-4120

2020-03-09 Thread Wes McKinney (Jira)
Wes McKinney created ARROW-8048: --- Summary: [Python] Run memory leak tests nightly as follow up to ARROW-4120 Key: ARROW-8048 URL: https://issues.apache.org/jira/browse/ARROW-8048 Project: Apache Arrow

[jira] [Created] (ARROW-8047) [Python][Documentation] Document migration from ParquetDataset to pyarrow.datasets

2020-03-09 Thread Ben Kietzman (Jira)
Ben Kietzman created ARROW-8047: --- Summary: [Python][Documentation] Document migration from ParquetDataset to pyarrow.datasets Key: ARROW-8047 URL: https://issues.apache.org/jira/browse/ARROW-8047

[jira] [Created] (ARROW-8046) [Developer][Integration] Makefile.docker's target names are broken

2020-03-09 Thread Ben Kietzman (Jira)
Ben Kietzman created ARROW-8046: --- Summary: [Developer][Integration] Makefile.docker's target names are broken Key: ARROW-8046 URL: https://issues.apache.org/jira/browse/ARROW-8046 Project: Apache Arrow

Re: [jira] [Created] (ARROW-8042) [Python] pyarrow.ChunkedArray docstring is incorrect regarding zero-length ChunkedArray having no chunks

2020-03-09 Thread Duane Millar Barlow
unsubscribe On Mon, Mar 9, 2020 at 10:10 AM Wes McKinney (Jira) wrote: > Wes McKinney created ARROW-8042: > --- > > Summary: [Python] pyarrow.ChunkedArray docstring is incorrect > regarding zero-length ChunkedArray having no chunks >

[jira] [Created] (ARROW-8045) [Release] Ensure that the JIRAs belonging the release's commits have the proper version number

2020-03-09 Thread Krisztian Szucs (Jira)
Krisztian Szucs created ARROW-8045: -- Summary: [Release] Ensure that the JIRAs belonging the release's commits have the proper version number Key: ARROW-8045 URL: https://issues.apache.org/jira/browse/ARROW-8045

[jira] [Created] (ARROW-8044) [NIGHTLY:gandiva-jar-osx] pygit2 needs libgit2 v1.0.x

2020-03-09 Thread Ben Kietzman (Jira)
Ben Kietzman created ARROW-8044: --- Summary: [NIGHTLY:gandiva-jar-osx] pygit2 needs libgit2 v1.0.x Key: ARROW-8044 URL: https://issues.apache.org/jira/browse/ARROW-8044 Project: Apache Arrow

[jira] [Created] (ARROW-8043) [Developer] Provide better visibility for failed nightly builds

2020-03-09 Thread Ben Kietzman (Jira)
Ben Kietzman created ARROW-8043: --- Summary: [Developer] Provide better visibility for failed nightly builds Key: ARROW-8043 URL: https://issues.apache.org/jira/browse/ARROW-8043 Project: Apache Arrow

[jira] [Created] (ARROW-8042) [Python] pyarrow.ChunkedArray docstring is incorrect regarding zero-length ChunkedArray having no chunks

2020-03-09 Thread Wes McKinney (Jira)
Wes McKinney created ARROW-8042: --- Summary: [Python] pyarrow.ChunkedArray docstring is incorrect regarding zero-length ChunkedArray having no chunks Key: ARROW-8042 URL:

[jira] [Created] (ARROW-8041) [C++] protobuf_ep fails to build on Raspbian due to linking issues relating to atomics

2020-03-09 Thread Wes McKinney (Jira)
Wes McKinney created ARROW-8041: --- Summary: [C++] protobuf_ep fails to build on Raspbian due to linking issues relating to atomics Key: ARROW-8041 URL: https://issues.apache.org/jira/browse/ARROW-8041

[jira] [Created] (ARROW-8040) [Python][Packaging] Add Parquet encryption / OpenSSL to Python wheels

2020-03-09 Thread Wes McKinney (Jira)
Wes McKinney created ARROW-8040: --- Summary: [Python][Packaging] Add Parquet encryption / OpenSSL to Python wheels Key: ARROW-8040 URL: https://issues.apache.org/jira/browse/ARROW-8040 Project: Apache

[jira] [Created] (ARROW-8039) [C++][Python][Dataset] Assemble a minimal ParquetDataset shim

2020-03-09 Thread Ben Kietzman (Jira)
Ben Kietzman created ARROW-8039: --- Summary: [C++][Python][Dataset] Assemble a minimal ParquetDataset shim Key: ARROW-8039 URL: https://issues.apache.org/jira/browse/ARROW-8039 Project: Apache Arrow

[jira] [Created] (ARROW-8038) [C++][Packaging] Add OpenSSL / encryption support to C++ packages

2020-03-09 Thread Wes McKinney (Jira)
Wes McKinney created ARROW-8038: --- Summary: [C++][Packaging] Add OpenSSL / encryption support to C++ packages Key: ARROW-8038 URL: https://issues.apache.org/jira/browse/ARROW-8038 Project: Apache Arrow

[jira] [Created] (ARROW-8037) [Developer][Integration] Consolidate example JSON and test/validate uniformly

2020-03-09 Thread Ben Kietzman (Jira)
Ben Kietzman created ARROW-8037: --- Summary: [Developer][Integration] Consolidate example JSON and test/validate uniformly Key: ARROW-8037 URL: https://issues.apache.org/jira/browse/ARROW-8037 Project:

[jira] [Created] (ARROW-8036) [C++] Compilation failure with gtest 1.10.0

2020-03-09 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-8036: - Summary: [C++] Compilation failure with gtest 1.10.0 Key: ARROW-8036 URL: https://issues.apache.org/jira/browse/ARROW-8036 Project: Apache Arrow Issue

[jira] [Created] (ARROW-8035) [Developer][Integration] Add integration tests for extension types

2020-03-09 Thread Ben Kietzman (Jira)
Ben Kietzman created ARROW-8035: --- Summary: [Developer][Integration] Add integration tests for extension types Key: ARROW-8035 URL: https://issues.apache.org/jira/browse/ARROW-8035 Project: Apache Arrow

[jira] [Created] (ARROW-8034) [JavaScript][Integration] Enable custom_metadata integtration test

2020-03-09 Thread Ben Kietzman (Jira)
Ben Kietzman created ARROW-8034: --- Summary: [JavaScript][Integration] Enable custom_metadata integtration test Key: ARROW-8034 URL: https://issues.apache.org/jira/browse/ARROW-8034 Project: Apache Arrow

[jira] [Created] (ARROW-8032) CPP example parquet-arrow project includes broken FindParquet.cmake

2020-03-09 Thread Tomasz Cheda (Jira)
Tomasz Cheda created ARROW-8032: --- Summary: CPP example parquet-arrow project includes broken FindParquet.cmake Key: ARROW-8032 URL: https://issues.apache.org/jira/browse/ARROW-8032 Project: Apache