[GitHub] [arrow] emkornfield edited a comment on pull request #7143: ARROW-8504: [C++] [wip]Add BitRunReader and use it in parquet

2020-05-10 Thread GitBox
emkornfield edited a comment on pull request #7143: URL: https://github.com/apache/arrow/pull/7143#issuecomment-626464623 @wesm interesting data point, I updated performance benchmarks to generate random values/nullability (and kept deterministic one). It seems like the bad regression is

[GitHub] [arrow] emkornfield commented on pull request #7143: ARROW-8504: [C++] [wip]Add BitRunReader and use it in parquet

2020-05-10 Thread GitBox
emkornfield commented on pull request #7143: URL: https://github.com/apache/arrow/pull/7143#issuecomment-626464623 @wesm interesting data point, I updated performance benchmarks to generate random values/nullability (and kept deterministic one). It seems like the bad regression is really

[GitHub] [arrow] kiszk commented on a change in pull request #7135: ARROW-8553: [C++] Optimize unaligned bitmap operations

2020-05-10 Thread GitBox
kiszk commented on a change in pull request #7135: URL: https://github.com/apache/arrow/pull/7135#discussion_r422764992 ## File path: cpp/src/arrow/util/bit_util.cc ## @@ -273,28 +274,115 @@ void AlignedBitmapOp(const uint8_t* left, int64_t left_offset, const uint8_t* ri }

[GitHub] [arrow] zhztheplayer commented on a change in pull request #7030: ARROW-7808: [Java][Dataset] Implement Datasets Java API by JNI to C++

2020-05-10 Thread GitBox
zhztheplayer commented on a change in pull request #7030: URL: https://github.com/apache/arrow/pull/7030#discussion_r422738964 ## File path: cpp/src/jni/dataset/jni_wrapper.cpp ## @@ -0,0 +1,577 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contr

[GitHub] [arrow] kiszk opened a new pull request #7148: ARROW-8759: [C++][Plasma] Fix TestPlasmaSerialization.DeleteReply failure on big-endian platforms

2020-05-10 Thread GitBox
kiszk opened a new pull request #7148: URL: https://github.com/apache/arrow/pull/7148 This PR gets an element data using an endianless API in Flatbuffer instead of getting a pointer. This can fix a failure of TestPlasmaSerialization.DeleteReply in plasma-serialization-tests. Before

[GitHub] [arrow] github-actions[bot] commented on pull request #7148: ARROW-8759: [C++][Plasma] Fix TestPlasmaSerialization.DeleteReply failure on big-endian platforms

2020-05-10 Thread GitBox
github-actions[bot] commented on pull request #7148: URL: https://github.com/apache/arrow/pull/7148#issuecomment-626425675 https://issues.apache.org/jira/browse/ARROW-8759 This is an automated message from the Apache Git Serv

[GitHub] [arrow] emkornfield commented on pull request #7143: ARROW-8504: [C++] Add BitRunReader and use it in parquet

2020-05-10 Thread GitBox
emkornfield commented on pull request #7143: URL: https://github.com/apache/arrow/pull/7143#issuecomment-626424926 Agreed. I'm think I'll have to add a special case for this. The code can also be simplified a bit further for cases with runs --

[GitHub] [arrow] kiszk commented on pull request #7146: ARROW-8757: [C++][Plasma] Write Plasma header in little-endian format

2020-05-10 Thread GitBox
kiszk commented on pull request #7146: URL: https://github.com/apache/arrow/pull/7146#issuecomment-626418016 Good catch. updated the title. This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [arrow] wesm commented on pull request #7143: ARROW-8504: [C++] Add BitRunReader and use it in parquet

2020-05-10 Thread GitBox
wesm commented on pull request #7143: URL: https://github.com/apache/arrow/pull/7143#issuecomment-626408844 A fair chunk of RLE-related code came out of Impala originally, it might not be a bad idea to peek at what's in apache/impala to see if it has gotten worked on perf-wise since the be

[GitHub] [arrow] kou commented on pull request #7135: ARROW-8553: [C++] Optimize unaligned bitmap operations

2020-05-10 Thread GitBox
kou commented on pull request #7135: URL: https://github.com/apache/arrow/pull/7135#issuecomment-626408103 The GLib on Windows failure isn't Apache Arrow related problem. It's a glib2 gem and the latest GLib on MSYS2 problem. I'll fix it in a few days.

[GitHub] [arrow] kou commented on a change in pull request #7135: ARROW-8553: [C++] Optimize unaligned bitmap operations

2020-05-10 Thread GitBox
kou commented on a change in pull request #7135: URL: https://github.com/apache/arrow/pull/7135#discussion_r422718116 ## File path: cpp/src/arrow/util/bit_util.cc ## @@ -273,28 +274,115 @@ void AlignedBitmapOp(const uint8_t* left, int64_t left_offset, const uint8_t* ri } }

[GitHub] [arrow] lidavidm commented on a change in pull request #7130: ARROW-8742: [C++][Python] Add GRPC Mutual TLS for clients and server

2020-05-10 Thread GitBox
lidavidm commented on a change in pull request #7130: URL: https://github.com/apache/arrow/pull/7130#discussion_r422696732 ## File path: cpp/src/arrow/flight/client.cc ## @@ -531,14 +531,19 @@ class FlightClient::FlightClientImpl { std::stringstream grpc_uri; std::s

[GitHub] [arrow] wesm commented on pull request #7135: ARROW-8553: [C++] Optimize unaligned bitmap operations

2020-05-10 Thread GitBox
wesm commented on pull request #7135: URL: https://github.com/apache/arrow/pull/7135#issuecomment-626381392 FWIW the CI failure looks like a transient issue This is an automated message from the Apache Git Service. To respond

[GitHub] [arrow] github-actions[bot] commented on pull request #7147: ARROW-8758: [R] Updates for compatibility with dplyr 1.0

2020-05-10 Thread GitBox
github-actions[bot] commented on pull request #7147: URL: https://github.com/apache/arrow/pull/7147#issuecomment-626369581 https://issues.apache.org/jira/browse/ARROW-8758 This is an automated message from the Apache Git Serv

[GitHub] [arrow] nealrichardson opened a new pull request #7147: ARROW-8758: [R] Updates for compatibility with dplyr 1.0

2020-05-10 Thread GitBox
nealrichardson opened a new pull request #7147: URL: https://github.com/apache/arrow/pull/7147 I tested this locally with the current version of `dplyr` on CRAN and the dev version scheduled to be released to CRAN on May 15. Our tests now pass with both versions. Changes addressed:

[GitHub] [arrow] github-actions[bot] commented on pull request #7146: ARROW-8757: [C++][Plasma] Write Plasma header in big-endian format

2020-05-10 Thread GitBox
github-actions[bot] commented on pull request #7146: URL: https://github.com/apache/arrow/pull/7146#issuecomment-626358208 https://issues.apache.org/jira/browse/ARROW-8757 This is an automated message from the Apache Git Serv

[GitHub] [arrow] kiszk opened a new pull request #7146: ARROW-8757: [C++][Plasma] Write Plasma header in big-endian format

2020-05-10 Thread GitBox
kiszk opened a new pull request #7146: URL: https://github.com/apache/arrow/pull/7146 This PR writes and reads Plasma header (version, type, and length) in the big-endian format. It allows us to make it easy to interpret a header of Plasma data among different endian machines. The c

[GitHub] [arrow] lidavidm commented on pull request #6656: ARROW-8297: [FlightRPC][C++] Implement Flight DoExchange for C++

2020-05-10 Thread GitBox
lidavidm commented on pull request #6656: URL: https://github.com/apache/arrow/pull/6656#issuecomment-626328588 I've rebased and fixed the integration test. I also managed to fix the compilation failure on GCC 4.8. This is a

[GitHub] [arrow] kiszk edited a comment on pull request #7136: ARROW-8486: [C++] Fix BitArray failures on big-endian platforms

2020-05-10 Thread GitBox
kiszk edited a comment on pull request #7136: URL: https://github.com/apache/arrow/pull/7136#issuecomment-626291270 This PR can fix test failures in arrow-utility-tests as shown in https://travis-ci.org/github/apache/arrow/builds/685085866#L2071 https://travis-ci.org/github/apache/arro

[GitHub] [arrow] cyb70289 commented on a change in pull request #7135: ARROW-8553: [C++] Optimize unaligned bitmap operations

2020-05-10 Thread GitBox
cyb70289 commented on a change in pull request #7135: URL: https://github.com/apache/arrow/pull/7135#discussion_r422632059 ## File path: cpp/src/arrow/util/bit_util.cc ## @@ -273,28 +274,115 @@ void AlignedBitmapOp(const uint8_t* left, int64_t left_offset, const uint8_t* ri

[GitHub] [arrow] guyuqi commented on a change in pull request #7121: ARROW-8633: [C++] Add ValidateAscii function

2020-05-10 Thread GitBox
guyuqi commented on a change in pull request #7121: URL: https://github.com/apache/arrow/pull/7121#discussion_r422628358 ## File path: cpp/src/arrow/util/utf8_util_benchmark.cc ## @@ -70,16 +70,44 @@ static void BenchmarkUTF8Validation( state.SetBytesProcessed(state.iteratio

[GitHub] [arrow] cyb70289 commented on a change in pull request #7135: ARROW-8553: [C++] Optimize unaligned bitmap operations

2020-05-10 Thread GitBox
cyb70289 commented on a change in pull request #7135: URL: https://github.com/apache/arrow/pull/7135#discussion_r422619489 ## File path: cpp/src/arrow/util/bit_util.cc ## @@ -273,28 +274,115 @@ void AlignedBitmapOp(const uint8_t* left, int64_t left_offset, const uint8_t* ri

[GitHub] [arrow] kiszk edited a comment on pull request #7136: ARROW-8486: [C++] Fix BitArray failures on big-endian platforms

2020-05-10 Thread GitBox
kiszk edited a comment on pull request #7136: URL: https://github.com/apache/arrow/pull/7136#issuecomment-626275196 I will split this PR into multiple PRs (#7136(this), #7144, #7145). Each PR will focus on a different issue in this test suite. -

[GitHub] [arrow] github-actions[bot] commented on pull request #7145: ARROW-8756: [C++] Fix Bitmap Words tests' failures on big-endian platforms

2020-05-10 Thread GitBox
github-actions[bot] commented on pull request #7145: URL: https://github.com/apache/arrow/pull/7145#issuecomment-626295954 https://issues.apache.org/jira/browse/ARROW-8756 This is an automated message from the Apache Git Serv

[GitHub] [arrow] kiszk opened a new pull request #7145: ARROW-8756: [C++] Fix Bitmap Words tests' failures on big-endian platforms

2020-05-10 Thread GitBox
kiszk opened a new pull request #7145: URL: https://github.com/apache/arrow/pull/7145 This PR adds support of multiple-word operation on big-endian platforms. There are optimized code to concat multiple words into one word with bit shift operations. The current code assumes a little-endian

[GitHub] [arrow] kiszk commented on pull request #7136: ARROW-8486: [C++] Fix BitArray failures on big-endian platforms

2020-05-10 Thread GitBox
kiszk commented on pull request #7136: URL: https://github.com/apache/arrow/pull/7136#issuecomment-626291270 This PR can fix test failures in arrow-utility-tests as shown in https://travis-ci.org/github/apache/arrow/builds/685085866#L2071 https://travis-ci.org/github/apache/arrow/build

[GitHub] [arrow] github-actions[bot] commented on pull request #7144: ARROW-4018: [C++] Fix RLE tests' failures on big-endian platforms

2020-05-10 Thread GitBox
github-actions[bot] commented on pull request #7144: URL: https://github.com/apache/arrow/pull/7144#issuecomment-626290966 https://issues.apache.org/jira/browse/ARROW-4018 This is an automated message from the Apache Git Serv

[GitHub] [arrow] kiszk opened a new pull request #7144: ARROW-4018: [C++] Fix RLE tests' failures on big-endian platforms

2020-05-10 Thread GitBox
kiszk opened a new pull request #7144: URL: https://github.com/apache/arrow/pull/7144 This PR adds big-endian support in RLE related classes. The data for RLE is stored using a little-endian format. The current code assumes a little-endian CPU in two places. 1. `RleDecoder::NextCounts`

[GitHub] [arrow] kou commented on pull request #7138: ARROW-8577: [Plasma][CUDA] Make CUDA initialization lazy

2020-05-10 Thread GitBox
kou commented on pull request #7138: URL: https://github.com/apache/arrow/pull/7138#issuecomment-626289056 +1 I've confirmed that built binaries can work on no CUDA machine. This is an automated message from the Apache Git

[GitHub] [arrow] github-actions[bot] commented on pull request #7136: ARROW-8755: [C++] Fix BitArray failures on big-endian platforms

2020-05-10 Thread GitBox
github-actions[bot] commented on pull request #7136: URL: https://github.com/apache/arrow/pull/7136#issuecomment-626287970 https://issues.apache.org/jira/browse/ARROW-8755 This is an automated message from the Apache Git Serv