[GitHub] [arrow] emkornfield commented on pull request #6985: ARROW-8413: [C++][Parquet] Refactor Generating validity bitmap for values column

2020-06-03 Thread GitBox
emkornfield commented on pull request #6985: URL: https://github.com/apache/arrow/pull/6985#issuecomment-638620462 @wesm I expect your PR to get merged first but if it doesn't consider using PopCount methods in this one. Thi

[GitHub] [arrow] emkornfield commented on pull request #6985: ARROW-8413: [C++][Parquet] Refactor Generating validity bitmap for values column

2020-06-03 Thread GitBox
emkornfield commented on pull request #6985: URL: https://github.com/apache/arrow/pull/6985#issuecomment-638620176 @pitrou I think the C GLib build failure looks unrelated to me. I fixed the s390x build. This is an automat

[GitHub] [arrow] emkornfield commented on pull request #6985: ARROW-8413: [C++][Parquet] Refactor Generating validity bitmap for values column

2020-06-02 Thread GitBox
emkornfield commented on pull request #6985: URL: https://github.com/apache/arrow/pull/6985#issuecomment-637334193 OK build is green (side note I don't think this will compile on s390x, not sure if that is a blocker). This i

[GitHub] [arrow] emkornfield commented on pull request #6985: ARROW-8413: [C++][Parquet] Refactor Generating validity bitmap for values column

2020-06-01 Thread GitBox
emkornfield commented on pull request #6985: URL: https://github.com/apache/arrow/pull/6985#issuecomment-637187135 As long as @pitrou is happy with my workaround to avoid inflicting pain on AMD processor users. I think I need to fix one signed conversion bug to make CI happy. I shou

[GitHub] [arrow] emkornfield commented on pull request #6985: ARROW-8413: [C++][Parquet] Refactor Generating validity bitmap for values column

2020-05-21 Thread GitBox
emkornfield commented on pull request #6985: URL: https://github.com/apache/arrow/pull/6985#issuecomment-632468275 @pitrou I updated this PR to require AVX512 for the use of BMI2 codepath. I think this should prevent issues for AMD architectures (I don't believe there are any that support

[GitHub] [arrow] emkornfield commented on pull request #6985: ARROW-8413: [C++][Parquet] Refactor Generating validity bitmap for values column

2020-05-18 Thread GitBox
emkornfield commented on pull request #6985: URL: https://github.com/apache/arrow/pull/6985#issuecomment-630572790 > So the BMI2 instructions incur a large slowdown here (even catastrophic with gcc). I would expect bad numbers with GCC 7.5 since we lose autovectorization and use BMI

[GitHub] [arrow] emkornfield commented on pull request #6985: ARROW-8413: [C++][Parquet] Refactor Generating validity bitmap for values column

2020-05-16 Thread GitBox
emkornfield commented on pull request #6985: URL: https://github.com/apache/arrow/pull/6985#issuecomment-629737181 @pitrou https://github.com/apache/arrow/pull/6985/commits/a36dc55efe3def27eef413b8e17567ec400a01ff adds a benchmark. Should be cherry-pickable to a clean branch for compariso

[GitHub] [arrow] emkornfield commented on pull request #6985: ARROW-8413: [C++][Parquet] Refactor Generating validity bitmap for values column

2020-05-14 Thread GitBox
emkornfield commented on pull request #6985: URL: https://github.com/apache/arrow/pull/6985#issuecomment-628755223 > As a sidenote, it seems BMI2 may be problematic on AMD CPUs: https://twitter.com/trav_downs/status/1202793097962364928 Yea we need some benchmarks to see if the per

[GitHub] [arrow] emkornfield commented on pull request #6985: ARROW-8413: [C++][Parquet] Refactor Generating validity bitmap for values column

2020-05-13 Thread GitBox
emkornfield commented on pull request #6985: URL: https://github.com/apache/arrow/pull/6985#issuecomment-628370216 @pitrou Doing some investigation it appears GCC 7.5 fails to auto-vectorize the comparison of levels. GCC 8.3 does autovectorize. Are we stuck with GCC 7.5 for releasing?

[GitHub] [arrow] emkornfield commented on pull request #6985: ARROW-8413: [C++][Parquet] Refactor Generating validity bitmap for values column

2020-05-13 Thread GitBox
emkornfield commented on pull request #6985: URL: https://github.com/apache/arrow/pull/6985#issuecomment-628364641 OK it seems like clang and GCC perform very differently. Clang-8 new code vs master: ![image](https://user-images.githubusercontent.com/17869838/81888336-ff86bb00-955

[GitHub] [arrow] emkornfield commented on pull request #6985: ARROW-8413: [C++][Parquet] Refactor Generating validity bitmap for values column

2020-05-08 Thread GitBox
emkornfield commented on pull request #6985: URL: https://github.com/apache/arrow/pull/6985#issuecomment-626100172 @pitrou did you want to take another look? @nealrichardson should I hold off until we an revert the change to see if the test still fails?

[GitHub] [arrow] emkornfield commented on pull request #6985: ARROW-8413: [C++][Parquet] Refactor Generating validity bitmap for values column

2020-05-03 Thread GitBox
emkornfield commented on pull request #6985: URL: https://github.com/apache/arrow/pull/6985#issuecomment-623271321 @nealrichardson looks like a different error now? I think I probably want to use the scalar version for 32-bit builds. Is there an example in the code that provide the right

[GitHub] [arrow] emkornfield commented on pull request #6985: ARROW-8413: [C++][Parquet] Refactor Generating validity bitmap for values column

2020-05-03 Thread GitBox
emkornfield commented on pull request #6985: URL: https://github.com/apache/arrow/pull/6985#issuecomment-623249922 The separate .cc files might work as, we should probably cleanup macro naming so we can distinguish between min and max SSE instruction se to use (I imagine some people won't

[GitHub] [arrow] emkornfield commented on pull request #6985: ARROW-8413: [C++][Parquet] Refactor Generating validity bitmap for values column

2020-05-01 Thread GitBox
emkornfield commented on pull request #6985: URL: https://github.com/apache/arrow/pull/6985#issuecomment-622422074 @nealrichardson i'm not sure what the error is saying? also 32-bit R didn't realize that is still a thing :)

[GitHub] [arrow] emkornfield commented on pull request #6985: ARROW-8413: [C++][Parquet] Refactor Generating validity bitmap for values column

2020-04-30 Thread GitBox
emkornfield commented on pull request #6985: URL: https://github.com/apache/arrow/pull/6985#issuecomment-622266880 > One problem with introducing more SIMD code is that we won't yet have a runtime dispatching strategy. We will need to go through all of our SIMD accelerations in this librar

[GitHub] [arrow] emkornfield commented on pull request #6985: ARROW-8413: [C++][Parquet] Refactor Generating validity bitmap for values column

2020-04-30 Thread GitBox
emkornfield commented on pull request #6985: URL: https://github.com/apache/arrow/pull/6985#issuecomment-621974840 @wesm wanted to make sure this is still on your radar This is an automated message from the Apache Git Service

[GitHub] [arrow] emkornfield commented on pull request #6985: ARROW-8413: [C++][Parquet] Refactor Generating validity bitmap for values column

2020-04-25 Thread GitBox
emkornfield commented on pull request #6985: URL: https://github.com/apache/arrow/pull/6985#issuecomment-619488093 @pitrou I think I addressed your comments. One of them that went stale was the complexity for "AppendWord", I tried to remove parts that did not seem to affect performance on