[GitHub] [arrow] jorgecarleitao commented on pull request #8670: ARROW-10591: [Rust] Improve take (-20 to -40%)

2020-11-15 Thread GitBox
jorgecarleitao commented on pull request #8670: URL: https://github.com/apache/arrow/pull/8670#issuecomment-727724202 Performance: with the fix on the benchmarks for `take`, this is 2x slower than master. This is an automate

[GitHub] [arrow] cyb70289 commented on pull request #8671: ARROW-10598: [C++] Separate out bit-packing in internal::GenerateBitsUnrolled for better performance

2020-11-15 Thread GitBox
cyb70289 commented on pull request #8671: URL: https://github.com/apache/arrow/pull/8671#issuecomment-727702783 Tested on `Xeon(R) Gold 5218 CPU @ 2.30GHz` (skylake). gcc-7.5 has big improvement. ``` benchmarkbaseline contend

[GitHub] [arrow] cyb70289 commented on a change in pull request #8671: ARROW-10598: [C++] Separate out bit-packing in internal::GenerateBitsUnrolled for better performance

2020-11-15 Thread GitBox
cyb70289 commented on a change in pull request #8671: URL: https://github.com/apache/arrow/pull/8671#discussion_r523856680 ## File path: cpp/src/arrow/util/bitmap_generate.h ## @@ -83,17 +83,19 @@ void GenerateBitsUnrolled(uint8_t* bitmap, int64_t start_offset, int64_t length,

[GitHub] [arrow] drusso commented on pull request #8670: ARROW-10591: [Rust] Improve take (-20 to -40%)

2020-11-15 Thread GitBox
drusso commented on pull request #8670: URL: https://github.com/apache/arrow/pull/8670#issuecomment-727659428 This looks like a neat take on take (pardon the pun!). Is there anything in particular preventing adoption of this approach? -

[GitHub] [arrow] wesm commented on issue #8607: Deletion of existing file when write_table fails

2020-11-15 Thread GitBox
wesm commented on issue #8607: URL: https://github.com/apache/arrow/issues/8607#issuecomment-727649809 Can you please open some Jira issues if there's something to fix or improve in pyarrow? This is an automated message fro

[GitHub] [arrow] wesm closed issue #8607: Deletion of existing file when write_table fails

2020-11-15 Thread GitBox
wesm closed issue #8607: URL: https://github.com/apache/arrow/issues/8607 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

[GitHub] [arrow] wesm closed issue #8646: Predicate pushdown question

2020-11-15 Thread GitBox
wesm closed issue #8646: URL: https://github.com/apache/arrow/issues/8646 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

[GitHub] [arrow] wesm commented on a change in pull request #8660: ARROW-10173: [Rust][DataFusion] Implement support for direct comparison to scalar values

2020-11-15 Thread GitBox
wesm commented on a change in pull request #8660: URL: https://github.com/apache/arrow/pull/8660#discussion_r523824755 ## File path: rust/datafusion/src/physical_plan/mod.rs ## @@ -100,6 +100,30 @@ pub enum Distribution { SinglePartition, } +/// Represents the result fr

[GitHub] [arrow] yordan-pavlov commented on a change in pull request #8660: ARROW-10173: [Rust][DataFusion] Implement support for direct comparison to scalar values

2020-11-15 Thread GitBox
yordan-pavlov commented on a change in pull request #8660: URL: https://github.com/apache/arrow/pull/8660#discussion_r523823571 ## File path: rust/datafusion/src/physical_plan/mod.rs ## @@ -100,6 +100,30 @@ pub enum Distribution { SinglePartition, } +/// Represents the

[GitHub] [arrow] yordan-pavlov commented on pull request #8660: ARROW-10173: [Rust][DataFusion] Implement support for direct comparison to scalar values

2020-11-15 Thread GitBox
yordan-pavlov commented on pull request #8660: URL: https://github.com/apache/arrow/pull/8660#issuecomment-727644829 > I tried to run this code against our simple TPCH Q1 implementation (which is dominated by evaluating expressions against constants), and sadly it hit an error @alam

[GitHub] [arrow] jorgecarleitao commented on pull request #8630: ARROW-10540 [Rust] Improve filtering

2020-11-15 Thread GitBox
jorgecarleitao commented on pull request #8630: URL: https://github.com/apache/arrow/pull/8630#issuecomment-727639074 FYI I was unable to make this sufficiently efficient for the `take`: it has a large performance hit when compared to the current approach. As such, I am not sure if this ca

[GitHub] [arrow] jorgecarleitao commented on pull request #8670: ARROW-10591: [Rust] Improve take (-20 to -40%)

2020-11-15 Thread GitBox
jorgecarleitao commented on pull request #8670: URL: https://github.com/apache/arrow/pull/8670#issuecomment-727638541 I am still not comfortable with this, so I am closing it for now. This is an automated message from the Apa

[GitHub] [arrow] jorgecarleitao closed pull request #8670: ARROW-10591: [Rust] Improve take (-20 to -40%)

2020-11-15 Thread GitBox
jorgecarleitao closed pull request #8670: URL: https://github.com/apache/arrow/pull/8670 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

[GitHub] [arrow] rdettai commented on pull request #8658: ARROW-10577: [Rust][DataFusion] HashAggregator stream finishes unexpectedly after going to Pending state - tests

2020-11-15 Thread GitBox
rdettai commented on pull request #8658: URL: https://github.com/apache/arrow/pull/8658#issuecomment-727635074 @alamb except for Travis that is still stuck, tests are passing after the rebase! If you are okay with the way I structured the test fixture, I think this can be merged! ---

[GitHub] [arrow] Kopilov edited a comment on pull request #8605: ARROW-10508 [Java] Allow FixedSizeListVector to have empty children

2020-11-15 Thread GitBox
Kopilov edited a comment on pull request #8605: URL: https://github.com/apache/arrow/pull/8605#issuecomment-727626118 @liyafan82 seems working correctly in all cases if we add `writer.startList` and `writer.endList` together with `vector.setNull`. I have refactored `writeListVector` func

[GitHub] [arrow] Kopilov commented on pull request #8605: ARROW-10508 [Java] Allow FixedSizeListVector to have empty children

2020-11-15 Thread GitBox
Kopilov commented on pull request #8605: URL: https://github.com/apache/arrow/pull/8605#issuecomment-727626118 @liyafan82 seems working correctly in all cases if we add `writer.startList` and `writer.endList` together with `vector.setNull`. I have refactored `writeListVector` function in

[GitHub] [arrow] github-actions[bot] commented on pull request #8672: ARROW-10574: [Python][Parquet] Enhance hive partition filtering.

2020-11-15 Thread GitBox
github-actions[bot] commented on pull request #8672: URL: https://github.com/apache/arrow/pull/8672#issuecomment-727623661 https://issues.apache.org/jira/browse/ARROW-10574 This is an automated message from the Apache Git Ser

[GitHub] [arrow] wyzhao opened a new pull request #8672: ARROW-10574: [Python][Parquet] Enhance hive partition filtering.

2020-11-15 Thread GitBox
wyzhao opened a new pull request #8672: URL: https://github.com/apache/arrow/pull/8672 I would like to enhance partition filters in methods such as: pyarrow.parquet.ParquetDataset(path, filters) I am proposing the below enhancements: 1. for operator "in", "not in", the v

[GitHub] [arrow] jorgecarleitao closed pull request #8669: ARROW-10596: [Rust] Improve take benchmark

2020-11-15 Thread GitBox
jorgecarleitao closed pull request #8669: URL: https://github.com/apache/arrow/pull/8669 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

[GitHub] [arrow] github-actions[bot] commented on pull request #8640: ARROW-4193: [Rust] Add support for decimal data type

2020-11-15 Thread GitBox
github-actions[bot] commented on pull request #8640: URL: https://github.com/apache/arrow/pull/8640#issuecomment-727602423 https://issues.apache.org/jira/browse/ARROW-4193 This is an automated message from the Apache Git Serv

[GitHub] [arrow] sweb commented on pull request #8640: ARROW-4193: [Rust] Add support for decimal data type

2020-11-15 Thread GitBox
sweb commented on pull request #8640: URL: https://github.com/apache/arrow/pull/8640#issuecomment-727602232 @jorgecarleitao @nevi-me may I ask you for a review? From a functionality point of view I think this is more or less done, however I am very much at the beginning of writing stuff in

[GitHub] [arrow] wesm commented on a change in pull request #8671: ARROW-10598: [C++] Separate out bit-packing in internal::GenerateBitsUnrolled for better performance

2020-11-15 Thread GitBox
wesm commented on a change in pull request #8671: URL: https://github.com/apache/arrow/pull/8671#discussion_r523781685 ## File path: cpp/src/arrow/util/bitmap_generate.h ## @@ -83,17 +83,19 @@ void GenerateBitsUnrolled(uint8_t* bitmap, int64_t start_offset, int64_t length,

[GitHub] [arrow] github-actions[bot] commented on pull request #8671: ARROW-10598: [C++] Separate out bit-packing in internal::GenerateBitsUnrolled for better performance

2020-11-15 Thread GitBox
github-actions[bot] commented on pull request #8671: URL: https://github.com/apache/arrow/pull/8671#issuecomment-727597667 https://issues.apache.org/jira/browse/ARROW-10598 This is an automated message from the Apache Git Ser

[GitHub] [arrow] wesm commented on pull request #8671: ARROW-10598: [C++] Separate out bit-packing in internal::GenerateBitsUnrolled for better performance

2020-11-15 Thread GitBox
wesm commented on pull request #8671: URL: https://github.com/apache/arrow/pull/8671#issuecomment-727597367 @cyb70289 @pitrou @jianxind since you've all been doing some kernel performance grinding you might be interested in helping investigate this further to see if there's more performanc

[GitHub] [arrow] wesm opened a new pull request #8671: ARROW-10598: [C++] Separate out bit-packing in internal::GenerateBitsUnrolled for better performance

2020-11-15 Thread GitBox
wesm opened a new pull request #8671: URL: https://github.com/apache/arrow/pull/8671 This issue was pointed out to me by some folks at UWisc and just getting around to trying out a fix. This produces a substantial performance benefit in numerical comparisons with gcc8 ```

[GitHub] [arrow] nevi-me commented on pull request #8666: ARROW-10269: [Rust] Update to 2020-11-14 nightly

2020-11-15 Thread GitBox
nevi-me commented on pull request #8666: URL: https://github.com/apache/arrow/pull/8666#issuecomment-727588237 @alamb thanks. A quick solution for many of the lints, is to run clippy-fix with -Z unstable-options, so that one doesn't manually fix them.

[GitHub] [arrow] jorgecarleitao commented on pull request #8667: ARROW-10590: [Rust] Remove Date32(Millisecond) from casts

2020-11-15 Thread GitBox
jorgecarleitao commented on pull request #8667: URL: https://github.com/apache/arrow/pull/8667#issuecomment-727587052 ahaha, I only now realized that it was written `Int16`: I read `Float16` xD This is an automated message fr

[GitHub] [arrow] nevi-me commented on pull request #8667: ARROW-10590: [Rust] Remove Date32(Millisecond) from casts

2020-11-15 Thread GitBox
nevi-me commented on pull request #8667: URL: https://github.com/apache/arrow/pull/8667#issuecomment-727586356 I meant Float16 there, oops This is an automated message from the Apache Git Service. To respond to the message, p

[GitHub] [arrow] jorgecarleitao commented on pull request #8665: ARROW-10589: [Rust] Implement AVX-512 bit and operation

2020-11-15 Thread GitBox
jorgecarleitao commented on pull request #8665: URL: https://github.com/apache/arrow/pull/8665#issuecomment-727584271 Running the CI on separate feature fits exactly what I was thinking about. IMO we could then benefit from some clarity over how we merge PRs from here on: do we wait

[GitHub] [arrow] andygrove commented on pull request #8665: ARROW-10589: [Rust] Implement AVX-512 bit and operation

2020-11-15 Thread GitBox
andygrove commented on pull request #8665: URL: https://github.com/apache/arrow/pull/8665#issuecomment-727583378 I can see both sides of the argument here but I would be supportive of merging this PR as long as we have a JIRA filed to follow up on the CI support (which I agree is really im

[GitHub] [arrow] nevi-me commented on pull request #8665: ARROW-10589: [Rust] Implement AVX-512 bit and operation

2020-11-15 Thread GitBox
nevi-me commented on pull request #8665: URL: https://github.com/apache/arrow/pull/8665#issuecomment-727578474 Reran the tests, failure now confirmed: https://github.com/nevi-me/arrow/runs/1402675504#step:8:2084 The C++ implementation has avx512 support, so maybe @kszucs or someone e

[GitHub] [arrow] jorgecarleitao commented on a change in pull request #8401: ARROW-10109: [Rust] Add support to the C data interface for primitive types and utf8

2020-11-15 Thread GitBox
jorgecarleitao commented on a change in pull request #8401: URL: https://github.com/apache/arrow/pull/8401#discussion_r523763928 ## File path: rust/arrow/src/ffi.rs ## @@ -0,0 +1,657 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor licen

[GitHub] [arrow] vertexclique commented on pull request #8665: ARROW-10589: [Rust] Implement AVX-512 bit and operation

2020-11-15 Thread GitBox
vertexclique commented on pull request #8665: URL: https://github.com/apache/arrow/pull/8665#issuecomment-727575454 > Okay, great. For tracking purposes, once we find a resolution to the CI question; we should open an umbrella JIRA to implement avx512 for other simd equivalents. I'm assumi

[GitHub] [arrow] jorgecarleitao commented on a change in pull request #8401: ARROW-10109: [Rust] Add support to the C data interface for primitive types and utf8

2020-11-15 Thread GitBox
jorgecarleitao commented on a change in pull request #8401: URL: https://github.com/apache/arrow/pull/8401#discussion_r523763431 ## File path: rust/arrow/src/bytes.rs ## @@ -0,0 +1,166 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor lic

[GitHub] [arrow] nevi-me commented on pull request #8665: ARROW-10589: [Rust] Implement AVX-512 bit and operation

2020-11-15 Thread GitBox
nevi-me commented on pull request #8665: URL: https://github.com/apache/arrow/pull/8665#issuecomment-727574928 > So the person who enables `avx512` should use avx512 feature or fallback to autovec. Not need to bring packed_simd or something else to do fallback. Okay, great. For track

[GitHub] [arrow] jorgecarleitao commented on a change in pull request #8401: ARROW-10109: [Rust] Add support to the C data interface for primitive types and utf8

2020-11-15 Thread GitBox
jorgecarleitao commented on a change in pull request #8401: URL: https://github.com/apache/arrow/pull/8401#discussion_r523763314 ## File path: rust/arrow/src/buffer.rs ## @@ -151,70 +80,52 @@ impl Buffer { /// /// * `ptr` - Pointer to raw parts /// * `len` - Leng

[GitHub] [arrow] vertexclique commented on pull request #8665: ARROW-10589: [Rust] Implement AVX-512 bit and operation

2020-11-15 Thread GitBox
vertexclique commented on pull request #8665: URL: https://github.com/apache/arrow/pull/8665#issuecomment-727574114 > @vertexclique from my reviewing the code, it looked like simd and avx512 were able to co-exist, but I'm getting some errors on the macOS CI (https://github.com/nevi-me/arro

[GitHub] [arrow] nevi-me commented on pull request #8665: ARROW-10589: [Rust] Implement AVX-512 bit and operation

2020-11-15 Thread GitBox
nevi-me commented on pull request #8665: URL: https://github.com/apache/arrow/pull/8665#issuecomment-727573373 @vertexclique from my reviewing the code, it looked like `simd` and `avx512` were able to co-exist, but I'm getting some errors on the macOS CI (https://github.com/nevi-me/arrow/r

[GitHub] [arrow] nevi-me commented on pull request #8665: ARROW-10589: [Rust] Implement AVX-512 bit and operation

2020-11-15 Thread GitBox
nevi-me commented on pull request #8665: URL: https://github.com/apache/arrow/pull/8665#issuecomment-727572262 I'm running a CI job at https://github.com/nevi-me/arrow/runs/1402635214, which includes `"simd avx512"` features. We do test "simd" in one of the CI tasks. Perhaps I didn't

[GitHub] [arrow] jorgecarleitao commented on pull request #8665: ARROW-10589: [Rust] Implement AVX-512 bit and operation

2020-11-15 Thread GitBox
jorgecarleitao commented on pull request #8665: URL: https://github.com/apache/arrow/pull/8665#issuecomment-727571958 @vertexclique [here](https://github.com/apache/arrow/blob/master/ci/scripts/rust_test.sh#L34). The timings are important: e.g. we decided to not place coverage on every PR

[GitHub] [arrow] vertexclique commented on pull request #8665: ARROW-10589: [Rust] Implement AVX-512 bit and operation

2020-11-15 Thread GitBox
vertexclique commented on pull request #8665: URL: https://github.com/apache/arrow/pull/8665#issuecomment-727571404 feature `simd` is not tested on this CI too. Am I missing something? This is an automated message from the Ap

[GitHub] [arrow] alamb closed pull request #8654: ARROW-10572: [Rust][DataFusion] Use aHash instead of FnvHashMap

2020-11-15 Thread GitBox
alamb closed pull request #8654: URL: https://github.com/apache/arrow/pull/8654 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [arrow] alamb commented on pull request #8630: ARROW-10540 [Rust] Improve filtering

2020-11-15 Thread GitBox
alamb commented on pull request #8630: URL: https://github.com/apache/arrow/pull/8630#issuecomment-727569496 I will try and review this carefully tomorrow This is an automated message from the Apache Git Service. To respond t

[GitHub] [arrow] alamb commented on pull request #8654: ARROW-10572: [Rust][DataFusion] Use aHash instead of FnvHashMap

2020-11-15 Thread GitBox
alamb commented on pull request #8654: URL: https://github.com/apache/arrow/pull/8654#issuecomment-727569404 I think this is good to go -- what do you think @jorgecarleitao / @andygrove ? This is an automated message from t

[GitHub] [arrow] vertexclique commented on pull request #8665: ARROW-10589: [Rust] Implement AVX-512 bit and operation

2020-11-15 Thread GitBox
vertexclique commented on pull request #8665: URL: https://github.com/apache/arrow/pull/8665#issuecomment-727568453 > The main thing we have to be wary of, which I'm fine if we test locally and confirm; is whether anything here breaks the arrow crate building on stable. I haven't tested th

[GitHub] [arrow] jorgecarleitao commented on pull request #8667: ARROW-10590: [Rust] Remove Date32(Millisecond) from casts

2020-11-15 Thread GitBox
jorgecarleitao commented on pull request #8667: URL: https://github.com/apache/arrow/pull/8667#issuecomment-727568073 (I had to look it up to understand why this was crashing: I did not know this either until today xD) This

[GitHub] [arrow] github-actions[bot] commented on pull request #8670: ARROW-10591: [Rust] Improve take (-20 to -40%)

2020-11-15 Thread GitBox
github-actions[bot] commented on pull request #8670: URL: https://github.com/apache/arrow/pull/8670#issuecomment-727566672 https://issues.apache.org/jira/browse/ARROW-10591 This is an automated message from the Apache Git Ser

[GitHub] [arrow] jorgecarleitao closed pull request #8666: ARROW-10269: [Rust] Update to 2020-11-14 nightly

2020-11-15 Thread GitBox
jorgecarleitao closed pull request #8666: URL: https://github.com/apache/arrow/pull/8666 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

[GitHub] [arrow] jorgecarleitao opened a new pull request #8670: ARROW-10591: [Rust] Improve take (-20 to -40%)

2020-11-15 Thread GitBox
jorgecarleitao opened a new pull request #8670: URL: https://github.com/apache/arrow/pull/8670 TL;DR: * Add support to filter `StructArray` * fixes 2 bugs in `take` of `StructArray` and `ListArray` * speedup `take` by 1.2-1.9 ## Motivation Same motivation as #8630

[GitHub] [arrow] nevi-me closed pull request #8668: ARROW-10595: [Rust] Simplify inner loop of min/max kernels for non-null case

2020-11-15 Thread GitBox
nevi-me closed pull request #8668: URL: https://github.com/apache/arrow/pull/8668 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

[GitHub] [arrow] alamb closed pull request #8662: ARROW-9911: [Rust][DataFusion] SELECT with no FROM clause should produce a single row of output

2020-11-15 Thread GitBox
alamb closed pull request #8662: URL: https://github.com/apache/arrow/pull/8662 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [arrow] vertexclique commented on pull request #8669: ARROW-10596: [Rust] Improve take benchmark

2020-11-15 Thread GitBox
vertexclique commented on pull request #8669: URL: https://github.com/apache/arrow/pull/8669#issuecomment-727563881 > it is always taking the same element (indices is a constant vector), which is very easy to speculatively predict what the element is going to be. Yes, most benchmarks

[GitHub] [arrow] alamb commented on pull request #8662: ARROW-9911: [Rust][DataFusion] SELECT with no FROM clause should produce a single row of output

2020-11-15 Thread GitBox
alamb commented on pull request #8662: URL: https://github.com/apache/arrow/pull/8662#issuecomment-727563878 Again, something is gummed up with the travis CI jobs https://github.com/apache/arrow/pull/8662/checks?check_run_id=1400052607 -- but they don't test rust, so I am going to merge an

[GitHub] [arrow] nevi-me commented on pull request #8666: ARROW-10269: [Rust] Update to 2020-11-14 nightly

2020-11-15 Thread GitBox
nevi-me commented on pull request #8666: URL: https://github.com/apache/arrow/pull/8666#issuecomment-727563899 I don't see this being a big risk to open PRs (as some already have merge conflicts). We run 36 CI jobs for this, vs the normal 8; so maybe there's an efficiency argument in mergi

[GitHub] [arrow] nevi-me commented on pull request #8665: ARROW-10589: [Rust] Implement AVX-512 bit and operation

2020-11-15 Thread GitBox
nevi-me commented on pull request #8665: URL: https://github.com/apache/arrow/pull/8665#issuecomment-727563248 > Is this being tested on the CI? I did not see any changes to the CI to test with that feature. The main thing we have to be wary of, which I'm fine if we test locally and

[GitHub] [arrow] vertexclique edited a comment on pull request #8665: ARROW-10589: [Rust] Implement AVX-512 bit and operation

2020-11-15 Thread GitBox
vertexclique edited a comment on pull request #8665: URL: https://github.com/apache/arrow/pull/8665#issuecomment-727560444 I will do that in a min. edit: @nevi-me updated. This is an automated message from the Apache Git Serv

[GitHub] [arrow] Dandandan commented on a change in pull request #8654: ARROW-10572: [Rust][DataFusion] Use aHash instead of FnvHashMap

2020-11-15 Thread GitBox
Dandandan commented on a change in pull request #8654: URL: https://github.com/apache/arrow/pull/8654#discussion_r523752932 ## File path: rust/datafusion/src/physical_plan/hash_aggregate.rs ## @@ -882,25 +867,12 @@ mod tests { assert_eq!(batch.num_columns(), 2);

[GitHub] [arrow] alamb commented on a change in pull request #8654: ARROW-10572: [Rust][DataFusion] Use aHash instead of FnvHashMap

2020-11-15 Thread GitBox
alamb commented on a change in pull request #8654: URL: https://github.com/apache/arrow/pull/8654#discussion_r523752313 ## File path: rust/datafusion/src/physical_plan/hash_aggregate.rs ## @@ -882,25 +867,12 @@ mod tests { assert_eq!(batch.num_columns(), 2); a

[GitHub] [arrow] alamb commented on a change in pull request #8654: ARROW-10572: [Rust][DataFusion] Use aHash instead of FnvHashMap

2020-11-15 Thread GitBox
alamb commented on a change in pull request #8654: URL: https://github.com/apache/arrow/pull/8654#discussion_r523752313 ## File path: rust/datafusion/src/physical_plan/hash_aggregate.rs ## @@ -882,25 +867,12 @@ mod tests { assert_eq!(batch.num_columns(), 2); a

[GitHub] [arrow] alamb commented on pull request #8658: ARROW-10577: [Rust][DataFusion] HashAggregator stream finishes unexpectedly after going to Pending state - tests

2020-11-15 Thread GitBox
alamb commented on pull request #8658: URL: https://github.com/apache/arrow/pull/8658#issuecomment-727561962 FYI https://github.com/apache/arrow/pull/8553 has been merged This is an automated message from the Apache Git Servi

[GitHub] [arrow] github-actions[bot] commented on pull request #8669: ARROW-10596: [Rust] Improve take benchmark

2020-11-15 Thread GitBox
github-actions[bot] commented on pull request #8669: URL: https://github.com/apache/arrow/pull/8669#issuecomment-727561941 https://issues.apache.org/jira/browse/ARROW-10596 This is an automated message from the Apache Git Ser

[GitHub] [arrow] jorgecarleitao commented on pull request #8666: ARROW-10269: [Rust] Update to 2020-11-14 nightly

2020-11-15 Thread GitBox
jorgecarleitao commented on pull request #8666: URL: https://github.com/apache/arrow/pull/8666#issuecomment-727561098 Do we wait for some of the PRs to be flushed, or do we merge this? This is an automated message from the Ap

[GitHub] [arrow] vertexclique edited a comment on pull request #8665: ARROW-10589: [Rust] Implement AVX-512 bit and operation

2020-11-15 Thread GitBox
vertexclique edited a comment on pull request #8665: URL: https://github.com/apache/arrow/pull/8665#issuecomment-727560444 I will do that in a min. This is an automated message from the Apache Git Service. To respond to the m

[GitHub] [arrow] vertexclique commented on pull request #8665: ARROW-10589: [Rust] Implement AVX-512 bit and operation

2020-11-15 Thread GitBox
vertexclique commented on pull request #8665: URL: https://github.com/apache/arrow/pull/8665#issuecomment-727560444 I will do that. This is an automated message from the Apache Git Service. To respond to the message, please l

[GitHub] [arrow] jorgecarleitao opened a new pull request #8669: ARROW-10596: [Rust] Improve take benchmark

2020-11-15 Thread GitBox
jorgecarleitao opened a new pull request #8669: URL: https://github.com/apache/arrow/pull/8669 This fixes 3 issues on the take benchmark: 1. it is always taking the same element (indices is a constant vector), which is very easy to speculatively predict what the element is going to b

[GitHub] [arrow] vertexclique edited a comment on pull request #8664: ARROW-10588: [Rust] Safe bit operations for Arrow

2020-11-15 Thread GitBox
vertexclique edited a comment on pull request #8664: URL: https://github.com/apache/arrow/pull/8664#issuecomment-727560018 This is using safe bitvec interface to manage bits. I worked on top of my existing PR. Since that didn't change all code to use safe operations, now this does. I squas

[GitHub] [arrow] vertexclique commented on pull request #8664: ARROW-10588: [Rust] Safe bit operations for Arrow

2020-11-15 Thread GitBox
vertexclique commented on pull request #8664: URL: https://github.com/apache/arrow/pull/8664#issuecomment-727560018 This is using safe bitvec interface to manage bits. I worked on top of my existing PR. Since that didn't change all code to use safe operations, now this does. I squashed all

[GitHub] [arrow] jorgecarleitao commented on a change in pull request #8654: ARROW-10572: [Rust][DataFusion] Use aHash instead of FnvHashMap

2020-11-15 Thread GitBox
jorgecarleitao commented on a change in pull request #8654: URL: https://github.com/apache/arrow/pull/8654#discussion_r523749603 ## File path: rust/datafusion/src/physical_plan/hash_aggregate.rs ## @@ -882,25 +867,12 @@ mod tests { assert_eq!(batch.num_columns(), 2);

[GitHub] [arrow] github-actions[bot] commented on pull request #8668: ARROW-10595: [Rust] Simplify inner loop of min/max kernels for non-null case

2020-11-15 Thread GitBox
github-actions[bot] commented on pull request #8668: URL: https://github.com/apache/arrow/pull/8668#issuecomment-727555281 https://issues.apache.org/jira/browse/ARROW-10595 This is an automated message from the Apache Git Ser

[GitHub] [arrow] Dandandan opened a new pull request #8668: ARROW-10595: [Rust] Simplify inner loop of min/max kernels for non-null case

2020-11-15 Thread GitBox
Dandandan opened a new pull request #8668: URL: https://github.com/apache/arrow/pull/8668 Keeping track of `has_value` is not needed when there are no nulls in the array, we can just pick the first item. This is an automated

[GitHub] [arrow] codecov-io commented on pull request #8666: ARROW-10269: [Rust] Update to 2020-11-14 nightly

2020-11-15 Thread GitBox
codecov-io commented on pull request #8666: URL: https://github.com/apache/arrow/pull/8666#issuecomment-727552812 # [Codecov](https://codecov.io/gh/apache/arrow/pull/8666?src=pr&el=h1) Report > Merging [#8666](https://codecov.io/gh/apache/arrow/pull/8666?src=pr&el=desc) (c094456) into

[GitHub] [arrow] nevi-me commented on pull request #8666: ARROW-10269: [Rust] Update to 2020-11-14 nightly

2020-11-15 Thread GitBox
nevi-me commented on pull request #8666: URL: https://github.com/apache/arrow/pull/8666#issuecomment-727552341 > Segfault on the coverage, outch :/ Found and fixed the problem. We were using an older cached version of `cargo-tarpaulin`, but the latest version has a bug; so I've force

[GitHub] [arrow] nevi-me commented on a change in pull request #8634: ARROW-10535: [Rust] Implement bit & byte slice iterator for big endian platforms

2020-11-15 Thread GitBox
nevi-me commented on a change in pull request #8634: URL: https://github.com/apache/arrow/pull/8634#discussion_r523743095 ## File path: rust/arrow/src/util/bit_slice_iterator.rs ## @@ -0,0 +1,108 @@ +#[cfg(all(test, target_endian = "big"))] Review comment: File needs li

[GitHub] [arrow] nevi-me commented on pull request #8664: ARROW-10588: [Rust] Safe bit operations for Arrow

2020-11-15 Thread GitBox
nevi-me commented on pull request #8664: URL: https://github.com/apache/arrow/pull/8664#issuecomment-727551719 @vertexclique did you work on this on top of #8663 from @jhorstmann (or at least both PRs remove the popcnt table)? The changes here look reasonable to me, so I can review after

[GitHub] [arrow] nevi-me closed pull request #8663: ARROW-10079: [Rust] Benchmark and improve count bits

2020-11-15 Thread GitBox
nevi-me closed pull request #8663: URL: https://github.com/apache/arrow/pull/8663 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

[GitHub] [arrow] nevi-me closed pull request #8667: ARROW-10590: [Rust] Remove Date32(Millisecond) from casts

2020-11-15 Thread GitBox
nevi-me closed pull request #8667: URL: https://github.com/apache/arrow/pull/8667 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

[GitHub] [arrow] nevi-me commented on pull request #8667: ARROW-10590: [Rust] Remove Date32(Millisecond) from casts

2020-11-15 Thread GitBox
nevi-me commented on pull request #8667: URL: https://github.com/apache/arrow/pull/8667#issuecomment-727537758 > There is a broader problem to be solved, on which we should remove the possibility to create these variants, but I left it as a separate exercise (as it is a bit more disruptive

[GitHub] [arrow] nevi-me commented on pull request #8666: ARROW-10269: [Rust] Update to 2020-11-14 nightly

2020-11-15 Thread GitBox
nevi-me commented on pull request #8666: URL: https://github.com/apache/arrow/pull/8666#issuecomment-727536464 > Segfault on the coverage, outch :/ I want to look at whether we can use the new llvm-based coverage instead of tarpaulin

[GitHub] [arrow] github-actions[bot] commented on pull request #8667: ARROW-10590: [Rust] Remove Date32(Millisecond) from casts

2020-11-15 Thread GitBox
github-actions[bot] commented on pull request #8667: URL: https://github.com/apache/arrow/pull/8667#issuecomment-727536050 https://issues.apache.org/jira/browse/ARROW-10590 This is an automated message from the Apache Git Ser

[GitHub] [arrow] jorgecarleitao commented on pull request #8666: ARROW-10269: [Rust] Update to 2020-11-14 nightly

2020-11-15 Thread GitBox
jorgecarleitao commented on pull request #8666: URL: https://github.com/apache/arrow/pull/8666#issuecomment-727535047 Segfault on the coverage, outch :/ This is an automated message from the Apache Git Service. To respond to

[GitHub] [arrow] jorgecarleitao opened a new pull request #8667: ARROW-10590: [Rust] Remove Date32(Millisecond) from casts

2020-11-15 Thread GitBox
jorgecarleitao opened a new pull request #8667: URL: https://github.com/apache/arrow/pull/8667 In the arrow spec, Date32 and Date64 are always in days and milliseconds respectively. This removes them from a test about casting. There is a broader problem to be solved, on which we shou

[GitHub] [arrow] github-actions[bot] commented on pull request #8666: ARROW-10269: [Rust] Update to 2020-11-14 nightly

2020-11-15 Thread GitBox
github-actions[bot] commented on pull request #8666: URL: https://github.com/apache/arrow/pull/8666#issuecomment-727531762 https://issues.apache.org/jira/browse/ARROW-10269 This is an automated message from the Apache Git Ser