jorgecarleitao commented on pull request #8670:
URL: https://github.com/apache/arrow/pull/8670#issuecomment-727724202
Performance: with the fix on the benchmarks for `take`, this is 2x slower
than master.
This is an automate
cyb70289 commented on pull request #8671:
URL: https://github.com/apache/arrow/pull/8671#issuecomment-727702783
Tested on `Xeon(R) Gold 5218 CPU @ 2.30GHz` (skylake).
gcc-7.5 has big improvement.
```
benchmarkbaseline
contend
cyb70289 commented on a change in pull request #8671:
URL: https://github.com/apache/arrow/pull/8671#discussion_r523856680
##
File path: cpp/src/arrow/util/bitmap_generate.h
##
@@ -83,17 +83,19 @@ void GenerateBitsUnrolled(uint8_t* bitmap, int64_t
start_offset, int64_t length,
drusso commented on pull request #8670:
URL: https://github.com/apache/arrow/pull/8670#issuecomment-727659428
This looks like a neat take on take (pardon the pun!). Is there anything in
particular preventing adoption of this approach?
-
wesm commented on issue #8607:
URL: https://github.com/apache/arrow/issues/8607#issuecomment-727649809
Can you please open some Jira issues if there's something to fix or improve
in pyarrow?
This is an automated message fro
wesm closed issue #8607:
URL: https://github.com/apache/arrow/issues/8607
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specif
wesm closed issue #8646:
URL: https://github.com/apache/arrow/issues/8646
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specif
wesm commented on a change in pull request #8660:
URL: https://github.com/apache/arrow/pull/8660#discussion_r523824755
##
File path: rust/datafusion/src/physical_plan/mod.rs
##
@@ -100,6 +100,30 @@ pub enum Distribution {
SinglePartition,
}
+/// Represents the result fr
yordan-pavlov commented on a change in pull request #8660:
URL: https://github.com/apache/arrow/pull/8660#discussion_r523823571
##
File path: rust/datafusion/src/physical_plan/mod.rs
##
@@ -100,6 +100,30 @@ pub enum Distribution {
SinglePartition,
}
+/// Represents the
yordan-pavlov commented on pull request #8660:
URL: https://github.com/apache/arrow/pull/8660#issuecomment-727644829
> I tried to run this code against our simple TPCH Q1 implementation (which
is dominated by evaluating expressions against constants), and sadly it hit an
error
@alam
jorgecarleitao commented on pull request #8630:
URL: https://github.com/apache/arrow/pull/8630#issuecomment-727639074
FYI I was unable to make this sufficiently efficient for the `take`: it has
a large performance hit when compared to the current approach. As such, I am
not sure if this ca
jorgecarleitao commented on pull request #8670:
URL: https://github.com/apache/arrow/pull/8670#issuecomment-727638541
I am still not comfortable with this, so I am closing it for now.
This is an automated message from the Apa
jorgecarleitao closed pull request #8670:
URL: https://github.com/apache/arrow/pull/8670
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to g
rdettai commented on pull request #8658:
URL: https://github.com/apache/arrow/pull/8658#issuecomment-727635074
@alamb except for Travis that is still stuck, tests are passing after the
rebase! If you are okay with the way I structured the test fixture, I think
this can be merged!
---
Kopilov edited a comment on pull request #8605:
URL: https://github.com/apache/arrow/pull/8605#issuecomment-727626118
@liyafan82 seems working correctly in all cases if we add `writer.startList`
and `writer.endList` together with `vector.setNull`.
I have refactored `writeListVector` func
Kopilov commented on pull request #8605:
URL: https://github.com/apache/arrow/pull/8605#issuecomment-727626118
@liyafan82 seems working correctly in all cases if we add `writer.startList`
and `writer.endList` together with `vector.setNull`.
I have refactored `writeListVector` function in
github-actions[bot] commented on pull request #8672:
URL: https://github.com/apache/arrow/pull/8672#issuecomment-727623661
https://issues.apache.org/jira/browse/ARROW-10574
This is an automated message from the Apache Git Ser
wyzhao opened a new pull request #8672:
URL: https://github.com/apache/arrow/pull/8672
I would like to enhance partition filters in methods such as:
pyarrow.parquet.ParquetDataset(path, filters)
I am proposing the below enhancements:
1. for operator "in", "not in", the v
jorgecarleitao closed pull request #8669:
URL: https://github.com/apache/arrow/pull/8669
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to g
github-actions[bot] commented on pull request #8640:
URL: https://github.com/apache/arrow/pull/8640#issuecomment-727602423
https://issues.apache.org/jira/browse/ARROW-4193
This is an automated message from the Apache Git Serv
sweb commented on pull request #8640:
URL: https://github.com/apache/arrow/pull/8640#issuecomment-727602232
@jorgecarleitao @nevi-me may I ask you for a review? From a functionality
point of view I think this is more or less done, however I am very much at the
beginning of writing stuff in
wesm commented on a change in pull request #8671:
URL: https://github.com/apache/arrow/pull/8671#discussion_r523781685
##
File path: cpp/src/arrow/util/bitmap_generate.h
##
@@ -83,17 +83,19 @@ void GenerateBitsUnrolled(uint8_t* bitmap, int64_t
start_offset, int64_t length,
github-actions[bot] commented on pull request #8671:
URL: https://github.com/apache/arrow/pull/8671#issuecomment-727597667
https://issues.apache.org/jira/browse/ARROW-10598
This is an automated message from the Apache Git Ser
wesm commented on pull request #8671:
URL: https://github.com/apache/arrow/pull/8671#issuecomment-727597367
@cyb70289 @pitrou @jianxind since you've all been doing some kernel
performance grinding you might be interested in helping investigate this
further to see if there's more performanc
wesm opened a new pull request #8671:
URL: https://github.com/apache/arrow/pull/8671
This issue was pointed out to me by some folks at UWisc and just getting
around to trying out a fix. This produces a substantial performance benefit in
numerical comparisons with gcc8
```
nevi-me commented on pull request #8666:
URL: https://github.com/apache/arrow/pull/8666#issuecomment-727588237
@alamb thanks. A quick solution for many of the lints, is to run clippy-fix
with -Z unstable-options, so that one doesn't manually fix them.
jorgecarleitao commented on pull request #8667:
URL: https://github.com/apache/arrow/pull/8667#issuecomment-727587052
ahaha, I only now realized that it was written `Int16`: I read `Float16` xD
This is an automated message fr
nevi-me commented on pull request #8667:
URL: https://github.com/apache/arrow/pull/8667#issuecomment-727586356
I meant Float16 there, oops
This is an automated message from the Apache Git Service.
To respond to the message, p
jorgecarleitao commented on pull request #8665:
URL: https://github.com/apache/arrow/pull/8665#issuecomment-727584271
Running the CI on separate feature fits exactly what I was thinking about.
IMO we could then benefit from some clarity over how we merge PRs from here
on: do we wait
andygrove commented on pull request #8665:
URL: https://github.com/apache/arrow/pull/8665#issuecomment-727583378
I can see both sides of the argument here but I would be supportive of
merging this PR as long as we have a JIRA filed to follow up on the CI support
(which I agree is really im
nevi-me commented on pull request #8665:
URL: https://github.com/apache/arrow/pull/8665#issuecomment-727578474
Reran the tests, failure now confirmed:
https://github.com/nevi-me/arrow/runs/1402675504#step:8:2084
The C++ implementation has avx512 support, so maybe @kszucs or someone e
jorgecarleitao commented on a change in pull request #8401:
URL: https://github.com/apache/arrow/pull/8401#discussion_r523763928
##
File path: rust/arrow/src/ffi.rs
##
@@ -0,0 +1,657 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor licen
vertexclique commented on pull request #8665:
URL: https://github.com/apache/arrow/pull/8665#issuecomment-727575454
> Okay, great. For tracking purposes, once we find a resolution to the CI
question; we should open an umbrella JIRA to implement avx512 for other simd
equivalents. I'm assumi
jorgecarleitao commented on a change in pull request #8401:
URL: https://github.com/apache/arrow/pull/8401#discussion_r523763431
##
File path: rust/arrow/src/bytes.rs
##
@@ -0,0 +1,166 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor lic
nevi-me commented on pull request #8665:
URL: https://github.com/apache/arrow/pull/8665#issuecomment-727574928
> So the person who enables `avx512` should use avx512 feature or fallback
to autovec. Not need to bring packed_simd or something else to do fallback.
Okay, great. For track
jorgecarleitao commented on a change in pull request #8401:
URL: https://github.com/apache/arrow/pull/8401#discussion_r523763314
##
File path: rust/arrow/src/buffer.rs
##
@@ -151,70 +80,52 @@ impl Buffer {
///
/// * `ptr` - Pointer to raw parts
/// * `len` - Leng
vertexclique commented on pull request #8665:
URL: https://github.com/apache/arrow/pull/8665#issuecomment-727574114
> @vertexclique from my reviewing the code, it looked like simd and avx512
were able to co-exist, but I'm getting some errors on the macOS CI
(https://github.com/nevi-me/arro
nevi-me commented on pull request #8665:
URL: https://github.com/apache/arrow/pull/8665#issuecomment-727573373
@vertexclique from my reviewing the code, it looked like `simd` and `avx512`
were able to co-exist, but I'm getting some errors on the macOS CI
(https://github.com/nevi-me/arrow/r
nevi-me commented on pull request #8665:
URL: https://github.com/apache/arrow/pull/8665#issuecomment-727572262
I'm running a CI job at https://github.com/nevi-me/arrow/runs/1402635214,
which includes `"simd avx512"` features.
We do test "simd" in one of the CI tasks. Perhaps I didn't
jorgecarleitao commented on pull request #8665:
URL: https://github.com/apache/arrow/pull/8665#issuecomment-727571958
@vertexclique
[here](https://github.com/apache/arrow/blob/master/ci/scripts/rust_test.sh#L34).
The timings are important: e.g. we decided to not place coverage on every PR
vertexclique commented on pull request #8665:
URL: https://github.com/apache/arrow/pull/8665#issuecomment-727571404
feature `simd` is not tested on this CI too. Am I missing something?
This is an automated message from the Ap
alamb closed pull request #8654:
URL: https://github.com/apache/arrow/pull/8654
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the
alamb commented on pull request #8630:
URL: https://github.com/apache/arrow/pull/8630#issuecomment-727569496
I will try and review this carefully tomorrow
This is an automated message from the Apache Git Service.
To respond t
alamb commented on pull request #8654:
URL: https://github.com/apache/arrow/pull/8654#issuecomment-727569404
I think this is good to go -- what do you think @jorgecarleitao /
@andygrove ?
This is an automated message from t
vertexclique commented on pull request #8665:
URL: https://github.com/apache/arrow/pull/8665#issuecomment-727568453
> The main thing we have to be wary of, which I'm fine if we test locally
and confirm; is whether anything here breaks the arrow crate building on
stable. I haven't tested th
jorgecarleitao commented on pull request #8667:
URL: https://github.com/apache/arrow/pull/8667#issuecomment-727568073
(I had to look it up to understand why this was crashing: I did not know
this either until today xD)
This
github-actions[bot] commented on pull request #8670:
URL: https://github.com/apache/arrow/pull/8670#issuecomment-727566672
https://issues.apache.org/jira/browse/ARROW-10591
This is an automated message from the Apache Git Ser
jorgecarleitao closed pull request #8666:
URL: https://github.com/apache/arrow/pull/8666
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to g
jorgecarleitao opened a new pull request #8670:
URL: https://github.com/apache/arrow/pull/8670
TL;DR:
* Add support to filter `StructArray`
* fixes 2 bugs in `take` of `StructArray` and `ListArray`
* speedup `take` by 1.2-1.9
## Motivation
Same motivation as #8630
nevi-me closed pull request #8668:
URL: https://github.com/apache/arrow/pull/8668
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to th
alamb closed pull request #8662:
URL: https://github.com/apache/arrow/pull/8662
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the
vertexclique commented on pull request #8669:
URL: https://github.com/apache/arrow/pull/8669#issuecomment-727563881
> it is always taking the same element (indices is a constant vector), which
is very easy to speculatively predict what the element is going to be.
Yes, most benchmarks
alamb commented on pull request #8662:
URL: https://github.com/apache/arrow/pull/8662#issuecomment-727563878
Again, something is gummed up with the travis CI jobs
https://github.com/apache/arrow/pull/8662/checks?check_run_id=1400052607 -- but
they don't test rust, so I am going to merge an
nevi-me commented on pull request #8666:
URL: https://github.com/apache/arrow/pull/8666#issuecomment-727563899
I don't see this being a big risk to open PRs (as some already have merge
conflicts). We run 36 CI jobs for this, vs the normal 8; so maybe there's an
efficiency argument in mergi
nevi-me commented on pull request #8665:
URL: https://github.com/apache/arrow/pull/8665#issuecomment-727563248
> Is this being tested on the CI? I did not see any changes to the CI to
test with that feature.
The main thing we have to be wary of, which I'm fine if we test locally and
vertexclique edited a comment on pull request #8665:
URL: https://github.com/apache/arrow/pull/8665#issuecomment-727560444
I will do that in a min. edit: @nevi-me updated.
This is an automated message from the Apache Git Serv
Dandandan commented on a change in pull request #8654:
URL: https://github.com/apache/arrow/pull/8654#discussion_r523752932
##
File path: rust/datafusion/src/physical_plan/hash_aggregate.rs
##
@@ -882,25 +867,12 @@ mod tests {
assert_eq!(batch.num_columns(), 2);
alamb commented on a change in pull request #8654:
URL: https://github.com/apache/arrow/pull/8654#discussion_r523752313
##
File path: rust/datafusion/src/physical_plan/hash_aggregate.rs
##
@@ -882,25 +867,12 @@ mod tests {
assert_eq!(batch.num_columns(), 2);
a
alamb commented on a change in pull request #8654:
URL: https://github.com/apache/arrow/pull/8654#discussion_r523752313
##
File path: rust/datafusion/src/physical_plan/hash_aggregate.rs
##
@@ -882,25 +867,12 @@ mod tests {
assert_eq!(batch.num_columns(), 2);
a
alamb commented on pull request #8658:
URL: https://github.com/apache/arrow/pull/8658#issuecomment-727561962
FYI https://github.com/apache/arrow/pull/8553 has been merged
This is an automated message from the Apache Git Servi
github-actions[bot] commented on pull request #8669:
URL: https://github.com/apache/arrow/pull/8669#issuecomment-727561941
https://issues.apache.org/jira/browse/ARROW-10596
This is an automated message from the Apache Git Ser
jorgecarleitao commented on pull request #8666:
URL: https://github.com/apache/arrow/pull/8666#issuecomment-727561098
Do we wait for some of the PRs to be flushed, or do we merge this?
This is an automated message from the Ap
vertexclique edited a comment on pull request #8665:
URL: https://github.com/apache/arrow/pull/8665#issuecomment-727560444
I will do that in a min.
This is an automated message from the Apache Git Service.
To respond to the m
vertexclique commented on pull request #8665:
URL: https://github.com/apache/arrow/pull/8665#issuecomment-727560444
I will do that.
This is an automated message from the Apache Git Service.
To respond to the message, please l
jorgecarleitao opened a new pull request #8669:
URL: https://github.com/apache/arrow/pull/8669
This fixes 3 issues on the take benchmark:
1. it is always taking the same element (indices is a constant vector),
which is very easy to speculatively predict what the element is going to b
vertexclique edited a comment on pull request #8664:
URL: https://github.com/apache/arrow/pull/8664#issuecomment-727560018
This is using safe bitvec interface to manage bits. I worked on top of my
existing PR. Since that didn't change all code to use safe operations, now this
does. I squas
vertexclique commented on pull request #8664:
URL: https://github.com/apache/arrow/pull/8664#issuecomment-727560018
This is using safe bitvec interface to manage bits. I worked on top of my
existing PR. Since that didn't change all code to use safe operations, now this
does. I squashed all
jorgecarleitao commented on a change in pull request #8654:
URL: https://github.com/apache/arrow/pull/8654#discussion_r523749603
##
File path: rust/datafusion/src/physical_plan/hash_aggregate.rs
##
@@ -882,25 +867,12 @@ mod tests {
assert_eq!(batch.num_columns(), 2);
github-actions[bot] commented on pull request #8668:
URL: https://github.com/apache/arrow/pull/8668#issuecomment-727555281
https://issues.apache.org/jira/browse/ARROW-10595
This is an automated message from the Apache Git Ser
Dandandan opened a new pull request #8668:
URL: https://github.com/apache/arrow/pull/8668
Keeping track of `has_value` is not needed when there are no nulls in the
array, we can just pick the first item.
This is an automated
codecov-io commented on pull request #8666:
URL: https://github.com/apache/arrow/pull/8666#issuecomment-727552812
# [Codecov](https://codecov.io/gh/apache/arrow/pull/8666?src=pr&el=h1) Report
> Merging
[#8666](https://codecov.io/gh/apache/arrow/pull/8666?src=pr&el=desc) (c094456)
into
nevi-me commented on pull request #8666:
URL: https://github.com/apache/arrow/pull/8666#issuecomment-727552341
> Segfault on the coverage, outch :/
Found and fixed the problem. We were using an older cached version of
`cargo-tarpaulin`, but the latest version has a bug; so I've force
nevi-me commented on a change in pull request #8634:
URL: https://github.com/apache/arrow/pull/8634#discussion_r523743095
##
File path: rust/arrow/src/util/bit_slice_iterator.rs
##
@@ -0,0 +1,108 @@
+#[cfg(all(test, target_endian = "big"))]
Review comment:
File needs li
nevi-me commented on pull request #8664:
URL: https://github.com/apache/arrow/pull/8664#issuecomment-727551719
@vertexclique did you work on this on top of #8663 from @jhorstmann (or at
least both PRs remove the popcnt table)?
The changes here look reasonable to me, so I can review after
nevi-me closed pull request #8663:
URL: https://github.com/apache/arrow/pull/8663
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to th
nevi-me closed pull request #8667:
URL: https://github.com/apache/arrow/pull/8667
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to th
nevi-me commented on pull request #8667:
URL: https://github.com/apache/arrow/pull/8667#issuecomment-727537758
> There is a broader problem to be solved, on which we should remove the
possibility to create these variants, but I left it as a separate exercise (as
it is a bit more disruptive
nevi-me commented on pull request #8666:
URL: https://github.com/apache/arrow/pull/8666#issuecomment-727536464
> Segfault on the coverage, outch :/
I want to look at whether we can use the new llvm-based coverage instead of
tarpaulin
github-actions[bot] commented on pull request #8667:
URL: https://github.com/apache/arrow/pull/8667#issuecomment-727536050
https://issues.apache.org/jira/browse/ARROW-10590
This is an automated message from the Apache Git Ser
jorgecarleitao commented on pull request #8666:
URL: https://github.com/apache/arrow/pull/8666#issuecomment-727535047
Segfault on the coverage, outch :/
This is an automated message from the Apache Git Service.
To respond to
jorgecarleitao opened a new pull request #8667:
URL: https://github.com/apache/arrow/pull/8667
In the arrow spec, Date32 and Date64 are always in days and milliseconds
respectively. This removes them from a test about casting.
There is a broader problem to be solved, on which we shou
github-actions[bot] commented on pull request #8666:
URL: https://github.com/apache/arrow/pull/8666#issuecomment-727531762
https://issues.apache.org/jira/browse/ARROW-10269
This is an automated message from the Apache Git Ser
82 matches
Mail list logo