[GitHub] [arrow] ursabot edited a comment on pull request #12150: ARROW-15332: [C++] Add new cases and fix issues in IPC read/write benchmark

2022-01-15 Thread GitBox
ursabot edited a comment on pull request #12150: URL: https://github.com/apache/arrow/pull/12150#issuecomment-1013447936 Benchmark runs are scheduled for baseline = 093fdad19dc2c0dfa3d2ed999fd918826d918e96 and contender = f585a470539d61cbc237b66a1851149d28adc176. f585a470539d61cbc237b66a1

[GitHub] [arrow-datafusion] selvavm commented on issue #1536: Not able to get the table from register_listing_table

2022-01-15 Thread GitBox
selvavm commented on issue #1536: URL: https://github.com/apache/arrow-datafusion/issues/1536#issuecomment-1013641042 Hi @houqp. Sorry for delay in response. I have created a minimum example which reproduce the problem. It is [here](https://github.com/apache/arrow-datafusion/files/7

[GitHub] [arrow-datafusion] hntd187 commented on issue #1544: Streaming support for DataFusion

2022-01-15 Thread GitBox
hntd187 commented on issue #1544: URL: https://github.com/apache/arrow-datafusion/issues/1544#issuecomment-1013641369 > > I don't know if outside of a custom DataFrame impl this would be possible in a contrib module. > > If this is not clear from the get go, you can also complete th

[GitHub] [arrow-datafusion] xudong963 opened a new pull request #1566: fix: sql planner creates cross join instead of inner join from select predicates

2022-01-15 Thread GitBox
xudong963 opened a new pull request #1566: URL: https://github.com/apache/arrow-datafusion/pull/1566 # Which issue does this PR close? Closes #1293 # Rationale for this change # What changes are included in this PR? # Are there any user-facing ch

[GitHub] [arrow-datafusion] xudong963 commented on pull request #1314: fix: sql planner creates cross join instead of inner join from select predicates

2022-01-15 Thread GitBox
xudong963 commented on pull request #1314: URL: https://github.com/apache/arrow-datafusion/pull/1314#issuecomment-1013642190 closed, new pr: #1566 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

[GitHub] [arrow-datafusion] xudong963 closed pull request #1314: fix: sql planner creates cross join instead of inner join from select predicates

2022-01-15 Thread GitBox
xudong963 closed pull request #1314: URL: https://github.com/apache/arrow-datafusion/pull/1314 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: git

[GitHub] [arrow-datafusion] xudong963 commented on pull request #1566: fix: sql planner creates cross join instead of inner join from select predicates

2022-01-15 Thread GitBox
xudong963 commented on pull request #1566: URL: https://github.com/apache/arrow-datafusion/pull/1566#issuecomment-1013642299 ``` ❯ create table part as select 1 as p_partkey;

[GitHub] [arrow-datafusion] xudong963 commented on pull request #1566: fix: sql planner creates cross join instead of inner join from select predicates

2022-01-15 Thread GitBox
xudong963 commented on pull request #1566: URL: https://github.com/apache/arrow-datafusion/pull/1566#issuecomment-1013642370 cc @houqp -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

[GitHub] [arrow] ursabot edited a comment on pull request #11938: ARROW-15077: [Python] Move Expression class from _dataset to _compute cython module

2022-01-15 Thread GitBox
ursabot edited a comment on pull request #11938: URL: https://github.com/apache/arrow/pull/11938#issuecomment-1013111004 Benchmark runs are scheduled for baseline = 5632423fea91f5c2c69709c56eb64696bd9301ef and contender = 093fdad19dc2c0dfa3d2ed999fd918826d918e96. 093fdad19dc2c0dfa3d2ed999

[GitHub] [arrow] ursabot edited a comment on pull request #11616: ARROW-14577: [C++] Enable fine grained IO for async IPC reader

2022-01-15 Thread GitBox
ursabot edited a comment on pull request #11616: URL: https://github.com/apache/arrow/pull/11616#issuecomment-1013557306 Benchmark runs are scheduled for baseline = f585a470539d61cbc237b66a1851149d28adc176 and contender = 7029f90ea3b39e97f1a671227ca932cbcdbcee05. 7029f90ea3b39e97f1a671227

[GitHub] [arrow-datafusion] xudong963 edited a comment on pull request #1566: fix: sql planner creates cross join instead of inner join from select predicates

2022-01-15 Thread GitBox
xudong963 edited a comment on pull request #1566: URL: https://github.com/apache/arrow-datafusion/pull/1566#issuecomment-1013642370 cc @houqp @alamb -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

[GitHub] [arrow-datafusion] jon-chuang edited a comment on issue #1221: Task assignment between Scheduler and Executors

2022-01-15 Thread GitBox
jon-chuang edited a comment on issue #1221: URL: https://github.com/apache/arrow-datafusion/issues/1221#issuecomment-1013068782 @yjshen thanks for your questions > task scheduling, keepalive monitoring, struggler detection, and speculative task execution\ - yes. - yes and

[GitHub] [arrow] ursabot edited a comment on pull request #12150: ARROW-15332: [C++] Add new cases and fix issues in IPC read/write benchmark

2022-01-15 Thread GitBox
ursabot edited a comment on pull request #12150: URL: https://github.com/apache/arrow/pull/12150#issuecomment-1013447936 Benchmark runs are scheduled for baseline = 093fdad19dc2c0dfa3d2ed999fd918826d918e96 and contender = f585a470539d61cbc237b66a1851149d28adc176. f585a470539d61cbc237b66a1

[GitHub] [arrow-rs] alamb commented on a change in pull request #1173: Update test output for Rust 1.58 release

2022-01-15 Thread GitBox
alamb commented on a change in pull request #1173: URL: https://github.com/apache/arrow-rs/pull/1173#discussion_r785299353 ## File path: parquet/src/record/api.rs ## @@ -1081,9 +1081,9 @@ mod tests { fn test_convert_float_to_string() { assert_eq!(format!("{}", Fie

[GitHub] [arrow-rs] alamb closed issue #1177: Parquet Record Tests Fail on Rust 1.58

2022-01-15 Thread GitBox
alamb closed issue #1177: URL: https://github.com/apache/arrow-rs/issues/1177 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@a

[GitHub] [arrow-rs] alamb merged pull request #1178: Fix record formatting in 1.58

2022-01-15 Thread GitBox
alamb merged pull request #1178: URL: https://github.com/apache/arrow-rs/pull/1178 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

[GitHub] [arrow-rs] alamb commented on pull request #1178: Fix record formatting in 1.58

2022-01-15 Thread GitBox
alamb commented on pull request #1178: URL: https://github.com/apache/arrow-rs/pull/1178#issuecomment-1013666588 Thanks @tustvold -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific co

[GitHub] [arrow-rs] alamb closed pull request #1173: Update test output for Rust 1.58 release

2022-01-15 Thread GitBox
alamb closed pull request #1173: URL: https://github.com/apache/arrow-rs/pull/1173 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

[GitHub] [arrow-rs] alamb opened a new pull request #1181: Add ticket reference for false positive in clippy

2022-01-15 Thread GitBox
alamb opened a new pull request #1181: URL: https://github.com/apache/arrow-rs/pull/1181 1. Add a reference to https://github.com/rust-lang/rust-clippy/issues/8045 explaining why that lint is disabled Found by @jhorstmann -- This is an automated message from the Apache Git Servi

[GitHub] [arrow-rs] alamb commented on issue #1176: Discussion: relationship / unification of arrow-rs and arrow2 going forward

2022-01-15 Thread GitBox
alamb commented on issue #1176: URL: https://github.com/apache/arrow-rs/issues/1176#issuecomment-1013667451 For the record, I would be willing to help maintain `arrow2` if it were donated into the ASF doing things like coordinating release process / voting, bug fixing, answering questions,

[GitHub] [arrow-datafusion] xudong963 commented on issue #162: TPC-H Query 8

2022-01-15 Thread GitBox
xudong963 commented on issue #162: URL: https://github.com/apache/arrow-datafusion/issues/162#issuecomment-1013668231 When I ran with the latest code, I still got the error ``` Running benchmarks with the following options: DataFusionBenchmarkOpt { query: 8, debug: false, iterations

[GitHub] [arrow-datafusion] xudong963 edited a comment on issue #162: TPC-H Query 8

2022-01-15 Thread GitBox
xudong963 edited a comment on issue #162: URL: https://github.com/apache/arrow-datafusion/issues/162#issuecomment-1013668231 When I ran with the latest code, I still got the error ``` Running benchmarks with the following options: DataFusionBenchmarkOpt { query: 8, debug: false, ite

[GitHub] [arrow-rs] codecov-commenter commented on pull request #1181: Add ticket reference for false positive in clippy

2022-01-15 Thread GitBox
codecov-commenter commented on pull request #1181: URL: https://github.com/apache/arrow-rs/pull/1181#issuecomment-1013668326 # [Codecov](https://codecov.io/gh/apache/arrow-rs/pull/1181?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=T

[GitHub] [arrow-rs] jhorstmann opened a new issue #1182: Evaluate performance of simd on simple arithmetic

2022-01-15 Thread GitBox
jhorstmann opened a new issue #1182: URL: https://github.com/apache/arrow-rs/issues/1182 **Is your feature request related to a problem or challenge? Please describe what you are trying to do.** For simple arithmetic kernels (+,-,*), the compiler should be able to automatically vect

[GitHub] [arrow-datafusion] xudong963 opened a new pull request #1567: minor: improve the benchmark readme

2022-01-15 Thread GitBox
xudong963 opened a new pull request #1567: URL: https://github.com/apache/arrow-datafusion/pull/1567 # Which issue does this PR close? As the title. # Rationale for this change # What changes are included in this PR? # Are there any user-facing ch

[GitHub] [arrow-rs] jhorstmann commented on issue #1182: Evaluate performance of simd on simple arithmetic

2022-01-15 Thread GitBox
jhorstmann commented on issue #1182: URL: https://github.com/apache/arrow-rs/issues/1182#issuecomment-1013669825 Benchmarks with array size of 64k, run on an AMD Ryzen 3700U laptop. Compiled with `$ RUSTFLAGS="-C target-cpu=skylake"` (The Skylake code generator in llvm seems to have re

[GitHub] [arrow-datafusion] alamb commented on issue #587: Optionally Limit memory used by DataFusion plan

2022-01-15 Thread GitBox
alamb commented on issue #587: URL: https://github.com/apache/arrow-datafusion/issues/587#issuecomment-1013670632 I have started added a "Progress Tracking" list to the description of this ticket. Please update it with additional items as you discover them. -- This is an automated mess

[GitHub] [arrow-datafusion] alamb commented on issue #1568: Memory Limited Sort (Externalized / Spill)

2022-01-15 Thread GitBox
alamb commented on issue #1568: URL: https://github.com/apache/arrow-datafusion/issues/1568#issuecomment-1013671903 cc @yjshen here are some thoughts on the next steps for sorting -- what do you think? Are you planning to do any/all of this? Is there something in particular I could help t

[GitHub] [arrow-datafusion] xudong963 commented on pull request #1566: fix: sql planner creates cross join instead of inner join from select predicates

2022-01-15 Thread GitBox
xudong963 commented on pull request #1566: URL: https://github.com/apache/arrow-datafusion/pull/1566#issuecomment-1013673068 Currently, query 8 can't seem to pass due to `case ... when ... then ... else`, so I delete it to run the bench ```sql select o_year, from (

[GitHub] [arrow-datafusion] xudong963 edited a comment on pull request #1566: fix: sql planner creates cross join instead of inner join from select predicates

2022-01-15 Thread GitBox
xudong963 edited a comment on pull request #1566: URL: https://github.com/apache/arrow-datafusion/pull/1566#issuecomment-1013673068 Currently, query 8 can't seem to pass due to `case ... when ... then ... else`, so I delete it to run the bench ```sql select o_year, from

[GitHub] [arrow-datafusion] alamb commented on pull request #1556: Officially maintained Arrow2 branch

2022-01-15 Thread GitBox
alamb commented on pull request #1556: URL: https://github.com/apache/arrow-datafusion/pull/1556#issuecomment-1013674164 Discussion of arrow-rs / arrow2 is here in case anyone missed it: https://github.com/apache/arrow-rs/issues/1176 -- This is an automated message from the Apache Git S

[GitHub] [arrow-datafusion] alamb opened a new issue #1574: SQL integration tests named `mod`

2022-01-15 Thread GitBox
alamb opened a new issue #1574: URL: https://github.com/apache/arrow-datafusion/issues/1574 **Describe the bug** DataFusion has a large library of sql integration tests They are currently run via ```shell cargo test -p datafusion --test mod ``` (Or `cargo test`)

[GitHub] [arrow-datafusion] alamb opened a new pull request #1575: Rename sql integration tests from `mod` to `sql_integration`

2022-01-15 Thread GitBox
alamb opened a new pull request #1575: URL: https://github.com/apache/arrow-datafusion/pull/1575 # Which issue does this PR close? Closes #1574 DataFusion has a large library of sql integration tests They are currently run via ```shell cargo test -p dat

[GitHub] [arrow] ursabot edited a comment on pull request #11616: ARROW-14577: [C++] Enable fine grained IO for async IPC reader

2022-01-15 Thread GitBox
ursabot edited a comment on pull request #11616: URL: https://github.com/apache/arrow/pull/11616#issuecomment-1013557306 Benchmark runs are scheduled for baseline = f585a470539d61cbc237b66a1851149d28adc176 and contender = 7029f90ea3b39e97f1a671227ca932cbcdbcee05. 7029f90ea3b39e97f1a671227

[GitHub] [arrow-datafusion] alamb commented on a change in pull request #1556: Officially maintained Arrow2 branch

2022-01-15 Thread GitBox
alamb commented on a change in pull request #1556: URL: https://github.com/apache/arrow-datafusion/pull/1556#discussion_r785303506 ## File path: .github/workflows/rust.yml ## @@ -116,6 +116,7 @@ jobs: cargo test --no-default-features cargo run --example cs

[GitHub] [arrow-datafusion] yjshen commented on issue #1572: Consolidate the N-way merging code and `SortPreservingMergeStream` (which has quite good tests of what is often quite tricky code, and it w

2022-01-15 Thread GitBox
yjshen commented on issue #1572: URL: https://github.com/apache/arrow-datafusion/issues/1572#issuecomment-1013680326 Thanks @alamb for bringing it up. I propose using heap-sort for N-way merge, but consolidate all the codes we have now in `in_mem_sort` and `SortPreservingMergeStream

[GitHub] [arrow-datafusion] yjshen commented on issue #1572: Consolidate the N-way merging code and `SortPreservingMergeStream` (which has quite good tests of what is often quite tricky code, and it w

2022-01-15 Thread GitBox
yjshen commented on issue #1572: URL: https://github.com/apache/arrow-datafusion/issues/1572#issuecomment-1013680446 Also cc @houqp for more minds. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [arrow-datafusion] xudong963 opened a new issue #1576: casting `Int64` to `Float64` unsuccessfully caused tpch8 to fail

2022-01-15 Thread GitBox
xudong963 opened a new issue #1576: URL: https://github.com/apache/arrow-datafusion/issues/1576 **Describe the bug** During running tpch8, I got the following error: ``` ➜ benchmarks git:(fix_cross_join) cargo run --release --bin tpch -- benchmark datafusion --iterations 3 --path

[GitHub] [arrow-datafusion] xudong963 commented on issue #1576: casting `Int64` to `Float64` unsuccessfully caused tpch8 to fail

2022-01-15 Thread GitBox
xudong963 commented on issue #1576: URL: https://github.com/apache/arrow-datafusion/issues/1576#issuecomment-1013681759 I've found out why it failed. let's see query 8, there is `when ... then ... else ...`. ```sql select o_year, sum(case when natio

[GitHub] [arrow-datafusion] xudong963 edited a comment on issue #1576: casting `Int64` to `Float64` unsuccessfully caused tpch8 to fail

2022-01-15 Thread GitBox
xudong963 edited a comment on issue #1576: URL: https://github.com/apache/arrow-datafusion/issues/1576#issuecomment-1013681759 I've found out why it failed. let's see query 8, there is `when ... then ... else ...`. ```sql select o_year, sum(case whe

[GitHub] [arrow-datafusion] tustvold commented on issue #1572: Consolidate the N-way merging code and `SortPreservingMergeStream` (which has quite good tests of what is often quite tricky code, and it

2022-01-15 Thread GitBox
tustvold commented on issue #1572: URL: https://github.com/apache/arrow-datafusion/issues/1572#issuecomment-1013683820 > I propose using heap-sort for N-way merge > what's your opinion on the merging algorithm to choose? SGTM :+1:. This was in fact mentioned on the original PR tha

[GitHub] [arrow-datafusion] tustvold edited a comment on issue #1572: Consolidate the N-way merging code and `SortPreservingMergeStream` (which has quite good tests of what is often quite tricky code,

2022-01-15 Thread GitBox
tustvold edited a comment on issue #1572: URL: https://github.com/apache/arrow-datafusion/issues/1572#issuecomment-1013683820 > I propose using heap-sort for N-way merge > what's your opinion on the merging algorithm to choose? SGTM :+1:. This was in fact mentioned on the original

[GitHub] [arrow-datafusion] tustvold commented on issue #1527: Error reading Parquet files after schema evolution

2022-01-15 Thread GitBox
tustvold commented on issue #1527: URL: https://github.com/apache/arrow-datafusion/issues/1527#issuecomment-1013687826 Not sure if related, but in IOx we handle this at the query layer with a thing we call [SchemaAdapterStream](https://github.com/influxdata/influxdb_iox/blob/f3f6f335a93d2

[GitHub] [arrow-datafusion] tustvold commented on issue #1441: Incorrect results in datafusion

2022-01-15 Thread GitBox
tustvold commented on issue #1441: URL: https://github.com/apache/arrow-datafusion/issues/1441#issuecomment-1013688365 I believe with the update to use arrow 7.0.0 which contains @yordan-pavlov 's fix, this should now be fixed? -- This is an automated message from the Apache Git Service

[GitHub] [arrow-datafusion] tustvold edited a comment on issue #1441: Incorrect results in datafusion

2022-01-15 Thread GitBox
tustvold edited a comment on issue #1441: URL: https://github.com/apache/arrow-datafusion/issues/1441#issuecomment-1013688365 I believe with the update to use arrow 7.0.0 which contains @yordan-pavlov 's fix, this should now be fixed in DataFusion? -- This is an automated message from t

[GitHub] [arrow-datafusion] tustvold commented on issue #924: Add a separate configuration setting for parallelism of scanning parquet files

2022-01-15 Thread GitBox
tustvold commented on issue #924: URL: https://github.com/apache/arrow-datafusion/issues/924#issuecomment-1013688680 Related https://github.com/influxdata/influxdb_iox/issues/3288 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Git

[GitHub] [arrow-datafusion] xudong963 commented on issue #1293: sql planner creates cross join instead of inner join from select predicates

2022-01-15 Thread GitBox
xudong963 commented on issue #1293: URL: https://github.com/apache/arrow-datafusion/issues/1293#issuecomment-1013693867 > tpch query 8 is taking a long time Maybe it's because #1576, not the reason for cross join? But cross Join is really an optimization point, good catch! @houqp

[GitHub] [arrow-datafusion] xudong963 commented on a change in pull request #1457: try to use sqllogictest

2022-01-15 Thread GitBox
xudong963 commented on a change in pull request #1457: URL: https://github.com/apache/arrow-datafusion/pull/1457#discussion_r785315493 ## File path: datafusion/Cargo.toml ## @@ -77,6 +77,7 @@ rand = "0.8" avro-rs = { version = "0.13", features = ["snappy"], optional = true }

[GitHub] [arrow-datafusion] xudong963 commented on pull request #1457: try to use sqllogictest

2022-01-15 Thread GitBox
xudong963 commented on pull request #1457: URL: https://github.com/apache/arrow-datafusion/pull/1457#issuecomment-1013696074 Close the ticket for a moment. My final thought is to create an independent repo that runs independently to find bugs. If there is a bug, it'll automatically

[GitHub] [arrow-datafusion] xudong963 closed pull request #1457: try to use sqllogictest

2022-01-15 Thread GitBox
xudong963 closed pull request #1457: URL: https://github.com/apache/arrow-datafusion/pull/1457 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: git

[GitHub] [arrow-datafusion] liukun4515 commented on issue #587: Optionally Limit memory used by DataFusion plan

2022-01-15 Thread GitBox
liukun4515 commented on issue #587: URL: https://github.com/apache/arrow-datafusion/issues/587#issuecomment-1013697747 @alamb Maybe we should take the `join` operation into this track. -- This is an automated message from the Apache Git Service. To respond to the message, please log on t

[GitHub] [arrow-rs] tustvold opened a new pull request #1183: Truncate bitmask on split

2022-01-15 Thread GitBox
tustvold opened a new pull request #1183: URL: https://github.com/apache/arrow-rs/pull/1183 # Which issue does this PR close? Relates to #1037 . # Rationale for this change When splitting a null bitmask off, the code added in #1054 would return the entire bitmask that

[GitHub] [arrow-rs] tustvold commented on a change in pull request #1183: Truncate bitmask on split

2022-01-15 Thread GitBox
tustvold commented on a change in pull request #1183: URL: https://github.com/apache/arrow-rs/pull/1183#discussion_r785323724 ## File path: parquet/src/arrow/record_reader/definition_levels.rs ## @@ -462,4 +465,30 @@ mod tests { assert_eq!(actual, expected);

[GitHub] [arrow-rs] codecov-commenter commented on pull request #1183: Truncate bitmask on split

2022-01-15 Thread GitBox
codecov-commenter commented on pull request #1183: URL: https://github.com/apache/arrow-rs/pull/1183#issuecomment-1013707795 # [Codecov](https://codecov.io/gh/apache/arrow-rs/pull/1183?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=T

[GitHub] [arrow-rs] codecov-commenter commented on pull request #1165: Use tempfile for parquet tests

2022-01-15 Thread GitBox
codecov-commenter commented on pull request #1165: URL: https://github.com/apache/arrow-rs/pull/1165#issuecomment-1013708180 # [Codecov](https://codecov.io/gh/apache/arrow-rs/pull/1165?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=T

[GitHub] [arrow-rs] codecov-commenter commented on pull request #1175: Serialize i128 as JSON string

2022-01-15 Thread GitBox
codecov-commenter commented on pull request #1175: URL: https://github.com/apache/arrow-rs/pull/1175#issuecomment-1013708374 # [Codecov](https://codecov.io/gh/apache/arrow-rs/pull/1175?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=T

[GitHub] [arrow-datafusion] andygrove opened a new issue #1577: Documentation for running benchmarks with simd support does not work for me

2022-01-15 Thread GitBox
andygrove opened a new issue #1577: URL: https://github.com/apache/arrow-datafusion/issues/1577 **Describe the bug** I tried running this command from https://github.com/apache/arrow-datafusion/tree/master/benchmarks ```bash cargo run --release --features "simd mimalloc" --bin

[GitHub] [arrow] edponce commented on a change in pull request #11882: ARROW-9843: [C++] Implement Between ternary kernel

2022-01-15 Thread GitBox
edponce commented on a change in pull request #11882: URL: https://github.com/apache/arrow/pull/11882#discussion_r785333007 ## File path: cpp/src/arrow/util/bit_block_counter.h ## @@ -46,7 +46,7 @@ inline uint64_t ShiftWord(uint64_t current, uint64_t next, int64_t shift) { }

[GitHub] [arrow] edponce commented on pull request #11882: ARROW-9843: [C++] Implement Between ternary kernel

2022-01-15 Thread GitBox
edponce commented on pull request #11882: URL: https://github.com/apache/arrow/pull/11882#issuecomment-1013718034 @bkmgit I am curious why there are many changes to Java, Go, R made alongside this Between PR? -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [arrow] edponce edited a comment on pull request #11882: ARROW-9843: [C++] Implement Between ternary kernel

2022-01-15 Thread GitBox
edponce edited a comment on pull request #11882: URL: https://github.com/apache/arrow/pull/11882#issuecomment-1013718034 @bkmgit I am curious why there are many changes to Java, Go, R, Makefile files made alongside this Between PR? -- This is an automated message from the Apache Git Serv

[GitHub] [arrow] edponce edited a comment on pull request #11882: ARROW-9843: [C++] Implement Between ternary kernel

2022-01-15 Thread GitBox
edponce edited a comment on pull request #11882: URL: https://github.com/apache/arrow/pull/11882#issuecomment-1013718034 @bkmgit I am curious why there are many changes to Java, Go, R, Makefile files made alongside this Between PR? Preferably, this PR should have kept scoped to the C++ B

[GitHub] [arrow-datafusion] andygrove commented on pull request #1556: Officially maintained Arrow2 branch

2022-01-15 Thread GitBox
andygrove commented on pull request #1556: URL: https://github.com/apache/arrow-datafusion/pull/1556#issuecomment-1013718590 Here are some quick benchmark results comparing the master branch and this PR, running on a threadripper desktop with 24 cores and an NVMe drive. I ran each query o

[GitHub] [arrow] bkmgit commented on a change in pull request #11882: ARROW-9843: [C++] Implement Between ternary kernel

2022-01-15 Thread GitBox
bkmgit commented on a change in pull request #11882: URL: https://github.com/apache/arrow/pull/11882#discussion_r785334190 ## File path: ci/docker/java-jni-manylinux-201x.dockerfile ## @@ -30,8 +30,7 @@ RUN vcpkg install --clean-after-build \ boost-regex \ boo

[GitHub] [arrow] bkmgit commented on a change in pull request #11882: ARROW-9843: [C++] Implement Between ternary kernel

2022-01-15 Thread GitBox
bkmgit commented on a change in pull request #11882: URL: https://github.com/apache/arrow/pull/11882#discussion_r785336284 ## File path: cpp/src/arrow/util/bit_block_counter.h ## @@ -46,7 +46,7 @@ inline uint64_t ShiftWord(uint64_t current, uint64_t next, int64_t shift) { }

[GitHub] [arrow] github-actions[bot] commented on pull request #12161: ARROW-9843: [C++] Implement Between ternary kernel

2022-01-15 Thread GitBox
github-actions[bot] commented on pull request #12161: URL: https://github.com/apache/arrow/pull/12161#issuecomment-1013723707 https://issues.apache.org/jira/browse/ARROW-9843 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub an

[GitHub] [arrow] bkmgit commented on pull request #11882: ARROW-9843: [C++] Implement Between ternary kernel

2022-01-15 Thread GitBox
bkmgit commented on pull request #11882: URL: https://github.com/apache/arrow/pull/11882#issuecomment-1013724118 Made some errors on rebase. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

[GitHub] [arrow-datafusion] andygrove opened a new issue #1578: Write documentation explaining how to enable metrics

2022-01-15 Thread GitBox
andygrove opened a new issue #1578: URL: https://github.com/apache/arrow-datafusion/issues/1578 **Is your feature request related to a problem or challenge? Please describe what you are trying to do.** I would like to see metrics for a query and I don't remember how to do that so I look

[GitHub] [arrow] bkmgit closed pull request #12161: ARROW-9843: [C++] Implement Between ternary kernel

2022-01-15 Thread GitBox
bkmgit closed pull request #12161: URL: https://github.com/apache/arrow/pull/12161 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

[GitHub] [arrow-datafusion] andygrove commented on pull request #1556: Officially maintained Arrow2 branch

2022-01-15 Thread GitBox
andygrove commented on pull request #1556: URL: https://github.com/apache/arrow-datafusion/pull/1556#issuecomment-1013730047 Metrics for query 1 from master & arrow2. ## Master ``` === Physical plan with metrics === SortExec: [l_returnflag@0 ASC NULLS LAST,l_linestatus@1

[GitHub] [arrow] edponce commented on pull request #11882: ARROW-9843: [C++] Implement Between ternary kernel

2022-01-15 Thread GitBox
edponce commented on pull request #11882: URL: https://github.com/apache/arrow/pull/11882#issuecomment-1013732577 The are some linter errors, run clang-format. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

[GitHub] [arrow-datafusion] andygrove edited a comment on pull request #1556: Officially maintained Arrow2 branch

2022-01-15 Thread GitBox
andygrove edited a comment on pull request #1556: URL: https://github.com/apache/arrow-datafusion/pull/1556#issuecomment-1013730047 Metrics for query 1 from master & arrow2. ## Master ``` === Physical plan with metrics === SortExec: [l_returnflag@0 ASC NULLS LAST,l_lines

[GitHub] [arrow-datafusion] andygrove edited a comment on pull request #1556: Officially maintained Arrow2 branch

2022-01-15 Thread GitBox
andygrove edited a comment on pull request #1556: URL: https://github.com/apache/arrow-datafusion/pull/1556#issuecomment-1013730047 Metrics and CPU activity charts for query 1 from master & arrow2. ## Master ``` === Physical plan with metrics === SortExec: [l_returnflag@

[GitHub] [arrow-datafusion] alamb commented on issue #587: Optionally Limit memory used by DataFusion plan

2022-01-15 Thread GitBox
alamb commented on issue #587: URL: https://github.com/apache/arrow-datafusion/issues/587#issuecomment-1013733620 > @alamb Maybe we should take the join operation into this track. It is a good idea @liukun4515 -- I ran out of ambition while typing up Sort and Grouping. I'll try and

[GitHub] [arrow-rs] helgikrs opened a new issue #1184: Writing structs nested in lists produces an incorrect output

2022-01-15 Thread GitBox
helgikrs opened a new issue #1184: URL: https://github.com/apache/arrow-rs/issues/1184 **Describe the bug** Writing an arrow record batch with structs nested within lists using the parquet writer produces a parquet file with incorrect values when there are null or empty lists present.

[GitHub] [arrow-datafusion] alamb commented on issue #1441: Incorrect results in datafusion

2022-01-15 Thread GitBox
alamb commented on issue #1441: URL: https://github.com/apache/arrow-datafusion/issues/1441#issuecomment-1013733967 Ye that is my understanding -- someone just needs to rerun the (wonderful) reproducer from @franeklubi to confirm -- This is an automated message from the Apache Git Serv

[GitHub] [arrow-datafusion] alamb commented on issue #1572: Consolidate the N-way merging code and `SortPreservingMergeStream` (which has quite good tests of what is often quite tricky code, and it wi

2022-01-15 Thread GitBox
alamb commented on issue #1572: URL: https://github.com/apache/arrow-datafusion/issues/1572#issuecomment-1013734241 Great! @yjshen added a heap based implementation https://github.com/apache/arrow-datafusion/blob/master/datafusion/src/physical_plan/sorts/in_mem_sort.rs#L60 -- standardizi

[GitHub] [arrow-datafusion] alamb commented on pull request #1566: fix: sql planner creates cross join instead of inner join from select predicates

2022-01-15 Thread GitBox
alamb commented on pull request #1566: URL: https://github.com/apache/arrow-datafusion/pull/1566#issuecomment-1013734364 Thank you @xudong963 -- this looks very cool. I'll try and review it carefully tomorrow 👍 -- This is an automated message from the Apache Git Service. To respond to

[GitHub] [arrow-rs] alamb merged pull request #1181: Add ticket reference for false positive in clippy

2022-01-15 Thread GitBox
alamb merged pull request #1181: URL: https://github.com/apache/arrow-rs/pull/1181 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

[GitHub] [arrow-rs] alamb merged pull request #1175: Serialize i128 as JSON string

2022-01-15 Thread GitBox
alamb merged pull request #1175: URL: https://github.com/apache/arrow-rs/pull/1175 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

[GitHub] [arrow-rs] alamb closed issue #1174: Disable serde_json `arbitrary_precision` feature flag

2022-01-15 Thread GitBox
alamb closed issue #1174: URL: https://github.com/apache/arrow-rs/issues/1174 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@a

[GitHub] [arrow-rs] alamb commented on pull request #1175: Serialize i128 as JSON string

2022-01-15 Thread GitBox
alamb commented on pull request #1175: URL: https://github.com/apache/arrow-rs/pull/1175#issuecomment-1013734554 Thanks @tustvold -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific co

[GitHub] [arrow-rs] helgikrs opened a new pull request #1185: fix: Fix a bug in how filter indices are calculated

2022-01-15 Thread GitBox
helgikrs opened a new pull request #1185: URL: https://github.com/apache/arrow-rs/pull/1185 # Which issue does this PR close? Closes #1184. # What changes are included in this PR? Using the definition level and the nullability of the column only produces the correct

[GitHub] [arrow] bkmgit commented on pull request #11882: ARROW-9843: [C++] Implement Between ternary kernel

2022-01-15 Thread GitBox
bkmgit commented on pull request #11882: URL: https://github.com/apache/arrow/pull/11882#issuecomment-1013734860 @github-actions autotune -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

[GitHub] [arrow] bkmgit commented on pull request #11882: ARROW-9843: [C++] Implement Between ternary kernel

2022-01-15 Thread GitBox
bkmgit commented on pull request #11882: URL: https://github.com/apache/arrow/pull/11882#issuecomment-1013735121 @github-actions crossbow submit autotune -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

[GitHub] [arrow] github-actions[bot] commented on pull request #11882: ARROW-9843: [C++] Implement Between ternary kernel

2022-01-15 Thread GitBox
github-actions[bot] commented on pull request #11882: URL: https://github.com/apache/arrow/pull/11882#issuecomment-1013735250 ``` Unable to match any tasks for `autotune` The Archery job run can be found at: https://github.com/apache/arrow/actions/runs/1702298478``` -- This is an a

[GitHub] [arrow] bkmgit commented on pull request #11882: ARROW-9843: [C++] Implement Between ternary kernel

2022-01-15 Thread GitBox
bkmgit commented on pull request #11882: URL: https://github.com/apache/arrow/pull/11882#issuecomment-1013735426 @github-actions autotune -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

[GitHub] [arrow-rs] codecov-commenter commented on pull request #1185: fix: Fix a bug in how filter indices are calculated

2022-01-15 Thread GitBox
codecov-commenter commented on pull request #1185: URL: https://github.com/apache/arrow-rs/pull/1185#issuecomment-1013736732 # [Codecov](https://codecov.io/gh/apache/arrow-rs/pull/1185?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=T

[GitHub] [arrow-rs] helgikrs opened a new issue #1186: Parquet reader should be able to read structs within list

2022-01-15 Thread GitBox
helgikrs opened a new issue #1186: URL: https://github.com/apache/arrow-rs/issues/1186 **Is your feature request related to a problem or challenge? Please describe what you are trying to do.** Reading a parquet file containing a list of structs fails with an error like ```

[GitHub] [arrow-rs] helgikrs opened a new pull request #1187: feat(parquet): support for reading structs nested within lists

2022-01-15 Thread GitBox
helgikrs opened a new pull request #1187: URL: https://github.com/apache/arrow-rs/pull/1187 # Which issue does this PR close? Closes #1186. # What changes are included in this PR? Adds support reading lists of structs in the parquet reader. Adds a tests roundtrip wri

[GitHub] [arrow-rs] helgikrs commented on a change in pull request #1187: feat(parquet): support for reading structs nested within lists

2022-01-15 Thread GitBox
helgikrs commented on a change in pull request #1187: URL: https://github.com/apache/arrow-rs/pull/1187#discussion_r785342933 ## File path: parquet/src/arrow/arrow_writer.rs ## @@ -1770,4 +1770,104 @@ mod tests { let stats = column.statistics().unwrap(); asser

[GitHub] [arrow] bkmgit commented on a change in pull request #11882: ARROW-9843: [C++] Implement Between ternary kernel

2022-01-15 Thread GitBox
bkmgit commented on a change in pull request #11882: URL: https://github.com/apache/arrow/pull/11882#discussion_r785343003 ## File path: go/arrow/memory/_lib/arch.h ## @@ -22,8 +22,6 @@ #define FULL_NAME(x) x##_sse4 #elif __SSE3__ == 1 #define FULL_NAME(x) x##_sse3 -

[GitHub] [arrow] bkmgit commented on a change in pull request #11882: ARROW-9843: [C++] Implement Between ternary kernel

2022-01-15 Thread GitBox
bkmgit commented on a change in pull request #11882: URL: https://github.com/apache/arrow/pull/11882#discussion_r785343038 ## File path: go/arrow/memory/_lib/arch.h ## @@ -22,8 +22,6 @@ #define FULL_NAME(x) x##_sse4 #elif __SSE3__ == 1 #define FULL_NAME(x) x##_sse3 -

[GitHub] [arrow] bkmgit commented on a change in pull request #11882: ARROW-9843: [C++] Implement Between ternary kernel

2022-01-15 Thread GitBox
bkmgit commented on a change in pull request #11882: URL: https://github.com/apache/arrow/pull/11882#discussion_r785343075 ## File path: go/arrow/memory/Makefile ## @@ -56,11 +46,9 @@ _lib/memory_avx2.s: _lib/memory.c _lib/memory_sse4.s: _lib/memory.c $(CC) -S $(C_FLAG

[GitHub] [arrow] bkmgit commented on a change in pull request #11882: ARROW-9843: [C++] Implement Between ternary kernel

2022-01-15 Thread GitBox
bkmgit commented on a change in pull request #11882: URL: https://github.com/apache/arrow/pull/11882#discussion_r785343098 ## File path: dev/tasks/java-jars/README.md ## @@ -16,7 +16,7 @@ See the License for the specific language governing permissions and limitations under th

[GitHub] [arrow] bkmgit commented on pull request #11882: ARROW-9843: [C++] Implement Between ternary kernel

2022-01-15 Thread GitBox
bkmgit commented on pull request #11882: URL: https://github.com/apache/arrow/pull/11882#issuecomment-1013738181 @lidavidm @edponce @kou Thanks for all your feedback. Any further suggestions? -- This is an automated message from the Apache Git Service. To respond to the message, please l

[GitHub] [arrow-datafusion] james727 opened a new pull request #1579: Implement ARRAY_AGG(DISTINCT ...)

2022-01-15 Thread GitBox
james727 opened a new pull request #1579: URL: https://github.com/apache/arrow-datafusion/pull/1579 # Which issue does this PR close? Closes https://github.com/apache/arrow-datafusion/issues/1323 # What changes are included in this PR? This includes the implementation of `array_

[GitHub] [arrow-rs] yordan-pavlov commented on a change in pull request #1054: Improve parquet reading performance for columns with nulls by preserving bitmask when possible (#1037)

2022-01-15 Thread GitBox
yordan-pavlov commented on a change in pull request #1054: URL: https://github.com/apache/arrow-rs/pull/1054#discussion_r785354757 ## File path: parquet/src/arrow/record_reader/definition_levels.rs ## @@ -91,10 +131,335 @@ impl DefinitionLevelBuffer { &self, r

[GitHub] [arrow-rs] tustvold commented on a change in pull request #1054: Improve parquet reading performance for columns with nulls by preserving bitmask when possible (#1037)

2022-01-15 Thread GitBox
tustvold commented on a change in pull request #1054: URL: https://github.com/apache/arrow-rs/pull/1054#discussion_r785355572 ## File path: parquet/src/arrow/record_reader/definition_levels.rs ## @@ -91,10 +131,335 @@ impl DefinitionLevelBuffer { &self, range:

[GitHub] [arrow-rs] tustvold commented on a change in pull request #1185: fix: Fix a bug in how filter indices are calculated

2022-01-15 Thread GitBox
tustvold commented on a change in pull request #1185: URL: https://github.com/apache/arrow-rs/pull/1185#discussion_r785352427 ## File path: parquet/src/arrow/levels.rs ## @@ -759,13 +759,6 @@ impl LevelInfo { /// Given a level's information, calculate the offsets required

[GitHub] [arrow-rs] tustvold commented on a change in pull request #1185: fix: Fix a bug in how filter indices are calculated

2022-01-15 Thread GitBox
tustvold commented on a change in pull request #1185: URL: https://github.com/apache/arrow-rs/pull/1185#discussion_r785357479 ## File path: parquet/src/arrow/levels.rs ## @@ -780,17 +773,25 @@ impl LevelInfo { }) .collect(); } +

  1   2   >