[GitHub] [arrow-rs] viirya commented on issue #3215: Off-by-one buffer size error triggers Panic when constructing RecordBatch from IPC bytes (should return an Error)

2022-11-29 Thread GitBox
viirya commented on issue #3215: URL: https://github.com/apache/arrow-rs/issues/3215#issuecomment-1331769380 If your IPC payload is generated by apache-arrow NPM package function `tableToIPC`, the size of buffers is produced by that, not from arrow-rs. IPC reader just reads provided buffer

[GitHub] [arrow-datafusion] jackwener commented on a diff in pull request #4365: reimplement `push_down_filter` to remove global-state

2022-11-29 Thread GitBox
jackwener commented on code in PR #4365: URL: https://github.com/apache/arrow-datafusion/pull/4365#discussion_r1035631504 ## datafusion/optimizer/src/push_down_filter.rs: ## @@ -403,94 +294,90 @@ fn extract_or_clause(expr: &Expr, schema_columns: &HashSet) -> Option, plan:

[GitHub] [arrow-datafusion] mingmwang commented on a diff in pull request #4365: reimplement `push_down_filter` to remove global-state

2022-11-29 Thread GitBox
mingmwang commented on code in PR #4365: URL: https://github.com/apache/arrow-datafusion/pull/4365#discussion_r1035626623 ## datafusion/optimizer/src/push_down_filter.rs: ## @@ -403,94 +294,90 @@ fn extract_or_clause(expr: &Expr, schema_columns: &HashSet) -> Option, plan:

[GitHub] [arrow-datafusion] jackwener commented on pull request #4365: reimplement `push_down_filter` to remove global-state

2022-11-29 Thread GitBox
jackwener commented on PR #4365: URL: https://github.com/apache/arrow-datafusion/pull/4365#issuecomment-1331756958 All followup enhancement in #4433 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [arrow-datafusion] jackwener opened a new issue, #4433: Follwup #4425

2022-11-29 Thread GitBox
jackwener opened a new issue, #4433: URL: https://github.com/apache/arrow-datafusion/issues/4433 **Is your feature request related to a problem or challenge? Please describe what you are trying to do.** - support push_down_filter when meet Window - support SEMI/ANTI JOIN push_down

[GitHub] [arrow] ursabot commented on pull request #14770: MINOR: [R] Fix URLs in vignettes

2022-11-29 Thread GitBox
ursabot commented on PR #14770: URL: https://github.com/apache/arrow/pull/14770#issuecomment-1331755566 Benchmark runs are scheduled for baseline = ccb68afedf00a064c280220f480f3a639cce28f6 and contender = 0f66b714860f25ef711c39ee9cb068a70b302c69. 0f66b714860f25ef711c39ee9cb068a70b302c69 is

[GitHub] [arrow-datafusion] tustvold commented on a diff in pull request #4400: Update to arrow 28

2022-11-29 Thread GitBox
tustvold commented on code in PR #4400: URL: https://github.com/apache/arrow-datafusion/pull/4400#discussion_r1035620730 ## datafusion/expr/src/type_coercion/binary.rs: ## @@ -287,8 +287,8 @@ fn get_wider_decimal_type( (DataType::Decimal128(p1, s1), DataType::Decimal128

[GitHub] [arrow-datafusion] jackwener commented on a diff in pull request #4365: reimplement `push_down_filter` to remove global-state

2022-11-29 Thread GitBox
jackwener commented on code in PR #4365: URL: https://github.com/apache/arrow-datafusion/pull/4365#discussion_r1035613833 ## datafusion/optimizer/src/push_down_filter.rs: ## @@ -500,302 +387,359 @@ fn optimize_join( // vector will contain only join keys (without additi

[GitHub] [arrow-datafusion] retikulum commented on issue #4386: Make Binary Dictionary Operations Optional

2022-11-29 Thread GitBox
retikulum commented on issue #4386: URL: https://github.com/apache/arrow-datafusion/issues/4386#issuecomment-1331748181 Hi. I added this on purpose (but without knowing it is extremely expensive) to pass `test_dictionary_type_to_array_coersion` test case. The following error was generated

[GitHub] [arrow-datafusion] jackwener commented on a diff in pull request #4365: reimplement `push_down_filter` to remove global-state

2022-11-29 Thread GitBox
jackwener commented on code in PR #4365: URL: https://github.com/apache/arrow-datafusion/pull/4365#discussion_r1035609444 ## datafusion/optimizer/src/push_down_filter.rs: ## @@ -403,94 +294,90 @@ fn extract_or_clause(expr: &Expr, schema_columns: &HashSet) -> Option, plan:

[GitHub] [arrow-datafusion] jackwener commented on a diff in pull request #4365: reimplement `push_down_filter` to remove global-state

2022-11-29 Thread GitBox
jackwener commented on code in PR #4365: URL: https://github.com/apache/arrow-datafusion/pull/4365#discussion_r1035613833 ## datafusion/optimizer/src/push_down_filter.rs: ## @@ -500,302 +387,359 @@ fn optimize_join( // vector will contain only join keys (without additi

[GitHub] [arrow-datafusion] jackwener commented on a diff in pull request #4365: reimplement `push_down_filter` to remove global-state

2022-11-29 Thread GitBox
jackwener commented on code in PR #4365: URL: https://github.com/apache/arrow-datafusion/pull/4365#discussion_r1035611930 ## datafusion/optimizer/src/push_down_filter.rs: ## @@ -500,302 +387,359 @@ fn optimize_join( // vector will contain only join keys (without additi

[GitHub] [arrow-datafusion] mingmwang commented on a diff in pull request #4365: reimplement `push_down_filter` to remove global-state

2022-11-29 Thread GitBox
mingmwang commented on code in PR #4365: URL: https://github.com/apache/arrow-datafusion/pull/4365#discussion_r1035609575 ## datafusion/optimizer/src/push_down_filter.rs: ## @@ -500,302 +387,359 @@ fn optimize_join( // vector will contain only join keys (without additi

[GitHub] [arrow-datafusion] jackwener commented on a diff in pull request #4365: reimplement `push_down_filter` to remove global-state

2022-11-29 Thread GitBox
jackwener commented on code in PR #4365: URL: https://github.com/apache/arrow-datafusion/pull/4365#discussion_r1035609444 ## datafusion/optimizer/src/push_down_filter.rs: ## @@ -403,94 +294,90 @@ fn extract_or_clause(expr: &Expr, schema_columns: &HashSet) -> Option, plan:

[GitHub] [arrow-datafusion] mingmwang commented on a diff in pull request #4365: reimplement `push_down_filter` to remove global-state

2022-11-29 Thread GitBox
mingmwang commented on code in PR #4365: URL: https://github.com/apache/arrow-datafusion/pull/4365#discussion_r1035603648 ## datafusion/optimizer/src/push_down_filter.rs: ## @@ -403,94 +294,90 @@ fn extract_or_clause(expr: &Expr, schema_columns: &HashSet) -> Option, plan:

[GitHub] [arrow-datafusion] mingmwang commented on pull request #4365: reimplement `push_down_filter` to remove global-state

2022-11-29 Thread GitBox
mingmwang commented on PR #4365: URL: https://github.com/apache/arrow-datafusion/pull/4365#issuecomment-1331709722 Except for the LogicalPlan::Window, the others LGTM. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use t

[GitHub] [arrow-datafusion] liukun4515 commented on a diff in pull request #4377: Refactor the Hash Join

2022-11-29 Thread GitBox
liukun4515 commented on code in PR #4377: URL: https://github.com/apache/arrow-datafusion/pull/4377#discussion_r1035549338 ## datafusion/core/src/physical_plan/joins/hash_join.rs: ## @@ -2306,23 +2162,36 @@ mod tests { Ok(()) } +fn build_semi_anti_left_table(

[GitHub] [arrow-datafusion] ygf11 commented on issue #4389: Proposal: Improve the join keys of logical plan

2022-11-29 Thread GitBox
ygf11 commented on issue #4389: URL: https://github.com/apache/arrow-datafusion/issues/4389#issuecomment-1331669666 > If we can change the pub on: Vec<(column,column)> to option, we don't need to do the https://github.com/apache/arrow-datafusion/pull/4353 specifically for the expr in the J

[GitHub] [arrow-rs] wjones127 opened a new pull request, #3236: fix(object_store,gcp): test copy_if_not_exist

2022-11-29 Thread GitBox
wjones127 opened a new pull request, #3236: URL: https://github.com/apache/arrow-rs/pull/3236 # Which issue does this PR close? Closes #3235. # Rationale for this change The `copy_if_not_exist` function was not tested, and didn't pass the test when enabled. It needed to

[GitHub] [arrow-rs] wjones127 opened a new issue, #3235: object_store(gcp): GCP complains about content-length for copy

2022-11-29 Thread GitBox
wjones127 opened a new issue, #3235: URL: https://github.com/apache/arrow-rs/issues/3235 **Describe the bug** An error in the GCP `copy_if_not_exist` was reported upstream in delta-rs: https://github.com/delta-io/delta-rs/issues/878#issue-1404449207 ``` PyDeltaTableError: Fa

[GitHub] [arrow-datafusion] HaoYang670 opened a new pull request, #4432: Remove the schema checking when creating `CrossJoinExec`

2022-11-29 Thread GitBox
HaoYang670 opened a new pull request, #4432: URL: https://github.com/apache/arrow-datafusion/pull/4432 Signed-off-by: remzi <1371656737...@gmail.com> # Which issue does this PR close? Closes #4431 . # Rationale for this change # What changes are inc

[GitHub] [arrow-datafusion] HaoYang670 opened a new issue, #4431: Remove the schema checking from `CrossJoinExec::try_new`

2022-11-29 Thread GitBox
HaoYang670 opened a new issue, #4431: URL: https://github.com/apache/arrow-datafusion/issues/4431 **Is your feature request related to a problem or challenge? Please describe what you are trying to do.** https://github.com/apache/arrow-datafusion/blob/49166ea55f317722ab7a37fbfc253bcd497c

[GitHub] [arrow] ursabot commented on pull request #14768: MINOR: Quick fix to the labeler for CPP files.

2022-11-29 Thread GitBox
ursabot commented on PR #14768: URL: https://github.com/apache/arrow/pull/14768#issuecomment-1331653701 Benchmark runs are scheduled for baseline = b1bcd6f3f17ceee958fae6905185a99e1307e6a7 and contender = ccb68afedf00a064c280220f480f3a639cce28f6. ccb68afedf00a064c280220f480f3a639cce28f6 is

[GitHub] [arrow-datafusion] liukun4515 commented on a diff in pull request #4377: Refactor the Hash Join

2022-11-29 Thread GitBox
liukun4515 commented on code in PR #4377: URL: https://github.com/apache/arrow-datafusion/pull/4377#discussion_r1035533851 ## datafusion/core/src/physical_plan/joins/hash_join.rs: ## @@ -2306,23 +2162,36 @@ mod tests { Ok(()) } +fn build_semi_anti_left_table(

[GitHub] [arrow-rs] aarashy commented on issue #3215: Off-by-one buffer size error triggers Panic when constructing RecordBatch from IPC bytes (should return an Error)

2022-11-29 Thread GitBox
aarashy commented on issue #3215: URL: https://github.com/apache/arrow-rs/issues/3215#issuecomment-1331649397 I removed the unwraps here https://github.com/apache/arrow-rs/pull/3232 I have some bytes which reproduce this error, but the data is private. The bytes were the result of the

[GitHub] [arrow-datafusion] liukun4515 commented on a diff in pull request #4377: Refactor the Hash Join

2022-11-29 Thread GitBox
liukun4515 commented on code in PR #4377: URL: https://github.com/apache/arrow-datafusion/pull/4377#discussion_r1035532240 ## datafusion/core/src/physical_plan/joins/hash_join.rs: ## @@ -2306,23 +2162,36 @@ mod tests { Ok(()) } +fn build_semi_anti_left_table(

[GitHub] [arrow-datafusion] liukun4515 commented on a diff in pull request #4377: Refactor the Hash Join

2022-11-29 Thread GitBox
liukun4515 commented on code in PR #4377: URL: https://github.com/apache/arrow-datafusion/pull/4377#discussion_r1035532240 ## datafusion/core/src/physical_plan/joins/hash_join.rs: ## @@ -2306,23 +2162,36 @@ mod tests { Ok(()) } +fn build_semi_anti_left_table(

[GitHub] [arrow-datafusion] jackwener commented on pull request #4365: reimplement `push_down_filter` to remove global-state

2022-11-29 Thread GitBox
jackwener commented on PR #4365: URL: https://github.com/apache/arrow-datafusion/pull/4365#issuecomment-1331646328 Has added it in UT. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

[GitHub] [arrow-datafusion] jackwener commented on pull request #4429: The CLI panics when passing an invalid explain query

2022-11-29 Thread GitBox
jackwener commented on PR #4429: URL: https://github.com/apache/arrow-datafusion/pull/4429#issuecomment-1331644695 Agree with @HaoYang670 , look like it should be fixed in sqlparser-rs. -- This is an automated message from the Apache Git Service. To respond to the message, please log on t

[GitHub] [arrow-datafusion] HaoYang670 commented on pull request #4429: The CLI panics when passing an invalid explain query

2022-11-29 Thread GitBox
HaoYang670 commented on PR #4429: URL: https://github.com/apache/arrow-datafusion/pull/4429#issuecomment-1331635067 > Hi @HaoYang670 please check the PR But tbh the optimizer doesn't respect errors now so the error message looks like > > ``` > DataFusion CLI v14.0.0 > ❯ explain

[GitHub] [arrow] wgtmac commented on pull request #14742: ARROW-18413: [C++][Parquet] Expose page index info from ColumnChunkMetaData

2022-11-29 Thread GitBox
wgtmac commented on PR #14742: URL: https://github.com/apache/arrow/pull/14742#issuecomment-1331634459 I have addressed your comment, and the unsuccessful CI checks are unrelated to my change. Can you please take a look again? @emkornfield -- This is an automated message from the Apache

[GitHub] [arrow-datafusion] mingmwang commented on pull request #4365: reimplement `push_down_filter` to remove global-state

2022-11-29 Thread GitBox
mingmwang commented on PR #4365: URL: https://github.com/apache/arrow-datafusion/pull/4365#issuecomment-1331632285 Before this PR, there is a global state which can help to avoid duplicate Filters been generated and pushed down. Now the global state is removed. Need to double conform th

[GitHub] [arrow-datafusion] liukun4515 commented on a diff in pull request #4377: Refactor the Hash Join

2022-11-29 Thread GitBox
liukun4515 commented on code in PR #4377: URL: https://github.com/apache/arrow-datafusion/pull/4377#discussion_r1035513158 ## datafusion/core/src/physical_plan/joins/hash_join.rs: ## @@ -1440,44 +1181,150 @@ fn equal_rows( err.unwrap_or(Ok(res)) } -// Produces a batch fo

[GitHub] [arrow-datafusion] liukun4515 commented on pull request #4411: optimize limit push for join case

2022-11-29 Thread GitBox
liukun4515 commented on PR #4411: URL: https://github.com/apache/arrow-datafusion/pull/4411#issuecomment-1331624320 cc @Dandandan if it looks good to you, I will merge this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub an

[GitHub] [arrow-rs] wjones127 opened a new pull request, #3234: fix: use better minimum part size

2022-11-29 Thread GitBox
wjones127 opened a new pull request, #3234: URL: https://github.com/apache/arrow-rs/pull/3234 # Which issue does this PR close? Closes #3233. # Rationale for this change Bumping up the size of the test data as well, so it's easier to catch this. However, I think the loc

[GitHub] [arrow-rs] wjones127 opened a new issue, #3233: object_store(aws): EntityTooSmall error on multi-part upload

2022-11-29 Thread GitBox
wjones127 opened a new issue, #3233: URL: https://github.com/apache/arrow-rs/issues/3233 **Describe the bug** Our multi-part upload pieces are too small for the AWS API's liking. Currently, it is using 5,000,000 byte parts, but minimum is either 5 MB or 5 MiB (not sure). Examp

[GitHub] [arrow-datafusion] liukun4515 commented on issue #4389: Proposal: Improve the join keys of logical plan

2022-11-29 Thread GitBox
liukun4515 commented on issue #4389: URL: https://github.com/apache/arrow-datafusion/issues/4389#issuecomment-1331621205 😭, I also confused about we split the join to `join` and `crossjoin` in the logical phase, I think we can combine these two together and just add `crossjoin` join_type f

[GitHub] [arrow-datafusion] mingmwang commented on pull request #4365: reimplement `push_down_filter` to remove global-state

2022-11-29 Thread GitBox
mingmwang commented on PR #4365: URL: https://github.com/apache/arrow-datafusion/pull/4365#issuecomment-1331618851 Could you please also modify the UT `optimize_plan()` method and let the rule run twice and see what will happen ? ``` fn optimize_plan(plan: &LogicalPlan) -> Log

[GitHub] [arrow-datafusion] liukun4515 commented on issue #4389: Proposal: Improve the join keys of logical plan

2022-11-29 Thread GitBox
liukun4515 commented on issue #4389: URL: https://github.com/apache/arrow-datafusion/issues/4389#issuecomment-1331617846 Can we change the logical plan of join to presto or doris? and extract the `on condition` to the `option` If we can change the `pub on: Vec<(column,column)>,` to o

[GitHub] [arrow-datafusion] liukun4515 commented on issue #4389: Proposal: Improve the join keys of logical plan

2022-11-29 Thread GitBox
liukun4515 commented on issue #4389: URL: https://github.com/apache/arrow-datafusion/issues/4389#issuecomment-1331614793 > equi_preds in the spark just the ``` case class Join( left: LogicalPlan, right: LogicalPlan, joinType: JoinType, condition:

[GitHub] [arrow-rs] aarashy opened a new pull request, #3232: Remove unwraps from 'create_primitive_array'

2022-11-29 Thread GitBox
aarashy opened a new pull request, #3232: URL: https://github.com/apache/arrow-rs/pull/3232 # Which issue does this PR close? Addresses part of https://github.com/apache/arrow-rs/issues/3215, but there is a separate mystery at play - what types of inputs were triggering panics

[GitHub] [arrow-datafusion] mingmwang commented on pull request #4365: reimplement `push_down_filter` to remove global-state

2022-11-29 Thread GitBox
mingmwang commented on PR #4365: URL: https://github.com/apache/arrow-datafusion/pull/4365#issuecomment-1331609471 > > You can try this: select (a + b) as c, count(*) from Table_A group by 1 > > ```rust > #[test] > fn push_down_filter_groupby_expr_contains_alias() { > let

[GitHub] [arrow-datafusion] jackwener commented on a diff in pull request #4365: reimplement `push_down_filter` to remove global-state

2022-11-29 Thread GitBox
jackwener commented on code in PR #4365: URL: https://github.com/apache/arrow-datafusion/pull/4365#discussion_r1035499731 ## datafusion/optimizer/src/push_down_filter.rs: ## @@ -500,302 +387,336 @@ fn optimize_join( // vector will contain only join keys (without additi

[GitHub] [arrow-datafusion] jackwener commented on a diff in pull request #4365: reimplement `push_down_filter` to remove global-state

2022-11-29 Thread GitBox
jackwener commented on code in PR #4365: URL: https://github.com/apache/arrow-datafusion/pull/4365#discussion_r1035499731 ## datafusion/optimizer/src/push_down_filter.rs: ## @@ -500,302 +387,336 @@ fn optimize_join( // vector will contain only join keys (without additi

[GitHub] [arrow-datafusion] liukun4515 commented on a diff in pull request #4400: Update to arrow 28

2022-11-29 Thread GitBox
liukun4515 commented on code in PR #4400: URL: https://github.com/apache/arrow-datafusion/pull/4400#discussion_r1035496939 ## datafusion/sql/src/planner.rs: ## @@ -3213,7 +3213,7 @@ mod tests { let sql = "SELECT CAST(10 AS DECIMAL(0))"; let err = logica

[GitHub] [arrow-datafusion] liukun4515 commented on a diff in pull request #4400: Update to arrow 28

2022-11-29 Thread GitBox
liukun4515 commented on code in PR #4400: URL: https://github.com/apache/arrow-datafusion/pull/4400#discussion_r1035496657 ## datafusion/sql/src/utils.rs: ## @@ -522,9 +522,12 @@ pub(crate) fn make_decimal_type( }; // Arrow decimal is i128 meaning 38 maximum decimal

[GitHub] [arrow-datafusion] liukun4515 commented on a diff in pull request #4400: Update to arrow 28

2022-11-29 Thread GitBox
liukun4515 commented on code in PR #4400: URL: https://github.com/apache/arrow-datafusion/pull/4400#discussion_r1035493777 ## datafusion/sql/src/utils.rs: ## @@ -522,9 +522,12 @@ pub(crate) fn make_decimal_type( }; // Arrow decimal is i128 meaning 38 maximum decimal

[GitHub] [arrow-datafusion] jackwener opened a new issue, #4430: `UnwrapCastInComparison` exist bug

2022-11-29 Thread GitBox
jackwener opened a new issue, #4430: URL: https://github.com/apache/arrow-datafusion/issues/4430 **Describe the bug** A clear and concise description of what the bug is. **To Reproduce** run in optimizer integration-test ```rust #[test] fn push_down_filter_groupby_ex

[GitHub] [arrow-datafusion] HaoYang670 commented on a diff in pull request #4400: Update to arrow 28

2022-11-29 Thread GitBox
HaoYang670 commented on code in PR #4400: URL: https://github.com/apache/arrow-datafusion/pull/4400#discussion_r1035494671 ## datafusion/sql/src/planner.rs: ## @@ -3213,7 +3213,7 @@ mod tests { let sql = "SELECT CAST(10 AS DECIMAL(0))"; let err = logica

[GitHub] [arrow-datafusion] liukun4515 commented on a diff in pull request #4400: Update to arrow 28

2022-11-29 Thread GitBox
liukun4515 commented on code in PR #4400: URL: https://github.com/apache/arrow-datafusion/pull/4400#discussion_r1035493777 ## datafusion/sql/src/utils.rs: ## @@ -522,9 +522,12 @@ pub(crate) fn make_decimal_type( }; // Arrow decimal is i128 meaning 38 maximum decimal

[GitHub] [arrow-datafusion] liukun4515 commented on a diff in pull request #4400: Update to arrow 28

2022-11-29 Thread GitBox
liukun4515 commented on code in PR #4400: URL: https://github.com/apache/arrow-datafusion/pull/4400#discussion_r1035493298 ## datafusion/sql/src/planner.rs: ## @@ -3213,7 +3213,7 @@ mod tests { let sql = "SELECT CAST(10 AS DECIMAL(0))"; let err = logica

[GitHub] [arrow-datafusion] liukun4515 commented on a diff in pull request #4400: Update to arrow 28

2022-11-29 Thread GitBox
liukun4515 commented on code in PR #4400: URL: https://github.com/apache/arrow-datafusion/pull/4400#discussion_r1035492782 ## datafusion/sql/src/planner.rs: ## @@ -3213,7 +3213,7 @@ mod tests { let sql = "SELECT CAST(10 AS DECIMAL(0))"; let err = logica

[GitHub] [arrow-datafusion] liukun4515 commented on a diff in pull request #4400: Update to arrow 28

2022-11-29 Thread GitBox
liukun4515 commented on code in PR #4400: URL: https://github.com/apache/arrow-datafusion/pull/4400#discussion_r1035491442 ## datafusion/optimizer/src/simplify_expressions/utils.rs: ## @@ -108,8 +106,12 @@ pub fn is_one(s: &Expr) -> bool { | Expr::Literal(ScalarValue::U

[GitHub] [arrow-datafusion] liukun4515 commented on a diff in pull request #4400: Update to arrow 28

2022-11-29 Thread GitBox
liukun4515 commented on code in PR #4400: URL: https://github.com/apache/arrow-datafusion/pull/4400#discussion_r1035489865 ## datafusion/expr/src/type_coercion/binary.rs: ## @@ -287,8 +287,8 @@ fn get_wider_decimal_type( (DataType::Decimal128(p1, s1), DataType::Decimal1

[GitHub] [arrow-datafusion] liukun4515 commented on a diff in pull request #4400: Update to arrow 28

2022-11-29 Thread GitBox
liukun4515 commented on code in PR #4400: URL: https://github.com/apache/arrow-datafusion/pull/4400#discussion_r103543 ## datafusion/expr/src/type_coercion/binary.rs: ## @@ -287,8 +287,8 @@ fn get_wider_decimal_type( (DataType::Decimal128(p1, s1), DataType::Decimal1

[GitHub] [arrow-datafusion] jackwener commented on pull request #4365: reimplement `push_down_filter` to remove global-state

2022-11-29 Thread GitBox
jackwener commented on PR #4365: URL: https://github.com/apache/arrow-datafusion/pull/4365#issuecomment-1331592683 > You can try this: select (a + b) as c, count(*) from Table_A group by 1 ```rust #[test] fn push_down_filter_groupby_expr_contains_alias() { let sql = "SEL

[GitHub] [arrow-datafusion] mingmwang commented on pull request #4365: reimplement `push_down_filter` to remove global-state

2022-11-29 Thread GitBox
mingmwang commented on PR #4365: URL: https://github.com/apache/arrow-datafusion/pull/4365#issuecomment-1331584666 > @mingmwang look like alias can't be in groupby. > > sql 1999 > > ``` > Function > Specify a grouped table derived by the application of the to the result

[GitHub] [arrow] ursabot commented on pull request #14731: ARROW-18380: [Dev] Update dev_pr GitHub workflows to accept both GitHub issues and JIRA

2022-11-29 Thread GitBox
ursabot commented on PR #14731: URL: https://github.com/apache/arrow/pull/14731#issuecomment-1331578942 Benchmark runs are scheduled for baseline = fde7b937c84eaad842ab0457d2490c6c8c244697 and contender = b1bcd6f3f17ceee958fae6905185a99e1307e6a7. b1bcd6f3f17ceee958fae6905185a99e1307e6a7 is

[GitHub] [arrow-rs] liukun4515 commented on issue #3223: precision is not considered when cast value to decimal

2022-11-29 Thread GitBox
liukun4515 commented on issue #3223: URL: https://github.com/apache/arrow-rs/issues/3223#issuecomment-1331566959 @viirya @tustvold thanks for your advice. In the user case, some cases want to get the error when the data is overflow for the precision, and some cases don't want to get the

[GitHub] [arrow] wgtmac commented on a diff in pull request #14742: ARROW-18413: [C++][Parquet] Expose page index info from ColumnChunkMetaData

2022-11-29 Thread GitBox
wgtmac commented on code in PR #14742: URL: https://github.com/apache/arrow/pull/14742#discussion_r1035455360 ## cpp/src/parquet/metadata.h: ## @@ -171,6 +171,13 @@ class PARQUET_EXPORT ColumnChunkMetaData { int64_t total_uncompressed_size() const; std::unique_ptr crypto_m

[GitHub] [arrow-datafusion] HaoYang670 commented on pull request #4429: The CLI panics when passing an invalid explain query

2022-11-29 Thread GitBox
HaoYang670 commented on PR #4429: URL: https://github.com/apache/arrow-datafusion/pull/4429#issuecomment-1331528826 Could we add a test for this? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [arrow-datafusion] xudong963 commented on pull request #4395: Add sqllogictests (v0)

2022-11-29 Thread GitBox
xudong963 commented on PR #4395: URL: https://github.com/apache/arrow-datafusion/pull/4395#issuecomment-1331510714 Thanks for reviewing @alamb . I'll review it in the evening. (GMT+8) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [arrow-datafusion-python] Jimexist closed pull request #65: version update of python and maturin

2022-11-29 Thread GitBox
Jimexist closed pull request #65: version update of python and maturin URL: https://github.com/apache/arrow-datafusion-python/pull/65 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comm

[GitHub] [arrow-datafusion] mvanschellebeeck commented on a diff in pull request #4395: Add sqllogictests (v0)

2022-11-29 Thread GitBox
mvanschellebeeck commented on code in PR #4395: URL: https://github.com/apache/arrow-datafusion/pull/4395#discussion_r1035414419 ## Cargo.toml: ## @@ -31,6 +31,7 @@ members = [ "test-utils", "parquet-test-utils", "benchmarks", +"tests/sqllogictests", Review C

[GitHub] [arrow-datafusion] mvanschellebeeck commented on a diff in pull request #4395: Add sqllogictests (v0)

2022-11-29 Thread GitBox
mvanschellebeeck commented on code in PR #4395: URL: https://github.com/apache/arrow-datafusion/pull/4395#discussion_r1035414041 ## tests/sqllogictests/README.md: ## @@ -0,0 +1,45 @@ + Overview + +This is the Datafusion implementation of [sqllogictest](https://www.sqlite.or

[GitHub] [arrow] ursabot commented on pull request #14744: GH-14745: [R] {rlang} dependency must be at least version 1.0.0 because of check_dots_empty

2022-11-29 Thread GitBox
ursabot commented on PR #14744: URL: https://github.com/apache/arrow/pull/14744#issuecomment-1331492200 Benchmark runs are scheduled for baseline = a594e38fad126a63c952e0fd84e773f80fc3b3f0 and contender = fde7b937c84eaad842ab0457d2490c6c8c244697. fde7b937c84eaad842ab0457d2490c6c8c244697 is

[GitHub] [arrow] vibhatha commented on a diff in pull request #14646: ARROW-18269: [C++] Handle slash character in Hive-style partition values

2022-11-29 Thread GitBox
vibhatha commented on code in PR #14646: URL: https://github.com/apache/arrow/pull/14646#discussion_r1035405839 ## cpp/src/arrow/dataset/partition_test.cc: ## @@ -1048,5 +1051,60 @@ TEST(TestStripPrefixAndFilename, Basic) { "year=2019/m

[GitHub] [arrow] wjones127 commented on a diff in pull request #14679: ARROW-15470: [R] Set null value in CSV writer

2022-11-29 Thread GitBox
wjones127 commented on code in PR #14679: URL: https://github.com/apache/arrow/pull/14679#discussion_r1035397111 ## r/R/csv.R: ## @@ -722,9 +731,10 @@ write_csv_arrow <- function(x, if (is.null(write_options)) { write_options <- readr_to_csv_write_options( - inclu

[GitHub] [arrow-datafusion] mvanschellebeeck commented on a diff in pull request #4395: Add sqllogictests (v0)

2022-11-29 Thread GitBox
mvanschellebeeck commented on code in PR #4395: URL: https://github.com/apache/arrow-datafusion/pull/4395#discussion_r1035395990 ## tests/sqllogictests/src/main.rs: ## @@ -0,0 +1,121 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor licens

[GitHub] [arrow-datafusion] comphead commented on pull request #4429: The CLI panics when passing an invalid explain query

2022-11-29 Thread GitBox
comphead commented on PR #4429: URL: https://github.com/apache/arrow-datafusion/pull/4429#issuecomment-1331454937 Hi @HaoYang670 please check the PR But tbh the optimizer doesn't respect errors now so the error message looks like ``` DataFusion CLI v14.0.0 ❯ explain explain sel

[GitHub] [arrow-datafusion] comphead opened a new pull request, #4429: The CLI panics when passing an invalid explain query

2022-11-29 Thread GitBox
comphead opened a new pull request, #4429: URL: https://github.com/apache/arrow-datafusion/pull/4429 # Which issue does this PR close? Closes #4378 . # Rationale for this change # What changes are included in this PR? Replace panics in favor if Error

[GitHub] [arrow] fatemehp commented on a diff in pull request #14603: PARQUET-2210: [C++][Parquet] Skip pages based on header metadata using a callback

2022-11-29 Thread GitBox
fatemehp commented on code in PR #14603: URL: https://github.com/apache/arrow/pull/14603#discussion_r1035378763 ## cpp/src/parquet/column_reader.cc: ## @@ -263,6 +269,11 @@ class SerializedPageReader : public PageReader { int compres

[GitHub] [arrow] fatemehp commented on a diff in pull request #14603: PARQUET-2210: [C++][Parquet] Skip pages based on header metadata using a callback

2022-11-29 Thread GitBox
fatemehp commented on code in PR #14603: URL: https://github.com/apache/arrow/pull/14603#discussion_r1035378900 ## cpp/src/parquet/metadata.h: ## @@ -182,6 +184,28 @@ class PARQUET_EXPORT ColumnChunkMetaData { std::unique_ptr impl_; }; +// \brief DataPageStats stores stati

[GitHub] [arrow] fatemehp commented on a diff in pull request #14603: PARQUET-2210: [C++][Parquet] Skip pages based on header metadata using a callback

2022-11-29 Thread GitBox
fatemehp commented on code in PR #14603: URL: https://github.com/apache/arrow/pull/14603#discussion_r1035363944 ## cpp/src/parquet/column_reader.cc: ## @@ -337,6 +348,50 @@ void SerializedPageReader::UpdateDecryption(const std::shared_ptr& de } } +bool SerializedPageReade

[GitHub] [arrow-datafusion] dmitrijoseph opened a new issue, #4428: with_column_renamed is always a NO OP when table name is ?table?

2022-11-29 Thread GitBox
dmitrijoseph opened a new issue, #4428: URL: https://github.com/apache/arrow-datafusion/issues/4428 ``` let test_df = ctx.read_csv("test.csv", CsvReadOptions::new()).await?; let test_df = test_df.with_column_renamed("id", "renamedID")?; println!("{:#?}", test_df.explain(true, true)?

[GitHub] [arrow] github-actions[bot] commented on pull request #14777: ARROW-18112: [Go] Remaining Scalar Arithmetic

2022-11-29 Thread GitBox
github-actions[bot] commented on PR #14777: URL: https://github.com/apache/arrow/pull/14777#issuecomment-1331406762 :warning: Ticket **has not been started in JIRA**, please click 'Start Progress'. -- This is an automated message from the Apache Git Service. To respond to the message, ple

[GitHub] [arrow] github-actions[bot] commented on pull request #14777: ARROW-18112: [Go] Remaining Scalar Arithmetic

2022-11-29 Thread GitBox
github-actions[bot] commented on PR #14777: URL: https://github.com/apache/arrow/pull/14777#issuecomment-1331406735 https://issues.apache.org/jira/browse/ARROW-18112 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [arrow-datafusion] alamb commented on issue #4349: DataFusion Configuration Consolidation

2022-11-29 Thread GitBox
alamb commented on issue #4349: URL: https://github.com/apache/arrow-datafusion/issues/4349#issuecomment-1331402197 Here is my next contribution to clean up configuration: https://github.com/apache/arrow-datafusion/pull/4427 (slowly consolidating the configurations) -- This is an automa

[GitHub] [arrow-datafusion] alamb closed pull request #3885: Consolidate remaining parquet config options into ConfigOptions

2022-11-29 Thread GitBox
alamb closed pull request #3885: Consolidate remaining parquet config options into ConfigOptions URL: https://github.com/apache/arrow-datafusion/pull/3885 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

[GitHub] [arrow-datafusion] alamb commented on pull request #3885: Consolidate remaining parquet config options into ConfigOptions

2022-11-29 Thread GitBox
alamb commented on PR #3885: URL: https://github.com/apache/arrow-datafusion/pull/3885#issuecomment-1331401524 Updated version in https://github.com/apache/arrow-datafusion/pull/4427 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to G

[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #4427: Expose remaining parquet config options into ConfigOptions (try 2)

2022-11-29 Thread GitBox
alamb commented on code in PR #4427: URL: https://github.com/apache/arrow-datafusion/pull/4427#discussion_r1035340983 ## benchmarks/src/bin/tpch.rs: ## @@ -396,7 +396,8 @@ async fn get_table( } "parquet" => { let path = format!("{}/{}",

[GitHub] [arrow-datafusion] alamb opened a new pull request, #4427: Expose remaining parquet config options into ConfigOptions (try 2)

2022-11-29 Thread GitBox
alamb opened a new pull request, #4427: URL: https://github.com/apache/arrow-datafusion/pull/4427 this is a reworked version of https://github.com/apache/arrow-datafusion/pull/3885 # Which issue does this PR close? Closes https://github.com/apache/arrow-datafusion/issues/3821

[GitHub] [arrow] ursabot commented on pull request #14762: GH-14761: [Dev] Update labels on PR labeler to use new Component ones

2022-11-29 Thread GitBox
ursabot commented on PR #14762: URL: https://github.com/apache/arrow/pull/14762#issuecomment-1331370822 ['Python', 'R'] benchmarks have high level of regressions. [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/1f50bc0aff244c6db1a8dd358b21f256...8a1000f38f5849a9aaaf1dcc92024de7/)

[GitHub] [arrow] ursabot commented on pull request #14762: GH-14761: [Dev] Update labels on PR labeler to use new Component ones

2022-11-29 Thread GitBox
ursabot commented on PR #14762: URL: https://github.com/apache/arrow/pull/14762#issuecomment-1331370501 Benchmark runs are scheduled for baseline = d77ced27a008ef0cb32093e62f890ba38a16febd and contender = a594e38fad126a63c952e0fd84e773f80fc3b3f0. a594e38fad126a63c952e0fd84e773f80fc3b3f0 is

[GitHub] [arrow] kou commented on a diff in pull request #14585: ARROW-17884: [C++] Add Intel®-IAA/QPL-based Parquet RLE Decode

2022-11-29 Thread GitBox
kou commented on code in PR #14585: URL: https://github.com/apache/arrow/pull/14585#discussion_r1035301666 ## cpp/cmake_modules/ThirdpartyToolchain.cmake: ## @@ -183,7 +184,9 @@ macro(build_dependency DEPENDENCY_NAME) build_orc() elseif("${DEPENDENCY_NAME}" STREQUAL "Pro

[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #3885: Consolidate remaining parquet config options into ConfigOptions

2022-11-29 Thread GitBox
alamb commented on code in PR #3885: URL: https://github.com/apache/arrow-datafusion/pull/3885#discussion_r1035276922 ## datafusion/core/src/config.rs: ## @@ -237,6 +247,29 @@ impl BuiltInConfigs { to reduce the number of rows decoded.", false,

[GitHub] [arrow-rs] ursabot commented on pull request #3231: Fix CI build by upgrading tonic-build to 0.8.4

2022-11-29 Thread GitBox
ursabot commented on PR #3231: URL: https://github.com/apache/arrow-rs/pull/3231#issuecomment-1331288540 Benchmark runs are scheduled for baseline = bdfe0fdeb127c99ef918af779a3b8404e91e41b1 and contender = 1a8e6ed957e483ec27b88fce54a48b8176be3179. 1a8e6ed957e483ec27b88fce54a48b8176be3179 i

[GitHub] [arrow-rs] viirya merged pull request #3231: Fix CI build by upgrading tonic-build to 0.8.4

2022-11-29 Thread GitBox
viirya merged PR #3231: URL: https://github.com/apache/arrow-rs/pull/3231 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apach

[GitHub] [arrow-nanoarrow] codecov-commenter commented on pull request #78: [C][R] Port release verification script

2022-11-29 Thread GitBox
codecov-commenter commented on PR #78: URL: https://github.com/apache/arrow-nanoarrow/pull/78#issuecomment-1331273561 # [Codecov](https://codecov.io/gh/apache/arrow-nanoarrow/pull/78?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+

[GitHub] [arrow-nanoarrow] paleolimbot opened a new pull request, #78: [C][R] Port release verification script

2022-11-29 Thread GitBox
paleolimbot opened a new pull request, #78: URL: https://github.com/apache/arrow-nanoarrow/pull/78 Work in progress! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsu

[GitHub] [arrow-datafusion] alamb commented on issue #4426: Support prepared statement

2022-11-29 Thread GitBox
alamb commented on issue #4426: URL: https://github.com/apache/arrow-datafusion/issues/4426#issuecomment-1331261721 I think the feature described in this proposal is needed to properly handle prepared statements in FlightSQL For example, in ballista parameter handling appears to stil

[GitHub] [arrow-rs] viirya opened a new pull request, #3231: Fix CI build by upgrading tonic-build to 0.8.4

2022-11-29 Thread GitBox
viirya opened a new pull request, #3231: URL: https://github.com/apache/arrow-rs/pull/3231 # Which issue does this PR close? Closes #. # Rationale for this change CI now failed by ``` error: failed to select a version for the requirement `tonic-

[GitHub] [arrow-datafusion] NGA-TRAN commented on issue #4426: Support prepared statement

2022-11-29 Thread GitBox
NGA-TRAN commented on issue #4426: URL: https://github.com/apache/arrow-datafusion/issues/4426#issuecomment-1331260826 Thanks @alamb . Let me see how Logical Plan looks like and propose a clearer one -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [arrow] lidavidm merged pull request #14573: ARROW-18237: [Java] Extend Table code

2022-11-29 Thread GitBox
lidavidm merged PR #14573: URL: https://github.com/apache/arrow/pull/14573 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apac

[GitHub] [arrow-datafusion] alamb commented on issue #4426: Support prepared statement

2022-11-29 Thread GitBox
alamb commented on issue #4426: URL: https://github.com/apache/arrow-datafusion/issues/4426#issuecomment-1331258449 This is great @NGA-TRAN -- thank you for writing it up. My only feedback is that for option 2 it might be easier if the output was a new LogicalPlan that had the parameter v

[GitHub] [arrow-rs] viirya opened a new pull request, #3230: Remove negative scale check

2022-11-29 Thread GitBox
viirya opened a new pull request, #3230: URL: https://github.com/apache/arrow-rs/pull/3230 # Which issue does this PR close? Closes #. # Rationale for this change I re-checked how Spark handles negative scale. Negative scale is not limited to the max sca

[GitHub] [arrow-datafusion] NGA-TRAN commented on issue #4426: Support prepared statement

2022-11-29 Thread GitBox
NGA-TRAN commented on issue #4426: URL: https://github.com/apache/arrow-datafusion/issues/4426#issuecomment-1331255758 @alamb What do you think? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

[GitHub] [arrow-datafusion] NGA-TRAN opened a new issue, #4426: Support prepared statemet

2022-11-29 Thread GitBox
NGA-TRAN opened a new issue, #4426: URL: https://github.com/apache/arrow-datafusion/issues/4426 **Is your feature request related to a problem or challenge? Please describe what you are trying to do.** In order to support [Prepare statement](https://en.wikipedia.org/wiki/Prepared_stateme

[GitHub] [arrow-rs] viirya commented on a diff in pull request #3222: Move some tests from `arrow-cast` to `arrow`

2022-11-29 Thread GitBox
viirya commented on code in PR #3222: URL: https://github.com/apache/arrow-rs/pull/3222#discussion_r1035233528 ## arrow-cast/src/cast.rs: ## @@ -3614,7 +3616,6 @@ mod tests { } #[test] -#[cfg(not(feature = "force_validate"))] Review Comment: For the tests whi

[GitHub] [arrow-datafusion] ursabot commented on pull request #4406: Add integration test for erroring when memory limits are hit

2022-11-29 Thread GitBox
ursabot commented on PR #4406: URL: https://github.com/apache/arrow-datafusion/pull/4406#issuecomment-1331242517 Benchmark runs are scheduled for baseline = 66c95e70ae2ff9f3f89b91898ede875d316e731f and contender = 49166ea55f317722ab7a37fbfc253bcd497c1672. 49166ea55f317722ab7a37fbfc253bcd4

  1   2   3   4   5   >