[GitHub] [arrow-datafusion] ygf11 commented on pull request #4711: Move the extract_join_keys to optimizer

2022-12-23 Thread GitBox
ygf11 commented on PR #4711: URL: https://github.com/apache/arrow-datafusion/pull/4711#issuecomment-1364480944 @jackwener Thanks for review. I think it is ok. > But now this rule is previous. Can this rule ignore B.key = C.key? or we need reorder rule? No, this rule will ignor

[GitHub] [arrow-datafusion] jackwener commented on pull request #4711: Move the extract_join_keys to optimizer

2022-12-23 Thread GitBox
jackwener commented on PR #4711: URL: https://github.com/apache/arrow-datafusion/pull/4711#issuecomment-1364480554 > I'm not sure if some special case will meet problem. like: > > `A join (B join C on B.id = C.id) on A.id = B.id and B.key = C.key`. > > `B.key = C.key` will be p

[GitHub] [arrow-datafusion] jackwener commented on pull request #4711: Move the extract_join_keys to optimizer

2022-12-23 Thread GitBox
jackwener commented on PR #4711: URL: https://github.com/apache/arrow-datafusion/pull/4711#issuecomment-1364476553 But I'm not sure if some special case will meet problem. like: `A join (B join C on B.id = C.id) on A.id = B.id and B.key = C.key`. `B.key = C.key` will be pushdo

[GitHub] [arrow-datafusion] jackwener commented on a diff in pull request #4711: Move the extract_join_keys to optimizer

2022-12-23 Thread GitBox
jackwener commented on code in PR #4711: URL: https://github.com/apache/arrow-datafusion/pull/4711#discussion_r1056753570 ## datafusion/sql/src/planner.rs: ## @@ -820,45 +818,20 @@ impl<'a, S: ContextProvider> SqlToRel<'a, S> { // normalize all columns in expr

[GitHub] [arrow-datafusion] ygf11 commented on a diff in pull request #4711: Move the extract_join_keys to optimizer

2022-12-23 Thread GitBox
ygf11 commented on code in PR #4711: URL: https://github.com/apache/arrow-datafusion/pull/4711#discussion_r1056751676 ## datafusion/optimizer/src/extract_equijoin_predicate.rs: ## @@ -0,0 +1,438 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contri

[GitHub] [arrow-datafusion] ygf11 commented on a diff in pull request #4711: Move the extract_join_keys to optimizer

2022-12-23 Thread GitBox
ygf11 commented on code in PR #4711: URL: https://github.com/apache/arrow-datafusion/pull/4711#discussion_r1056751207 ## datafusion/optimizer/src/extract_equijoin_predicate.rs: ## @@ -0,0 +1,438 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contri

[GitHub] [arrow-datafusion] ygf11 commented on a diff in pull request #4711: Move the extract_join_keys to optimizer

2022-12-23 Thread GitBox
ygf11 commented on code in PR #4711: URL: https://github.com/apache/arrow-datafusion/pull/4711#discussion_r105674 ## datafusion/sql/src/planner.rs: ## @@ -806,7 +805,7 @@ impl<'a, S: ContextProvider> SqlToRel<'a, S> { ) -> Result { match constraint {

[GitHub] [arrow] ursabot commented on pull request #15049: GH-14968: [Python] Fix segfault for dataset ORC write

2022-12-23 Thread GitBox
ursabot commented on PR #15049: URL: https://github.com/apache/arrow/pull/15049#issuecomment-1364466853 ['Python', 'R'] benchmarks have high level of regressions. [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/a4e46214056848d09b08182a8bef21d8...24e29913218842b78d6f594c6efbe545/)

[GitHub] [arrow] ursabot commented on pull request #15049: GH-14968: [Python] Fix segfault for dataset ORC write

2022-12-23 Thread GitBox
ursabot commented on PR #15049: URL: https://github.com/apache/arrow/pull/15049#issuecomment-1364466834 Benchmark runs are scheduled for baseline = c88fe74d04bad57e03fc791b1f03ee8c8f039fd3 and contender = 5feabc57bd75a54b5fe003988a14394aa621df05. 5feabc57bd75a54b5fe003988a14394aa621df05 is

[GitHub] [arrow] westonpace commented on pull request #14867: GH-14866: [C++] Remove internal GroupBy implementation

2022-12-23 Thread GitBox
westonpace commented on PR #14867: URL: https://github.com/apache/arrow/pull/14867#issuecomment-1364454353 @jorisvandenbossche Thanks for pointing out that problem. I think I've addressed your concerns (and I've added your example as a test case). -- This is an automated message from the

[GitHub] [arrow-adbc] lidavidm merged pull request #264: chore(dev/release): fix unbound variable in 02-source.sh

2022-12-23 Thread GitBox
lidavidm merged PR #264: URL: https://github.com/apache/arrow-adbc/pull/264 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apa

[GitHub] [arrow-adbc] lidavidm opened a new pull request, #264: chore(dev/release): fix unbound variable in 02-source.sh

2022-12-23 Thread GitBox
lidavidm opened a new pull request, #264: URL: https://github.com/apache/arrow-adbc/pull/264 Fixes #263. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-m

[GitHub] [arrow-adbc] lidavidm merged pull request #261: chore: add support for releasing Linux packages

2022-12-23 Thread GitBox
lidavidm merged PR #261: URL: https://github.com/apache/arrow-adbc/pull/261 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apa

[GitHub] [arrow-adbc] lidavidm merged pull request #262: chore(dev/release): make sure Maven updates all versions

2022-12-23 Thread GitBox
lidavidm merged PR #262: URL: https://github.com/apache/arrow-adbc/pull/262 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apa

[GitHub] [arrow-adbc] lidavidm opened a new pull request, #262: chore(dev/release): make sure Maven updates all versions

2022-12-23 Thread GitBox
lidavidm opened a new pull request, #262: URL: https://github.com/apache/arrow-adbc/pull/262 For some reason Maven didn't update all of the package versions just now. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [arrow] ursabot commented on pull request #15082: GH-15081: [Release] Add support for using custom artifacts directory in dev/release/05-binary-upload.sh

2022-12-23 Thread GitBox
ursabot commented on PR #15082: URL: https://github.com/apache/arrow/pull/15082#issuecomment-1364448175 ['Python', 'R'] benchmarks have high level of regressions. [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/f365934bedb040c9b598437fd3f6099e...a4e46214056848d09b08182a8bef21d8/)

[GitHub] [arrow] ursabot commented on pull request #15082: GH-15081: [Release] Add support for using custom artifacts directory in dev/release/05-binary-upload.sh

2022-12-23 Thread GitBox
ursabot commented on PR #15082: URL: https://github.com/apache/arrow/pull/15082#issuecomment-1364448117 Benchmark runs are scheduled for baseline = 5a9805807456fa1b50671afded557044ab6cc8e6 and contender = c88fe74d04bad57e03fc791b1f03ee8c8f039fd3. c88fe74d04bad57e03fc791b1f03ee8c8f039fd3 is

[GitHub] [arrow-adbc] lidavidm merged pull request #260: ci: disable Linux packaging builds for RCs for now

2022-12-23 Thread GitBox
lidavidm merged PR #260: URL: https://github.com/apache/arrow-adbc/pull/260 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apa

[GitHub] [arrow-ballista] mpurins-coralogix opened a new issue, #578: starts_with function is serialised as UDF

2022-12-23 Thread GitBox
mpurins-coralogix opened a new issue, #578: URL: https://github.com/apache/arrow-ballista/issues/578 **Describe the bug** When physical plan includes datafusion builtin function `starts_with` then it is serialized as UDF. **To Reproduce** Following test (can be added in some

[GitHub] [arrow] vibhatha commented on pull request #15083: ARROW-18403: [C++] Add support for nullary and n-ary aggregate functions

2022-12-23 Thread GitBox
vibhatha commented on PR #15083: URL: https://github.com/apache/arrow/pull/15083#issuecomment-1364438934 @westonpace @wjones127 could please help with running the CIs and reviews. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitH

[GitHub] [arrow] lpiep commented on issue #15055: Aggregate Functions in R API

2022-12-23 Thread GitBox
lpiep commented on issue #15055: URL: https://github.com/apache/arrow/issues/15055#issuecomment-1364434006 Great! I'll mark this closed then. Thanks for everyone's help. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [arrow] djnavarro commented on issue #15055: Aggregate Functions in R API

2022-12-23 Thread GitBox
djnavarro commented on issue #15055: URL: https://github.com/apache/arrow/issues/15055#issuecomment-1364433281 Hi @lpiep I'm out of the loop nowadays but it's my understanding that those changes will migrate to the front page with the 11.0.0 release. Until you can find them in the dev docs

[GitHub] [arrow] lpiep commented on issue #15055: Aggregate Functions in R API

2022-12-23 Thread GitBox
lpiep commented on issue #15055: URL: https://github.com/apache/arrow/issues/15055#issuecomment-1364428506 Actually it appears @djnavarro has made some changes recently in the package vignette that makes this clear. However, their changes aren't reflected in the documentation site (https://

[GitHub] [arrow-adbc] kou commented on pull request #261: chore: add support for releasing Linux packages

2022-12-23 Thread GitBox
kou commented on PR #261: URL: https://github.com/apache/arrow-adbc/pull/261#issuecomment-1364422145 I confirmed that Artifactory upload works for AlmaLinux and Debian packages. (I didn't confirm Ubuntu packages because it uses the same logic for Debian packages.) I used `dev/release/ver

[GitHub] [arrow] github-actions[bot] commented on pull request #15083: ARROW-18403: [C++] Add support for nullary and n-ary aggregate functions

2022-12-23 Thread GitBox
github-actions[bot] commented on PR #15083: URL: https://github.com/apache/arrow/pull/15083#issuecomment-1364421211 :warning: Ticket **has not been started in JIRA**, please click 'Start Progress'. -- This is an automated message from the Apache Git Service. To respond to the message, ple

[GitHub] [arrow] github-actions[bot] commented on pull request #15083: ARROW-18403: [C++] Add support for nullary and n-ary aggregate functions

2022-12-23 Thread GitBox
github-actions[bot] commented on PR #15083: URL: https://github.com/apache/arrow/pull/15083#issuecomment-1364421210 https://issues.apache.org/jira/browse/ARROW-18403 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [arrow] felipecrv opened a new pull request, #15083: ARROW-18403: [C++] Add support for nullary and n-ary aggregate functions

2022-12-23 Thread GitBox
felipecrv opened a new pull request, #15083: URL: https://github.com/apache/arrow/pull/15083 - [x] Add ability to pass 0 or more than 1 target fields via the Aggregate API - [ ] Add support for nullary `count` -- `count(*)` -- This is an automated message from the Apache Git Service.

[GitHub] [arrow-adbc] kou opened a new pull request, #261: chore: add support for releasing Linux packages

2022-12-23 Thread GitBox
kou opened a new pull request, #261: URL: https://github.com/apache/arrow-adbc/pull/261 Fixes #259. With this change, we don't use source archive to build Linux package. We always use the current source because our release workflow create source archive and binary artifacts in parall

[GitHub] [arrow] kou merged pull request #15049: GH-14968: [Python] Fix segfault for dataset ORC write

2022-12-23 Thread GitBox
kou merged PR #15049: URL: https://github.com/apache/arrow/pull/15049 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.or

[GitHub] [arrow-datafusion] tustvold commented on a diff in pull request #4721: Non-deprecated support for planning SQL without DDL, deprecate some more SessionContext methods

2022-12-23 Thread GitBox
tustvold commented on code in PR #4721: URL: https://github.com/apache/arrow-datafusion/pull/4721#discussion_r1056695809 ## datafusion/core/src/execution/context.rs: ## @@ -559,11 +573,9 @@ impl SessionContext { } Ok(false) } -/// Creates a logical pla

[GitHub] [arrow] ursabot commented on pull request #14052: ARROW-16728: [Python] Switch default and deprecate use_legacy_dataset=True in ParquetDataset

2022-12-23 Thread GitBox
ursabot commented on PR #14052: URL: https://github.com/apache/arrow/pull/14052#issuecomment-1364417929 Benchmark runs are scheduled for baseline = 305026f62c172acfe0ed6549288e209358247dda and contender = 5a9805807456fa1b50671afded557044ab6cc8e6. 5a9805807456fa1b50671afded557044ab6cc8e6 is

[GitHub] [arrow] kou merged pull request #15082: GH-15081: [Release] Add support for using custom artifacts directory in dev/release/05-binary-upload.sh

2022-12-23 Thread GitBox
kou merged PR #15082: URL: https://github.com/apache/arrow/pull/15082 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.or

[GitHub] [arrow] kou commented on pull request #15082: GH-15081: [Release] Add support for using custom artifacts directory in dev/release/05-binary-upload.sh

2022-12-23 Thread GitBox
kou commented on PR #15082: URL: https://github.com/apache/arrow/pull/15082#issuecomment-1364417820 +1 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mai

[GitHub] [arrow] kou commented on pull request #15075: GH-15040: [C++] Improve pkg-config support for ARROW_BUILD_SHARED=OFF

2022-12-23 Thread GitBox
kou commented on PR #15075: URL: https://github.com/apache/arrow/pull/15075#issuecomment-1364417755 This is ready to review/merge. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

[GitHub] [arrow-datafusion] comphead commented on issue #4723: Default generated column name is confusing on casts

2022-12-23 Thread GitBox
comphead commented on issue #4723: URL: https://github.com/apache/arrow-datafusion/issues/4723#issuecomment-1364414066 @alamb I already worked on this https://github.com/apache/arrow-datafusion/issues/3722 and would like to continue if we can decide the column name convention -- This is

[GitHub] [arrow-datafusion] comphead opened a new issue, #4723: Default generated column name is confusing on casts

2022-12-23 Thread GitBox
comphead opened a new issue, #4723: URL: https://github.com/apache/arrow-datafusion/issues/4723 **Is your feature request related to a problem or challenge? Please describe what you are trying to do.** There were bunch of discussions that generated column names confusing, hard to read an

[GitHub] [arrow-datafusion] tustvold commented on a diff in pull request #4721: Non-deprecated support for planning SQL without DDL, deprecate some more SessionContext methods

2022-12-23 Thread GitBox
tustvold commented on code in PR #4721: URL: https://github.com/apache/arrow-datafusion/pull/4721#discussion_r1056685904 ## datafusion/core/src/physical_plan/planner.rs: ## @@ -1097,15 +1097,15 @@ impl DefaultPhysicalPlanner { // TABLE" -- it must be handled

[GitHub] [arrow-datafusion] tustvold commented on a diff in pull request #4721: Non-deprecated support for planning SQL without DDL, deprecate some more SessionContext methods

2022-12-23 Thread GitBox
tustvold commented on code in PR #4721: URL: https://github.com/apache/arrow-datafusion/pull/4721#discussion_r1056685904 ## datafusion/core/src/physical_plan/planner.rs: ## @@ -1097,15 +1097,15 @@ impl DefaultPhysicalPlanner { // TABLE" -- it must be handled

[GitHub] [arrow-datafusion] tustvold commented on a diff in pull request #4721: Non-deprecated support for planning SQL without DDL, deprecate some more SessionContext methods

2022-12-23 Thread GitBox
tustvold commented on code in PR #4721: URL: https://github.com/apache/arrow-datafusion/pull/4721#discussion_r1056684089 ## datafusion/core/src/execution/context.rs: ## @@ -559,11 +573,9 @@ impl SessionContext { } Ok(false) } -/// Creates a logical pla

[GitHub] [arrow] github-actions[bot] commented on pull request #15075: GH-15040: [C++] Improve pkg-config support for ARROW_BUILD_SHARED=OFF

2022-12-23 Thread GitBox
github-actions[bot] commented on PR #15075: URL: https://github.com/apache/arrow/pull/15075#issuecomment-1364377916 Revision: 62e7a27a08a14e69b120ee3794de872c36debe8e Submitted crossbow builds: [ursacomputing/crossbow @ actions-67a83d](https://github.com/ursacomputing/crossbow/bra

[GitHub] [arrow] kou commented on issue #15080: [CI][R] Binary package job for R 4.1 on Windows is failed with purrr 1.0.0

2022-12-23 Thread GitBox
kou commented on issue #15080: URL: https://github.com/apache/arrow/issues/15080#issuecomment-1364377465 > Purrr already has a patch merged: https://github.com/tidyverse/purrr/pull/1017 Thanks for the information! > We could switch to purr Dev version until it is on cran?

[GitHub] [arrow-datafusion] tustvold commented on a diff in pull request #4721: Non-deprecated support for planning SQL without DDL, deprecate some more SessionContext methods

2022-12-23 Thread GitBox
tustvold commented on code in PR #4721: URL: https://github.com/apache/arrow-datafusion/pull/4721#discussion_r1056683926 ## datafusion/core/src/execution/context.rs: ## @@ -492,6 +497,15 @@ impl SessionContext { } } +/// Creates a [`DataFrame`] that will exec

[GitHub] [arrow] kou commented on pull request #15075: GH-15040: [C++] Improve pkg-config support for ARROW_BUILD_SHARED=OFF

2022-12-23 Thread GitBox
kou commented on PR #15075: URL: https://github.com/apache/arrow/pull/15075#issuecomment-1364377163 @github-actions crossbow submit r-binary-packages -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [arrow] github-actions[bot] commented on pull request #15082: GH-15081: [Release] Add support for using custom artifacts directory in dev/release/05-binary-upload.sh

2022-12-23 Thread GitBox
github-actions[bot] commented on PR #15082: URL: https://github.com/apache/arrow/pull/15082#issuecomment-1364375971 :warning: GitHub issue #15081 **has been automatically assigned in GitHub** to PR creator. -- This is an automated message from the Apache Git Service. To respond to the mes

[GitHub] [arrow] github-actions[bot] commented on pull request #15082: GH-15081: [Release] Add support for using custom artifacts directory in dev/release/05-binary-upload.sh

2022-12-23 Thread GitBox
github-actions[bot] commented on PR #15082: URL: https://github.com/apache/arrow/pull/15082#issuecomment-1364375951 * Closes: #15081 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

[GitHub] [arrow] kou opened a new pull request, #15082: GH-15081: [Release] Add support for using custom artifacts directory in dev/release/05-binary-upload.sh

2022-12-23 Thread GitBox
kou opened a new pull request, #15082: URL: https://github.com/apache/arrow/pull/15082 It's for reusing the script from apache/arrow-adbc. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

[GitHub] [arrow-datafusion] alamb commented on pull request #4722: Dynamic information_schema configuration and port more tests

2022-12-23 Thread GitBox
alamb commented on PR #4722: URL: https://github.com/apache/arrow-datafusion/pull/4722#issuecomment-1364374144 FYI @tustvold -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #4722: Dynamic information_schema configuration and port more tests

2022-12-23 Thread GitBox
alamb commented on code in PR #4722: URL: https://github.com/apache/arrow-datafusion/pull/4722#discussion_r1056681646 ## datafusion/core/tests/sql/information_schema.rs: ## @@ -30,91 +30,6 @@ use rstest::rstest; use super::*; -#[tokio::test] Review Comment: I ported a f

[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #4722: Dynamic information_schema configuration and port more tests

2022-12-23 Thread GitBox
alamb commented on code in PR #4722: URL: https://github.com/apache/arrow-datafusion/pull/4722#discussion_r105668 ## datafusion/core/src/execution/context.rs: ## @@ -1546,17 +1550,10 @@ impl SessionState { Self::register_default_schema(&config, &runtime, &defa

[GitHub] [arrow-datafusion] alamb opened a new pull request, #4722: Dynamic information_schema configuration and port more tests

2022-12-23 Thread GitBox
alamb opened a new pull request, #4722: URL: https://github.com/apache/arrow-datafusion/pull/4722 # Which issue does this PR close? Part of https://github.com/apache/arrow-datafusion/issues/4495 # Rationale for this change Currently the information schema can not be enabl

[GitHub] [arrow] assignUser commented on issue #15080: [CI][R] Binary package job for R 4.1 on Windows is failed with purrr 1.0.0

2022-12-23 Thread GitBox
assignUser commented on issue #15080: URL: https://github.com/apache/arrow/issues/15080#issuecomment-1364365941 Purrr already has a patch merged: https://github.com/tidyverse/purrr/pull/1017 We could switch to purr Dev version until it is merged? -- This is an automated message fro

[GitHub] [arrow] assignUser commented on issue #15080: [CI][R] Binary package job for R 4.1 on Windows is failed with purrr 1.0.0

2022-12-23 Thread GitBox
assignUser commented on issue #15080: URL: https://github.com/apache/arrow/issues/15080#issuecomment-1364365658 Purrr already has a patch merged: https://github.com/tidyverse/purrr/pull/1017 -- This is an automated message from the Apache Git Service. To respond to the message, please log

[GitHub] [arrow] rtpsw commented on pull request #14352: ARROW-17642: [C++] Add ordered aggregation

2022-12-23 Thread GitBox
rtpsw commented on PR #14352: URL: https://github.com/apache/arrow/pull/14352#issuecomment-1364360685 > I don't think you're testing a group by with the exec plan using segment keys but I might be missing something. So I don't think the changes in aggregate_node.cc are tested. This l

[GitHub] [arrow] github-actions[bot] commented on pull request #15075: GH-15040: [C++] Improve pkg-config support for ARROW_BUILD_SHARED=OFF

2022-12-23 Thread GitBox
github-actions[bot] commented on PR #15075: URL: https://github.com/apache/arrow/pull/15075#issuecomment-1364359391 Revision: a9f43475ba0f909b01a6c03e529bfa655e93e609 Submitted crossbow builds: [ursacomputing/crossbow @ actions-f0dd31f2ff](https://github.com/ursacomputing/crossbow/bra

[GitHub] [arrow] kou commented on pull request #15075: GH-15040: [C++] Improve pkg-config support for ARROW_BUILD_SHARED=OFF

2022-12-23 Thread GitBox
kou commented on PR #15075: URL: https://github.com/apache/arrow/pull/15075#issuecomment-1364358319 @github-actions crossbow submit r-binary-packages -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [arrow] kou commented on a diff in pull request #15049: GH-14968: [Python] Fix segfault for dataset ORC write

2022-12-23 Thread GitBox
kou commented on code in PR #15049: URL: https://github.com/apache/arrow/pull/15049#discussion_r1056668069 ## cpp/src/arrow/dataset/file_base.h: ## @@ -200,6 +200,9 @@ class ARROW_DS_EXPORT FileFormat : public std::enable_shared_from_this

[GitHub] [arrow] jorisvandenbossche merged pull request #14052: ARROW-16728: [Python] Switch default and deprecate use_legacy_dataset=True in ParquetDataset

2022-12-23 Thread GitBox
jorisvandenbossche merged PR #14052: URL: https://github.com/apache/arrow/pull/14052 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #4711: Move the extract_join_keys to optimizer

2022-12-23 Thread GitBox
alamb commented on code in PR #4711: URL: https://github.com/apache/arrow-datafusion/pull/4711#discussion_r1056651204 ## datafusion/sql/src/planner.rs: ## @@ -806,7 +805,7 @@ impl<'a, S: ContextProvider> SqlToRel<'a, S> { ) -> Result { match constraint {

[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #4691: Unnecessary SortExec removal rule from Physical Plan

2022-12-23 Thread GitBox
alamb commented on code in PR #4691: URL: https://github.com/apache/arrow-datafusion/pull/4691#discussion_r1056647558 ## datafusion/core/tests/sql/window.rs: ## @@ -1748,17 +1748,20 @@ async fn test_window_partition_by_order_by() -> Result<()> { let msg = format!("Creati

[GitHub] [arrow-datafusion] ursabot commented on pull request #4717: Minor: refactor streaming CSV inference code

2022-12-23 Thread GitBox
ursabot commented on PR #4717: URL: https://github.com/apache/arrow-datafusion/pull/4717#issuecomment-1364332276 Benchmark runs are scheduled for baseline = af9cd58751288f495b47f0585c6fa182d270e11d and contender = 2f5b25d8aa8089232f395aa9a7ac15c715eaaa83. 2f5b25d8aa8089232f395aa9a7ac15c71

[GitHub] [arrow-datafusion] alamb commented on pull request #4691: Unnecessary SortExec removal rule from Physical Plan

2022-12-23 Thread GitBox
alamb commented on PR #4691: URL: https://github.com/apache/arrow-datafusion/pull/4691#issuecomment-1364331895 > In the physical plan level, I think there is no way to be sure whether SortExec below ProjectionExec is enforced or not. I agree -- we would have to add some way to distin

[GitHub] [arrow-datafusion] alamb merged pull request #4717: Minor: refactor streaming CSV inference code

2022-12-23 Thread GitBox
alamb merged PR #4717: URL: https://github.com/apache/arrow-datafusion/pull/4717 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arro

[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #4721: Non-deprecated support for planning SQL without DDL, deprecate some more SessionContext methods

2022-12-23 Thread GitBox
alamb commented on code in PR #4721: URL: https://github.com/apache/arrow-datafusion/pull/4721#discussion_r1056643369 ## datafusion/core/src/execution/context.rs: ## @@ -492,6 +497,15 @@ impl SessionContext { } } +/// Creates a [`DataFrame`] that will execute

[GitHub] [arrow] github-actions[bot] commented on pull request #15075: GH-15040: [C++] Improve pkg-config support for ARROW_BUILD_SHARED=OFF

2022-12-23 Thread GitBox
github-actions[bot] commented on PR #15075: URL: https://github.com/apache/arrow/pull/15075#issuecomment-1364324222 Revision: fd60bc7b5689ffa1df5af26bafc442662f67d4bf Submitted crossbow builds: [ursacomputing/crossbow @ actions-46b65e2722](https://github.com/ursacomputing/crossbow/bra

[GitHub] [arrow] kou commented on pull request #15075: GH-15040: [C++] Improve pkg-config support for ARROW_BUILD_SHARED=OFF

2022-12-23 Thread GitBox
kou commented on PR #15075: URL: https://github.com/apache/arrow/pull/15075#issuecomment-1364323436 @github-actions crossbow submit r-binary-packages -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [arrow] rtpsw commented on a diff in pull request #14352: ARROW-17642: [C++] Add ordered aggregation

2022-12-23 Thread GitBox
rtpsw commented on code in PR #14352: URL: https://github.com/apache/arrow/pull/14352#discussion_r1056638346 ## cpp/src/arrow/compute/row/grouper.h: ## @@ -39,10 +77,19 @@ class ARROW_EXPORT Grouper { static Result> Make(const std::vector& key_types,

[GitHub] [arrow-datafusion] ozankabak commented on a diff in pull request #4691: Unnecessary SortExec removal rule from Physical Plan

2022-12-23 Thread GitBox
ozankabak commented on code in PR #4691: URL: https://github.com/apache/arrow-datafusion/pull/4691#discussion_r1056439006 ## datafusion/core/src/physical_optimizer/remove_unnecessary_sorts.rs: ## @@ -0,0 +1,887 @@ +// Licensed to the Apache Software Foundation (ASF) under one +/

[GitHub] [arrow-datafusion] ursabot commented on pull request #4719: Add TPC-DS query planning regression tests

2022-12-23 Thread GitBox
ursabot commented on PR #4719: URL: https://github.com/apache/arrow-datafusion/pull/4719#issuecomment-1364287398 Benchmark runs are scheduled for baseline = 720bdb042af0bba4a0da2dc9e44551f934853011 and contender = af9cd58751288f495b47f0585c6fa182d270e11d. af9cd58751288f495b47f0585c6fa182d

[GitHub] [arrow-datafusion] Dandandan closed issue #4718: Add regression tests for planning TPC-DS queries

2022-12-23 Thread GitBox
Dandandan closed issue #4718: Add regression tests for planning TPC-DS queries URL: https://github.com/apache/arrow-datafusion/issues/4718 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [arrow-datafusion] Dandandan merged pull request #4719: Add TPC-DS query planning regression tests

2022-12-23 Thread GitBox
Dandandan merged PR #4719: URL: https://github.com/apache/arrow-datafusion/pull/4719 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

[GitHub] [arrow-datafusion] Dandandan commented on pull request #4719: Add TPC-DS query planning regression tests

2022-12-23 Thread GitBox
Dandandan commented on PR #4719: URL: https://github.com/apache/arrow-datafusion/pull/4719#issuecomment-1364269053 Nice! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [arrow-datafusion] alamb opened a new pull request, #4721: Non-deprecated support for planning SQL without DDL

2022-12-23 Thread GitBox
alamb opened a new pull request, #4721: URL: https://github.com/apache/arrow-datafusion/pull/4721 # Which issue does this PR close? Closes https://github.com/apache/arrow-datafusion/issues/4720 # Rationale for this change More details on ticket https://github.com/apache/arrow-dat

[GitHub] [arrow-datafusion] alamb opened a new issue, #4720: Non-deprecated support for planning SQL without DDL

2022-12-23 Thread GitBox
alamb opened a new issue, #4720: URL: https://github.com/apache/arrow-datafusion/issues/4720 **Is your feature request related to a problem or challenge? Please describe what you are trying to do.** For IOx, it is very important to use DataFusion in read-only mode -- we don't allow users

[GitHub] [arrow-rs] viirya commented on a diff in pull request #3388: Make FlightSQL Support HTTPs

2022-12-23 Thread GitBox
viirya commented on code in PR #3388: URL: https://github.com/apache/arrow-rs/pull/3388#discussion_r1056575478 ## arrow-flight/src/sql/client.rs: ## @@ -71,6 +74,43 @@ impl FlightSqlServiceClient { .http2_keep_alive_interval(Duration::from_secs(300)) .k

[GitHub] [arrow-datafusion] alamb commented on pull request #4694: Support for executing infinite files and boundedness-aware join reordering rule

2022-12-23 Thread GitBox
alamb commented on PR #4694: URL: https://github.com/apache/arrow-datafusion/pull/4694#issuecomment-1364173056 Thank you for this PR -- I hope to find time to review it in the next day or two -- This is an automated message from the Apache Git Service. To respond to the message, please l

[GitHub] [arrow-datafusion] andygrove opened a new pull request, #4719: Add TPC-DS logical query planning regression tests

2022-12-23 Thread GitBox
andygrove opened a new pull request, #4719: URL: https://github.com/apache/arrow-datafusion/pull/4719 # Which issue does this PR close? Closes https://github.com/apache/arrow-datafusion/issues/4718 # Rationale for this change Prevent regressions in supporting

[GitHub] [arrow-datafusion] andygrove opened a new issue, #4718: Add regression tests for planning TPC-DS queries

2022-12-23 Thread GitBox
andygrove opened a new issue, #4718: URL: https://github.com/apache/arrow-datafusion/issues/4718 **Is your feature request related to a problem or challenge? Please describe what you are trying to do.** I would like DataFusion to support creating valid logical plans for TPC-DS queries an

[GitHub] [arrow] ursabot commented on pull request #14839: ARROW-18363: [Docs] Include warning when viewing old docs (redirecting to stable/dev docs)

2022-12-23 Thread GitBox
ursabot commented on PR #14839: URL: https://github.com/apache/arrow/pull/14839#issuecomment-1364106123 Benchmark runs are scheduled for baseline = 387e95ad575fd158bb2758e97800716d3976fce2 and contender = 305026f62c172acfe0ed6549288e209358247dda. 305026f62c172acfe0ed6549288e209358247dda is

[GitHub] [arrow-adbc] lidavidm commented on pull request #260: ci: disable Linux packaging builds for RCs for now

2022-12-23 Thread GitBox
lidavidm commented on PR #260: URL: https://github.com/apache/arrow-adbc/pull/260#issuecomment-1364097042 Kou: maybe the 'right' thing will be to set VERSION in packaging.yml if we are building for an RC, and switch up the instructions so that you don't push the tag until after 02-source.sh

[GitHub] [arrow-adbc] lidavidm opened a new pull request, #260: ci: disable Linux packaging builds for RCs for now

2022-12-23 Thread GitBox
lidavidm opened a new pull request, #260: URL: https://github.com/apache/arrow-adbc/pull/260 Workaround for #259. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubsc

[GitHub] [arrow-adbc] lidavidm commented on issue #259: [Release] Linux packaging does not work in RC builds

2022-12-23 Thread GitBox
lidavidm commented on issue #259: URL: https://github.com/apache/arrow-adbc/issues/259#issuecomment-1364093715 Possibly we should just set VERSION appropriately in the CI flow to get it to download the right sources, but for now, I'll disable the task when running for release. -- This is

[GitHub] [arrow] westonpace commented on pull request #14867: GH-14866: [C++] Remove internal GroupBy implementation

2022-12-23 Thread GitBox
westonpace commented on PR #14867: URL: https://github.com/apache/arrow/pull/14867#issuecomment-1364092779 I'm going to rebase this and address the problem Joris raised. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [arrow] westonpace commented on pull request #14867: GH-14866: [C++] Remove internal GroupBy implementation

2022-12-23 Thread GitBox
westonpace commented on PR #14867: URL: https://github.com/apache/arrow/pull/14867#issuecomment-1364092173 @rtpsw https://github.com/apache/arrow/commit/0f2b458ec45c851707bf8f6f93e0e703af359f26 is an example of layering this PR on top of your ordered groupby changes. I couldn't get the ag

[GitHub] [arrow] westonpace commented on a diff in pull request #14352: ARROW-17642: [C++] Add ordered aggregation

2022-12-23 Thread GitBox
westonpace commented on code in PR #14352: URL: https://github.com/apache/arrow/pull/14352#discussion_r1056079656 ## cpp/src/arrow/compute/exec.h: ## @@ -180,7 +180,7 @@ struct ARROW_EXPORT ExecBatch { explicit ExecBatch(const RecordBatch& batch); - static Result Make(st

[GitHub] [arrow-adbc] lidavidm merged pull request #258: fix(python): make package names consistent

2022-12-23 Thread GitBox
lidavidm merged PR #258: URL: https://github.com/apache/arrow-adbc/pull/258 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apa

[GitHub] [arrow-datafusion] ozankabak commented on a diff in pull request #4691: Unnecessary SortExec removal rule from Physical Plan

2022-12-23 Thread GitBox
ozankabak commented on code in PR #4691: URL: https://github.com/apache/arrow-datafusion/pull/4691#discussion_r1056439006 ## datafusion/core/src/physical_optimizer/remove_unnecessary_sorts.rs: ## @@ -0,0 +1,887 @@ +// Licensed to the Apache Software Foundation (ASF) under one +/

[GitHub] [arrow-datafusion] andygrove commented on pull request #4620: Implement optimizer rule for reordering fact-dimension joins

2022-12-23 Thread GitBox
andygrove commented on PR #4620: URL: https://github.com/apache/arrow-datafusion/pull/4620#issuecomment-1364058815 > I have theories as to why it is not yet solved but they are too lengthy to type up here I would love to hear more about this someday. -- This is an automated messag

[GitHub] [arrow] lidavidm commented on issue #15069: PyArrow Flight DoAction does not return results as available

2022-12-23 Thread GitBox
lidavidm commented on issue #15069: URL: https://github.com/apache/arrow/issues/15069#issuecomment-1364026698 I think it just needs some Cython code. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [arrow-julia] jrevels commented on issue #367: inappropriately applied `Arrow.NullVector` optimization to `Union{ZonedDateTime,Missing}` column

2022-12-23 Thread GitBox
jrevels commented on issue #367: URL: https://github.com/apache/arrow-julia/issues/367#issuecomment-1364004835 Hmm just judging from this line, it looks like timezone is part of the `Timestamp` type: ``` ArrowTypes.toarrow(x::ZonedDateTime) = convert(Timestamp{Meta.TimeUnits.MILLI

[GitHub] [arrow-julia] jrevels commented on issue #367: inappropriately applied `Arrow.NullVector` optimization to `Union{ZonedDateTime,Missing}` column

2022-12-23 Thread GitBox
jrevels commented on issue #367: URL: https://github.com/apache/arrow-julia/issues/367#issuecomment-1363992349 Jumping to https://github.com/apache/arrow-julia/blob/0c4793871d911e185cb6a9603e577a1f52f52a22/src/ArrowTypes/src/ArrowTypes.jl#L333-L349 we can evaluate this in the REPL pre

[GitHub] [arrow] rok commented on issue #14923: [C++][Parquet] DeltaBitPackDecoder expects all miniblock bitwidths to be present for the last block

2022-12-23 Thread GitBox
rok commented on issue #14923: URL: https://github.com/apache/arrow/issues/14923#issuecomment-1363989554 Hey @mapleFU. I'm on Ryzen 7 `Linux desko 5.15.0-56-generic #62-Ubuntu SMP Tue Nov 22 19:54:14 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux`. If you just apply the change below and run

[GitHub] [arrow-datafusion] ursabot commented on pull request #4661: Stream CSV file during schema inference

2022-12-23 Thread GitBox
ursabot commented on PR #4661: URL: https://github.com/apache/arrow-datafusion/pull/4661#issuecomment-1363985499 Benchmark runs are scheduled for baseline = 6a4e0df8a616521e486f5374953940f1a7366efe and contender = 720bdb042af0bba4a0da2dc9e44551f934853011. 720bdb042af0bba4a0da2dc9e44551f93

[GitHub] [arrow-datafusion] mustafasrepo commented on a diff in pull request #4691: Unnecessary SortExec removal rule from Physical Plan

2022-12-23 Thread GitBox
mustafasrepo commented on code in PR #4691: URL: https://github.com/apache/arrow-datafusion/pull/4691#discussion_r1056369324 ## datafusion/core/src/physical_optimizer/remove_unnecessary_sorts.rs: ## @@ -0,0 +1,887 @@ +// Licensed to the Apache Software Foundation (ASF) under one

[GitHub] [arrow-datafusion] alamb commented on pull request #4661: Stream CSV file during schema inference

2022-12-23 Thread GitBox
alamb commented on PR #4661: URL: https://github.com/apache/arrow-datafusion/pull/4661#issuecomment-1363983534 Here is a follow on PR to reduce the nesting: https://github.com/apache/arrow-datafusion/pull/4717 -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [arrow-datafusion] alamb opened a new pull request, #4717: Minor: refactor streaming CSV inference code

2022-12-23 Thread GitBox
alamb opened a new pull request, #4717: URL: https://github.com/apache/arrow-datafusion/pull/4717 # Which issue does this PR close? re https://github.com/apache/arrow-datafusion/issues/3658 # Rationale for this change I spent some time messing with this code as part of re

[GitHub] [arrow-julia] jrevels commented on issue #367: inappropriately applied `Arrow.NullVector` optimization to `Union{ZonedDateTime,Missing}` column

2022-12-23 Thread GitBox
jrevels commented on issue #367: URL: https://github.com/apache/arrow-julia/issues/367#issuecomment-1363982436 narrowing down to the problem point: we can drill down `Arrow.Table(Arrow.tobuffer(t))` to a call to `toarrowvector(t.x)`, which then results in an invocation of `arrowvector

[GitHub] [arrow-datafusion] alamb closed issue #3658: CSV inference reads in the whole file to memory, regardless of row limit

2022-12-23 Thread GitBox
alamb closed issue #3658: CSV inference reads in the whole file to memory, regardless of row limit URL: https://github.com/apache/arrow-datafusion/issues/3658 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [arrow-datafusion] alamb merged pull request #4661: Stream CSV file during schema inference

2022-12-23 Thread GitBox
alamb merged PR #4661: URL: https://github.com/apache/arrow-datafusion/pull/4661 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arro

[GitHub] [arrow] ursabot commented on pull request #14976: GH-14975: [Python] Dataset.sort_by

2022-12-23 Thread GitBox
ursabot commented on PR #14976: URL: https://github.com/apache/arrow/pull/14976#issuecomment-1363977634 Benchmark runs are scheduled for baseline = 0f5b8dda5fbc3b7e384a549075e402df93eb602e and contender = 387e95ad575fd158bb2758e97800716d3976fce2. 387e95ad575fd158bb2758e97800716d3976fce2 is

[GitHub] [arrow] github-actions[bot] commented on pull request #15077: GH-14907: [R] right_join() function does not produce the expected outcome

2022-12-23 Thread GitBox
github-actions[bot] commented on PR #15077: URL: https://github.com/apache/arrow/pull/15077#issuecomment-1363976730 * Closes: #14907 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

  1   2   >