[GitHub] [spark] viirya commented on a diff in pull request #37129: [SPARK-39710][SQL] Support push local topK through outer join

2022-07-11 Thread GitBox
viirya commented on code in PR #37129: URL: https://github.com/apache/spark/pull/37129#discussion_r918563482 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/PushLocalTopKThroughOuterJoin.scala: ## @@ -0,0 +1,118 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] viirya commented on a diff in pull request #37129: [SPARK-39710][SQL] Support push local topK through outer join

2022-07-11 Thread GitBox
viirya commented on code in PR #37129: URL: https://github.com/apache/spark/pull/37129#discussion_r918561846 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/PushLocalTopKThroughOuterJoin.scala: ## @@ -0,0 +1,118 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] viirya commented on a diff in pull request #37129: [SPARK-39710][SQL] Support push local topK through outer join

2022-07-11 Thread GitBox
viirya commented on code in PR #37129: URL: https://github.com/apache/spark/pull/37129#discussion_r918560968 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/PushLocalTopKThroughOuterJoin.scala: ## @@ -0,0 +1,118 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] viirya commented on a diff in pull request #37129: [SPARK-39710][SQL] Support push local topK through outer join

2022-07-11 Thread GitBox
viirya commented on code in PR #37129: URL: https://github.com/apache/spark/pull/37129#discussion_r918559156 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -412,6 +412,14 @@ object SQLConf { .longConf

[GitHub] [spark] mridulm commented on pull request #37052: [SPARK-39647][CORE] Register the executor with ESS before registering the BlockManager

2022-07-11 Thread GitBox
mridulm commented on PR #37052: URL: https://github.com/apache/spark/pull/37052#issuecomment-1181327891 Merged to master and branch-3.3 Thanks for fixing this @otterc ! Thanks for reviewing @attilapiros, @zhouyejoe, @Ngone51 and @weixiuli :-) -- This is an automated message from the

[GitHub] [spark] mridulm closed pull request #37052: [SPARK-39647][CORE] Register the executor with ESS before registering the BlockManager

2022-07-11 Thread GitBox
mridulm closed pull request #37052: [SPARK-39647][CORE] Register the executor with ESS before registering the BlockManager URL: https://github.com/apache/spark/pull/37052 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] mridulm commented on pull request #37052: [SPARK-39647][CORE] Register the executor with ESS before registering the BlockManager

2022-07-11 Thread GitBox
mridulm commented on PR #37052: URL: https://github.com/apache/spark/pull/37052#issuecomment-1181326481 Merging to master and branch-3.3 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] cloud-fan commented on a diff in pull request #37040: [SPARK-39651][SQL] Prune filter condition if compare with rand is deterministic

2022-07-11 Thread GitBox
cloud-fan commented on code in PR #37040: URL: https://github.com/apache/spark/pull/37040#discussion_r918550920 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/OptimizeRand.scala: ## @@ -0,0 +1,55 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] [spark] cloud-fan commented on a diff in pull request #37116: [SPARK-39707][SQL][DOCS] Add SQL reference for aggregate functions

2022-07-11 Thread GitBox
cloud-fan commented on code in PR #37116: URL: https://github.com/apache/spark/pull/37116#discussion_r918550376 ## docs/sql-ref-syntax-qry-select-aggregate.md: ## @@ -0,0 +1,145 @@ +--- +layout: global +title: Aggregate Functions +displayTitle: Aggregate Functions +license: | +

[GitHub] [spark] cloud-fan commented on a diff in pull request #37116: [SPARK-39707][SQL][DOCS] Add SQL reference for aggregate functions

2022-07-11 Thread GitBox
cloud-fan commented on code in PR #37116: URL: https://github.com/apache/spark/pull/37116#discussion_r918549516 ## docs/sql-ref-syntax-qry-select-aggregate.md: ## @@ -0,0 +1,145 @@ +--- +layout: global +title: Aggregate Functions +displayTitle: Aggregate Functions +license: | +

[GitHub] [spark] cloud-fan commented on a diff in pull request #37116: [SPARK-39707][SQL][DOCS] Add SQL reference for aggregate functions

2022-07-11 Thread GitBox
cloud-fan commented on code in PR #37116: URL: https://github.com/apache/spark/pull/37116#discussion_r918549516 ## docs/sql-ref-syntax-qry-select-aggregate.md: ## @@ -0,0 +1,145 @@ +--- +layout: global +title: Aggregate Functions +displayTitle: Aggregate Functions +license: | +

[GitHub] [spark] singhpk234 commented on a diff in pull request #37083: [SPARK-39678][SQL] Improve stats estimation for v2 tables

2022-07-11 Thread GitBox
singhpk234 commented on code in PR #37083: URL: https://github.com/apache/spark/pull/37083#discussion_r918534436 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/BasicStatsPlanVisitor.scala: ## @@ -17,16 +17,40 @@ package

[GitHub] [spark] singhpk234 commented on a diff in pull request #37083: [SPARK-39678][SQL] Improve stats estimation for v2 tables

2022-07-11 Thread GitBox
singhpk234 commented on code in PR #37083: URL: https://github.com/apache/spark/pull/37083#discussion_r918544779 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/BasicStatsPlanVisitor.scala: ## @@ -17,16 +17,40 @@ package

[GitHub] [spark] singhpk234 commented on a diff in pull request #37083: [SPARK-39678][SQL] Improve stats estimation for v2 tables

2022-07-11 Thread GitBox
singhpk234 commented on code in PR #37083: URL: https://github.com/apache/spark/pull/37083#discussion_r918534436 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/BasicStatsPlanVisitor.scala: ## @@ -17,16 +17,40 @@ package

[GitHub] [spark] AmplabJenkins commented on pull request #37147: [SPARK-39731][SQL] Fix issue in CSV data source when parsing dates in "yyyyMMdd" format with CORRECTED time parser policy

2022-07-11 Thread GitBox
AmplabJenkins commented on PR #37147: URL: https://github.com/apache/spark/pull/37147#issuecomment-1181307870 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] singhpk234 commented on a diff in pull request #37083: [SPARK-39678][SQL] Improve stats estimation for v2 tables

2022-07-11 Thread GitBox
singhpk234 commented on code in PR #37083: URL: https://github.com/apache/spark/pull/37083#discussion_r918534436 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/BasicStatsPlanVisitor.scala: ## @@ -17,16 +17,40 @@ package

[GitHub] [spark] singhpk234 commented on a diff in pull request #37083: [SPARK-39678][SQL] Improve stats estimation for v2 tables

2022-07-11 Thread GitBox
singhpk234 commented on code in PR #37083: URL: https://github.com/apache/spark/pull/37083#discussion_r918534436 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/BasicStatsPlanVisitor.scala: ## @@ -17,16 +17,40 @@ package

[GitHub] [spark] beliefer commented on a diff in pull request #37116: [SPARK-39707][SQL][DOCS] Add SQL reference for aggregate functions

2022-07-11 Thread GitBox
beliefer commented on code in PR #37116: URL: https://github.com/apache/spark/pull/37116#discussion_r918523599 ## docs/sql-ref-syntax-qry-select-aggregate.md: ## @@ -0,0 +1,125 @@ +--- +layout: global +title: Aggregate Functions +displayTitle: Aggregate Functions +license: | +

[GitHub] [spark] beliefer commented on a diff in pull request #37116: [SPARK-39707][SQL][DOCS] Add SQL reference for aggregate functions

2022-07-11 Thread GitBox
beliefer commented on code in PR #37116: URL: https://github.com/apache/spark/pull/37116#discussion_r918520601 ## docs/sql-ref-syntax-qry-select-aggregate.md: ## @@ -0,0 +1,125 @@ +--- +layout: global +title: Aggregate Functions +displayTitle: Aggregate Functions +license: | +

[GitHub] [spark] Yikun commented on a diff in pull request #37117: [WIP][SPARK-39714][PYTHON] Try to fix the mypy annotation tests

2022-07-11 Thread GitBox
Yikun commented on code in PR #37117: URL: https://github.com/apache/spark/pull/37117#discussion_r918485337 ## python/pyspark/ml/util.py: ## @@ -536,10 +536,8 @@ def __get_class(clazz: str) -> Type[RL]: """ parts = clazz.split(".") module =

[GitHub] [spark] beliefer commented on a diff in pull request #37040: [SPARK-39651][SQL] Prune filter condition if compare with rand is deterministic

2022-07-11 Thread GitBox
beliefer commented on code in PR #37040: URL: https://github.com/apache/spark/pull/37040#discussion_r918512433 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/OptimizeRand.scala: ## @@ -0,0 +1,55 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

[GitHub] [spark] AngersZhuuuu commented on pull request #37162: [SPARK-38910][YARN][FOLLOWUP] Clean spark staging before unregister

2022-07-11 Thread GitBox
AngersZh commented on PR #37162: URL: https://github.com/apache/spark/pull/37162#issuecomment-1181266134 waiting for @tgravescs back and review this -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] AngersZhuuuu opened a new pull request, #37162: [SPARK-38910][YARN][FOLLOWUP] Clean spark staging before unregister

2022-07-11 Thread GitBox
AngersZh opened a new pull request, #37162: URL: https://github.com/apache/spark/pull/37162 ### What changes were proposed in this pull request? After discussing about https://github.com/apache/spark/pull/36207 and re-check the whole logic, we should revert

[GitHub] [spark] cloud-fan commented on a diff in pull request #37116: [SPARK-39707][SQL][DOCS] Add SQL reference for aggregate functions

2022-07-11 Thread GitBox
cloud-fan commented on code in PR #37116: URL: https://github.com/apache/spark/pull/37116#discussion_r918509777 ## docs/sql-ref-syntax-qry-select-aggregate.md: ## @@ -0,0 +1,125 @@ +--- +layout: global +title: Aggregate Functions +displayTitle: Aggregate Functions +license: | +

[GitHub] [spark] cloud-fan commented on a diff in pull request #37116: [SPARK-39707][SQL][DOCS] Add SQL reference for aggregate functions

2022-07-11 Thread GitBox
cloud-fan commented on code in PR #37116: URL: https://github.com/apache/spark/pull/37116#discussion_r918508889 ## docs/sql-ref-syntax-qry-select-aggregate.md: ## @@ -0,0 +1,125 @@ +--- +layout: global +title: Aggregate Functions +displayTitle: Aggregate Functions +license: | +

[GitHub] [spark] cloud-fan commented on a diff in pull request #37116: [SPARK-39707][SQL][DOCS] Add SQL reference for aggregate functions

2022-07-11 Thread GitBox
cloud-fan commented on code in PR #37116: URL: https://github.com/apache/spark/pull/37116#discussion_r918508166 ## docs/sql-ref-syntax-qry-select-aggregate.md: ## @@ -0,0 +1,125 @@ +--- +layout: global +title: Aggregate Functions +displayTitle: Aggregate Functions +license: | +

[GitHub] [spark] zhengruifeng commented on pull request #37135: [SPARK-39723][R] Implement functionExists/getFunc in SparkR for 3L namespace

2022-07-11 Thread GitBox
zhengruifeng commented on PR #37135: URL: https://github.com/apache/spark/pull/37135#issuecomment-1181262398 Merged to master, thanks @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] AmplabJenkins commented on pull request #37153: [SPARK-26052] Add type comments to exposed Prometheus metrics

2022-07-11 Thread GitBox
AmplabJenkins commented on PR #37153: URL: https://github.com/apache/spark/pull/37153#issuecomment-1181262208 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] zhengruifeng closed pull request #37135: [SPARK-39723][R] Implement functionExists/getFunc in SparkR for 3L namespace

2022-07-11 Thread GitBox
zhengruifeng closed pull request #37135: [SPARK-39723][R] Implement functionExists/getFunc in SparkR for 3L namespace URL: https://github.com/apache/spark/pull/37135 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] cloud-fan commented on a diff in pull request #37040: [SPARK-39651][SQL] Prune filter condition if compare with rand is deterministic

2022-07-11 Thread GitBox
cloud-fan commented on code in PR #37040: URL: https://github.com/apache/spark/pull/37040#discussion_r918506608 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/PruneFiltersSuite.scala: ## @@ -129,9 +129,11 @@ class PruneFiltersSuite extends PlanTest {

[GitHub] [spark] cloud-fan commented on a diff in pull request #37040: [SPARK-39651][SQL] Prune filter condition if compare with rand is deterministic

2022-07-11 Thread GitBox
cloud-fan commented on code in PR #37040: URL: https://github.com/apache/spark/pull/37040#discussion_r918506272 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/OptimizeRand.scala: ## @@ -0,0 +1,55 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] [spark] cloud-fan commented on a diff in pull request #37040: [SPARK-39651][SQL] Prune filter condition if compare with rand is deterministic

2022-07-11 Thread GitBox
cloud-fan commented on code in PR #37040: URL: https://github.com/apache/spark/pull/37040#discussion_r918506022 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/OptimizeRand.scala: ## @@ -0,0 +1,55 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] [spark] HeartSaVioR commented on pull request #37161: [SPARK-39748][SQL][SS] Include the origin logical plan for LogicalRDD if it comes from DataFrame

2022-07-11 Thread GitBox
HeartSaVioR commented on PR #37161: URL: https://github.com/apache/spark/pull/37161#issuecomment-1181260007 cc. @cloud-fan @viirya Please take a look. Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] beliefer commented on pull request #37116: [SPARK-39707][SQL][DOCS] Add SQL reference for aggregate functions

2022-07-11 Thread GitBox
beliefer commented on PR #37116: URL: https://github.com/apache/spark/pull/37116#issuecomment-1181259930 ping @cloud-fan https://github.com/apache/spark/pull/37150 merged, please review this PR again. -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [spark] HeartSaVioR opened a new pull request, #37161: [SPARK-39748][SQL][SS] Include the origin logical plan for LogicalRDD if it comes from DataFrame

2022-07-11 Thread GitBox
HeartSaVioR opened a new pull request, #37161: URL: https://github.com/apache/spark/pull/37161 ### What changes were proposed in this pull request? This PR proposes to include the origin logical plan for LogicalRDD, if the LogicalRDD is built from DataFrame's RDD. Once the origin

[GitHub] [spark] AngersZhuuuu commented on pull request #36207: [SPARK-38910][YARN] Clean spark staging before `unregister`

2022-07-11 Thread GitBox
AngersZh commented on PR #36207: URL: https://github.com/apache/spark/pull/36207#issuecomment-1181258817 @tgravescs After re-check the whole logic, I got your point, although here said we don't need to rerun, but if it reregister failed, and it's not the last attempt, yarn will rerun

[GitHub] [spark] beliefer commented on pull request #37150: [SPARK-39737][SQL] `PERCENTILE_CONT` and `PERCENTILE_DISC` should support aggregate filter

2022-07-11 Thread GitBox
beliefer commented on PR #37150: URL: https://github.com/apache/spark/pull/37150#issuecomment-1181258765 @cloud-fan Thank you ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] gengliangwang commented on pull request #37160: [SPARK-39749][SQL] Use plain string representation on casting Decimal to String

2022-07-11 Thread GitBox
gengliangwang commented on PR #37160: URL: https://github.com/apache/spark/pull/37160#issuecomment-1181252406 cc @timarmstrong @entong @cloud-fan @srielau -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] gengliangwang opened a new pull request, #37160: [SPARK-39749][SQL] Use plain string representation on casting Decimal to String

2022-07-11 Thread GitBox
gengliangwang opened a new pull request, #37160: URL: https://github.com/apache/spark/pull/37160 ### What changes were proposed in this pull request? Currently, casting decimal as string type will result in Strings with exponential notations if the adjusted exponent is less

[GitHub] [spark] huaxingao commented on pull request #37123: [SPARK-39711][TESTS] Remove redundant trait: BeforeAndAfterAll & BeforeAndAfterEach & Logging

2022-07-11 Thread GitBox
huaxingao commented on PR #37123: URL: https://github.com/apache/spark/pull/37123#issuecomment-1181244052 Merged to master. Thanks @panbingkun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] huaxingao closed pull request #37123: [SPARK-39711][TESTS] Remove redundant trait: BeforeAndAfterAll & BeforeAndAfterEach & Logging

2022-07-11 Thread GitBox
huaxingao closed pull request #37123: [SPARK-39711][TESTS] Remove redundant trait: BeforeAndAfterAll & BeforeAndAfterEach & Logging URL: https://github.com/apache/spark/pull/37123 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] cloud-fan closed pull request #37150: [SPARK-39737][SQL] `PERCENTILE_CONT` and `PERCENTILE_DISC` should support aggregate filter

2022-07-11 Thread GitBox
cloud-fan closed pull request #37150: [SPARK-39737][SQL] `PERCENTILE_CONT` and `PERCENTILE_DISC` should support aggregate filter URL: https://github.com/apache/spark/pull/37150 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] cloud-fan commented on pull request #37150: [SPARK-39737][SQL] `PERCENTILE_CONT` and `PERCENTILE_DISC` should support aggregate filter

2022-07-11 Thread GitBox
cloud-fan commented on PR #37150: URL: https://github.com/apache/spark/pull/37150#issuecomment-1181237260 thanks, merging to maseter! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] Yikun commented on a diff in pull request #37117: [WIP][SPARK-39714][PYTHON] Try to fix the mypy annotation tests

2022-07-11 Thread GitBox
Yikun commented on code in PR #37117: URL: https://github.com/apache/spark/pull/37117#discussion_r918485337 ## python/pyspark/ml/util.py: ## @@ -536,10 +536,8 @@ def __get_class(clazz: str) -> Type[RL]: """ parts = clazz.split(".") module =

[GitHub] [spark] cloud-fan commented on a diff in pull request #36871: [SPARK-39469][SQL] Infer date type for CSV schema inference

2022-07-11 Thread GitBox
cloud-fan commented on code in PR #36871: URL: https://github.com/apache/spark/pull/36871#discussion_r918484209 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVOptions.scala: ## @@ -148,7 +148,28 @@ class CSVOptions( // A language tag in IETF BCP 47

[GitHub] [spark] cloud-fan commented on pull request #37147: [SPARK-39731][SQL] Fix issue in CSV data source when parsing dates in "yyyyMMdd" format with CORRECTED time parser policy

2022-07-11 Thread GitBox
cloud-fan commented on PR #37147: URL: https://github.com/apache/spark/pull/37147#issuecomment-1181227634 If the legacy behavior is unreasonable, I think we don't have to keep it. If datetime patten is specified, we should not fall back to the legacy code path, even if it only supports 4

[GitHub] [spark] beliefer commented on a diff in pull request #37001: [SPARK-39148][SQL] DS V2 aggregate push down can work with OFFSET or LIMIT

2022-07-11 Thread GitBox
beliefer commented on code in PR #37001: URL: https://github.com/apache/spark/pull/37001#discussion_r918480531 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2ScanRelationPushDown.scala: ## @@ -165,106 +169,63 @@ object V2ScanRelationPushDown extends

[GitHub] [spark] AmplabJenkins commented on pull request #37156: [SPARK-39742][core]Fix a problem with the result of adjusting resources is not exp…

2022-07-11 Thread GitBox
AmplabJenkins commented on PR #37156: URL: https://github.com/apache/spark/pull/37156#issuecomment-1181212204 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] beliefer commented on a diff in pull request #37150: [SPARK-39737][SQL] `PERCENTILE_CONT` and `PERCENTILE_DISC` should support aggregate filter

2022-07-11 Thread GitBox
beliefer commented on code in PR #37150: URL: https://github.com/apache/spark/pull/37150#discussion_r918466540 ## sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4: ## @@ -849,7 +849,8 @@ primaryExpression | OVERLAY LEFT_PAREN

[GitHub] [spark] sadikovi commented on pull request #37147: [SPARK-39731][SQL] Fix issue in CSV data source when parsing dates in "yyyyMMdd" format with CORRECTED time parser policy

2022-07-11 Thread GitBox
sadikovi commented on PR #37147: URL: https://github.com/apache/spark/pull/37147#issuecomment-1181198109 Thanks for the reviews. I will address the comments and failing tests and update the PR. My question was whether there are any concerns with this change and whether users might

[GitHub] [spark] HyukjinKwon closed pull request #37158: [SPARK-39736][INFRA] Enable base image build in SparkR job

2022-07-11 Thread GitBox
HyukjinKwon closed pull request #37158: [SPARK-39736][INFRA] Enable base image build in SparkR job URL: https://github.com/apache/spark/pull/37158 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] HyukjinKwon commented on pull request #37158: [SPARK-39736][INFRA] Enable base image build in SparkR job

2022-07-11 Thread GitBox
HyukjinKwon commented on PR #37158: URL: https://github.com/apache/spark/pull/37158#issuecomment-1181176377 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #37135: [SPARK-39723][R] Implement functionExists/getFunc in SparkR for 3L namespace

2022-07-11 Thread GitBox
HyukjinKwon commented on code in PR #37135: URL: https://github.com/apache/spark/pull/37135#discussion_r918448953 ## R/pkg/NAMESPACE: ## @@ -479,7 +479,9 @@ export("as.DataFrame", "databaseExists", "dropTempTable", "dropTempView", +

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #37135: [SPARK-39723][R] Implement functionExists/getFunc in SparkR for 3L namespace

2022-07-11 Thread GitBox
HyukjinKwon commented on code in PR #37135: URL: https://github.com/apache/spark/pull/37135#discussion_r918449024 ## R/pkg/pkgdown/_pkgdown_template.yml: ## @@ -266,7 +266,9 @@ reference: - databaseExists - dropTempTable - dropTempView + - functionExists -

[GitHub] [spark] Yikun commented on pull request #37158: [SPARK-39736][INFRA] Enable base image build in SparkR job

2022-07-11 Thread GitBox
Yikun commented on PR #37158: URL: https://github.com/apache/spark/pull/37158#issuecomment-1181166080 Ready to go -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #37135: [SPARK-39723][R] Implement functionExists/getFunction in SparkR for 3L namespace

2022-07-11 Thread GitBox
HyukjinKwon commented on code in PR #37135: URL: https://github.com/apache/spark/pull/37135#discussion_r918440696 ## R/pkg/tests/fulltests/test_context.R: ## @@ -21,10 +21,11 @@ test_that("Check masked functions", { # Check that we are not masking any new function from base,

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #36871: [SPARK-39469][SQL] Infer date type for CSV schema inference

2022-07-11 Thread GitBox
HyukjinKwon commented on code in PR #36871: URL: https://github.com/apache/spark/pull/36871#discussion_r918440378 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVOptions.scala: ## @@ -148,7 +148,28 @@ class CSVOptions( // A language tag in IETF BCP 47

[GitHub] [spark] zhengruifeng commented on a diff in pull request #37135: [SPARK-39723][R] Implement functionExists/getFunction in SparkR for 3L namespace

2022-07-11 Thread GitBox
zhengruifeng commented on code in PR #37135: URL: https://github.com/apache/spark/pull/37135#discussion_r918440246 ## R/pkg/tests/fulltests/test_context.R: ## @@ -21,10 +21,11 @@ test_that("Check masked functions", { # Check that we are not masking any new function from

[GitHub] [spark] HyukjinKwon closed pull request #37157: [MINOR][FOLLOWUP] Remove redundant return

2022-07-11 Thread GitBox
HyukjinKwon closed pull request #37157: [MINOR][FOLLOWUP] Remove redundant return URL: https://github.com/apache/spark/pull/37157 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] HyukjinKwon closed pull request #37159: [SPARK-38796][SQL][DOC][FOLLOWUP] Remove try_to_char reference in the doc

2022-07-11 Thread GitBox
HyukjinKwon closed pull request #37159: [SPARK-38796][SQL][DOC][FOLLOWUP] Remove try_to_char reference in the doc URL: https://github.com/apache/spark/pull/37159 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] HyukjinKwon commented on pull request #37159: [SPARK-38796][SQL][DOC][FOLLOWUP] Remove try_to_char reference in the doc

2022-07-11 Thread GitBox
HyukjinKwon commented on PR #37159: URL: https://github.com/apache/spark/pull/37159#issuecomment-1181096060 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] Jonathancui123 commented on a diff in pull request #37147: [SPARK-39731][SQL] Fix issue in CSV data source when parsing dates in "yyyyMMdd" format with CORRECTED time parser policy

2022-07-11 Thread GitBox
Jonathancui123 commented on code in PR #37147: URL: https://github.com/apache/spark/pull/37147#discussion_r918421341 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/UnivocityParser.scala: ## @@ -222,7 +226,11 @@ class UnivocityParser( } catch {

[GitHub] [spark] Jonathancui123 commented on a diff in pull request #37147: [SPARK-39731][SQL] Fix issue in CSV data source when parsing dates in "yyyyMMdd" format with CORRECTED time parser policy

2022-07-11 Thread GitBox
Jonathancui123 commented on code in PR #37147: URL: https://github.com/apache/spark/pull/37147#discussion_r918421341 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/UnivocityParser.scala: ## @@ -222,7 +226,11 @@ class UnivocityParser( } catch {

[GitHub] [spark] panbingkun commented on pull request #37157: [MINOR][FOLLOWUP] Remove redundant return

2022-07-11 Thread GitBox
panbingkun commented on PR #37157: URL: https://github.com/apache/spark/pull/37157#issuecomment-1181059198 yes,i check twice -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] c21 commented on a diff in pull request #37083: [SPARK-39678][SQL] Improve stats estimation for v2 tables

2022-07-11 Thread GitBox
c21 commented on code in PR #37083: URL: https://github.com/apache/spark/pull/37083#discussion_r918415377 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/AdvancedStatsPlanVisitor.scala: ## @@ -0,0 +1,90 @@ +/* + * Licensed to the

[GitHub] [spark] mridulm commented on a diff in pull request #35906: [SPARK-33236][shuffle] Enable Push-based shuffle service to store state in NM level DB for work preserving restart

2022-07-11 Thread GitBox
mridulm commented on code in PR #35906: URL: https://github.com/apache/spark/pull/35906#discussion_r918226879 ## common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java: ## @@ -343,15 +397,44 @@ void

[GitHub] [spark] HyukjinKwon commented on pull request #37157: [MINOR][FOLLOWUP] Remove redundant return

2022-07-11 Thread GitBox
HyukjinKwon commented on PR #37157: URL: https://github.com/apache/spark/pull/37157#issuecomment-1181000672 @panbingkun are they all to fix? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] huaxingao commented on a diff in pull request #37080: [SPARK-35208][SQL][DOCS] Add docs for LATERAL subqueries

2022-07-11 Thread GitBox
huaxingao commented on code in PR #37080: URL: https://github.com/apache/spark/pull/37080#discussion_r918358655 ## docs/sql-ref-syntax-qry-select-join.md: ## @@ -26,7 +26,7 @@ A SQL join is used to combine rows from two relations based on join criteria. Th ### Syntax

[GitHub] [spark] otterc commented on pull request #37052: [SPARK-39647][CORE] Register the executor with ESS before registering the BlockManager

2022-07-11 Thread GitBox
otterc commented on PR #37052: URL: https://github.com/apache/spark/pull/37052#issuecomment-1180723189 All the tests pass now @Ngone51 @mridulm -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] mridulm commented on pull request #36162: [SPARK-32170][CORE] Improve the speculation through the stage task metrics.

2022-07-11 Thread GitBox
mridulm commented on PR #36162: URL: https://github.com/apache/spark/pull/36162#issuecomment-1180706514 It helps in two cases @weixiuli - the example you gave (generated input (like range()), etc where there is no input metrics). It also helps when reading shuffle input where there is a

[GitHub] [spark] allisonwang-db commented on a diff in pull request #37080: [SPARK-35208][SQL][DOCS] Add docs for LATERAL subqueries

2022-07-11 Thread GitBox
allisonwang-db commented on code in PR #37080: URL: https://github.com/apache/spark/pull/37080#discussion_r918189602 ## docs/sql-ref-syntax-qry-select-join.md: ## @@ -26,7 +26,7 @@ A SQL join is used to combine rows from two relations based on join criteria. Th ### Syntax

[GitHub] [spark] zhouyejoe commented on a diff in pull request #35906: [SPARK-33236][shuffle] Enable Push-based shuffle service to store state in NM level DB for work preserving restart

2022-07-11 Thread GitBox
zhouyejoe commented on code in PR #35906: URL: https://github.com/apache/spark/pull/35906#discussion_r918214646 ## common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java: ## @@ -317,22 +353,24 @@ public void applicationRemoved(String

[GitHub] [spark] mridulm commented on a diff in pull request #35906: [SPARK-33236][shuffle] Enable Push-based shuffle service to store state in NM level DB for work preserving restart

2022-07-11 Thread GitBox
mridulm commented on code in PR #35906: URL: https://github.com/apache/spark/pull/35906#discussion_r918184780 ## common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java: ## @@ -317,22 +353,24 @@ public void applicationRemoved(String

[GitHub] [spark] xinrong-databricks commented on a diff in pull request #36893: [SPARK-39494][PYTHON] Support `createDataFrame` from a list of scalars when schema is not provided

2022-07-11 Thread GitBox
xinrong-databricks commented on code in PR #36893: URL: https://github.com/apache/spark/pull/36893#discussion_r918171922 ## python/pyspark/sql/session.py: ## @@ -1023,6 +1023,20 @@ def prepare(obj: Any) -> Any: if isinstance(data, RDD): rdd, struct =

[GitHub] [spark] xinrong-databricks commented on a diff in pull request #36893: [SPARK-39494][PYTHON] Support `createDataFrame` from a list of scalars when schema is not provided

2022-07-11 Thread GitBox
xinrong-databricks commented on code in PR #36893: URL: https://github.com/apache/spark/pull/36893#discussion_r918171922 ## python/pyspark/sql/session.py: ## @@ -1023,6 +1023,20 @@ def prepare(obj: Any) -> Any: if isinstance(data, RDD): rdd, struct =

[GitHub] [spark] xinrong-databricks commented on a diff in pull request #36893: [SPARK-39494][PYTHON] Support `createDataFrame` from a list of scalars when schema is not provided

2022-07-11 Thread GitBox
xinrong-databricks commented on code in PR #36893: URL: https://github.com/apache/spark/pull/36893#discussion_r918165714 ## python/pyspark/sql/tests/test_types.py: ## @@ -374,12 +373,6 @@ def test_negative_decimal(self): finally: self.spark.sql("set

[GitHub] [spark] Jonathancui123 commented on a diff in pull request #36871: [SPARK-39469][SQL] Infer date type for CSV schema inference

2022-07-11 Thread GitBox
Jonathancui123 commented on code in PR #36871: URL: https://github.com/apache/spark/pull/36871#discussion_r918153782 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVOptions.scala: ## @@ -148,7 +148,28 @@ class CSVOptions( // A language tag in IETF BCP 47

[GitHub] [spark] Yikf commented on a diff in pull request #37129: [SPARK-39710][SQL] Support push local topK through outer join

2022-07-11 Thread GitBox
Yikf commented on code in PR #37129: URL: https://github.com/apache/spark/pull/37129#discussion_r918140392 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala: ## @@ -230,6 +230,7 @@ abstract class Optimizer(catalogManager: CatalogManager)

[GitHub] [spark] Jonathancui123 commented on a diff in pull request #36871: [SPARK-39469][SQL] Infer date type for CSV schema inference

2022-07-11 Thread GitBox
Jonathancui123 commented on code in PR #36871: URL: https://github.com/apache/spark/pull/36871#discussion_r918142012 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVInferSchema.scala: ## @@ -117,7 +123,10 @@ class CSVInferSchema(val options: CSVOptions)

[GitHub] [spark] dtenedor commented on a diff in pull request #37159: [SPARK-38796][SQL][DOC][FOLLOWUP] Remove try_to_char reference in the doc

2022-07-11 Thread GitBox
dtenedor commented on code in PR #37159: URL: https://github.com/apache/spark/pull/37159#discussion_r918129229 ## docs/sql-ref-number-pattern.md: ## @@ -176,10 +176,10 @@ Note that the format string used in most of these examples expects: "$#.##" -- 'S' can be at the

[GitHub] [spark] Yikf commented on a diff in pull request #37113: [SPARK-39741][SQL] Support url encode/decode as built-in function and tidy up url-related functions

2022-07-11 Thread GitBox
Yikf commented on code in PR #37113: URL: https://github.com/apache/spark/pull/37113#discussion_r918128553 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/urlExpressions.scala: ## @@ -0,0 +1,290 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] [spark] cloud-fan commented on pull request #37159: [SPARK-38796][SQL][DOC][FOLLOWUP] Remove try_to_char reference in the doc

2022-07-11 Thread GitBox
cloud-fan commented on PR #37159: URL: https://github.com/apache/spark/pull/37159#issuecomment-1180612285 CC @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] cloud-fan opened a new pull request, #37159: [SPARK-38796][SQL][DOC][FOLLOWUP] Remove try_to_char reference in the doc

2022-07-11 Thread GitBox
cloud-fan opened a new pull request, #37159: URL: https://github.com/apache/spark/pull/37159 ### What changes were proposed in this pull request? We have removed the `try_to_char` function and it shouldn't appear in the doc anymore. This PR also improves the doc a little bit.

[GitHub] [spark] cloud-fan commented on pull request #37074: [SPARK-39672][SQL][3.1] Fix de-duplicating conflicting attributes when rewriting subquery

2022-07-11 Thread GitBox
cloud-fan commented on PR #37074: URL: https://github.com/apache/spark/pull/37074#issuecomment-1180582242 OK I think `DeduplicateRelations` needs some fix. Ideally the outer and inner plan should not have conflicting output attributes after analysis, but this local relation + project case

[GitHub] [spark] cloud-fan commented on pull request #37074: [SPARK-39672][SQL][3.1] Fix de-duplicating conflicting attributes when rewriting subquery

2022-07-11 Thread GitBox
cloud-fan commented on PR #37074: URL: https://github.com/apache/spark/pull/37074#issuecomment-1180579396 > This check is not accurate when there's And expression in the Join condition as in this case. Hence, this PR proposes to add a check whether the intersected attributes exist in all

[GitHub] [spark] cloud-fan commented on a diff in pull request #37074: [SPARK-39672][SQL][3.1] Fix de-duplicating conflicting attributes when rewriting subquery

2022-07-11 Thread GitBox
cloud-fan commented on code in PR #37074: URL: https://github.com/apache/spark/pull/37074#discussion_r918091135 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/subquery.scala: ## @@ -72,12 +72,22 @@ object RewritePredicateSubquery extends

[GitHub] [spark] cloud-fan commented on a diff in pull request #37113: [SPARK-39741][SQL] Support url encode/decode as built-in function and tidy up url-related functions

2022-07-11 Thread GitBox
cloud-fan commented on code in PR #37113: URL: https://github.com/apache/spark/pull/37113#discussion_r918083814 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/urlExpressions.scala: ## @@ -0,0 +1,290 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] [spark] cloud-fan commented on a diff in pull request #37150: [SPARK-39737][SQL] `PERCENTILE_CONT` and `PERCENTILE_DISC` should support aggregate filter

2022-07-11 Thread GitBox
cloud-fan commented on code in PR #37150: URL: https://github.com/apache/spark/pull/37150#discussion_r918064239 ## sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4: ## @@ -849,7 +849,8 @@ primaryExpression | OVERLAY LEFT_PAREN

[GitHub] [spark] Yikun closed pull request #37006: [SPARK-39522][INFRA] Uses Docker image cache over a custom image in sparkr job

2022-07-11 Thread GitBox
Yikun closed pull request #37006: [SPARK-39522][INFRA] Uses Docker image cache over a custom image in sparkr job URL: https://github.com/apache/spark/pull/37006 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] Yikun opened a new pull request, #37158: [SPARK-39736][INFRA] Enable base image build in SparkR job

2022-07-11 Thread GitBox
Yikun opened a new pull request, #37158: URL: https://github.com/apache/spark/pull/37158 ### What changes were proposed in this pull request? Add base image build for SparkR job https://user-images.githubusercontent.com/1736354/178295594-6f057247-72ab-4ff1-bb69-48aba05dd06b.png;>

[GitHub] [spark] wangyum commented on pull request #37069: [SPARK-39667][SQL] Add another workaround when there is not enough memory to build and broadcast the table

2022-07-11 Thread GitBox
wangyum commented on PR #37069: URL: https://github.com/apache/spark/pull/37069#issuecomment-1180466553 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] wangyum closed pull request #37069: [SPARK-39667][SQL] Add another workaround when there is not enough memory to build and broadcast the table

2022-07-11 Thread GitBox
wangyum closed pull request #37069: [SPARK-39667][SQL] Add another workaround when there is not enough memory to build and broadcast the table URL: https://github.com/apache/spark/pull/37069 -- This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] wangyum commented on pull request #37048: [SPARK-39655][CORE] Add a config to limit the number of RDD partitions

2022-07-11 Thread GitBox
wangyum commented on PR #37048: URL: https://github.com/apache/spark/pull/37048#issuecomment-1180460255 > OK, but your approach doesn't fix all the issues. With AQE, a complicated query may have run for a while and the cartesian product fails in the last query stage. Yes. This is

[GitHub] [spark] panbingkun commented on pull request #37157: [MINOR][FOLLOWUP] Remove redundant return

2022-07-11 Thread GitBox
panbingkun commented on PR #37157: URL: https://github.com/apache/spark/pull/37157#issuecomment-1180449395 ping @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] panbingkun opened a new pull request, #37157: [MINOR][FOLLOWUP] Remove redundant return

2022-07-11 Thread GitBox
panbingkun opened a new pull request, #37157: URL: https://github.com/apache/spark/pull/37157 ### What changes were proposed in this pull request? Remove redundant return in scala code. The pr followup: https://github.com/apache/spark/pull/37148 ### Why are the changes

[GitHub] [spark] panbingkun commented on pull request #37148: [MINOR] Remove redundant return

2022-07-11 Thread GitBox
panbingkun commented on PR #37148: URL: https://github.com/apache/spark/pull/37148#issuecomment-1180445408 > Merged to master. I am very sorrry, I miss some scala code with the same problem. @HyukjinKwon Followup PR: -- This is an automated message from the Apache Git Service.

[GitHub] [spark] wangyum commented on a diff in pull request #37129: [SPARK-39710][SQL] Support push local topK through outer join

2022-07-11 Thread GitBox
wangyum commented on code in PR #37129: URL: https://github.com/apache/spark/pull/37129#discussion_r917966457 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/PushLocalTopKThroughOuterJoin.scala: ## @@ -0,0 +1,118 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] HyukjinKwon closed pull request #37155: [SPARK-39735][INFRA] Move image condition to jobs to make non-master schedule job work

2022-07-11 Thread GitBox
HyukjinKwon closed pull request #37155: [SPARK-39735][INFRA] Move image condition to jobs to make non-master schedule job work URL: https://github.com/apache/spark/pull/37155 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] HyukjinKwon commented on pull request #37155: [SPARK-39735][INFRA] Move image condition to jobs to make non-master schedule job work

2022-07-11 Thread GitBox
HyukjinKwon commented on PR #37155: URL: https://github.com/apache/spark/pull/37155#issuecomment-1180388648 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #37135: [SPARK-39723][R] Implement functionExists/getFunction in SparkR for 3L namespace

2022-07-11 Thread GitBox
HyukjinKwon commented on code in PR #37135: URL: https://github.com/apache/spark/pull/37135#discussion_r917898330 ## R/pkg/tests/fulltests/test_context.R: ## @@ -21,10 +21,11 @@ test_that("Check masked functions", { # Check that we are not masking any new function from base,

  1   2   >