[GitHub] [spark] Yikun opened a new pull request, #36464: [SPARK-38947][PYTHON] Supports groupby positional indexing

2022-05-06 Thread GitBox
Yikun opened a new pull request, #36464: URL: https://github.com/apache/spark/pull/36464 ### What changes were proposed in this pull request? Add groupby positional indexing support for Pandas on Spark. ### Why are the changes needed? Pandas supports Groupby positional inde

[GitHub] [spark] cloud-fan commented on a diff in pull request #36417: [SPARK-39057][SQL] Offset could work without Limit

2022-05-06 Thread GitBox
cloud-fan commented on code in PR #36417: URL: https://github.com/apache/spark/pull/36417#discussion_r866548301 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala: ## @@ -1871,14 +1871,12 @@ object EliminateLimits extends Rule[LogicalPlan] {

[GitHub] [spark] cloud-fan commented on a diff in pull request #36417: [SPARK-39057][SQL] Offset could work without Limit

2022-05-06 Thread GitBox
cloud-fan commented on code in PR #36417: URL: https://github.com/apache/spark/pull/36417#discussion_r866549645 ## sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala: ## @@ -92,15 +92,17 @@ abstract class SparkStrategies extends QueryPlanner[SparkPlan]

[GitHub] [spark] cloud-fan commented on a diff in pull request #36417: [SPARK-39057][SQL] Offset could work without Limit

2022-05-06 Thread GitBox
cloud-fan commented on code in PR #36417: URL: https://github.com/apache/spark/pull/36417#discussion_r866553021 ## sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala: ## @@ -818,8 +820,6 @@ abstract class SparkStrategies extends QueryPlanner[SparkPlan]

[GitHub] [spark] cloud-fan commented on a diff in pull request #36417: [SPARK-39057][SQL] Offset could work without Limit

2022-05-06 Thread GitBox
cloud-fan commented on code in PR #36417: URL: https://github.com/apache/spark/pull/36417#discussion_r866554687 ## sql/core/src/main/scala/org/apache/spark/sql/execution/limit.scala: ## @@ -70,9 +78,11 @@ case class CollectLimitExec( val singlePartitionRDD = if (childRDD.

[GitHub] [spark] cloud-fan commented on a diff in pull request #36417: [SPARK-39057][SQL] Offset could work without Limit

2022-05-06 Thread GitBox
cloud-fan commented on code in PR #36417: URL: https://github.com/apache/spark/pull/36417#discussion_r866557269 ## sql/core/src/main/scala/org/apache/spark/sql/execution/limit.scala: ## @@ -207,32 +222,62 @@ case class GlobalLimitExec(limit: Int, child: SparkPlan) extends BaseL

[GitHub] [spark] cloud-fan commented on a diff in pull request #36417: [SPARK-39057][SQL] Offset could work without Limit

2022-05-06 Thread GitBox
cloud-fan commented on code in PR #36417: URL: https://github.com/apache/spark/pull/36417#discussion_r866557980 ## sql/core/src/main/scala/org/apache/spark/sql/execution/limit.scala: ## @@ -250,17 +295,19 @@ case class TakeOrderedAndProjectExec( limit: Int, sortOrder:

[GitHub] [spark] cloud-fan commented on a diff in pull request #36417: [SPARK-39057][SQL] Offset could work without Limit

2022-05-06 Thread GitBox
cloud-fan commented on code in PR #36417: URL: https://github.com/apache/spark/pull/36417#discussion_r866559243 ## sql/core/src/test/resources/sql-tests/inputs/subquery/in-subquery/in-limit.sql: ## @@ -159,3 +159,54 @@ GROUP BY t1b ORDER BY t1b NULLS last LIMIT 1 OFFSET 1;

[GitHub] [spark] cloud-fan commented on a diff in pull request #36440: [SPARK-37259][SQL] Support CTE and temp table queries with MSSQL JDBC

2022-05-06 Thread GitBox
cloud-fan commented on code in PR #36440: URL: https://github.com/apache/spark/pull/36440#discussion_r866561860 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCOptions.scala: ## @@ -222,6 +222,8 @@ class JDBCOptions( // User specified JDBC con

[GitHub] [spark] cloud-fan commented on a diff in pull request #36330: [SPARK-38897][SQL] DS V2 supports push down string functions

2022-05-06 Thread GitBox
cloud-fan commented on code in PR #36330: URL: https://github.com/apache/spark/pull/36330#discussion_r866564250 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/expressions/GeneralScalarExpression.java: ## @@ -148,6 +148,54 @@ *Since version: 3.3.0 * *

[GitHub] [spark] bjornjorgensen opened a new pull request, #36465: [SPARK-39113] Rename `self` to `cls` in mllib/clustering

2022-05-06 Thread GitBox
bjornjorgensen opened a new pull request, #36465: URL: https://github.com/apache/spark/pull/36465 ### What changes were proposed in this pull request? Rename `self` to `cls` ### Why are the changes needed? Function def train(self) is decorated as a @classmethod

[GitHub] [spark] cloud-fan commented on a diff in pull request #36330: [SPARK-38897][SQL] DS V2 supports push down string functions

2022-05-06 Thread GitBox
cloud-fan commented on code in PR #36330: URL: https://github.com/apache/spark/pull/36330#discussion_r866566246 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/util/V2ExpressionSQLBuilder.java: ## @@ -102,13 +102,23 @@ public String build(Expression expr) {

[GitHub] [spark] cloud-fan commented on a diff in pull request #36330: [SPARK-38897][SQL] DS V2 supports push down string functions

2022-05-06 Thread GitBox
cloud-fan commented on code in PR #36330: URL: https://github.com/apache/spark/pull/36330#discussion_r866566246 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/util/V2ExpressionSQLBuilder.java: ## @@ -102,13 +102,23 @@ public String build(Expression expr) {

[GitHub] [spark] cloud-fan commented on pull request #36455: [SPARK-39105][SQL] Add ConditionalExpression trait

2022-05-06 Thread GitBox
cloud-fan commented on PR #36455: URL: https://github.com/apache/spark/pull/36455#issuecomment-1119344173 thanks, merging to master/3.3! (I'm backporting this small refactor as we will have a bug fix that relies on it) -- This is an automated message from the Apache Git Service. To respon

[GitHub] [spark] cloud-fan closed pull request #36455: [SPARK-39105][SQL] Add ConditionalExpression trait

2022-05-06 Thread GitBox
cloud-fan closed pull request #36455: [SPARK-39105][SQL] Add ConditionalExpression trait URL: https://github.com/apache/spark/pull/36455 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] AmplabJenkins commented on pull request #36465: [SPARK-39113][CORE][MLLIB][PYTHON] Rename `self` to `cls` in mllib/clustering

2022-05-06 Thread GitBox
AmplabJenkins commented on PR #36465: URL: https://github.com/apache/spark/pull/36465#issuecomment-1119349929 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] LuciferYang opened a new pull request, #36466: [DON'T MERGE] Replace `finalize()` with custom `ReferenceQueue` for `LevelDB/RocksDBIterator` cleanup

2022-05-06 Thread GitBox
LuciferYang opened a new pull request, #36466: URL: https://github.com/apache/spark/pull/36466 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was thi

[GitHub] [spark] LuciferYang closed pull request #36466: [DON'T MERGE] Replace `finalize()` with custom `ReferenceQueue` for `LevelDB/RocksDBIterator` cleanup

2022-05-06 Thread GitBox
LuciferYang closed pull request #36466: [DON'T MERGE] Replace `finalize()` with custom `ReferenceQueue` for `LevelDB/RocksDBIterator` cleanup URL: https://github.com/apache/spark/pull/36466 -- This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] pan3793 commented on pull request #36418: [SPARK-39079][SQL] Catalog name should not contain dot

2022-05-06 Thread GitBox
pan3793 commented on PR #36418: URL: https://github.com/apache/spark/pull/36418#issuecomment-1119358323 cc @rdblue -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsub

[GitHub] [spark] pan3793 commented on pull request #36418: [SPARK-39079][SQL] Catalog name should not contain dot

2022-05-06 Thread GitBox
pan3793 commented on PR #36418: URL: https://github.com/apache/spark/pull/36418#issuecomment-1119358325 cc @rdblue -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsub

[GitHub] [spark] LuciferYang opened a new pull request, #36467: [DON'T MERGE] Replace `finalize()` with custom ReferenceQueue for `RocksDBIterator` cleanup

2022-05-06 Thread GitBox
LuciferYang opened a new pull request, #36467: URL: https://github.com/apache/spark/pull/36467 ### What changes were proposed in this pull request? This pr try to use a custom `ReferenceQueue + cleanupThread` instead of `Finalization` mechanism to automatically clean up `RocksDBIterator`

[GitHub] [spark] LuciferYang commented on pull request #36467: [DON'T MERGE] Replace `finalize()` with custom ReferenceQueue for `RocksDBIterator` cleanup

2022-05-06 Thread GitBox
LuciferYang commented on PR #36467: URL: https://github.com/apache/spark/pull/36467#issuecomment-1119368808 Just for test now -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] beliefer commented on a diff in pull request #36417: [SPARK-39057][SQL] Offset could work without Limit

2022-05-06 Thread GitBox
beliefer commented on code in PR #36417: URL: https://github.com/apache/spark/pull/36417#discussion_r866602901 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala: ## @@ -1871,14 +1871,12 @@ object EliminateLimits extends Rule[LogicalPlan] { }

[GitHub] [spark] beliefer commented on a diff in pull request #36417: [SPARK-39057][SQL] Offset could work without Limit

2022-05-06 Thread GitBox
beliefer commented on code in PR #36417: URL: https://github.com/apache/spark/pull/36417#discussion_r866603469 ## sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala: ## @@ -92,15 +92,17 @@ abstract class SparkStrategies extends QueryPlanner[SparkPlan]

[GitHub] [spark] Yikun commented on a diff in pull request #36452: [SPARK-39109][PYTHON] Adjust `GroupBy.mean/median` to match pandas 1.4

2022-05-06 Thread GitBox
Yikun commented on code in PR #36452: URL: https://github.com/apache/spark/pull/36452#discussion_r866587031 ## python/pyspark/pandas/groupby.py: ## @@ -2673,7 +2682,7 @@ def get_group(self, name: Union[Name, List[Name]]) -> FrameLike: return self._cleanup_and_return(

[GitHub] [spark] Yikun commented on pull request #36464: [SPARK-38947][PYTHON] Supports groupby positional indexing

2022-05-06 Thread GitBox
Yikun commented on PR #36464: URL: https://github.com/apache/spark/pull/36464#issuecomment-1119379454 cc @xinrong-databricks @itholic @HyukjinKwon @zhengruifeng -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] MaxGekk commented on pull request #36458: [SPARK-39060][SQL][3.2] Typo in error messages of decimal overflow

2022-05-06 Thread GitBox
MaxGekk commented on PR #36458: URL: https://github.com/apache/spark/pull/36458#issuecomment-1119393395 > There are two tests failing in TPCDSV1_4_PlanStabilitySuite, but I also see that in the test run for the most recent commit to branch-3.2: https://github.com/apache/spark/pull/36341

[GitHub] [spark] beliefer commented on a diff in pull request #36417: [SPARK-39057][SQL] Offset could work without Limit

2022-05-06 Thread GitBox
beliefer commented on code in PR #36417: URL: https://github.com/apache/spark/pull/36417#discussion_r866627978 ## sql/core/src/test/resources/sql-tests/inputs/subquery/in-subquery/in-limit.sql: ## @@ -159,3 +159,54 @@ GROUP BY t1b ORDER BY t1b NULLS last LIMIT 1 OFFSET 1; +

[GitHub] [spark] MaxGekk commented on pull request #36459: [SPARK-39060][SQL][3.1] Typo in error messages of decimal overflow

2022-05-06 Thread GitBox
MaxGekk commented on PR #36459: URL: https://github.com/apache/spark/pull/36459#issuecomment-1119404876 +1, LGTM. Merging to 3.1. Thank you, @vli-databricks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

[GitHub] [spark] peter-toth commented on a diff in pull request #36440: [SPARK-37259][SQL] Support CTE and temp table queries with MSSQL JDBC

2022-05-06 Thread GitBox
peter-toth commented on code in PR #36440: URL: https://github.com/apache/spark/pull/36440#discussion_r866631665 ## connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/MsSqlServerIntegrationSuite.scala: ## @@ -374,4 +375,58 @@ class MsSqlServerIntegration

[GitHub] [spark] MaxGekk closed pull request #36459: [SPARK-39060][SQL][3.1] Typo in error messages of decimal overflow

2022-05-06 Thread GitBox
MaxGekk closed pull request #36459: [SPARK-39060][SQL][3.1] Typo in error messages of decimal overflow URL: https://github.com/apache/spark/pull/36459 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] MaxGekk commented on pull request #36460: [SPARK-39060][SQL][3.0] Typo in error messages of decimal overflow

2022-05-06 Thread GitBox
MaxGekk commented on PR #36460: URL: https://github.com/apache/spark/pull/36460#issuecomment-1119407156 +1, LGTM. Merging to 3.0. Thank you, @vli-databricks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

[GitHub] [spark] MaxGekk closed pull request #36460: [SPARK-39060][SQL][3.0] Typo in error messages of decimal overflow

2022-05-06 Thread GitBox
MaxGekk closed pull request #36460: [SPARK-39060][SQL][3.0] Typo in error messages of decimal overflow URL: https://github.com/apache/spark/pull/36460 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] ulysses-you commented on pull request #36455: [SPARK-39105][SQL] Add ConditionalExpression trait

2022-05-06 Thread GitBox
ulysses-you commented on PR #36455: URL: https://github.com/apache/spark/pull/36455#issuecomment-1119412422 thank you @cloud-fan @viirya -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

[GitHub] [spark] AmplabJenkins commented on pull request #36463: [SPARK-38751][SQL][TESTS] Test the error class: UNRECOGNIZED_SQL_TYPE

2022-05-06 Thread GitBox
AmplabJenkins commented on PR #36463: URL: https://github.com/apache/spark/pull/36463#issuecomment-1119417155 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] cloud-fan commented on a diff in pull request #36417: [SPARK-39057][SQL] Offset could work without Limit

2022-05-06 Thread GitBox
cloud-fan commented on code in PR #36417: URL: https://github.com/apache/spark/pull/36417#discussion_r866649081 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala: ## @@ -750,6 +749,11 @@ object LimitPushDown extends Rule[LogicalPlan] {

[GitHub] [spark] cloud-fan commented on a diff in pull request #36417: [SPARK-39057][SQL] Offset could work without Limit

2022-05-06 Thread GitBox
cloud-fan commented on code in PR #36417: URL: https://github.com/apache/spark/pull/36417#discussion_r866652724 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala: ## @@ -750,6 +749,11 @@ object LimitPushDown extends Rule[LogicalPlan] {

[GitHub] [spark] bjornjorgensen commented on a diff in pull request #36461: [SPARK-39111][CORE][SQL] Mark overridden methods with `@Override` annotation

2022-05-06 Thread GitBox
bjornjorgensen commented on code in PR #36461: URL: https://github.com/apache/spark/pull/36461#discussion_r866649565 ## connector/avro/src/main/java/org/apache/spark/sql/avro/SparkAvroKeyOutputFormat.java: ## @@ -46,6 +46,7 @@ static class SparkRecordWriterFactory extends Recor

[GitHub] [spark] cloud-fan commented on a diff in pull request #36330: [SPARK-38897][SQL] DS V2 supports push down string functions

2022-05-06 Thread GitBox
cloud-fan commented on code in PR #36330: URL: https://github.com/apache/spark/pull/36330#discussion_r866655388 ## sql/core/src/main/scala/org/apache/spark/sql/catalyst/util/V2ExpressionBuilder.scala: ## @@ -200,6 +200,55 @@ class V2ExpressionBuilder( } else { No

[GitHub] [spark] cloud-fan commented on a diff in pull request #36330: [SPARK-38897][SQL] DS V2 supports push down string functions

2022-05-06 Thread GitBox
cloud-fan commented on code in PR #36330: URL: https://github.com/apache/spark/pull/36330#discussion_r866656231 ## sql/core/src/main/scala/org/apache/spark/sql/catalyst/util/V2ExpressionBuilder.scala: ## @@ -200,6 +200,55 @@ class V2ExpressionBuilder( } else { No

[GitHub] [spark] cloud-fan commented on a diff in pull request #36330: [SPARK-38897][SQL] DS V2 supports push down string functions

2022-05-06 Thread GitBox
cloud-fan commented on code in PR #36330: URL: https://github.com/apache/spark/pull/36330#discussion_r866657505 ## sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCV2Suite.scala: ## @@ -626,6 +626,54 @@ class JDBCV2Suite extends QueryTest with SharedSparkSession with Expla

[GitHub] [spark] cloud-fan commented on pull request #36377: [SPARK-39043][SQL] Spark SQL Hive client should not gather statistic by default.

2022-05-06 Thread GitBox
cloud-fan commented on PR #36377: URL: https://github.com/apache/spark/pull/36377#issuecomment-1119434850 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

[GitHub] [spark] cloud-fan closed pull request #36377: [SPARK-39043][SQL] Spark SQL Hive client should not gather statistic by default.

2022-05-06 Thread GitBox
cloud-fan closed pull request #36377: [SPARK-39043][SQL] Spark SQL Hive client should not gather statistic by default. URL: https://github.com/apache/spark/pull/36377 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR

[GitHub] [spark] cloud-fan commented on pull request #36344: [SPARK-39012][SQL]SparkSQL cast partition value does not support all data types

2022-05-06 Thread GitBox
cloud-fan commented on PR #36344: URL: https://github.com/apache/spark/pull/36344#issuecomment-1119436016 thanks, merging to master/3.3! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

[GitHub] [spark] cloud-fan closed pull request #36344: [SPARK-39012][SQL]SparkSQL cast partition value does not support all data types

2022-05-06 Thread GitBox
cloud-fan closed pull request #36344: [SPARK-39012][SQL]SparkSQL cast partition value does not support all data types URL: https://github.com/apache/spark/pull/36344 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] LuciferYang commented on pull request #36403: [SPARK-39063][CORE] Remove `finalize()` method and related codes from `LevelDB/RocksDBIterator`

2022-05-06 Thread GitBox
LuciferYang commented on PR #36403: URL: https://github.com/apache/spark/pull/36403#issuecomment-1119462201 I give a new [draft](https://github.com/apache/spark/pull/36467/files) pr to replace `finalize()` with a `ReferenceQueue+Daemon Thread` for `RocksDBIterator` cleanup and try to avoid

[GitHub] [spark] ulysses-you opened a new pull request, #36468: [SPARK-39106][SQL] Correct conditional expression constant folding

2022-05-06 Thread GitBox
ulysses-you opened a new pull request, #36468: URL: https://github.com/apache/spark/pull/36468 ### What changes were proposed in this pull request? - skip folding children inside `ConditionalExpression` if it's not foldable - mark `CaseWhen` and `If` as foldable if it's chil

[GitHub] [spark] ulysses-you commented on a diff in pull request #36468: [SPARK-39106][SQL] Correct conditional expression constant folding

2022-05-06 Thread GitBox
ulysses-you commented on code in PR #36468: URL: https://github.com/apache/spark/pull/36468#discussion_r866699096 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala: ## @@ -52,22 +52,28 @@ object ConstantFolding extends Rule[LogicalPlan] {

[GitHub] [spark] AmplabJenkins commented on pull request #36461: [SPARK-39111][CORE][SQL] Mark overridden methods with `@Override` annotation

2022-05-06 Thread GitBox
AmplabJenkins commented on PR #36461: URL: https://github.com/apache/spark/pull/36461#issuecomment-1119486340 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] weixiuli commented on a diff in pull request #36162: [SPARK-32170][CORE] Improve the speculation through the stage task metrics.

2022-05-06 Thread GitBox
weixiuli commented on code in PR #36162: URL: https://github.com/apache/spark/pull/36162#discussion_r866718876 ## core/src/main/scala/org/apache/spark/internal/config/package.scala: ## @@ -2051,6 +2051,39 @@ package object config { .doubleConf .createWithDefault(0.

[GitHub] [spark] LorenzoMartini commented on a diff in pull request #36457: [SPARK-39107][SQL] Account for empty string input in regex replace

2022-05-06 Thread GitBox
LorenzoMartini commented on code in PR #36457: URL: https://github.com/apache/spark/pull/36457#discussion_r866721488 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala: ## @@ -642,7 +642,7 @@ case class RegExpReplace(subject: Express

[GitHub] [spark] LorenzoMartini commented on a diff in pull request #36457: [SPARK-39107][SQL] Account for empty string input in regex replace

2022-05-06 Thread GitBox
LorenzoMartini commented on code in PR #36457: URL: https://github.com/apache/spark/pull/36457#discussion_r866722337 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala: ## @@ -642,7 +642,7 @@ case class RegExpReplace(subject: Express

[GitHub] [spark] LorenzoMartini commented on a diff in pull request #36457: [SPARK-39107][SQL] Account for empty string input in regex replace

2022-05-06 Thread GitBox
LorenzoMartini commented on code in PR #36457: URL: https://github.com/apache/spark/pull/36457#discussion_r866723000 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/RegexpExpressionsSuite.scala: ## @@ -323,6 +323,16 @@ class RegexpExpressionsSuite extend

[GitHub] [spark] peter-toth commented on a diff in pull request #36440: [SPARK-37259][SQL] Support CTE and temp table queries with MSSQL JDBC

2022-05-06 Thread GitBox
peter-toth commented on code in PR #36440: URL: https://github.com/apache/spark/pull/36440#discussion_r866631780 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCOptions.scala: ## @@ -222,6 +222,8 @@ class JDBCOptions( // User specified JDBC co

[GitHub] [spark] srowen commented on a diff in pull request #36457: [SPARK-39107][SQL] Account for empty string input in regex replace

2022-05-06 Thread GitBox
srowen commented on code in PR #36457: URL: https://github.com/apache/spark/pull/36457#discussion_r866766828 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala: ## @@ -642,7 +642,7 @@ case class RegExpReplace(subject: Expression, reg

[GitHub] [spark] LuciferYang commented on pull request #36403: [SPARK-39063][CORE] Remove `finalize()` method and related codes from `LevelDB/RocksDBIterator`

2022-05-06 Thread GitBox
LuciferYang commented on PR #36403: URL: https://github.com/apache/spark/pull/36403#issuecomment-1119559814 > I give a new [draft](https://github.com/apache/spark/pull/36467/files) pr to replace `finalize()` with a `ReferenceQueue+Daemon Thread` for `RocksDBIterator` cleanup and try to avoi

[GitHub] [spark] LorenzoMartini commented on a diff in pull request #36457: [SPARK-39107][SQL] Account for empty string input in regex replace

2022-05-06 Thread GitBox
LorenzoMartini commented on code in PR #36457: URL: https://github.com/apache/spark/pull/36457#discussion_r866773711 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala: ## @@ -642,7 +642,7 @@ case class RegExpReplace(subject: Express

[GitHub] [spark] LorenzoMartini commented on a diff in pull request #36457: [SPARK-39107][SQL] Account for empty string input in regex replace

2022-05-06 Thread GitBox
LorenzoMartini commented on code in PR #36457: URL: https://github.com/apache/spark/pull/36457#discussion_r866773711 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala: ## @@ -642,7 +642,7 @@ case class RegExpReplace(subject: Express

[GitHub] [spark] srowen commented on a diff in pull request #36457: [SPARK-39107][SQL] Account for empty string input in regex replace

2022-05-06 Thread GitBox
srowen commented on code in PR #36457: URL: https://github.com/apache/spark/pull/36457#discussion_r866781006 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala: ## @@ -642,7 +642,7 @@ case class RegExpReplace(subject: Expression, reg

[GitHub] [spark] srowen commented on pull request #36462: [SPARK-39110][WEBUI] Add metrics properties to environment tab

2022-05-06 Thread GitBox
srowen commented on PR #36462: URL: https://github.com/apache/spark/pull/36462#issuecomment-1119570468 It seems reasonable, though I hesitate to clutter the UI further. How frequently would people need to check this, and how would they currently check these settings? is it that hard that

[GitHub] [spark] LorenzoMartini commented on a diff in pull request #36457: [SPARK-39107][SQL] Account for empty string input in regex replace

2022-05-06 Thread GitBox
LorenzoMartini commented on code in PR #36457: URL: https://github.com/apache/spark/pull/36457#discussion_r866783088 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala: ## @@ -642,7 +642,7 @@ case class RegExpReplace(subject: Express

[GitHub] [spark] srowen commented on a diff in pull request #36424: [SPARK-39083][CORE] : Fix race condition between update and clean app data

2022-05-06 Thread GitBox
srowen commented on code in PR #36424: URL: https://github.com/apache/spark/pull/36424#discussion_r866785032 ## core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala: ## @@ -631,37 +631,38 @@ private[history] class FsHistoryProvider(conf: SparkConf, clock:

[GitHub] [spark] LorenzoMartini commented on a diff in pull request #36457: [SPARK-39107][SQL] Account for empty string input in regex replace

2022-05-06 Thread GitBox
LorenzoMartini commented on code in PR #36457: URL: https://github.com/apache/spark/pull/36457#discussion_r866789627 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala: ## @@ -642,7 +642,7 @@ case class RegExpReplace(subject: Express

[GitHub] [spark] zhengruifeng opened a new pull request, #36469: [SPARK-39114][ML] ml.optim.aggregator avoid re-allocating buffers

2022-05-06 Thread GitBox
zhengruifeng opened a new pull request, #36469: URL: https://github.com/apache/spark/pull/36469 ### What changes were proposed in this pull request? ml.optim.aggregator avoid re-allocating buffers ### Why are the changes needed? in SPARK-30661 (KMeans blockify input v

[GitHub] [spark] tanvn commented on a diff in pull request #36424: [SPARK-39083][CORE] : Fix race condition between update and clean app data

2022-05-06 Thread GitBox
tanvn commented on code in PR #36424: URL: https://github.com/apache/spark/pull/36424#discussion_r866807615 ## core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala: ## @@ -631,37 +631,38 @@ private[history] class FsHistoryProvider(conf: SparkConf, clock:

[GitHub] [spark] ulysses-you commented on a diff in pull request #36468: [SPARK-39106][SQL] Correct conditional expression constant folding

2022-05-06 Thread GitBox
ulysses-you commented on code in PR #36468: URL: https://github.com/apache/spark/pull/36468#discussion_r866809228 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala: ## @@ -52,22 +52,28 @@ object ConstantFolding extends Rule[LogicalPlan] {

[GitHub] [spark] LorenzoMartini commented on a diff in pull request #36457: [SPARK-39107][SQL] Account for empty string input in regex replace

2022-05-06 Thread GitBox
LorenzoMartini commented on code in PR #36457: URL: https://github.com/apache/spark/pull/36457#discussion_r866723000 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/RegexpExpressionsSuite.scala: ## @@ -323,6 +323,16 @@ class RegexpExpressionsSuite extend

[GitHub] [spark] cloud-fan commented on a diff in pull request #36468: [SPARK-39106][SQL] Correct conditional expression constant folding

2022-05-06 Thread GitBox
cloud-fan commented on code in PR #36468: URL: https://github.com/apache/spark/pull/36468#discussion_r866823202 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala: ## @@ -52,22 +52,28 @@ object ConstantFolding extends Rule[LogicalPlan] {

[GitHub] [spark] cloud-fan commented on a diff in pull request #36417: [SPARK-39057][SQL] Offset could work without Limit

2022-05-06 Thread GitBox
cloud-fan commented on code in PR #36417: URL: https://github.com/apache/spark/pull/36417#discussion_r866824230 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala: ## @@ -1871,21 +1874,18 @@ object EliminateLimits extends Rule[LogicalPlan] {

[GitHub] [spark] cloud-fan commented on a diff in pull request #36417: [SPARK-39057][SQL] Offset could work without Limit

2022-05-06 Thread GitBox
cloud-fan commented on code in PR #36417: URL: https://github.com/apache/spark/pull/36417#discussion_r866827428 ## sql/core/src/main/scala/org/apache/spark/sql/execution/limit.scala: ## @@ -207,32 +230,47 @@ case class GlobalLimitExec(limit: Int, child: SparkPlan) extends BaseL

[GitHub] [spark] cloud-fan commented on a diff in pull request #36417: [SPARK-39057][SQL] Offset could work without Limit

2022-05-06 Thread GitBox
cloud-fan commented on code in PR #36417: URL: https://github.com/apache/spark/pull/36417#discussion_r866827813 ## sql/core/src/main/scala/org/apache/spark/sql/execution/limit.scala: ## @@ -250,17 +288,20 @@ case class TakeOrderedAndProjectExec( limit: Int, sortOrder:

[GitHub] [spark] cloud-fan commented on a diff in pull request #36438: [SPARK-39092][SQL] Propagate Empty Partitions

2022-05-06 Thread GitBox
cloud-fan commented on code in PR #36438: URL: https://github.com/apache/spark/pull/36438#discussion_r866837234 ## sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveSparkPlanExec.scala: ## @@ -134,7 +134,8 @@ case class AdaptiveSparkPlanExec( CoalesceS

[GitHub] [spark] cloud-fan commented on a diff in pull request #36438: [SPARK-39092][SQL] Propagate Empty Partitions

2022-05-06 Thread GitBox
cloud-fan commented on code in PR #36438: URL: https://github.com/apache/spark/pull/36438#discussion_r866839375 ## sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/PropagateEmptyPartitions.scala: ## @@ -0,0 +1,185 @@ +/* + * Licensed to the Apache Software Foundat

[GitHub] [spark] cloud-fan commented on a diff in pull request #36265: [SPARK-38951][SQL] Aggregate aliases override field names in ResolveAggregateFunctions

2022-05-06 Thread GitBox
cloud-fan commented on code in PR #36265: URL: https://github.com/apache/spark/pull/36265#discussion_r866842272 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisSuite.scala: ## @@ -1176,4 +1176,13 @@ class AnalysisSuite extends AnalysisTest with Matc

[GitHub] [spark] cloud-fan commented on a diff in pull request #36265: [SPARK-38951][SQL] Aggregate aliases override field names in ResolveAggregateFunctions

2022-05-06 Thread GitBox
cloud-fan commented on code in PR #36265: URL: https://github.com/apache/spark/pull/36265#discussion_r866842272 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisSuite.scala: ## @@ -1176,4 +1176,13 @@ class AnalysisSuite extends AnalysisTest with Matc

[GitHub] [spark] cloud-fan commented on a diff in pull request #36265: [SPARK-38951][SQL] Aggregate aliases override field names in ResolveAggregateFunctions

2022-05-06 Thread GitBox
cloud-fan commented on code in PR #36265: URL: https://github.com/apache/spark/pull/36265#discussion_r866856072 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala: ## @@ -2563,7 +2563,23 @@ class Analyzer(override val catalogManager: CatalogMan

[GitHub] [spark] cloud-fan commented on a diff in pull request #36265: [SPARK-38951][SQL] Aggregate aliases override field names in ResolveAggregateFunctions

2022-05-06 Thread GitBox
cloud-fan commented on code in PR #36265: URL: https://github.com/apache/spark/pull/36265#discussion_r866859656 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala: ## @@ -2563,7 +2563,23 @@ class Analyzer(override val catalogManager: CatalogMan

[GitHub] [spark] cloud-fan commented on a diff in pull request #36265: [SPARK-38951][SQL] Aggregate aliases override field names in ResolveAggregateFunctions

2022-05-06 Thread GitBox
cloud-fan commented on code in PR #36265: URL: https://github.com/apache/spark/pull/36265#discussion_r866863003 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala: ## @@ -2563,7 +2563,23 @@ class Analyzer(override val catalogManager: CatalogMan

[GitHub] [spark] AngersZhuuuu commented on pull request #36462: [SPARK-39110][WEBUI] Add metrics properties to environment tab

2022-05-06 Thread GitBox
AngersZh commented on PR #36462: URL: https://github.com/apache/spark/pull/36462#issuecomment-1119664833 > It seems reasonable, though I hesitate to clutter the UI further. How frequently would people need to check this, and how would they currently check these settings? is it that hard

[GitHub] [spark] LuciferYang opened a new pull request, #36470: [SPARK-39116][SQL] Replcace double negation in `exists` with `forall`

2022-05-06 Thread GitBox
LuciferYang opened a new pull request, #36470: URL: https://github.com/apache/spark/pull/36470 ### What changes were proposed in this pull request? This is a minor code simplification: **Before** ```scala !Seq(1, 2).exists(x => !condition(x)) ``` **After** ``

[GitHub] [spark] LuciferYang commented on pull request #36470: [SPARK-39116][SQL] Replcace double negation in `exists` with `forall`

2022-05-06 Thread GitBox
LuciferYang commented on PR #36470: URL: https://github.com/apache/spark/pull/36470#issuecomment-1119678618 Only these cases were found -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] pan3793 opened a new pull request, #36471: [MINOR][INFRA][3.3] Add ANTLR generated files to .gitignore

2022-05-06 Thread GitBox
pan3793 opened a new pull request, #36471: URL: https://github.com/apache/spark/pull/36471 ### What changes were proposed in this pull request? Backport https://github.com/apache/spark/pull/35838 to 3.3 ### Why are the changes needed? ### Does this PR intr

[GitHub] [spark] pan3793 commented on pull request #36471: [MINOR][INFRA][3.3] Add ANTLR generated files to .gitignore

2022-05-06 Thread GitBox
pan3793 commented on PR #36471: URL: https://github.com/apache/spark/pull/36471#issuecomment-1119687805 cc @MaxGekk -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsu

[GitHub] [spark] LorenzoMartini commented on pull request #36457: [SPARK-39107][SQL] Account for empty string input in regex replace

2022-05-06 Thread GitBox
LorenzoMartini commented on PR #36457: URL: https://github.com/apache/spark/pull/36457#issuecomment-1119707274 Thanks @srowen! Sounds good to me, the constraint would just be a nit and I'm happy to keep it as is to avoid additional complications / excessive fixing. Can we merge this :)? -

[GitHub] [spark] AmplabJenkins commented on pull request #36471: [MINOR][INFRA][3.3] Add ANTLR generated files to .gitignore

2022-05-06 Thread GitBox
AmplabJenkins commented on PR #36471: URL: https://github.com/apache/spark/pull/36471#issuecomment-1119723863 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] cloud-fan opened a new pull request, #36472: [SPARK-39117][SQL][TEST] Do not include number of functions in sql-expression-schema.md

2022-05-06 Thread GitBox
cloud-fan opened a new pull request, #36472: URL: https://github.com/apache/spark/pull/36472 ### What changes were proposed in this pull request? `sql-expression-schema.md` is a golden file for tracking purposes: whenever we change a function or add a new function, this file m

[GitHub] [spark] cloud-fan commented on pull request #36472: [SPARK-39117][SQL][TEST] Do not include number of functions in sql-expression-schema.md

2022-05-06 Thread GitBox
cloud-fan commented on PR #36472: URL: https://github.com/apache/spark/pull/36472#issuecomment-1119738256 @MaxGekk @viirya @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

[GitHub] [spark] dtenedor commented on pull request #36122: [SPARK-38838][SQL] Support ALTER TABLE ALTER COLUMN commands with DEFAULT values

2022-05-06 Thread GitBox
dtenedor commented on PR #36122: URL: https://github.com/apache/spark/pull/36122#issuecomment-1119749302 > LGTM overall. @dtenedor Could you take a look at the test failures? @gengliangwang SG, this is done. ✔️ -- This is an automated message from the Apache Git Service. To res

[GitHub] [spark] gengliangwang commented on pull request #36122: [SPARK-38838][SQL] Support ALTER TABLE ALTER COLUMN commands with DEFAULT values

2022-05-06 Thread GitBox
gengliangwang commented on PR #36122: URL: https://github.com/apache/spark/pull/36122#issuecomment-1119755451 Thanks, merging to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

[GitHub] [spark] gengliangwang closed pull request #36122: [SPARK-38838][SQL] Support ALTER TABLE ALTER COLUMN commands with DEFAULT values

2022-05-06 Thread GitBox
gengliangwang closed pull request #36122: [SPARK-38838][SQL] Support ALTER TABLE ALTER COLUMN commands with DEFAULT values URL: https://github.com/apache/spark/pull/36122 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use th

[GitHub] [spark] dongjoon-hyun commented on pull request #36455: [SPARK-39105][SQL] Add ConditionalExpression trait

2022-05-06 Thread GitBox
dongjoon-hyun commented on PR #36455: URL: https://github.com/apache/spark/pull/36455#issuecomment-1119758119 +1 for @cloud-fan 's backporting decision. Also, cc @MaxGekk -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] dongjoon-hyun commented on pull request #36471: [MINOR][INFRA][3.3] Add ANTLR generated files to .gitignore

2022-05-06 Thread GitBox
dongjoon-hyun commented on PR #36471: URL: https://github.com/apache/spark/pull/36471#issuecomment-1119759764 Technically, this is not a release blocker. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] dongjoon-hyun closed pull request #36471: [MINOR][INFRA][3.3] Add ANTLR generated files to .gitignore

2022-05-06 Thread GitBox
dongjoon-hyun closed pull request #36471: [MINOR][INFRA][3.3] Add ANTLR generated files to .gitignore URL: https://github.com/apache/spark/pull/36471 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] dongjoon-hyun commented on pull request #36471: [MINOR][INFRA][3.3] Add ANTLR generated files to .gitignore

2022-05-06 Thread GitBox
dongjoon-hyun commented on PR #36471: URL: https://github.com/apache/spark/pull/36471#issuecomment-1119761790 In addition, if we want to backport this, we had better backport the original one with the original authorship. Let me close this PR because of that issue. -- This is an automate

[GitHub] [spark] pan3793 commented on pull request #36471: [MINOR][INFRA][3.3] Add ANTLR generated files to .gitignore

2022-05-06 Thread GitBox
pan3793 commented on PR #36471: URL: https://github.com/apache/spark/pull/36471#issuecomment-1119772004 > In addition, if we want to backport this, we had better backport the original one with the original authorship. Let me close this PR because of that issue. Would you please backp

[GitHub] [spark] gengliangwang commented on a diff in pull request #36415: [SPARK-39078][SQL] Support UPDATE commands with DEFAULT values

2022-05-06 Thread GitBox
gengliangwang commented on code in PR #36415: URL: https://github.com/apache/spark/pull/36415#discussion_r866976275 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveDefaultColumns.scala: ## @@ -98,44 +101,74 @@ case class ResolveDefaultColumns(

[GitHub] [spark] gengliangwang commented on a diff in pull request #36415: [SPARK-39078][SQL] Support UPDATE commands with DEFAULT values

2022-05-06 Thread GitBox
gengliangwang commented on code in PR #36415: URL: https://github.com/apache/spark/pull/36415#discussion_r866977451 ## sql/core/src/test/scala/org/apache/spark/sql/execution/command/PlanResolutionSuite.scala: ## @@ -1035,6 +1060,41 @@ class PlanResolutionSuite extends AnalysisTe

[GitHub] [spark] gengliangwang commented on a diff in pull request #36415: [SPARK-39078][SQL] Support UPDATE commands with DEFAULT values

2022-05-06 Thread GitBox
gengliangwang commented on code in PR #36415: URL: https://github.com/apache/spark/pull/36415#discussion_r866977641 ## sql/core/src/test/scala/org/apache/spark/sql/execution/command/PlanResolutionSuite.scala: ## @@ -1035,6 +1060,41 @@ class PlanResolutionSuite extends AnalysisTe

[GitHub] [spark] mridulm opened a new pull request, #36473: [SPARK-37618][CORE][Followup] Fix test failure

2022-05-06 Thread GitBox
mridulm opened a new pull request, #36473: URL: https://github.com/apache/spark/pull/36473 ### What changes were proposed in this pull request? Fix test failure in build. Use jxr to change the umask of the process to be more restrictive - so that the test can validate the behavior cha

[GitHub] [spark] mridulm commented on pull request #36473: [SPARK-37618][CORE][Followup] Fix test failure

2022-05-06 Thread GitBox
mridulm commented on PR #36473: URL: https://github.com/apache/spark/pull/36473#issuecomment-1119826624 Tested locally, will wait for GA to also complete. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

  1   2   >