[GitHub] [spark] CHENXCHEN commented on pull request #36070: [SPARK-31675][CORE] Fix rename and delete files with different filesystem

2022-04-05 Thread GitBox
CHENXCHEN commented on PR #36070: URL: https://github.com/apache/spark/pull/36070#issuecomment-1088339027 cc @cloud-fan could you help take a look when you have time? Thanks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to G

[GitHub] [spark] Kimahriman commented on a diff in pull request #35959: [SPARK-38640][CORE] Fix NPE with memory-only cache blocks and RDD fetching

2022-04-05 Thread GitBox
Kimahriman commented on code in PR #35959: URL: https://github.com/apache/spark/pull/35959#discussion_r842433930 ## core/src/test/scala/org/apache/spark/ExternalShuffleServiceSuite.scala: ## @@ -255,4 +255,22 @@ class ExternalShuffleServiceSuite extends ShuffleSuite with Before

[GitHub] [spark] Kimahriman commented on a diff in pull request #35959: [SPARK-38640][CORE] Fix NPE with memory-only cache blocks and RDD fetching

2022-04-05 Thread GitBox
Kimahriman commented on code in PR #35959: URL: https://github.com/apache/spark/pull/35959#discussion_r842436352 ## core/src/test/scala/org/apache/spark/ExternalShuffleServiceSuite.scala: ## @@ -255,4 +255,22 @@ class ExternalShuffleServiceSuite extends ShuffleSuite with Before

[GitHub] [spark] lyssg commented on pull request #35667: [K8S] Avoid possible errors due to incorrect file size or type supplied in hadoop conf

2022-04-05 Thread GitBox
lyssg commented on PR #35667: URL: https://github.com/apache/spark/pull/35667#issuecomment-1088408000 > Could you address the previous comments, @lyssg ? In addition, please enable `GitHub Action` in your Apache Spark fork. Apache Spark community is using your GitHub resource quota and it's

[GitHub] [spark] AmplabJenkins commented on pull request #36070: [SPARK-31675][CORE] Fix rename and delete files with different filesystem

2022-04-05 Thread GitBox
AmplabJenkins commented on PR #36070: URL: https://github.com/apache/spark/pull/36070#issuecomment-1088457342 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] AmplabJenkins commented on pull request #36069: [SPARK-38767][SQL] Support `ignoreCorruptFiles` and `ignoreMissingFiles` in Data Source options

2022-04-05 Thread GitBox
AmplabJenkins commented on PR #36069: URL: https://github.com/apache/spark/pull/36069#issuecomment-1088457399 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] AmplabJenkins commented on pull request #36066: Implement the to_number and try_to_number SQL functions according to a new specification

2022-04-05 Thread GitBox
AmplabJenkins commented on PR #36066: URL: https://github.com/apache/spark/pull/36066#issuecomment-1088534252 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] ivoson commented on a diff in pull request #36064: [SPARK-38108][SQL] Use error classes in the compilation errors of UDF/UDAF

2022-04-05 Thread GitBox
ivoson commented on code in PR #36064: URL: https://github.com/apache/spark/pull/36064#discussion_r842671046 ## sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala: ## @@ -2264,17 +2258,20 @@ object QueryCompilationErrors { } def usingUnt

[GitHub] [spark] AmplabJenkins commented on pull request #36064: [SPARK-38108][SQL] Use error classes in the compilation errors of UDF/UDAF

2022-04-05 Thread GitBox
AmplabJenkins commented on PR #36064: URL: https://github.com/apache/spark/pull/36064#issuecomment-1088608091 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] MaxGekk commented on a diff in pull request #36064: [SPARK-38108][SQL] Use error classes in the compilation errors of UDF/UDAF

2022-04-05 Thread GitBox
MaxGekk commented on code in PR #36064: URL: https://github.com/apache/spark/pull/36064#discussion_r842764065 ## sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala: ## @@ -2264,17 +2258,20 @@ object QueryCompilationErrors { } def usingUn

[GitHub] [spark] LucaCanali commented on a diff in pull request #33559: [SPARK-34265][PYTHON][SQL] Instrument Python UDFs using SQL metrics

2022-04-05 Thread GitBox
LucaCanali commented on code in PR #33559: URL: https://github.com/apache/spark/pull/33559#discussion_r842774988 ## sql/core/src/main/scala/org/apache/spark/sql/execution/python/FlatMapGroupsInPandasExec.scala: ## @@ -89,7 +89,8 @@ case class FlatMapGroupsInPandasExec(

[GitHub] [spark] weand commented on pull request #25201: [SPARK-28419][SQL] Enable SparkThriftServer support proxy user's authentication .

2022-04-05 Thread GitBox
weand commented on PR #25201: URL: https://github.com/apache/spark/pull/25201#issuecomment-1088694580 :+1: to get the PR reopened or get the reasons for why not ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] tgravescs commented on pull request #34622: [SPARK-37340][UI] Display StageIds in Operators for SQL UI

2022-04-05 Thread GitBox
tgravescs commented on PR #34622: URL: https://github.com/apache/spark/pull/34622#issuecomment-1088731095 sorry, I missed your response. After looking some more the list of jobs can get very long as well. I'm fine with leaving the list of stages as well. Do you know how long t

[GitHub] [spark] hi-zir opened a new pull request, #36071: [SPARK-37398][PYTHON][ML] Inline type hints for pyspark.ml.classification

2022-04-05 Thread GitBox
hi-zir opened a new pull request, #36071: URL: https://github.com/apache/spark/pull/36071 ### What changes were proposed in this pull request Migration of type hints for pyspark.ml.evaluation from stub file to inline type hints. ### Why are the changes needed? Part of mi

[GitHub] [spark] zero323 closed pull request #35067: [SPARK-37423][PYTHON] Inline type hints for fpm.py in python/pyspark/mllib

2022-04-05 Thread GitBox
zero323 closed pull request #35067: [SPARK-37423][PYTHON] Inline type hints for fpm.py in python/pyspark/mllib URL: https://github.com/apache/spark/pull/35067 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] zero323 commented on pull request #35067: [SPARK-37423][PYTHON] Inline type hints for fpm.py in python/pyspark/mllib

2022-04-05 Thread GitBox
zero323 commented on PR #35067: URL: https://github.com/apache/spark/pull/35067#issuecomment-1088765965 Merged into master and branch-3.3. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

[GitHub] [spark] tgravescs commented on a diff in pull request #34622: [SPARK-37340][UI] Display StageIds in Operators for SQL UI

2022-04-05 Thread GitBox
tgravescs commented on code in PR #34622: URL: https://github.com/apache/spark/pull/34622#discussion_r842872337 ## sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLAppStatusListener.scala: ## @@ -202,6 +227,47 @@ class SQLAppStatusListener( } } + /* Connec

[GitHub] [spark] tgravescs commented on a diff in pull request #34622: [SPARK-37340][UI] Display StageIds in Operators for SQL UI

2022-04-05 Thread GitBox
tgravescs commented on code in PR #34622: URL: https://github.com/apache/spark/pull/34622#discussion_r842875274 ## sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLAppStatusListener.scala: ## @@ -202,6 +227,47 @@ class SQLAppStatusListener( } } + /* Connec

[GitHub] [spark] tgravescs commented on a diff in pull request #34622: [SPARK-37340][UI] Display StageIds in Operators for SQL UI

2022-04-05 Thread GitBox
tgravescs commented on code in PR #34622: URL: https://github.com/apache/spark/pull/34622#discussion_r842880025 ## sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLAppStatusListener.scala: ## @@ -202,6 +227,47 @@ class SQLAppStatusListener( } } + /* Connec

[GitHub] [spark] tgravescs commented on a diff in pull request #34622: [SPARK-37340][UI] Display StageIds in Operators for SQL UI

2022-04-05 Thread GitBox
tgravescs commented on code in PR #34622: URL: https://github.com/apache/spark/pull/34622#discussion_r842897202 ## sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLAppStatusListener.scala: ## @@ -202,6 +227,47 @@ class SQLAppStatusListener( } } + /* Connec

[GitHub] [spark] CEikermann commented on pull request #33630: [SPARK-36408][BUILD] Upgrade json4s to 4.0.3

2022-04-05 Thread GitBox
CEikermann commented on PR #33630: URL: https://github.com/apache/spark/pull/33630#issuecomment-1088902166 @sarutak do you think we can update json4s to 4.x in spark ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use th

[GitHub] [spark] minyyy commented on a diff in pull request #35864: [SPARK-38531][SQL] Fix the condition of "Prune unrequired child index" branch of ColumnPruning

2022-04-05 Thread GitBox
minyyy commented on code in PR #35864: URL: https://github.com/apache/spark/pull/35864#discussion_r842964417 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NestedColumnAliasing.scala: ## @@ -312,6 +312,32 @@ object NestedColumnAliasing { } } +object

[GitHub] [spark] minyyy commented on pull request #35864: [SPARK-38531][SQL] Fix the condition of "Prune unrequired child index" branch of ColumnPruning

2022-04-05 Thread GitBox
minyyy commented on PR #35864: URL: https://github.com/apache/spark/pull/35864#issuecomment-1088907002 @cloud-fan Could you help reviewing this change please? Thanks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] hi-zir closed pull request #36071: [SPARK-37398][PYTHON][ML] Inline type hints for pyspark.ml.classification

2022-04-05 Thread GitBox
hi-zir closed pull request #36071: [SPARK-37398][PYTHON][ML] Inline type hints for pyspark.ml.classification URL: https://github.com/apache/spark/pull/36071 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] xinrong-databricks commented on a diff in pull request #36034: [SPARK-38755][PYTHON][TEST] Add file to address missing pandas general functions

2022-04-05 Thread GitBox
xinrong-databricks commented on code in PR #36034: URL: https://github.com/apache/spark/pull/36034#discussion_r843044086 ## python/pyspark/pandas/__init__.py: ## @@ -151,3 +154,13 @@ def _auto_patch_pandas() -> None: from pyspark.pandas.config import get_option, options, option

[GitHub] [spark] xinrong-databricks commented on a diff in pull request #36028: [SPARK-38726][PYTHON] Support `how` parameter of `MultiIndex.dropna`

2022-04-05 Thread GitBox
xinrong-databricks commented on code in PR #36028: URL: https://github.com/apache/spark/pull/36028#discussion_r843075944 ## python/pyspark/pandas/indexes/base.py: ## @@ -1141,10 +1141,21 @@ def is_type_compatible(self, kind: str) -> bool: """ return kind == sel

[GitHub] [spark] xinrong-databricks commented on a diff in pull request #36028: [SPARK-38726][PYTHON] Support `how` parameter of `MultiIndex.dropna`

2022-04-05 Thread GitBox
xinrong-databricks commented on code in PR #36028: URL: https://github.com/apache/spark/pull/36028#discussion_r843078410 ## python/pyspark/pandas/tests/indexes/test_base.py: ## @@ -374,12 +374,23 @@ def test_drop_duplicates(self): ) def test_dropna(self): -

[GitHub] [spark] xinrong-databricks commented on a diff in pull request #36041: [SPARK-38765][PYTHON] Implement `inplace` parameter of `Series.clip`

2022-04-05 Thread GitBox
xinrong-databricks commented on code in PR #36041: URL: https://github.com/apache/spark/pull/36041#discussion_r843080608 ## python/pyspark/pandas/series.py: ## @@ -2162,12 +2169,26 @@ def clip(self, lower: Union[float, int] = None, upper: Union[float, int] = None) Ex

[GitHub] [spark] bersprockets opened a new pull request, #36072: [SPARK-38666][SQL] Add missing aggregate filter checks

2022-04-05 Thread GitBox
bersprockets opened a new pull request, #36072: URL: https://github.com/apache/spark/pull/36072 ### What changes were proposed in this pull request? Add checks in `ResolveFunctions#validateFunction` to ensure the following about each aggregate filter: - has a datatype of boolea

[GitHub] [spark] xinrong-databricks commented on a diff in pull request #36041: [SPARK-38765][PYTHON] Implement `inplace` parameter of `Series.clip`

2022-04-05 Thread GitBox
xinrong-databricks commented on code in PR #36041: URL: https://github.com/apache/spark/pull/36041#discussion_r843088287 ## python/pyspark/pandas/series.py: ## @@ -2154,6 +2159,8 @@ def clip(self, lower: Union[float, int] = None, upper: Union[float, int] = None) Mi

[GitHub] [spark] minyyy commented on pull request #35866: [SPARK-38530][SQL] Fix a bug that GeneratorNestedColumnAliasing can be incorrectly applied to some expressions

2022-04-05 Thread GitBox
minyyy commented on PR #35866: URL: https://github.com/apache/spark/pull/35866#issuecomment-1089096018 @cloud-fan Could you help review this PR please? Thanks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL ab

[GitHub] [spark] minyyy commented on pull request #35850: [SPARK-38529][SQL] Prevent GeneratorNestedColumnAliasing to be applied to non-Explode generators

2022-04-05 Thread GitBox
minyyy commented on PR #35850: URL: https://github.com/apache/spark/pull/35850#issuecomment-1089098919 @cloud-fan Could you help with reviewing this PR please? Thanks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use th

[GitHub] [spark] gatorsmile commented on pull request #36066: Implement the to_number and try_to_number SQL functions according to a new specification

2022-04-05 Thread GitBox
gatorsmile commented on PR #36066: URL: https://github.com/apache/spark/pull/36066#issuecomment-1089173352 @cloud-fan @gengliangwang -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] anishshri-db opened a new pull request, #36073: [SPARK-38787] [SS] Replace found value with non-null element in the remaining list for key and remove remaining null elements from valu

2022-04-05 Thread GitBox
anishshri-db opened a new pull request, #36073: URL: https://github.com/apache/spark/pull/36073 If we find null at end for found value at non-last index, we are currently not removing and swapping the found value. With this change, we will find the first non-null value from end and swap cur

[GitHub] [spark] anishshri-db commented on pull request #36073: [SPARK-38787] [SS] Replace found value with non-null element in the remaining list for key and remove remaining null elements from value

2022-04-05 Thread GitBox
anishshri-db commented on PR #36073: URL: https://github.com/apache/spark/pull/36073#issuecomment-1089226396 @HeartSaVioR - Could you please review ? Thx -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] MaxGekk opened a new pull request, #36074: [WIP][SPARK-38791][SQL] Output parameter values of error classes in the SQL style

2022-04-05 Thread GitBox
MaxGekk opened a new pull request, #36074: URL: https://github.com/apache/spark/pull/36074 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this pa

[GitHub] [spark] rdblue commented on a diff in pull request #35636: [SPARK-31357][SQL][WIP] Catalog API for view metadata

2022-04-05 Thread GitBox
rdblue commented on code in PR #35636: URL: https://github.com/apache/spark/pull/35636#discussion_r843217576 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala: ## @@ -44,7 +44,7 @@ import org.apache.spark.sql.catalyst.trees.{AlwaysProcess, Cur

[GitHub] [spark] rdblue commented on a diff in pull request #35636: [SPARK-31357][SQL][WIP] Catalog API for view metadata

2022-04-05 Thread GitBox
rdblue commented on code in PR #35636: URL: https://github.com/apache/spark/pull/35636#discussion_r843217875 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/NoSuchItemException.scala: ## @@ -37,8 +37,12 @@ case class NoSuchNamespaceException( override

[GitHub] [spark] rdblue commented on a diff in pull request #35636: [SPARK-31357][SQL][WIP] Catalog API for view metadata

2022-04-05 Thread GitBox
rdblue commented on code in PR #35636: URL: https://github.com/apache/spark/pull/35636#discussion_r843220351 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/ViewCatalog.java: ## @@ -0,0 +1,234 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

[GitHub] [spark] rdblue commented on a diff in pull request #35636: [SPARK-31357][SQL][WIP] Catalog API for view metadata

2022-04-05 Thread GitBox
rdblue commented on code in PR #35636: URL: https://github.com/apache/spark/pull/35636#discussion_r843223349 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/ViewCatalog.java: ## @@ -0,0 +1,234 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

[GitHub] [spark] rdblue commented on a diff in pull request #35636: [SPARK-31357][SQL][WIP] Catalog API for view metadata

2022-04-05 Thread GitBox
rdblue commented on code in PR #35636: URL: https://github.com/apache/spark/pull/35636#discussion_r843223822 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/ViewCatalog.java: ## @@ -0,0 +1,234 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

[GitHub] [spark] jzhuge commented on a diff in pull request #35636: [SPARK-31357][SQL][WIP] Catalog API for view metadata

2022-04-05 Thread GitBox
jzhuge commented on code in PR #35636: URL: https://github.com/apache/spark/pull/35636#discussion_r843227037 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala: ## @@ -44,7 +44,7 @@ import org.apache.spark.sql.catalyst.trees.{AlwaysProcess, Cur

[GitHub] [spark] jzhuge commented on a diff in pull request #35636: [SPARK-31357][SQL][WIP] Catalog API for view metadata

2022-04-05 Thread GitBox
jzhuge commented on code in PR #35636: URL: https://github.com/apache/spark/pull/35636#discussion_r843231600 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/NoSuchItemException.scala: ## @@ -37,8 +37,12 @@ case class NoSuchNamespaceException( override

[GitHub] [spark] rdblue commented on a diff in pull request #35636: [SPARK-31357][SQL][WIP] Catalog API for view metadata

2022-04-05 Thread GitBox
rdblue commented on code in PR #35636: URL: https://github.com/apache/spark/pull/35636#discussion_r843233957 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala: ## @@ -44,7 +44,7 @@ import org.apache.spark.sql.catalyst.trees.{AlwaysProcess, Cur

[GitHub] [spark] jzhuge commented on a diff in pull request #35636: [SPARK-31357][SQL][WIP] Catalog API for view metadata

2022-04-05 Thread GitBox
jzhuge commented on code in PR #35636: URL: https://github.com/apache/spark/pull/35636#discussion_r843244976 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/NoSuchItemException.scala: ## @@ -37,8 +37,12 @@ case class NoSuchNamespaceException( override

[GitHub] [spark] minyyy commented on pull request #35850: [SPARK-38529][SQL] Prevent GeneratorNestedColumnAliasing to be applied to non-Explode generators

2022-04-05 Thread GitBox
minyyy commented on PR #35850: URL: https://github.com/apache/spark/pull/35850#issuecomment-1089337333 Changed back to my previous implementation - Now I still check `.isInstanceOf[ExplodeBase]` in the case match branch instead of modifying `canPruneGenerator` since we do have tests coverin

[GitHub] [spark] kazuyukitanimura opened a new pull request, #36075: [SPARK-38786][SQL][TEST] Bug in StatisticsSuite 'change stats after add/drop partition command'

2022-04-05 Thread GitBox
kazuyukitanimura opened a new pull request, #36075: URL: https://github.com/apache/spark/pull/36075 ### What changes were proposed in this pull request? https://github.com/apache/spark/blob/cbffc12f90e45d33e651e38cf886d7ab4bcf96da/sql/hive/src/test/scala/org/apache/spark/sql/hive/Statisti

[GitHub] [spark] kazuyukitanimura commented on pull request #36075: [SPARK-38786][SQL][TEST] Bug in StatisticsSuite 'change stats after add/drop partition command'

2022-04-05 Thread GitBox
kazuyukitanimura commented on PR #36075: URL: https://github.com/apache/spark/pull/36075#issuecomment-1089368288 cc @MaxGekk @HyukjinKwon @sunchao -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] jzhuge commented on a diff in pull request #35636: [SPARK-31357][SQL][WIP] Catalog API for view metadata

2022-04-05 Thread GitBox
jzhuge commented on code in PR #35636: URL: https://github.com/apache/spark/pull/35636#discussion_r843274284 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/ViewCatalog.java: ## @@ -0,0 +1,234 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

[GitHub] [spark] xinrong-databricks opened a new pull request, #36076: [SPARK-38793][PYTHON] Support `return_indexer` parameter of `Index/MultiIndex.sort_values`

2022-04-05 Thread GitBox
xinrong-databricks opened a new pull request, #36076: URL: https://github.com/apache/spark/pull/36076 ### What changes were proposed in this pull request? Support `return_indexer` parameter of `Index/MultiIndex.sort_values`. Note that this method returns indexer as a pandas-on-Spark

[GitHub] [spark] alex-balikov commented on a diff in pull request #36073: [SPARK-38787] [SS] Replace found value with non-null element in the remaining list for key and remove remaining null elements

2022-04-05 Thread GitBox
alex-balikov commented on code in PR #36073: URL: https://github.com/apache/spark/pull/36073#discussion_r843296470 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/SymmetricHashJoinStateManager.scala: ## @@ -96,6 +96,15 @@ class SymmetricHashJoinStateMan

[GitHub] [spark] anishshri-db commented on a diff in pull request #36073: [SPARK-38787] [SS] Replace found value with non-null element in the remaining list for key and remove remaining null elements

2022-04-05 Thread GitBox
anishshri-db commented on code in PR #36073: URL: https://github.com/apache/spark/pull/36073#discussion_r843316289 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/SymmetricHashJoinStateManager.scala: ## @@ -96,6 +96,15 @@ class SymmetricHashJoinStateMan

[GitHub] [spark] anishshri-db commented on a diff in pull request #36073: [SPARK-38787] [SS] Replace found value with non-null element in the remaining list for key and remove remaining null elements

2022-04-05 Thread GitBox
anishshri-db commented on code in PR #36073: URL: https://github.com/apache/spark/pull/36073#discussion_r843317644 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/SymmetricHashJoinStateManager.scala: ## @@ -272,12 +289,36 @@ class SymmetricHashJoinState

[GitHub] [spark] anishshri-db commented on pull request #36073: [SPARK-38787] [SS] Replace found value with non-null element in the remaining list for key and remove remaining null elements from value

2022-04-05 Thread GitBox
anishshri-db commented on PR #36073: URL: https://github.com/apache/spark/pull/36073#issuecomment-1089459095 > Probably clarify in the PR comment that this relates to stream-stream joins. Done - Updated the PR title as well as description noting that the change relates to stream-stre

[GitHub] [spark] alex-balikov commented on a diff in pull request #36073: [SPARK-38787] [SS] Replace found value with non-null element in the remaining list for key and remove remaining null elements

2022-04-05 Thread GitBox
alex-balikov commented on code in PR #36073: URL: https://github.com/apache/spark/pull/36073#discussion_r843317972 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/SymmetricHashJoinStateManager.scala: ## @@ -272,12 +289,36 @@ class SymmetricHashJoinState

[GitHub] [spark] anishshri-db commented on a diff in pull request #36073: [SPARK-38787] [SS] Replace found value with non-null element in the remaining list for key and remove remaining null elements

2022-04-05 Thread GitBox
anishshri-db commented on code in PR #36073: URL: https://github.com/apache/spark/pull/36073#discussion_r843327523 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/SymmetricHashJoinStateManager.scala: ## @@ -272,12 +289,36 @@ class SymmetricHashJoinState

[GitHub] [spark] dtenedor opened a new pull request, #36077: [SPARK-38795][SQL] Support INSERT INTO user specified column lists with DEFAULT values

2022-04-05 Thread GitBox
dtenedor opened a new pull request, #36077: URL: https://github.com/apache/spark/pull/36077 ### What changes were proposed in this pull request? Support INSERT INTO commands with user specified column lists with DEFAULT values. For example: ``` CREATE TABLE t (x INT D

[GitHub] [spark] github-actions[bot] commented on pull request #34872: [SPARK-37617][SQL][HIVE] In CTAS, Replace Parquet name columns that have not alias

2022-04-05 Thread GitBox
github-actions[bot] commented on PR #34872: URL: https://github.com/apache/spark/pull/34872#issuecomment-1089562053 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[GitHub] [spark] beliefer commented on pull request #36039: [SPARK-38761][SQL] DS V2 supports push down misc non-aggregate functions

2022-04-05 Thread GitBox
beliefer commented on PR #36039: URL: https://github.com/apache/spark/pull/36039#issuecomment-1089567136 > I have a general question: what are the criteria of the functions that can be pushed down to data source? According to discussion offline, we plan to supports the ANSI functions

[GitHub] [spark] beliefer commented on a diff in pull request #36043: [SPARK-38768][SQL] Remove `Limit` from plan if complete push down limit to data source.

2022-04-05 Thread GitBox
beliefer commented on code in PR #36043: URL: https://github.com/apache/spark/pull/36043#discussion_r843371002 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2ScanRelationPushDown.scala: ## @@ -380,27 +380,32 @@ object V2ScanRelationPushDown extends Ru

[GitHub] [spark] sunchao closed pull request #36075: [SPARK-38786][SQL][TEST] Bug in StatisticsSuite 'change stats after add/drop partition command'

2022-04-05 Thread GitBox
sunchao closed pull request #36075: [SPARK-38786][SQL][TEST] Bug in StatisticsSuite 'change stats after add/drop partition command' URL: https://github.com/apache/spark/pull/36075 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub a

[GitHub] [spark] sunchao commented on pull request #36075: [SPARK-38786][SQL][TEST] Bug in StatisticsSuite 'change stats after add/drop partition command'

2022-04-05 Thread GitBox
sunchao commented on PR #36075: URL: https://github.com/apache/spark/pull/36075#issuecomment-1089591869 Committed to master, thanks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

[GitHub] [spark] beliefer commented on a diff in pull request #35041: [SPARK-37691][SQL] Support ANSI Aggregation Function: `percentile_disc`

2022-04-05 Thread GitBox
beliefer commented on code in PR #35041: URL: https://github.com/apache/spark/pull/35041#discussion_r843374143 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala: ## @@ -226,10 +226,11 @@ trait CheckAnalysis extends PredicateHelper with Lo

[GitHub] [spark] beliefer commented on a diff in pull request #35041: [SPARK-37691][SQL] Support ANSI Aggregation Function: `percentile_disc`

2022-04-05 Thread GitBox
beliefer commented on code in PR #35041: URL: https://github.com/apache/spark/pull/35041#discussion_r843375329 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/percentiles.scala: ## @@ -325,3 +339,154 @@ case class Percentile( frequencyExpr

[GitHub] [spark] HyukjinKwon commented on pull request #36066: Implement the to_number and try_to_number SQL functions according to a new specification

2022-04-05 Thread GitBox
HyukjinKwon commented on PR #36066: URL: https://github.com/apache/spark/pull/36066#issuecomment-1089621925 @dtenedor shall we file a JIRA and link it to the PR title BTW? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and us

[GitHub] [spark] HeartSaVioR commented on a diff in pull request #36073: [SPARK-38787] [SS] Replace found value with non-null element in the remaining list for key and remove remaining null elements f

2022-04-05 Thread GitBox
HeartSaVioR commented on code in PR #36073: URL: https://github.com/apache/spark/pull/36073#discussion_r843385001 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/SymmetricHashJoinStateManager.scala: ## @@ -272,12 +289,36 @@ class SymmetricHashJoinStateM

[GitHub] [spark] HeartSaVioR commented on a diff in pull request #36073: [SPARK-38787] [SS] Replace found value with non-null element in the remaining list for key and remove remaining null elements f

2022-04-05 Thread GitBox
HeartSaVioR commented on code in PR #36073: URL: https://github.com/apache/spark/pull/36073#discussion_r843386109 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/SymmetricHashJoinStateManager.scala: ## @@ -272,12 +289,36 @@ class SymmetricHashJoinStateM

[GitHub] [spark] maryannxue commented on a diff in pull request #36011: [SPARK-38697][SQL] Extend SparkSessionExtensions to inject rules into AQE Optimizer

2022-04-05 Thread GitBox
maryannxue commented on code in PR #36011: URL: https://github.com/apache/spark/pull/36011#discussion_r843395848 ## sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveRulesHolder.scala: ## @@ -0,0 +1,30 @@ +/* + * Licensed to the Apache Software Foundation (

[GitHub] [spark] dtenedor commented on pull request #36066: [SPARK-38796][SQL] Implement the to_number and try_to_number SQL functions according to a new specification

2022-04-05 Thread GitBox
dtenedor commented on PR #36066: URL: https://github.com/apache/spark/pull/36066#issuecomment-1089671082 > @dtenedor shall we file a JIRA and link it to the PR title BTW? Sure! This is done. -- This is an automated message from the Apache Git Service. To respond to the message, plea

[GitHub] [spark] alex-balikov commented on a diff in pull request #36073: [SPARK-38787] [SS] Replace found value with non-null element in the remaining list for key and remove remaining null elements

2022-04-05 Thread GitBox
alex-balikov commented on code in PR #36073: URL: https://github.com/apache/spark/pull/36073#discussion_r843405928 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/SymmetricHashJoinStateManager.scala: ## @@ -272,12 +289,36 @@ class SymmetricHashJoinState

[GitHub] [spark] HeartSaVioR commented on a diff in pull request #36073: [SPARK-38787] [SS] Replace found value with non-null element in the remaining list for key and remove remaining null elements f

2022-04-05 Thread GitBox
HeartSaVioR commented on code in PR #36073: URL: https://github.com/apache/spark/pull/36073#discussion_r843409388 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/SymmetricHashJoinStateManager.scala: ## @@ -272,12 +289,36 @@ class SymmetricHashJoinStateM

[GitHub] [spark] HeartSaVioR commented on a diff in pull request #36073: [SPARK-38787] [SS] Replace found value with non-null element in the remaining list for key and remove remaining null elements f

2022-04-05 Thread GitBox
HeartSaVioR commented on code in PR #36073: URL: https://github.com/apache/spark/pull/36073#discussion_r843409388 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/SymmetricHashJoinStateManager.scala: ## @@ -272,12 +289,36 @@ class SymmetricHashJoinStateM

[GitHub] [spark] HeartSaVioR commented on a diff in pull request #36073: [SPARK-38787] [SS] Replace found value with non-null element in the remaining list for key and remove remaining null elements f

2022-04-05 Thread GitBox
HeartSaVioR commented on code in PR #36073: URL: https://github.com/apache/spark/pull/36073#discussion_r843409388 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/SymmetricHashJoinStateManager.scala: ## @@ -272,12 +289,36 @@ class SymmetricHashJoinStateM

[GitHub] [spark] HeartSaVioR commented on a diff in pull request #36073: [SPARK-38787] [SS] Replace found value with non-null element in the remaining list for key and remove remaining null elements f

2022-04-05 Thread GitBox
HeartSaVioR commented on code in PR #36073: URL: https://github.com/apache/spark/pull/36073#discussion_r843409388 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/SymmetricHashJoinStateManager.scala: ## @@ -272,12 +289,36 @@ class SymmetricHashJoinStateM

[GitHub] [spark] LuciferYang opened a new pull request, #36078: [DON'T MERGE] Migrate Junit4 to Junit5

2022-04-05 Thread GitBox
LuciferYang opened a new pull request, #36078: URL: https://github.com/apache/spark/pull/36078 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

[GitHub] [spark] alex-balikov commented on a diff in pull request #36073: [SPARK-38787] [SS] Replace found value with non-null element in the remaining list for key and remove remaining null elements

2022-04-05 Thread GitBox
alex-balikov commented on code in PR #36073: URL: https://github.com/apache/spark/pull/36073#discussion_r843410794 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/SymmetricHashJoinStateManager.scala: ## @@ -272,12 +289,36 @@ class SymmetricHashJoinState

[GitHub] [spark] alex-balikov commented on a diff in pull request #36073: [SPARK-38787] [SS] Replace found value with non-null element in the remaining list for key and remove remaining null elements

2022-04-05 Thread GitBox
alex-balikov commented on code in PR #36073: URL: https://github.com/apache/spark/pull/36073#discussion_r843410794 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/SymmetricHashJoinStateManager.scala: ## @@ -272,12 +289,36 @@ class SymmetricHashJoinState

[GitHub] [spark] HeartSaVioR commented on a diff in pull request #36073: [SPARK-38787] [SS] Replace found value with non-null element in the remaining list for key and remove remaining null elements f

2022-04-05 Thread GitBox
HeartSaVioR commented on code in PR #36073: URL: https://github.com/apache/spark/pull/36073#discussion_r843414788 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/SymmetricHashJoinStateManager.scala: ## @@ -272,12 +289,36 @@ class SymmetricHashJoinStateM

[GitHub] [spark] AmplabJenkins commented on pull request #36073: [SPARK-38787] [SS] Replace found value with non-null element in the remaining list for key and remove remaining null elements from valu

2022-04-05 Thread GitBox
AmplabJenkins commented on PR #36073: URL: https://github.com/apache/spark/pull/36073#issuecomment-1089709388 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] HeartSaVioR commented on a diff in pull request #36073: [SPARK-38787] [SS] Replace found value with non-null element in the remaining list for key and remove remaining null elements f

2022-04-05 Thread GitBox
HeartSaVioR commented on code in PR #36073: URL: https://github.com/apache/spark/pull/36073#discussion_r843414788 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/SymmetricHashJoinStateManager.scala: ## @@ -272,12 +289,36 @@ class SymmetricHashJoinStateM

[GitHub] [spark] HeartSaVioR commented on a diff in pull request #36073: [SPARK-38787] [SS] Replace found value with non-null element in the remaining list for key and remove remaining null elements f

2022-04-05 Thread GitBox
HeartSaVioR commented on code in PR #36073: URL: https://github.com/apache/spark/pull/36073#discussion_r843414788 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/SymmetricHashJoinStateManager.scala: ## @@ -272,12 +289,36 @@ class SymmetricHashJoinStateM

[GitHub] [spark] HeartSaVioR commented on a diff in pull request #36073: [SPARK-38787] [SS] Replace found value with non-null element in the remaining list for key and remove remaining null elements f

2022-04-05 Thread GitBox
HeartSaVioR commented on code in PR #36073: URL: https://github.com/apache/spark/pull/36073#discussion_r843414788 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/SymmetricHashJoinStateManager.scala: ## @@ -272,12 +289,36 @@ class SymmetricHashJoinStateM

[GitHub] [spark] LuciferYang opened a new pull request, #36079: [SPARK-38798][CORE] Make `spark.file.transferTo` as an `ConfigEntry`

2022-04-05 Thread GitBox
LuciferYang opened a new pull request, #36079: URL: https://github.com/apache/spark/pull/36079 ### What changes were proposed in this pull request? This pr make `spark.file.transferTo` as an `ConfigEntry` and move it into `org.apache.spark.internal.config`. ### Why are the chan

[GitHub] [spark] HeartSaVioR commented on a diff in pull request #36073: [SPARK-38787] [SS] Replace found value with non-null element in the remaining list for key and remove remaining null elements f

2022-04-05 Thread GitBox
HeartSaVioR commented on code in PR #36073: URL: https://github.com/apache/spark/pull/36073#discussion_r843414788 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/SymmetricHashJoinStateManager.scala: ## @@ -272,12 +289,36 @@ class SymmetricHashJoinStateM

[GitHub] [spark] HeartSaVioR commented on a diff in pull request #36073: [SPARK-38787] [SS] Replace found value with non-null element in the remaining list for key and remove remaining null elements f

2022-04-05 Thread GitBox
HeartSaVioR commented on code in PR #36073: URL: https://github.com/apache/spark/pull/36073#discussion_r843414788 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/SymmetricHashJoinStateManager.scala: ## @@ -272,12 +289,36 @@ class SymmetricHashJoinStateM

[GitHub] [spark] HeartSaVioR commented on a diff in pull request #36073: [SPARK-38787] [SS] Replace found value with non-null element in the remaining list for key and remove remaining null elements f

2022-04-05 Thread GitBox
HeartSaVioR commented on code in PR #36073: URL: https://github.com/apache/spark/pull/36073#discussion_r843414788 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/SymmetricHashJoinStateManager.scala: ## @@ -272,12 +289,36 @@ class SymmetricHashJoinStateM

[GitHub] [spark] HeartSaVioR commented on a diff in pull request #36073: [SPARK-38787] [SS] Replace found value with non-null element in the remaining list for key and remove remaining null elements f

2022-04-05 Thread GitBox
HeartSaVioR commented on code in PR #36073: URL: https://github.com/apache/spark/pull/36073#discussion_r843414788 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/SymmetricHashJoinStateManager.scala: ## @@ -272,12 +289,36 @@ class SymmetricHashJoinStateM

[GitHub] [spark] HeartSaVioR commented on a diff in pull request #36073: [SPARK-38787] [SS] Replace found value with non-null element in the remaining list for key and remove remaining null elements f

2022-04-05 Thread GitBox
HeartSaVioR commented on code in PR #36073: URL: https://github.com/apache/spark/pull/36073#discussion_r843414788 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/SymmetricHashJoinStateManager.scala: ## @@ -272,12 +289,36 @@ class SymmetricHashJoinStateM

[GitHub] [spark] wangyum opened a new pull request, #36080: [SPARK-38797][SQL] Runtime Filter supports pruning side has window

2022-04-05 Thread GitBox
wangyum opened a new pull request, #36080: URL: https://github.com/apache/spark/pull/36080 ### What changes were proposed in this pull request? This PR makes row-level runtime filtering support pruning side has window. For example: ```sql SELECT *

[GitHub] [spark] wangyum commented on pull request #36080: [SPARK-38797][SQL] Runtime Filter supports pruning side has window

2022-04-05 Thread GitBox
wangyum commented on PR #36080: URL: https://github.com/apache/spark/pull/36080#issuecomment-1089750209 cc @sigmod -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsub

[GitHub] [spark] wangyum commented on a diff in pull request #36080: [SPARK-38797][SQL] Runtime Filter supports pruning side has window

2022-04-05 Thread GitBox
wangyum commented on code in PR #36080: URL: https://github.com/apache/spark/pull/36080#discussion_r843428746 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/InjectRuntimeFilter.scala: ## @@ -132,28 +131,21 @@ object InjectRuntimeFilter extends Rule[Logica

[GitHub] [spark] ulysses-you commented on a diff in pull request #36011: [SPARK-38697][SQL] Extend SparkSessionExtensions to inject rules into AQE Optimizer

2022-04-05 Thread GitBox
ulysses-you commented on code in PR #36011: URL: https://github.com/apache/spark/pull/36011#discussion_r843429311 ## sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/AdaptiveRulesHolder.scala: ## @@ -0,0 +1,30 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] [spark] yaooqinn opened a new pull request, #36081: [SPARK-38799][INFRA] Replace BSD 3-clause with ASF License v2 for scala binaries

2022-04-05 Thread GitBox
yaooqinn opened a new pull request, #36081: URL: https://github.com/apache/spark/pull/36081 ### What changes were proposed in this pull request? Replace BSD 3-clause with ASF License v2 for scala binaries ### Why are the changes needed? Scala distr

[GitHub] [spark] yaooqinn commented on pull request #36081: [SPARK-38799][INFRA] Replace BSD 3-clause with ASF License v2 for scala binaries

2022-04-05 Thread GitBox
yaooqinn commented on PR #36081: URL: https://github.com/apache/spark/pull/36081#issuecomment-1089788184 cc @HyukjinKwon @dongjoon-hyun @srowen thanks -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

[GitHub] [spark] AmplabJenkins commented on pull request #36071: [SPARK-37398][PYTHON][ML] Inline type hints for pyspark.ml.classification

2022-04-05 Thread GitBox
AmplabJenkins commented on PR #36071: URL: https://github.com/apache/spark/pull/36071#issuecomment-1089790818 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] dcoliversun commented on a diff in pull request #35886: [SPARK-38582][K8S] Introduce buildEnvVars and buildEnvVarsWithFieldRef for KubernetesUtils to eliminate duplicate code pattern

2022-04-05 Thread GitBox
dcoliversun commented on code in PR #35886: URL: https://github.com/apache/spark/pull/35886#discussion_r843447694 ## resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicDriverFeatureStep.scala: ## @@ -74,16 +74,8 @@ private[spark] class Bas

[GitHub] [spark] dcoliversun commented on a diff in pull request #35886: [SPARK-38582][K8S] Introduce buildEnvVars and buildEnvVarsWithFieldRef for KubernetesUtils to eliminate duplicate code pattern

2022-04-05 Thread GitBox
dcoliversun commented on code in PR #35886: URL: https://github.com/apache/spark/pull/35886#discussion_r843447694 ## resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicDriverFeatureStep.scala: ## @@ -74,16 +74,8 @@ private[spark] class Bas

[GitHub] [spark] dcoliversun commented on a diff in pull request #35886: [SPARK-38582][K8S] Introduce buildEnvVars and buildEnvVarsWithFieldRef for KubernetesUtils to eliminate duplicate code pattern

2022-04-05 Thread GitBox
dcoliversun commented on code in PR #35886: URL: https://github.com/apache/spark/pull/35886#discussion_r843449794 ## resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesUtils.scala: ## @@ -381,4 +381,42 @@ object KubernetesUtils extends Logging

[GitHub] [spark] dcoliversun commented on a diff in pull request #35886: [SPARK-38582][K8S] Introduce buildEnvVars and buildEnvVarsWithFieldRef for KubernetesUtils to eliminate duplicate code pattern

2022-04-05 Thread GitBox
dcoliversun commented on code in PR #35886: URL: https://github.com/apache/spark/pull/35886#discussion_r843449794 ## resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesUtils.scala: ## @@ -381,4 +381,42 @@ object KubernetesUtils extends Logging

[GitHub] [spark] dcoliversun commented on pull request #35886: [SPARK-38582][K8S] Introduce buildEnvVars and buildEnvVarsWithFieldRef for KubernetesUtils to eliminate duplicate code pattern

2022-04-05 Thread GitBox
dcoliversun commented on PR #35886: URL: https://github.com/apache/spark/pull/35886#issuecomment-1089806916 > +1 for the intention, but the proposed PR looks a little intrusive and inconsistent to me. > > This PR may introduce lots of regression if we are not careful enough to avoid

  1   2   >