[GitHub] [spark] itholic commented on a diff in pull request #42388: [SPARK-43618][SPARK-43658][CONNECT][PS][TESTS] Enabling more tests

2023-08-07 Thread via GitHub
itholic commented on code in PR #42388: URL: https://github.com/apache/spark/pull/42388#discussion_r1286623971 ## python/pyspark/pandas/tests/computation/test_compute.py: ## @@ -101,16 +101,9 @@ def test_mode(self): with self.assertRaises(ValueError):

[GitHub] [spark] LuciferYang commented on a diff in pull request #42378: [SPARK-44703][CORE] Log eventLog rewrite duration when compact old event log files

2023-08-07 Thread via GitHub
LuciferYang commented on code in PR #42378: URL: https://github.com/apache/spark/pull/42378#discussion_r1286623320 ## core/src/main/scala/org/apache/spark/deploy/history/EventLogFileCompactor.scala: ## @@ -158,6 +159,8 @@ class EventLogFileCompactor( ) }

[GitHub] [spark] itholic commented on a diff in pull request #42369: [SPARK-44695][PYTHON] Improve error message for `DataFrame.toDF`

2023-08-07 Thread via GitHub
itholic commented on code in PR #42369: URL: https://github.com/apache/spark/pull/42369#discussion_r1286614995 ## python/pyspark/sql/connect/dataframe.py: ## @@ -1732,6 +1732,23 @@ def to(self, schema: StructType) -> "DataFrame": to.__doc__ = PySparkDataFrame.to.__doc__

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #42369: [SPARK-44695][PYTHON] Improve error message for `DataFrame.toDF`

2023-08-07 Thread via GitHub
HyukjinKwon commented on code in PR #42369: URL: https://github.com/apache/spark/pull/42369#discussion_r1286607344 ## python/pyspark/sql/connect/dataframe.py: ## @@ -1732,6 +1732,23 @@ def to(self, schema: StructType) -> "DataFrame": to.__doc__ =

[GitHub] [spark] zhengruifeng commented on a diff in pull request #42388: [SPARK-43618][SPARK-43658][CONNECT][PS][TESTS] Enabling more tests

2023-08-07 Thread via GitHub
zhengruifeng commented on code in PR #42388: URL: https://github.com/apache/spark/pull/42388#discussion_r1286598403 ## python/pyspark/pandas/tests/computation/test_compute.py: ## @@ -101,16 +101,9 @@ def test_mode(self): with self.assertRaises(ValueError):

[GitHub] [spark] zhengruifeng commented on a diff in pull request #42388: [SPARK-43618][SPARK-43658][CONNECT][PS][TESTS] Enabling more tests

2023-08-07 Thread via GitHub
zhengruifeng commented on code in PR #42388: URL: https://github.com/apache/spark/pull/42388#discussion_r1286597113 ## python/pyspark/pandas/tests/computation/test_compute.py: ## @@ -101,16 +101,9 @@ def test_mode(self): with self.assertRaises(ValueError):

[GitHub] [spark] zhengruifeng commented on a diff in pull request #42388: [SPARK-43618][SPARK-43658][CONNECT][PS][TESTS] Enabling more tests

2023-08-07 Thread via GitHub
zhengruifeng commented on code in PR #42388: URL: https://github.com/apache/spark/pull/42388#discussion_r1286597113 ## python/pyspark/pandas/tests/computation/test_compute.py: ## @@ -101,16 +101,9 @@ def test_mode(self): with self.assertRaises(ValueError):

[GitHub] [spark] zhengruifeng commented on a diff in pull request #42388: [SPARK-43618][SPARK-43658][CONNECT][PS][TESTS] Enabling more tests

2023-08-07 Thread via GitHub
zhengruifeng commented on code in PR #42388: URL: https://github.com/apache/spark/pull/42388#discussion_r1286597113 ## python/pyspark/pandas/tests/computation/test_compute.py: ## @@ -101,16 +101,9 @@ def test_mode(self): with self.assertRaises(ValueError):

[GitHub] [spark] itholic commented on pull request #42388: [SPARK-43618][SPARK-43658][CONNECT][PS][TESTS] Enabling more tests

2023-08-07 Thread via GitHub
itholic commented on PR #42388: URL: https://github.com/apache/spark/pull/42388#issuecomment-1668907825 cc @zhengruifeng -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] itholic commented on a diff in pull request #42369: [SPARK-44695][PYTHON] Improve error message for `DataFrame.toDF`

2023-08-07 Thread via GitHub
itholic commented on code in PR #42369: URL: https://github.com/apache/spark/pull/42369#discussion_r1286592428 ## python/pyspark/sql/connect/dataframe.py: ## @@ -1732,6 +1732,23 @@ def to(self, schema: StructType) -> "DataFrame": to.__doc__ = PySparkDataFrame.to.__doc__

[GitHub] [spark] itholic commented on a diff in pull request #42369: [SPARK-44695][PYTHON] Improve error message for `DataFrame.toDF`

2023-08-07 Thread via GitHub
itholic commented on code in PR #42369: URL: https://github.com/apache/spark/pull/42369#discussion_r1286592428 ## python/pyspark/sql/connect/dataframe.py: ## @@ -1732,6 +1732,23 @@ def to(self, schema: StructType) -> "DataFrame": to.__doc__ = PySparkDataFrame.to.__doc__

[GitHub] [spark] itholic commented on a diff in pull request #42369: [SPARK-44695][PYTHON] Improve error message for `DataFrame.toDF`

2023-08-07 Thread via GitHub
itholic commented on code in PR #42369: URL: https://github.com/apache/spark/pull/42369#discussion_r1286592428 ## python/pyspark/sql/connect/dataframe.py: ## @@ -1732,6 +1732,23 @@ def to(self, schema: StructType) -> "DataFrame": to.__doc__ = PySparkDataFrame.to.__doc__

[GitHub] [spark] itholic commented on a diff in pull request #42369: [SPARK-44695][PYTHON] Improve error message for `DataFrame.toDF`

2023-08-07 Thread via GitHub
itholic commented on code in PR #42369: URL: https://github.com/apache/spark/pull/42369#discussion_r1286592428 ## python/pyspark/sql/connect/dataframe.py: ## @@ -1732,6 +1732,23 @@ def to(self, schema: StructType) -> "DataFrame": to.__doc__ = PySparkDataFrame.to.__doc__

[GitHub] [spark] lvyanquan commented on a diff in pull request #42380: [SPARK-44696][SQL] Support different timestamp precise for `from_json` function

2023-08-07 Thread via GitHub
lvyanquan commented on code in PR #42380: URL: https://github.com/apache/spark/pull/42380#discussion_r1286588941 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala: ## @@ -286,7 +286,14 @@ class JacksonParser( } case

[GitHub] [spark] itholic opened a new pull request, #42388: [SPARK-43618][SPARK-43658][CONNECT][PS][TESTS] Enabling more tests

2023-08-07 Thread via GitHub
itholic opened a new pull request, #42388: URL: https://github.com/apache/spark/pull/42388 ### What changes were proposed in this pull request? This PR proposes to enable tests for pandas API on Spark with Spark Connect ### Why are the changes needed? To increate

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #42380: [SPARK-44696][SQL] Support different timestamp precise for `from_json` function

2023-08-07 Thread via GitHub
HyukjinKwon commented on code in PR #42380: URL: https://github.com/apache/spark/pull/42380#discussion_r1286568544 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala: ## @@ -286,7 +286,14 @@ class JacksonParser( } case

[GitHub] [spark] lvyanquan commented on pull request #42380: [SPARK-44696][SQL] Support different timestamp precise for `from_json` function

2023-08-07 Thread via GitHub
lvyanquan commented on PR #42380: URL: https://github.com/apache/spark/pull/42380#issuecomment-1668865559 Sure. testCase was added in comment of `How was this patch tested?`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #42369: [SPARK-44695][PYTHON] Improve error message for `DataFrame.toDF`

2023-08-07 Thread via GitHub
HyukjinKwon commented on code in PR #42369: URL: https://github.com/apache/spark/pull/42369#discussion_r1286567129 ## python/pyspark/sql/connect/dataframe.py: ## @@ -1732,6 +1732,23 @@ def to(self, schema: StructType) -> "DataFrame": to.__doc__ =

[GitHub] [spark] yaooqinn commented on pull request #42295: [SPARK-44581][YARN] Fix the bug that ShutdownHookManager get wrong hadoop user group information

2023-08-07 Thread via GitHub
yaooqinn commented on PR #42295: URL: https://github.com/apache/spark/pull/42295#issuecomment-1668862618 Hi @liangyu-1, please also take care of the CI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] yaooqinn commented on pull request #42295: [SPARK-44581][YARN] Fix the bug that ShutdownHookManager get wrong hadoop user group information

2023-08-07 Thread via GitHub
yaooqinn commented on PR #42295: URL: https://github.com/apache/spark/pull/42295#issuecomment-1668845632 LGTM, please update the PR description according the code updates so far -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] HeartSaVioR closed pull request #42378: [SPARK-44703][CORE] Log eventLog rewrite duration when compact old event log files

2023-08-07 Thread via GitHub
HeartSaVioR closed pull request #42378: [SPARK-44703][CORE] Log eventLog rewrite duration when compact old event log files URL: https://github.com/apache/spark/pull/42378 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] liangyu-1 commented on pull request #42295: [SPARK-44581][YARN] Fix the bug that ShutdownHookManager get wrong hadoop user group information

2023-08-07 Thread via GitHub
liangyu-1 commented on PR #42295: URL: https://github.com/apache/spark/pull/42295#issuecomment-1668845332 I moved the ApplicationMaster instantiating and assignment inside the doAs block, and I rebuild the project and test it on my cluster, the shutdown hook thread now has the correct

[GitHub] [spark] hvanhovell opened a new pull request, #42387: [SPARK-44715][CONNECT] Bring back callUdf and udf function.

2023-08-07 Thread via GitHub
hvanhovell opened a new pull request, #42387: URL: https://github.com/apache/spark/pull/42387 ### What changes were proposed in this pull request? This PR adds the `udf` (with a return type), and `callUDF` functions to `functions.scala` for the Spark Connect Scala Client. ### Why

[GitHub] [spark] HeartSaVioR commented on pull request #42378: [SPARK-44703][CORE] Log eventLog rewrite duration when compact old event log files

2023-08-07 Thread via GitHub
HeartSaVioR commented on PR #42378: URL: https://github.com/apache/spark/pull/42378#issuecomment-1668844958 Thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] zhengruifeng commented on pull request #42353: [SPARK-44005][PYTHON] Improve error messages for regular Python UDTFs that return non-tuple values

2023-08-07 Thread via GitHub
zhengruifeng commented on PR #42353: URL: https://github.com/apache/spark/pull/42353#issuecomment-1668841085 merged to master and branch-3.5 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] zhengruifeng closed pull request #42353: [SPARK-44005][PYTHON] Improve error messages for regular Python UDTFs that return non-tuple values

2023-08-07 Thread via GitHub
zhengruifeng closed pull request #42353: [SPARK-44005][PYTHON] Improve error messages for regular Python UDTFs that return non-tuple values URL: https://github.com/apache/spark/pull/42353 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] zhengruifeng commented on pull request #42385: [SPARK-44705] Make PythonRunner single-threaded

2023-08-07 Thread via GitHub
zhengruifeng commented on PR #42385: URL: https://github.com/apache/spark/pull/42385#issuecomment-1668839667 cc @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] LuciferYang commented on pull request #42360: [SPARK-44689][CONNECT] Make the exception handling of function `SparkConnectPlanner#unpackScalarScalaUDF` more universal

2023-08-07 Thread via GitHub
LuciferYang commented on PR #42360: URL: https://github.com/apache/spark/pull/42360#issuecomment-1668838807 Merged into master and branch-3.5 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] LuciferYang closed pull request #42360: [SPARK-44689][CONNECT] Make the exception handling of function `SparkConnectPlanner#unpackScalarScalaUDF` more universal

2023-08-07 Thread via GitHub
LuciferYang closed pull request #42360: [SPARK-44689][CONNECT] Make the exception handling of function `SparkConnectPlanner#unpackScalarScalaUDF` more universal URL: https://github.com/apache/spark/pull/42360 -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [spark] LuciferYang commented on pull request #42167: [SPARK-44554][INFRA] Make Python linter related checks pass of branch-3.3/3.4 daily testing

2023-08-07 Thread via GitHub
LuciferYang commented on PR #42167: URL: https://github.com/apache/spark/pull/42167#issuecomment-1668837855 Merged into master. Thanks @HyukjinKwon @zhengruifeng @wangyum -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] LuciferYang closed pull request #42167: [SPARK-44554][INFRA] Make Python linter related checks pass of branch-3.3/3.4 daily testing

2023-08-07 Thread via GitHub
LuciferYang closed pull request #42167: [SPARK-44554][INFRA] Make Python linter related checks pass of branch-3.3/3.4 daily testing URL: https://github.com/apache/spark/pull/42167 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] LuciferYang commented on pull request #42370: [SPARK-44697][CORE] Clean up the deprecated usage of `o.a.commons.lang3.RandomUtils`

2023-08-07 Thread via GitHub
LuciferYang commented on PR #42370: URL: https://github.com/apache/spark/pull/42370#issuecomment-1668836650 Thanks @dongjoon-hyun ~ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] pan3793 commented on a diff in pull request #42336: [SPARK-44669][SQL][HIVE] Parquet/ORC files written using Hive Serde should has file extension

2023-08-07 Thread via GitHub
pan3793 commented on code in PR #42336: URL: https://github.com/apache/spark/pull/42336#discussion_r1286544346 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveFileFormat.scala: ## @@ -122,6 +130,23 @@ class HiveFileFormat(fileSinkConf: FileSinkDesc)

[GitHub] [spark] bogao007 commented on pull request #42386: [SPARK-44713][CONNECT][SQL] Move shared classes to sql/api

2023-08-07 Thread via GitHub
bogao007 commented on PR #42386: URL: https://github.com/apache/spark/pull/42386#issuecomment-1668820419 > @bogao007 PTAL since this touches a couple of streaming classes. Nice, thanks for the change! -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [spark] sunchao commented on pull request #42324: [SPARK-44641][SQL] Incorrect result in certain scenarios when SPJ is not triggered

2023-08-07 Thread via GitHub
sunchao commented on PR #42324: URL: https://github.com/apache/spark/pull/42324#issuecomment-1668811184 Thanks! merged to master/branch-3.4/branch-3.5 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] sunchao closed pull request #42324: [SPARK-44641][SQL] Incorrect result in certain scenarios when SPJ is not triggered

2023-08-07 Thread via GitHub
sunchao closed pull request #42324: [SPARK-44641][SQL] Incorrect result in certain scenarios when SPJ is not triggered URL: https://github.com/apache/spark/pull/42324 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] hvanhovell closed pull request #42367: [SPARK-43429][CONNECT] Add Default & Active SparkSession for Scala Client

2023-08-07 Thread via GitHub
hvanhovell closed pull request #42367: [SPARK-43429][CONNECT] Add Default & Active SparkSession for Scala Client URL: https://github.com/apache/spark/pull/42367 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] hvanhovell commented on pull request #42367: [SPARK-43429][CONNECT] Add Default & Active SparkSession for Scala Client

2023-08-07 Thread via GitHub
hvanhovell commented on PR #42367: URL: https://github.com/apache/spark/pull/42367#issuecomment-1668807299 Merging to master/3.5 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] HeartSaVioR closed pull request #42354: [SPARK-44683][SS] Logging level isn't passed to RocksDB state store provider correctly

2023-08-07 Thread via GitHub
HeartSaVioR closed pull request #42354: [SPARK-44683][SS] Logging level isn't passed to RocksDB state store provider correctly URL: https://github.com/apache/spark/pull/42354 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] HeartSaVioR commented on pull request #42354: [SPARK-44683][SS] Logging level isn't passed to RocksDB state store provider correctly

2023-08-07 Thread via GitHub
HeartSaVioR commented on PR #42354: URL: https://github.com/apache/spark/pull/42354#issuecomment-1668804815 Thanks! Merging to master/3.5! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] HyukjinKwon closed pull request #42371: [SPARK-44694][PYTHON][CONNECT] Refactor active sessions and expose them as an API

2023-08-07 Thread via GitHub
HyukjinKwon closed pull request #42371: [SPARK-44694][PYTHON][CONNECT] Refactor active sessions and expose them as an API URL: https://github.com/apache/spark/pull/42371 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] HyukjinKwon commented on pull request #42371: [SPARK-44694][PYTHON][CONNECT] Refactor active sessions and expose them as an API

2023-08-07 Thread via GitHub
HyukjinKwon commented on PR #42371: URL: https://github.com/apache/spark/pull/42371#issuecomment-1668799750 Merged to master and branch-3.5. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] hvanhovell commented on pull request #42386: [SPARK-44713][CONNECT][SQL] Move shared classes to sql/api

2023-08-07 Thread via GitHub
hvanhovell commented on PR #42386: URL: https://github.com/apache/spark/pull/42386#issuecomment-1668795001 @bogao007 PTAL since this touches a couple of streaming classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] hvanhovell opened a new pull request, #42386: [SPARK-44713][CONNECT][SQL] Move shared classes to sql/api

2023-08-07 Thread via GitHub
hvanhovell opened a new pull request, #42386: URL: https://github.com/apache/spark/pull/42386 ### What changes were proposed in this pull request? This PR deduplicates the following classes: - `org.apache.spark.sql.SaveMode` -

[GitHub] [spark] anchovYu commented on a diff in pull request #42276: [SPARK-44714] Ease restriction of LCA resolution regarding queries with having

2023-08-07 Thread via GitHub
anchovYu commented on code in PR #42276: URL: https://github.com/apache/spark/pull/42276#discussion_r1286515285 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveLateralColumnAliasReference.scala: ## @@ -131,95 +131,97 @@ object

[GitHub] [spark] itholic commented on a diff in pull request #42332: [SPARK-44665][PYTHON] Add support for pandas DataFrame assertDataFrameEqual

2023-08-07 Thread via GitHub
itholic commented on code in PR #42332: URL: https://github.com/apache/spark/pull/42332#discussion_r1286515194 ## python/pyspark/testing/utils.py: ## @@ -464,23 +467,42 @@ def assertDataFrameEqual( raise PySparkAssertionError(

[GitHub] [spark] anchovYu commented on a diff in pull request #42276: [SPARK-44714] Ease restriction of LCA resolution regarding queries with having

2023-08-07 Thread via GitHub
anchovYu commented on code in PR #42276: URL: https://github.com/apache/spark/pull/42276#discussion_r1286512577 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveLateralColumnAliasReference.scala: ## @@ -131,95 +131,97 @@ object

[GitHub] [spark] anchovYu commented on a diff in pull request #42276: [SPARK-44714] Ease restriction of LCA resolution regarding queries with having

2023-08-07 Thread via GitHub
anchovYu commented on code in PR #42276: URL: https://github.com/apache/spark/pull/42276#discussion_r1286512367 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveLateralColumnAliasReference.scala: ## @@ -131,95 +131,97 @@ object

[GitHub] [spark] anchovYu commented on a diff in pull request #42276: [SPARK-44714] Ease restriction of LCA resolution regarding queries with having

2023-08-07 Thread via GitHub
anchovYu commented on code in PR #42276: URL: https://github.com/apache/spark/pull/42276#discussion_r1285473731 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveLateralColumnAliasReference.scala: ## @@ -131,138 +131,166 @@ object

[GitHub] [spark] utkarsh39 opened a new pull request, #42385: [SPARK-44705] Make PythonRunner single-threaded

2023-08-07 Thread via GitHub
utkarsh39 opened a new pull request, #42385: URL: https://github.com/apache/spark/pull/42385 ### What changes were proposed in this pull request? PythonRunner, a utility that executes Python UDFs in Spark, uses two threads in a producer-consumer model today. This multi-threading

[GitHub] [spark] bogao007 commented on a diff in pull request #42384: [SPARK-44710][CONNECT] Add Dataset.dropDuplicatesWithinWatermark to Spark Connect Scala Client

2023-08-07 Thread via GitHub
bogao007 commented on code in PR #42384: URL: https://github.com/apache/spark/pull/42384#discussion_r1286500284 ## connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/streaming/ClientStreamingQuerySuite.scala: ## @@ -352,6 +352,26 @@ class ClientStreamingQuerySuite

[GitHub] [spark] hvanhovell commented on pull request #42384: [SPARK-44710][CONNECT] Add Dataset.dropDuplicatesWithinWatermark to Spark Connect Scala Client

2023-08-07 Thread via GitHub
hvanhovell commented on PR #42384: URL: https://github.com/apache/spark/pull/42384#issuecomment-1668759250 cc @bogao007 PTAL -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] hvanhovell opened a new pull request, #42384: [SPARK-44710][CONNECT] Add Dataset.dropDuplicatesWithinWatermark to Spark Connect Scala Client

2023-08-07 Thread via GitHub
hvanhovell opened a new pull request, #42384: URL: https://github.com/apache/spark/pull/42384 ### What changes were proposed in this pull request? This PR adds `Dataset.dropDuplicatesWithinWatermark` to the Spark Connect Scala Client. ### Why are the changes needed? Increase

[GitHub] [spark] itholic commented on a diff in pull request #40370: [SPARK-42620][PS] Add `inclusive` parameter for (DataFrame|Series).between_time

2023-08-07 Thread via GitHub
itholic commented on code in PR #40370: URL: https://github.com/apache/spark/pull/40370#discussion_r1286496031 ## python/docs/source/migration_guide/pyspark_upgrade.rst: ## @@ -28,6 +28,7 @@ Upgrading from PySpark 3.5 to 4.0 * In Spark 4.0, ``Series.append`` has been removed

[GitHub] [spark] itholic commented on a diff in pull request #40370: [SPARK-42620][PS] Add `inclusive` parameter for (DataFrame|Series).between_time

2023-08-07 Thread via GitHub
itholic commented on code in PR #40370: URL: https://github.com/apache/spark/pull/40370#discussion_r1286495087 ## python/pyspark/pandas/tests/frame/test_reindexing.py: ## @@ -854,7 +879,8 @@ def test_sample(self): class FrameReidexingTests(FrameReindexingMixin,

[GitHub] [spark] itholic commented on a diff in pull request #42371: [SPARK-44694][PYTHON][CONNECT] Refactor active sessions and expose them as an API

2023-08-07 Thread via GitHub
itholic commented on code in PR #42371: URL: https://github.com/apache/spark/pull/42371#discussion_r1286494110 ## python/pyspark/errors/error_classes.py: ## @@ -622,6 +622,11 @@ "No active Spark session found. Please create a new Spark session before running the code."

[GitHub] [spark] itholic commented on a diff in pull request #42371: [SPARK-44694][PYTHON][CONNECT] Refactor active sessions and expose them as an API

2023-08-07 Thread via GitHub
itholic commented on code in PR #42371: URL: https://github.com/apache/spark/pull/42371#discussion_r1286494110 ## python/pyspark/errors/error_classes.py: ## @@ -622,6 +622,11 @@ "No active Spark session found. Please create a new Spark session before running the code."

[GitHub] [spark] github-actions[bot] closed pull request #28488: [SPARK-29083][CORE] Prefetch elements in rdd.toLocalIterator

2023-08-07 Thread via GitHub
github-actions[bot] closed pull request #28488: [SPARK-29083][CORE] Prefetch elements in rdd.toLocalIterator URL: https://github.com/apache/spark/pull/28488 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] github-actions[bot] closed pull request #40608: [SPARK-35198][CONNECT][CORE][PYTHON][SQL] Add support for calling debugCodegen from Python & Java

2023-08-07 Thread via GitHub
github-actions[bot] closed pull request #40608: [SPARK-35198][CONNECT][CORE][PYTHON][SQL] Add support for calling debugCodegen from Python & Java URL: https://github.com/apache/spark/pull/40608 -- This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [spark] github-actions[bot] commented on pull request #40949: [DRAFT][SPARK-23607][CORE] Use HDFS extended attributes to store application summary information in SHS

2023-08-07 Thread via GitHub
github-actions[bot] commented on PR #40949: URL: https://github.com/apache/spark/pull/40949#issuecomment-1668735341 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[GitHub] [spark] HyukjinKwon commented on pull request #42378: [SPARK-44703][CORE] Log eventLog rewrite duration when compact old event log files

2023-08-07 Thread via GitHub
HyukjinKwon commented on PR #42378: URL: https://github.com/apache/spark/pull/42378#issuecomment-1668725457 cc @HeartSaVioR FYI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] srielau commented on a diff in pull request #40474: [SPARK-42849] [SQL] Session Variables

2023-08-07 Thread via GitHub
srielau commented on code in PR #40474: URL: https://github.com/apache/spark/pull/40474#discussion_r1286477051 ## sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala: ## @@ -567,6 +568,135 @@ class SparkSqlAstBuilder extends AstBuilder { } } +

[GitHub] [spark] srielau commented on a diff in pull request #40474: [SPARK-42849] [SQL] Session Variables

2023-08-07 Thread via GitHub
srielau commented on code in PR #40474: URL: https://github.com/apache/spark/pull/40474#discussion_r1286473766 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TableOutputResolver.scala: ## @@ -34,6 +35,36 @@ import

[GitHub] [spark] amaliujia commented on a diff in pull request #42363: [SPARK-44691][SQL][CONNECT] Move Subclasses of AnalysisException to sql/api

2023-08-07 Thread via GitHub
amaliujia commented on code in PR #42363: URL: https://github.com/apache/spark/pull/42363#discussion_r1286462131 ## sql/api/src/main/scala/org/apache/spark/sql/catalyst/util/QuotingUtils.scala: ## @@ -37,6 +49,18 @@ object QuotingUtils { } } + def quoted(namespace:

[GitHub] [spark] ueshin commented on a diff in pull request #42371: [SPARK-44694][PYTHON][CONNECT] Refactor active sessions and expose them as an API

2023-08-07 Thread via GitHub
ueshin commented on code in PR #42371: URL: https://github.com/apache/spark/pull/42371#discussion_r1286461234 ## python/pyspark/sql/connect/session.py: ## @@ -93,14 +94,13 @@ from pyspark.sql.connect.udtf import UDTFRegistration -# `_active_spark_session` stores the

[GitHub] [spark] ueshin commented on a diff in pull request #42371: [SPARK-44694][PYTHON][CONNECT] Refactor active sessions and expose them as an API

2023-08-07 Thread via GitHub
ueshin commented on code in PR #42371: URL: https://github.com/apache/spark/pull/42371#discussion_r1286461234 ## python/pyspark/sql/connect/session.py: ## @@ -93,14 +94,13 @@ from pyspark.sql.connect.udtf import UDTFRegistration -# `_active_spark_session` stores the

[GitHub] [spark] HyukjinKwon commented on pull request #42373: [MINOR][UI] Increasing the number of significant digits for Fraction Cached of RDD

2023-08-07 Thread via GitHub
HyukjinKwon commented on PR #42373: URL: https://github.com/apache/spark/pull/42373#issuecomment-1668701756 Oh I just cherry-picked this because it seems fairly minor but I don't mind reverting this out of branch-3.5 -- This is an automated message from the Apache Git Service. To respond

[GitHub] [spark] ueshin commented on a diff in pull request #42371: [SPARK-44694][PYTHON][CONNECT] Refactor active sessions and expose them as an API

2023-08-07 Thread via GitHub
ueshin commented on code in PR #42371: URL: https://github.com/apache/spark/pull/42371#discussion_r1286461234 ## python/pyspark/sql/connect/session.py: ## @@ -93,14 +94,13 @@ from pyspark.sql.connect.udtf import UDTFRegistration -# `_active_spark_session` stores the

[GitHub] [spark] dongjoon-hyun commented on pull request #42373: [MINOR][UI] Increasing the number of significant digits for Fraction Cached of RDD

2023-08-07 Thread via GitHub
dongjoon-hyun commented on PR #42373: URL: https://github.com/apache/spark/pull/42373#issuecomment-1668692419 Thank you. Is this Apache Spark 3.5-only bug fix? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #42332: [SPARK-44665][PYTHON] Add support for pandas DataFrame assertDataFrameEqual

2023-08-07 Thread via GitHub
HyukjinKwon commented on code in PR #42332: URL: https://github.com/apache/spark/pull/42332#discussion_r1286455108 ## python/pyspark/testing/utils.py: ## @@ -464,23 +467,42 @@ def assertDataFrameEqual( raise PySparkAssertionError(

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #42332: [SPARK-44665][PYTHON] Add support for pandas DataFrame assertDataFrameEqual

2023-08-07 Thread via GitHub
HyukjinKwon commented on code in PR #42332: URL: https://github.com/apache/spark/pull/42332#discussion_r1286452469 ## python/pyspark/testing/utils.py: ## @@ -464,23 +467,42 @@ def assertDataFrameEqual( raise PySparkAssertionError(

[GitHub] [spark] HyukjinKwon closed pull request #42373: [MINOR][UI] Increasing the number of significant digits for Fraction Cached of RDD

2023-08-07 Thread via GitHub
HyukjinKwon closed pull request #42373: [MINOR][UI] Increasing the number of significant digits for Fraction Cached of RDD URL: https://github.com/apache/spark/pull/42373 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] HyukjinKwon commented on pull request #42373: [MINOR][UI] Increasing the number of significant digits for Fraction Cached of RDD

2023-08-07 Thread via GitHub
HyukjinKwon commented on PR #42373: URL: https://github.com/apache/spark/pull/42373#issuecomment-1668687126 Merged to master and branch-3.5. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] HyukjinKwon closed pull request #41712: [SPARK-44132][SQL] Materialize `Stream` of join column names to avoid codegen failure

2023-08-07 Thread via GitHub
HyukjinKwon closed pull request #41712: [SPARK-44132][SQL] Materialize `Stream` of join column names to avoid codegen failure URL: https://github.com/apache/spark/pull/41712 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] HyukjinKwon commented on pull request #41712: [SPARK-44132][SQL] Materialize `Stream` of join column names to avoid codegen failure

2023-08-07 Thread via GitHub
HyukjinKwon commented on PR #41712: URL: https://github.com/apache/spark/pull/41712#issuecomment-1668686373 Merged to master and branch-3.5. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] HyukjinKwon commented on pull request #42321: [SPARK-44657][CONNECT] Fix incorrect limit handling in ArrowBatchWithSchemaIterator and config parsing of CONNECT_GRPC_ARROW_MAX_BATCH_SI

2023-08-07 Thread via GitHub
HyukjinKwon commented on PR #42321: URL: https://github.com/apache/spark/pull/42321#issuecomment-1668686189 Let's fix up https://github.com/vicennial/spark/actions/runs/5789336753/job/15690218462 the linter. otherwise should be good to go. -- This is an automated message from the Apache

[GitHub] [spark] siying commented on pull request #42354: [SPARK-44683][SS] Logging level isn't passed to RocksDB state store provider correctly

2023-08-07 Thread via GitHub
siying commented on PR #42354: URL: https://github.com/apache/spark/pull/42354#issuecomment-1668686101 CC @HeartSaVioR -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #42371: [SPARK-44694][PYTHON][CONNECT] Refactor active sessions and expose them as an API

2023-08-07 Thread via GitHub
HyukjinKwon commented on code in PR #42371: URL: https://github.com/apache/spark/pull/42371#discussion_r1286447366 ## python/pyspark/sql/connect/session.py: ## @@ -93,14 +94,13 @@ from pyspark.sql.connect.udtf import UDTFRegistration -# `_active_spark_session` stores

[GitHub] [spark] heyihong commented on a diff in pull request #42363: [SPARK-44691][SQL][CONNECT] Move Subclasses of AnalysisException to sql/api

2023-08-07 Thread via GitHub
heyihong commented on code in PR #42363: URL: https://github.com/apache/spark/pull/42363#discussion_r1286443119 ## sql/api/src/main/scala/org/apache/spark/sql/catalyst/util/QuotingUtils.scala: ## @@ -37,6 +49,18 @@ object QuotingUtils { } } + def quoted(namespace:

[GitHub] [spark] juliuszsompolski commented on pull request #42355: [SPARK-44709][CONNECT] Run ExecuteGrpcResponseSender in reattachable execute in new thread to fix flow control

2023-08-07 Thread via GitHub
juliuszsompolski commented on PR #42355: URL: https://github.com/apache/spark/pull/42355#issuecomment-1668675937 @hvanhovell -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] juliuszsompolski commented on a diff in pull request #42355: [SPARK-44709][CONNECT] Run ExecuteGrpcResponseSender in reattachable execute in new thread to fix flow control

2023-08-07 Thread via GitHub
juliuszsompolski commented on code in PR #42355: URL: https://github.com/apache/spark/pull/42355#discussion_r1286442702 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/config/Connect.scala: ## @@ -82,7 +93,7 @@ object Connect { "Set to 0 for

[GitHub] [spark] juliuszsompolski commented on a diff in pull request #42355: [SPARK-44709][CONNECT] Run ExecuteGrpcResponseSender in reattachable execute in new thread to fix flow control

2023-08-07 Thread via GitHub
juliuszsompolski commented on code in PR #42355: URL: https://github.com/apache/spark/pull/42355#discussion_r1286441954 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/execution/ExecuteResponseObserver.scala: ## @@ -85,12 +85,18 @@ private[connect] class

[GitHub] [spark] heyihong commented on a diff in pull request #42363: [SPARK-44691][SQL][CONNECT] Move Subclasses of AnalysisException to sql/api

2023-08-07 Thread via GitHub
heyihong commented on code in PR #42363: URL: https://github.com/apache/spark/pull/42363#discussion_r1286441314 ## sql/api/src/main/scala/org/apache/spark/sql/catalyst/util/QuotingUtils.scala: ## @@ -37,6 +49,18 @@ object QuotingUtils { } } + def quoted(namespace:

[GitHub] [spark] heyihong commented on a diff in pull request #42363: [SPARK-44691][SQL][CONNECT] Move Subclasses of AnalysisException to sql/api

2023-08-07 Thread via GitHub
heyihong commented on code in PR #42363: URL: https://github.com/apache/spark/pull/42363#discussion_r1286441314 ## sql/api/src/main/scala/org/apache/spark/sql/catalyst/util/QuotingUtils.scala: ## @@ -37,6 +49,18 @@ object QuotingUtils { } } + def quoted(namespace:

[GitHub] [spark] heyihong commented on a diff in pull request #42363: [SPARK-44691][SQL][CONNECT] Move Subclasses of AnalysisException to sql/api

2023-08-07 Thread via GitHub
heyihong commented on code in PR #42363: URL: https://github.com/apache/spark/pull/42363#discussion_r1286441314 ## sql/api/src/main/scala/org/apache/spark/sql/catalyst/util/QuotingUtils.scala: ## @@ -37,6 +49,18 @@ object QuotingUtils { } } + def quoted(namespace:

[GitHub] [spark] juliuszsompolski commented on pull request #42355: [SPARK-44709][CONNECT] Run ExecuteGrpcResponseSender in reattachable execute in new thread to fix flow control

2023-08-07 Thread via GitHub
juliuszsompolski commented on PR #42355: URL: https://github.com/apache/spark/pull/42355#issuecomment-1668673016 @hvanhovell > Is the ExecuteGrpcResponseSender thread safe? For example is detach() safe? detach() is synchronized on the executeObserver it's attached to, like the

[GitHub] [spark] monkeyboy123 commented on pull request #42376: [SPARK-44700][SQL] Rule OptimizeCsvJsonExprs should not be applied to expression like from_json(regexp_replace)

2023-08-07 Thread via GitHub
monkeyboy123 commented on PR #42376: URL: https://github.com/apache/spark/pull/42376#issuecomment-1668672646 gently ping @viirya Could you help me to reivew it? also cc @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] hvanhovell closed pull request #42368: [SPARK-44692][CONNECT][SQL] Move Trigger(s) to sql/api

2023-08-07 Thread via GitHub
hvanhovell closed pull request #42368: [SPARK-44692][CONNECT][SQL] Move Trigger(s) to sql/api URL: https://github.com/apache/spark/pull/42368 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] hvanhovell commented on pull request #42368: [SPARK-44692][CONNECT][SQL] Move Trigger(s) to sql/api

2023-08-07 Thread via GitHub
hvanhovell commented on PR #42368: URL: https://github.com/apache/spark/pull/42368#issuecomment-1668666826 Merging to master/3.5 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] ueshin commented on a diff in pull request #42371: [SPARK-44694][PYTHON][CONNECT] Refactor active sessions and expose them as an API

2023-08-07 Thread via GitHub
ueshin commented on code in PR #42371: URL: https://github.com/apache/spark/pull/42371#discussion_r1286422331 ## python/pyspark/sql/connect/session.py: ## @@ -93,14 +94,13 @@ from pyspark.sql.connect.udtf import UDTFRegistration -# `_active_spark_session` stores the

[GitHub] [spark] ueshin commented on a diff in pull request #42371: [SPARK-44694][PYTHON][CONNECT] Refactor active sessions and expose them as an API

2023-08-07 Thread via GitHub
ueshin commented on code in PR #42371: URL: https://github.com/apache/spark/pull/42371#discussion_r1286413587 ## python/pyspark/sql/connect/session.py: ## @@ -628,20 +664,18 @@ def is_stopped(self) -> bool: """ return self.client.is_closed -@classmethod

[GitHub] [spark] vinodkc commented on pull request #42380: [SPARK-44696][SQL] Support different timestamp precise for `from_json` function

2023-08-07 Thread via GitHub
vinodkc commented on PR #42380: URL: https://github.com/apache/spark/pull/42380#issuecomment-1668628765 Can you please add a testcase for millisecond & microseconds precision checks? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] HyukjinKwon closed pull request #42266: [SPARK-44575][SQL][CONNECT] Implement basic error translation

2023-08-07 Thread via GitHub
HyukjinKwon closed pull request #42266: [SPARK-44575][SQL][CONNECT] Implement basic error translation URL: https://github.com/apache/spark/pull/42266 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] HyukjinKwon commented on pull request #42266: [SPARK-44575][SQL][CONNECT] Implement basic error translation

2023-08-07 Thread via GitHub
HyukjinKwon commented on PR #42266: URL: https://github.com/apache/spark/pull/42266#issuecomment-1668604341 Merged to master and branch-3.5. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] HyukjinKwon closed pull request #42267: [SPARK-43606][PS] Remove `Int64Index` & `Float64Index`

2023-08-07 Thread via GitHub
HyukjinKwon closed pull request #42267: [SPARK-43606][PS] Remove `Int64Index` & `Float64Index` URL: https://github.com/apache/spark/pull/42267 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] HyukjinKwon commented on pull request #42267: [SPARK-43606][PS] Remove `Int64Index` & `Float64Index`

2023-08-07 Thread via GitHub
HyukjinKwon commented on PR #42267: URL: https://github.com/apache/spark/pull/42267#issuecomment-1668602519 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] allisonwang-db commented on pull request #42353: [SPARK-44005][PYTHON] Improve error messages for regular Python UDTFs that return non-tuple values

2023-08-07 Thread via GitHub
allisonwang-db commented on PR #42353: URL: https://github.com/apache/spark/pull/42353#issuecomment-1668587368 cc @ueshin @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] dongjoon-hyun commented on pull request #42381: [SPARK-44707][K8S] Use INFO log in `ExecutorPodsWatcher.onClose` if `SparkContext` is stopped

2023-08-07 Thread via GitHub
dongjoon-hyun commented on PR #42381: URL: https://github.com/apache/spark/pull/42381#issuecomment-1668567470 Merged to master for Apache Spark 4.0. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] dongjoon-hyun closed pull request #42381: [SPARK-44707][K8S] Use INFO log in `ExecutorPodsWatcher.onClose` if `SparkContext` is stopped

2023-08-07 Thread via GitHub
dongjoon-hyun closed pull request #42381: [SPARK-44707][K8S] Use INFO log in `ExecutorPodsWatcher.onClose` if `SparkContext` is stopped URL: https://github.com/apache/spark/pull/42381 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #42381: [SPARK-44707][K8S] Use INFO log in `ExecutorPodsWatcher.onClose` if `SparkContext` is stopped

2023-08-07 Thread via GitHub
dongjoon-hyun commented on code in PR #42381: URL: https://github.com/apache/spark/pull/42381#discussion_r1286375435 ## resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/ExecutorPodsWatchSnapshotSource.scala: ## @@ -86,12 +86,20 @@ class

  1   2   3   >