[GitHub] [spark] dengziming commented on a diff in pull request #42939: [SPARK-43254][SQL] Assign a name to the error _LEGACY_ERROR_TEMP_2018

2023-09-17 Thread via GitHub
dengziming commented on code in PR #42939: URL: https://github.com/apache/spark/pull/42939#discussion_r1328286795 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/ExpressionEncoder.scala: ## @@ -170,7 +170,7 @@ object ExpressionEncoder { * Function that

[GitHub] [spark] LuciferYang opened a new pull request, #42972: [SPARK-45196][PYTHON][DOCS] Refine docstring of `array/array_contains/arrays_overlap`

2023-09-17 Thread via GitHub
LuciferYang opened a new pull request, #42972: URL: https://github.com/apache/spark/pull/42972 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

[GitHub] [spark] amaliujia commented on a diff in pull request #42971: [SPARK-43979][SQL][FOLLOWUP] Handle non alias-only project case

2023-09-17 Thread via GitHub
amaliujia commented on code in PR #42971: URL: https://github.com/apache/spark/pull/42971#discussion_r1328287604 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala: ## @@ -1131,17 +1130,24 @@ trait CheckAnalysis extends PredicateHelper with

[GitHub] [spark] amaliujia commented on a diff in pull request #42971: [SPARK-43979][SQL][FOLLOWUP] Handle non alias-only project case

2023-09-17 Thread via GitHub
amaliujia commented on code in PR #42971: URL: https://github.com/apache/spark/pull/42971#discussion_r1328287335 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala: ## @@ -1131,17 +1130,24 @@ trait CheckAnalysis extends PredicateHelper with

[GitHub] [spark] panbingkun commented on pull request #42507: [SPARK-44823][PYTHON] Update black to 23.9.1 and fix erroneous check

2023-09-17 Thread via GitHub
panbingkun commented on PR #42507: URL: https://github.com/apache/spark/pull/42507#issuecomment-1722802547 > @panbingkun I think you can rebase now if you want to proceed Done. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to G

[GitHub] [spark] panbingkun commented on a diff in pull request #42917: [SPARK-45163][SQL] Merge UNSUPPORTED_VIEW_OPERATION & UNSUPPORTED_TABLE_OPERATION & fix some issue

2023-09-17 Thread via GitHub
panbingkun commented on code in PR #42917: URL: https://github.com/apache/spark/pull/42917#discussion_r1328273864 ## common/utils/src/main/resources/error/error-classes.json: ## @@ -860,6 +860,16 @@ "Exceeds char/varchar type length limitation: ." ] }, + "EXPECT_

[GitHub] [spark] MaxGekk commented on a diff in pull request #42931: [SPARK-45137][CONNECT] Support map/array parameters in parameterized `sql()`

2023-09-17 Thread via GitHub
MaxGekk commented on code in PR #42931: URL: https://github.com/apache/spark/pull/42931#discussion_r1328271078 ## connector/connect/common/src/main/protobuf/spark/connect/commands.proto: ## @@ -65,6 +65,12 @@ message SqlCommand { // (Optional) A sequence of literal expressi

[GitHub] [spark] cloud-fan commented on a diff in pull request #42917: [SPARK-45163][SQL] Merge UNSUPPORTED_VIEW_OPERATION & UNSUPPORTED_TABLE_OPERATION & fix some issue

2023-09-17 Thread via GitHub
cloud-fan commented on code in PR #42917: URL: https://github.com/apache/spark/pull/42917#discussion_r1328267365 ## common/utils/src/main/resources/error/error-classes.json: ## @@ -3239,6 +3249,11 @@ "message" : [ "TRANSFORM with SERDE is only supported in hi

[GitHub] [spark] cloud-fan commented on a diff in pull request #42917: [SPARK-45163][SQL] Merge UNSUPPORTED_VIEW_OPERATION & UNSUPPORTED_TABLE_OPERATION & fix some issue

2023-09-17 Thread via GitHub
cloud-fan commented on code in PR #42917: URL: https://github.com/apache/spark/pull/42917#discussion_r1328263710 ## common/utils/src/main/resources/error/error-classes.json: ## @@ -860,6 +860,16 @@ "Exceeds char/varchar type length limitation: ." ] }, + "EXPECT_T

[GitHub] [spark] cloud-fan commented on a diff in pull request #42917: [SPARK-45163][SQL] Merge UNSUPPORTED_VIEW_OPERATION & UNSUPPORTED_TABLE_OPERATION & fix some issue

2023-09-17 Thread via GitHub
cloud-fan commented on code in PR #42917: URL: https://github.com/apache/spark/pull/42917#discussion_r1328263710 ## common/utils/src/main/resources/error/error-classes.json: ## @@ -860,6 +860,16 @@ "Exceeds char/varchar type length limitation: ." ] }, + "EXPECT_T

[GitHub] [spark] cloud-fan commented on a diff in pull request #42917: [SPARK-45163][SQL] Merge UNSUPPORTED_VIEW_OPERATION & UNSUPPORTED_TABLE_OPERATION & fix some issue

2023-09-17 Thread via GitHub
cloud-fan commented on code in PR #42917: URL: https://github.com/apache/spark/pull/42917#discussion_r1328263476 ## common/utils/src/main/resources/error/error-classes.json: ## @@ -860,6 +860,16 @@ "Exceeds char/varchar type length limitation: ." ] }, + "EXPECT_T

[GitHub] [spark] cloud-fan commented on a diff in pull request #42917: [SPARK-45163][SQL] Merge UNSUPPORTED_VIEW_OPERATION & UNSUPPORTED_TABLE_OPERATION & fix some issue

2023-09-17 Thread via GitHub
cloud-fan commented on code in PR #42917: URL: https://github.com/apache/spark/pull/42917#discussion_r1328262847 ## common/utils/src/main/resources/error/error-classes.json: ## @@ -860,6 +860,16 @@ "Exceeds char/varchar type length limitation: ." ] }, + "EXPECT_T

[GitHub] [spark] cloud-fan commented on a diff in pull request #42971: [SPARK-43979][SQL][FOLLOWUP] Handle non alias-only project case

2023-09-17 Thread via GitHub
cloud-fan commented on code in PR #42971: URL: https://github.com/apache/spark/pull/42971#discussion_r1328261941 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala: ## @@ -1131,17 +1130,24 @@ trait CheckAnalysis extends PredicateHelper with

[GitHub] [spark] amaliujia opened a new pull request, #42971: [SPARK-43979][SQL][FOLLOWUP] Handle non alias-only project case

2023-09-17 Thread via GitHub
amaliujia opened a new pull request, #42971: URL: https://github.com/apache/spark/pull/42971 …. ### What changes were proposed in this pull request? `simplifyPlanForCollectedMetrics ` still could need to handle non alias-only project case where the project contains a mi

[GitHub] [spark] amaliujia commented on pull request #42971: [SPARK-43979][SQL][FOLLOWUP] Handle non alias-only project case

2023-09-17 Thread via GitHub
amaliujia commented on PR #42971: URL: https://github.com/apache/spark/pull/42971#issuecomment-1722776811 @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To uns

[GitHub] [spark] zhengruifeng commented on pull request #42887: [SPARK-45130][CONNECT][ML][PYTHON] Avoid Spark connect ML model to change input pandas dataframe

2023-09-17 Thread via GitHub
zhengruifeng commented on PR #42887: URL: https://github.com/apache/spark/pull/42887#issuecomment-1722755173 merged to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] zhengruifeng closed pull request #42887: [SPARK-45130][CONNECT][ML][PYTHON] Avoid Spark connect ML model to change input pandas dataframe

2023-09-17 Thread via GitHub
zhengruifeng closed pull request #42887: [SPARK-45130][CONNECT][ML][PYTHON] Avoid Spark connect ML model to change input pandas dataframe URL: https://github.com/apache/spark/pull/42887 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Gi

[GitHub] [spark] HyukjinKwon closed pull request #42965: [SPARK-45167][CONNECT][PYTHON][FOLLOW-UP] Use lighter threading Rlock, and use the existing eventually util function

2023-09-17 Thread via GitHub
HyukjinKwon closed pull request #42965: [SPARK-45167][CONNECT][PYTHON][FOLLOW-UP] Use lighter threading Rlock, and use the existing eventually util function URL: https://github.com/apache/spark/pull/42965 -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] HyukjinKwon commented on pull request #42965: [SPARK-45167][CONNECT][PYTHON][FOLLOW-UP] Use lighter threading Rlock, and use the existing eventually util function

2023-09-17 Thread via GitHub
HyukjinKwon commented on PR #42965: URL: https://github.com/apache/spark/pull/42965#issuecomment-1722749440 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #42970: [SPARK-45193][PS][CONNECT][TESTS] Refactor `test_mode` to be compatible with Spark Connect

2023-09-17 Thread via GitHub
HyukjinKwon commented on code in PR #42970: URL: https://github.com/apache/spark/pull/42970#discussion_r1328242546 ## python/pyspark/pandas/tests/computation/test_compute.py: ## @@ -15,11 +15,12 @@ # limitations under the License. # import unittest -from distutils.version imp

[GitHub] [spark] zhengruifeng commented on pull request #42970: [SPARK-45193][PS][CONNECT][TESTS] Refactor `test_mode` to be compatible with Spark Connect

2023-09-17 Thread via GitHub
zhengruifeng commented on PR #42970: URL: https://github.com/apache/spark/pull/42970#issuecomment-1722738480 cc @itholic @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] zhengruifeng opened a new pull request, #42970: [SPARK-45193][PS][CONNECT][TESTS] Refactor `test_mode` to be compatible with Spark Connect

2023-09-17 Thread via GitHub
zhengruifeng opened a new pull request, #42970: URL: https://github.com/apache/spark/pull/42970 ### What changes were proposed in this pull request? Refactor `test_mode` to be compatible with Spark Connect ### Why are the changes needed? for test parity ### Does this P

[GitHub] [spark] zhengruifeng commented on pull request #42966: [SPARK-45179][DOCS][FOLLOWUP] Add migration guide for Numpy minimum version upgrade

2023-09-17 Thread via GitHub
zhengruifeng commented on PR #42966: URL: https://github.com/apache/spark/pull/42966#issuecomment-1722727476 thanks, merged to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] zhengruifeng closed pull request #42966: [SPARK-45179][DOCS][FOLLOWUP] Add migration guide for Numpy minimum version upgrade

2023-09-17 Thread via GitHub
zhengruifeng closed pull request #42966: [SPARK-45179][DOCS][FOLLOWUP] Add migration guide for Numpy minimum version upgrade URL: https://github.com/apache/spark/pull/42966 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] gengliangwang closed pull request #42964: [SPARK-45189][SQL] Creating UnresolvedRelation from TableIdentifier should include the catalog field

2023-09-17 Thread via GitHub
gengliangwang closed pull request #42964: [SPARK-45189][SQL] Creating UnresolvedRelation from TableIdentifier should include the catalog field URL: https://github.com/apache/spark/pull/42964 -- This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] gengliangwang commented on pull request #42964: [SPARK-45189][SQL] Creating UnresolvedRelation from TableIdentifier should include the catalog field

2023-09-17 Thread via GitHub
gengliangwang commented on PR #42964: URL: https://github.com/apache/spark/pull/42964#issuecomment-1722719604 Thanks, merging to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #42965: [SPARK-45167][CONNECT][PYTHON][FOLLOW-UP] Use lighter threading Rlock, and use the existing eventually util function

2023-09-17 Thread via GitHub
HyukjinKwon commented on code in PR #42965: URL: https://github.com/apache/spark/pull/42965#discussion_r1328227343 ## python/pyspark/sql/connect/client/reattach.py: ## @@ -18,12 +18,11 @@ check_dependencies(__name__) +from threading import RLock import warnings import uui

[GitHub] [spark] sandip-db commented on a diff in pull request #42938: [SPARK-44788][CONNECT][PYTHON][SQL] Add from_xml and schema_of_xml to pyspark, spark connect and sql function

2023-09-17 Thread via GitHub
sandip-db commented on code in PR #42938: URL: https://github.com/apache/spark/pull/42938#discussion_r1328227219 ## python/pyspark/sql/tests/connect/test_connect_function.py: ## @@ -1821,6 +1821,111 @@ def test_json_functions(self): sdf.select(SF.to_json(SF.struct(S

[GitHub] [spark] grundprinzip commented on a diff in pull request #42965: [SPARK-45167][CONNECT][PYTHON][FOLLOW-UP] Use lighter threading Rlock, and use the existing eventually util function

2023-09-17 Thread via GitHub
grundprinzip commented on code in PR #42965: URL: https://github.com/apache/spark/pull/42965#discussion_r1328226669 ## python/pyspark/sql/connect/client/reattach.py: ## @@ -18,12 +18,11 @@ check_dependencies(__name__) +from threading import RLock import warnings import uu

[GitHub] [spark] zhengruifeng commented on pull request #42864: [WIP][SPARK-45112][SQL] Use UnresolvedFunction based resolution in SQL Dataset functions

2023-09-17 Thread via GitHub
zhengruifeng commented on PR #42864: URL: https://github.com/apache/spark/pull/42864#issuecomment-1722709137 also cc @beliefer @panbingkun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

[GitHub] [spark] cloud-fan commented on a diff in pull request #42957: [SPARK-45188][SQL][DOCS] Update error messages related to parameterized `sql()`

2023-09-17 Thread via GitHub
cloud-fan commented on code in PR #42957: URL: https://github.com/apache/spark/pull/42957#discussion_r1328222789 ## common/utils/src/main/resources/error/error-classes.json: ## @@ -1892,7 +1892,7 @@ }, "INVALID_SQL_ARG" : { "message" : [ - "The argument of `sql(

[GitHub] [spark] cloud-fan commented on a diff in pull request #42931: [SPARK-45137][CONNECT] Support map/array parameters in parameterized `sql()`

2023-09-17 Thread via GitHub
cloud-fan commented on code in PR #42931: URL: https://github.com/apache/spark/pull/42931#discussion_r1328222454 ## connector/connect/common/src/main/protobuf/spark/connect/commands.proto: ## @@ -65,6 +65,12 @@ message SqlCommand { // (Optional) A sequence of literal expres

[GitHub] [spark] cloud-fan commented on a diff in pull request #42931: [SPARK-45137][CONNECT] Support map/array parameters in parameterized `sql()`

2023-09-17 Thread via GitHub
cloud-fan commented on code in PR #42931: URL: https://github.com/apache/spark/pull/42931#discussion_r132837 ## connector/connect/common/src/main/protobuf/spark/connect/commands.proto: ## @@ -65,6 +65,12 @@ message SqlCommand { // (Optional) A sequence of literal expres

[GitHub] [spark] cloud-fan commented on a diff in pull request #42931: [SPARK-45137][CONNECT] Support map/array parameters in parameterized `sql()`

2023-09-17 Thread via GitHub
cloud-fan commented on code in PR #42931: URL: https://github.com/apache/spark/pull/42931#discussion_r1328222151 ## connector/connect/common/src/main/protobuf/spark/connect/commands.proto: ## @@ -65,6 +65,12 @@ message SqlCommand { // (Optional) A sequence of literal expres

[GitHub] [spark] itholic commented on a diff in pull request #42956: [SPARK-43654][CONNECT][PS][TESTS] Enable `InternalFrameParityTests.test_from_pandas`

2023-09-17 Thread via GitHub
itholic commented on code in PR #42956: URL: https://github.com/apache/spark/pull/42956#discussion_r1328221746 ## python/pyspark/pandas/tests/connect/test_parity_internal.py: ## @@ -15,18 +15,86 @@ # limitations under the License. # import unittest +import pandas as pd fro

[GitHub] [spark] cloud-fan commented on a diff in pull request #42864: [WIP][SPARK-45112][SQL] Use UnresolvedFunction based resolution in SQL Dataset functions

2023-09-17 Thread via GitHub
cloud-fan commented on code in PR #42864: URL: https://github.com/apache/spark/pull/42864#discussion_r1328221713 ## sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala: ## @@ -708,7 +708,7 @@ private[sql] object RelationalGroupedDataset { case expr:

[GitHub] [spark] cloud-fan commented on a diff in pull request #42864: [WIP][SPARK-45112][SQL] Use UnresolvedFunction based resolution in SQL Dataset functions

2023-09-17 Thread via GitHub
cloud-fan commented on code in PR #42864: URL: https://github.com/apache/spark/pull/42864#discussion_r1328220027 ## sql/core/src/main/scala/org/apache/spark/sql/functions.scala: ## @@ -6843,9 +6562,8 @@ object functions { * @since 3.0.0 */ // scalastyle:on line.size.l

[GitHub] [spark] zhengruifeng commented on a diff in pull request #42956: [SPARK-43654][CONNECT][PS][TESTS] Enable `InternalFrameParityTests.test_from_pandas`

2023-09-17 Thread via GitHub
zhengruifeng commented on code in PR #42956: URL: https://github.com/apache/spark/pull/42956#discussion_r1328219551 ## python/pyspark/pandas/tests/connect/test_parity_internal.py: ## @@ -15,18 +15,86 @@ # limitations under the License. # import unittest +import pandas as pd

[GitHub] [spark] LuciferYang commented on pull request #42908: [SPARK-44872][CONNECT][FOLLOWUP] Deflake ReattachableExecuteSuite and increase retry buffer

2023-09-17 Thread via GitHub
LuciferYang commented on PR #42908: URL: https://github.com/apache/spark/pull/42908#issuecomment-1722702977 ``` dev/change-scala-version.sh 2.13 build/sbt "connect/test" -Pscala-2.13 ``` @juliuszsompolski When I run the above command during local test, it is easier to reproduce `

[GitHub] [spark] itholic commented on a diff in pull request #42956: [SPARK-43654][CONNECT][PS][TESTS] Enable `InternalFrameParityTests.test_from_pandas`

2023-09-17 Thread via GitHub
itholic commented on code in PR #42956: URL: https://github.com/apache/spark/pull/42956#discussion_r1328218762 ## python/pyspark/pandas/tests/connect/test_parity_internal.py: ## @@ -15,18 +15,86 @@ # limitations under the License. # import unittest +import pandas as pd fro

[GitHub] [spark] LuciferYang commented on a diff in pull request #42968: [SPARK-45113][FOLLOWUP] Fix test failed in Scala 2.13

2023-09-17 Thread via GitHub
LuciferYang commented on code in PR #42968: URL: https://github.com/apache/spark/pull/42968#discussion_r1328218686 ## python/pyspark/sql/functions.py: ## @@ -3765,12 +3765,12 @@ def collect_set(col: "ColumnOrName") -> Column: Example 1: Collect values from a single column D

[GitHub] [spark] itholic commented on a diff in pull request #42956: [SPARK-43654][CONNECT][PS][TESTS] Enable `InternalFrameParityTests.test_from_pandas`

2023-09-17 Thread via GitHub
itholic commented on code in PR #42956: URL: https://github.com/apache/spark/pull/42956#discussion_r1328217955 ## python/pyspark/pandas/tests/connect/test_parity_internal.py: ## @@ -15,18 +15,86 @@ # limitations under the License. # import unittest +import pandas as pd fro

[GitHub] [spark] cloud-fan commented on a diff in pull request #42864: [WIP][SPARK-45112][SQL] Use UnresolvedFunction based resolution in SQL Dataset functions

2023-09-17 Thread via GitHub
cloud-fan commented on code in PR #42864: URL: https://github.com/apache/spark/pull/42864#discussion_r1328218485 ## sql/core/src/main/scala/org/apache/spark/sql/functions.scala: ## @@ -414,12 +407,13 @@ object functions { * @group agg_funcs * @since 1.3.0 */ - def c

[GitHub] [spark] itholic commented on a diff in pull request #42956: [SPARK-43654][CONNECT][PS][TESTS] Enable `InternalFrameParityTests.test_from_pandas`

2023-09-17 Thread via GitHub
itholic commented on code in PR #42956: URL: https://github.com/apache/spark/pull/42956#discussion_r1328217955 ## python/pyspark/pandas/tests/connect/test_parity_internal.py: ## @@ -15,18 +15,86 @@ # limitations under the License. # import unittest +import pandas as pd fro

[GitHub] [spark] itholic commented on a diff in pull request #42956: [SPARK-43654][CONNECT][PS][TESTS] Enable `InternalFrameParityTests.test_from_pandas`

2023-09-17 Thread via GitHub
itholic commented on code in PR #42956: URL: https://github.com/apache/spark/pull/42956#discussion_r1328217955 ## python/pyspark/pandas/tests/connect/test_parity_internal.py: ## @@ -15,18 +15,86 @@ # limitations under the License. # import unittest +import pandas as pd fro

[GitHub] [spark] cloud-fan commented on a diff in pull request #42864: [WIP][SPARK-45112][SQL] Use UnresolvedFunction based resolution in SQL Dataset functions

2023-09-17 Thread via GitHub
cloud-fan commented on code in PR #42864: URL: https://github.com/apache/spark/pull/42864#discussion_r1328217592 ## sql/core/src/main/scala/org/apache/spark/sql/Column.scala: ## @@ -234,7 +260,7 @@ class Column(val expr: Expression) extends Logging { * @group expr_ops *

[GitHub] [spark] zhengruifeng commented on a diff in pull request #42968: [SPARK-45113][FOLLOWUP] Fix test failed in Scala 2.13

2023-09-17 Thread via GitHub
zhengruifeng commented on code in PR #42968: URL: https://github.com/apache/spark/pull/42968#discussion_r1328217212 ## python/pyspark/sql/functions.py: ## @@ -3765,12 +3765,12 @@ def collect_set(col: "ColumnOrName") -> Column: Example 1: Collect values from a single column

[GitHub] [spark] LuciferYang commented on pull request #42968: [SPARK-45113][FOLLOWUP] Fix test failed in Scala 2.13

2023-09-17 Thread via GitHub
LuciferYang commented on PR #42968: URL: https://github.com/apache/spark/pull/42968#issuecomment-1722697313 will update the PR description after testing Scala 2.13 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR

[GitHub] [spark] cloud-fan commented on a diff in pull request #42864: [WIP][SPARK-45112][SQL] Use UnresolvedFunction based resolution in SQL Dataset functions

2023-09-17 Thread via GitHub
cloud-fan commented on code in PR #42864: URL: https://github.com/apache/spark/pull/42864#discussion_r1328213801 ## sql/core/src/main/scala/org/apache/spark/sql/Column.scala: ## @@ -234,7 +260,7 @@ class Column(val expr: Expression) extends Logging { * @group expr_ops *

[GitHub] [spark] LuciferYang commented on a diff in pull request #42968: [SPARK-45113][FOLLOWUP] Fix test failed in Scala 2.13

2023-09-17 Thread via GitHub
LuciferYang commented on code in PR #42968: URL: https://github.com/apache/spark/pull/42968#discussion_r1328213649 ## python/pyspark/sql/functions.py: ## @@ -3765,12 +3765,12 @@ def collect_set(col: "ColumnOrName") -> Column: Example 1: Collect values from a single column D

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #42938: [SPARK-44788][CONNECT][PYTHON][SQL] Add from_xml and schema_of_xml to pyspark, spark connect and sql function

2023-09-17 Thread via GitHub
HyukjinKwon commented on code in PR #42938: URL: https://github.com/apache/spark/pull/42938#discussion_r1328213652 ## python/pyspark/sql/tests/connect/test_connect_function.py: ## @@ -1821,6 +1821,111 @@ def test_json_functions(self): sdf.select(SF.to_json(SF.struct

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #42956: [SPARK-43654][CONNECT][PS][TESTS] Enable `InternalFrameParityTests.test_from_pandas`

2023-09-17 Thread via GitHub
HyukjinKwon commented on code in PR #42956: URL: https://github.com/apache/spark/pull/42956#discussion_r1328213080 ## python/pyspark/pandas/tests/connect/test_parity_internal.py: ## @@ -15,18 +15,86 @@ # limitations under the License. # import unittest +import pandas as pd

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #42956: [SPARK-43654][CONNECT][PS][TESTS] Enable `InternalFrameParityTests.test_from_pandas`

2023-09-17 Thread via GitHub
HyukjinKwon commented on code in PR #42956: URL: https://github.com/apache/spark/pull/42956#discussion_r1328212842 ## python/pyspark/pandas/tests/connect/test_parity_internal.py: ## @@ -15,18 +15,86 @@ # limitations under the License. # import unittest +import pandas as pd

[GitHub] [spark] zhengruifeng commented on a diff in pull request #42956: [SPARK-43654][CONNECT][PS][TESTS] Enable `InternalFrameParityTests.test_from_pandas`

2023-09-17 Thread via GitHub
zhengruifeng commented on code in PR #42956: URL: https://github.com/apache/spark/pull/42956#discussion_r1328212481 ## python/pyspark/pandas/tests/connect/test_parity_internal.py: ## @@ -15,18 +15,86 @@ # limitations under the License. # import unittest +import pandas as pd

[GitHub] [spark] LuciferYang commented on pull request #42908: [SPARK-44872][CONNECT][FOLLOWUP] Deflake ReattachableExecuteSuite and increase retry buffer

2023-09-17 Thread via GitHub
LuciferYang commented on PR #42908: URL: https://github.com/apache/spark/pull/42908#issuecomment-1722694131 > @LuciferYang I tried looking at [#42560 (comment)](https://github.com/apache/spark/pull/42560#issuecomment-1718968002) but did not reproduce it yet. If you have more instances of CI

[GitHub] [spark] yaooqinn opened a new pull request, #42969: [SPARK-45192][UI] Fix overdue lineInterpolate parameter for graphviz edge

2023-09-17 Thread via GitHub
yaooqinn opened a new pull request, #42969: URL: https://github.com/apache/spark/pull/42969 ### What changes were proposed in this pull request? The `edge.lineInterpolate` no longer takes effect for drawing edges. It shall be replaced by d3.curve ### Why ar

[GitHub] [spark] itholic commented on a diff in pull request #42956: [SPARK-43654][CONNECT][PS][TESTS] Enable `InternalFrameParityTests.test_from_pandas`

2023-09-17 Thread via GitHub
itholic commented on code in PR #42956: URL: https://github.com/apache/spark/pull/42956#discussion_r1328210599 ## python/pyspark/pandas/tests/connect/test_parity_internal.py: ## @@ -15,18 +15,86 @@ # limitations under the License. # import unittest +import pandas as pd fro

[GitHub] [spark] cloud-fan commented on a diff in pull request #42864: [WIP][SPARK-45112][SQL] Use UnresolvedFunction based resolution in SQL Dataset functions

2023-09-17 Thread via GitHub
cloud-fan commented on code in PR #42864: URL: https://github.com/apache/spark/pull/42864#discussion_r1328209728 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala: ## @@ -442,6 +442,10 @@ case class InSubquery(values: Seq[Expression], quer

[GitHub] [spark] cloud-fan commented on a diff in pull request #42864: [WIP][SPARK-45112][SQL] Use UnresolvedFunction based resolution in SQL Dataset functions

2023-09-17 Thread via GitHub
cloud-fan commented on code in PR #42864: URL: https://github.com/apache/spark/pull/42864#discussion_r1328209431 ## python/pyspark/sql/column.py: ## @@ -712,11 +712,11 @@ def __getitem__(self, k: Any) -> "Column": >>> df = spark.createDataFrame([('abce

[GitHub] [spark] LuciferYang commented on a diff in pull request #42968: [SPARK-45113][FOLLOWUP] Fix test failed in Scala 2.13

2023-09-17 Thread via GitHub
LuciferYang commented on code in PR #42968: URL: https://github.com/apache/spark/pull/42968#discussion_r1328205481 ## .github/workflows/build_and_test.yml: ## @@ -383,6 +383,7 @@ jobs: SKIP_PACKAGING: true METASPACE_SIZE: 1g BRANCH: ${{ inputs.branch }} +

[GitHub] [spark] LuciferYang opened a new pull request, #42968: [SPARK-45113][FOLLOWUP] Fix test failed in Scala 2.13

2023-09-17 Thread via GitHub
LuciferYang opened a new pull request, #42968: URL: https://github.com/apache/spark/pull/42968 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

[GitHub] [spark] ulysses-you opened a new pull request, #42967: [SPARK-45191][SQL] InMemoryTableScanExec simpleStringWithNodeId adds columnar info

2023-09-17 Thread via GitHub
ulysses-you opened a new pull request, #42967: URL: https://github.com/apache/spark/pull/42967 ### What changes were proposed in this pull request? InMemoryTableScanExec supports both row-based and columnar input and output which is based on the cache serialzier. It would

[GitHub] [spark] cloud-fan commented on a diff in pull request #42951: [SPARK-45078][SQL] Fix `array_insert` ImplicitCastInputTypes not work

2023-09-17 Thread via GitHub
cloud-fan commented on code in PR #42951: URL: https://github.com/apache/spark/pull/42951#discussion_r1328202310 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala: ## @@ -4749,7 +4749,6 @@ case class ArrayInsert( }

[GitHub] [spark] cloud-fan closed pull request #42952: [SPARK-45184][SQL][DOCS][TESTS] Remove orphaned error class documents

2023-09-17 Thread via GitHub
cloud-fan closed pull request #42952: [SPARK-45184][SQL][DOCS][TESTS] Remove orphaned error class documents URL: https://github.com/apache/spark/pull/42952 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] cloud-fan commented on pull request #42952: [SPARK-45184][SQL][DOCS][TESTS] Remove orphaned error class documents

2023-09-17 Thread via GitHub
cloud-fan commented on PR #42952: URL: https://github.com/apache/spark/pull/42952#issuecomment-1722674085 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

[GitHub] [spark] zhengruifeng commented on a diff in pull request #42938: [SPARK-44788][CONNECT][PYTHON][SQL] Add from_xml and schema_of_xml to pyspark, spark connect and sql function

2023-09-17 Thread via GitHub
zhengruifeng commented on code in PR #42938: URL: https://github.com/apache/spark/pull/42938#discussion_r1328198653 ## python/pyspark/sql/tests/connect/test_connect_function.py: ## @@ -1821,6 +1821,111 @@ def test_json_functions(self): sdf.select(SF.to_json(SF.struc

[GitHub] [spark] sandip-db commented on a diff in pull request #42938: [SPARK-44788][CONNECT][PYTHON][SQL] Add from_xml and schema_of_xml to pyspark, spark connect and sql function

2023-09-17 Thread via GitHub
sandip-db commented on code in PR #42938: URL: https://github.com/apache/spark/pull/42938#discussion_r1328196005 ## connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/FunctionTestSuite.scala: ## @@ -229,6 +229,18 @@ class FunctionTestSuite extends ConnectFunSuite {

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #42938: [SPARK-44788][CONNECT][PYTHON][SQL] Add from_xml and schema_of_xml to pyspark, spark connect and sql function

2023-09-17 Thread via GitHub
HyukjinKwon commented on code in PR #42938: URL: https://github.com/apache/spark/pull/42938#discussion_r1328194879 ## connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/FunctionTestSuite.scala: ## @@ -229,6 +229,18 @@ class FunctionTestSuite extends ConnectFunSuite

[GitHub] [spark] itholic commented on pull request #42793: [SPARK-45065][PYTHON][PS] Support Pandas 2.1.0

2023-09-17 Thread via GitHub
itholic commented on PR #42793: URL: https://github.com/apache/spark/pull/42793#issuecomment-1722660266 CI link: https://github.com/itholic/spark/actions/runs/6216894150 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] dcoliversun commented on pull request #42943: [SPARK-45175][K8S] download krb5.conf from remote storage in spark-submit on k8s

2023-09-17 Thread via GitHub
dcoliversun commented on PR #42943: URL: https://github.com/apache/spark/pull/42943#issuecomment-1722653759 @dongjoon-hyun @yaooqinn Thanks for your review. And this is a good question. The specific scenario of this PR is to support users to use krb5.conf on cloud storage, in which authenti

[GitHub] [spark] panbingkun closed pull request #41824: [SPARK-43570][SPARK-43571][PYTHON][TESTS] Enable DateOpsTests.[test_rsub|test_sub] for pandas 2.0.0.

2023-09-17 Thread via GitHub
panbingkun closed pull request #41824: [SPARK-43570][SPARK-43571][PYTHON][TESTS] Enable DateOpsTests.[test_rsub|test_sub] for pandas 2.0.0. URL: https://github.com/apache/spark/pull/41824 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] panbingkun commented on pull request #41824: [SPARK-43570][SPARK-43571][PYTHON][TESTS] Enable DateOpsTests.[test_rsub|test_sub] for pandas 2.0.0.

2023-09-17 Thread via GitHub
panbingkun commented on PR #41824: URL: https://github.com/apache/spark/pull/41824#issuecomment-1722652907 I'm good with fixing it in the current way, let me close it now. > Oh, I just realized that this is already fixed from #42533. > > But seems like the approach is a bit diff

[GitHub] [spark] chenyu-opensource commented on pull request #42919: [SPARK-45160][DOCS]Update the default value of 'spark.executor.logs.rolling.strategy'

2023-09-17 Thread via GitHub
chenyu-opensource commented on PR #42919: URL: https://github.com/apache/spark/pull/42919#issuecomment-1722644892 @srowen I had use a new issure. https://issues.apache.org/jira/browse/SPARK-45160 -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] zhengruifeng commented on pull request #42966: [SPARK-45179][DOCS][FOLLOWUP] Add migration guide for Numpy minimum version upgrade

2023-09-17 Thread via GitHub
zhengruifeng commented on PR #42966: URL: https://github.com/apache/spark/pull/42966#issuecomment-1722644456 CI link: https://github.com/zhengruifeng/spark/actions/runs/6216985551 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHu

[GitHub] [spark] zhengruifeng commented on a diff in pull request #42966: [SPARK-45179][DOCS][FOLLOWUP] Add migration guide for Numpy minimum version upgrade

2023-09-17 Thread via GitHub
zhengruifeng commented on code in PR #42966: URL: https://github.com/apache/spark/pull/42966#discussion_r1328185671 ## python/docs/source/migration_guide/pyspark_upgrade.rst: ## @@ -22,6 +22,8 @@ Upgrading PySpark Upgrading from PySpark 3.5 to 4.0 -

[GitHub] [spark] zhengruifeng commented on a diff in pull request #42966: [SPARK-45179][DOCS][FOLLOWUP] Add migration guide for Numpy minimum version upgrade

2023-09-17 Thread via GitHub
zhengruifeng commented on code in PR #42966: URL: https://github.com/apache/spark/pull/42966#discussion_r1328185625 ## python/docs/source/migration_guide/pyspark_upgrade.rst: ## @@ -22,6 +22,8 @@ Upgrading PySpark Upgrading from PySpark 3.5 to 4.0 -

[GitHub] [spark] zhengruifeng opened a new pull request, #42966: [SPARK-45179][DOCS][FOLLOWUP] Add migration guide for Numpy minimum version upgrade

2023-09-17 Thread via GitHub
zhengruifeng opened a new pull request, #42966: URL: https://github.com/apache/spark/pull/42966 ### What changes were proposed in this pull request? Add migration guide for Numpy minimum version upgrade ### Why are the changes needed? to inform users about this important change

[GitHub] [spark] itholic commented on a diff in pull request #42955: [SPARK-43628][SPARK-43629][CONNECT][PS][TESTS] Clear message for JVM dependent tests.

2023-09-17 Thread via GitHub
itholic commented on code in PR #42955: URL: https://github.com/apache/spark/pull/42955#discussion_r1328183416 ## python/pyspark/pandas/tests/computation/test_compute.py: ## @@ -101,16 +101,10 @@ def test_mode(self): with self.assertRaises(ValueError): psdf

[GitHub] [spark] itholic commented on pull request #42962: [SPARK-44033][PYTHON] Added support for binary ops for list like objects

2023-09-17 Thread via GitHub
itholic commented on PR #42962: URL: https://github.com/apache/spark/pull/42962#issuecomment-1722634015 Basically, could you do: - [ ] complete the PR description - [ ] Add a basic unit tests - [ ] Reformatting codes by running `./dev/reformat-python` -- This is an automated messa

[GitHub] [spark] zhengruifeng commented on a diff in pull request #42955: [SPARK-43628][SPARK-43629][CONNECT][PS][TESTS] Clear message for JVM dependent tests.

2023-09-17 Thread via GitHub
zhengruifeng commented on code in PR #42955: URL: https://github.com/apache/spark/pull/42955#discussion_r1328181845 ## python/pyspark/pandas/tests/computation/test_compute.py: ## @@ -101,16 +101,10 @@ def test_mode(self): with self.assertRaises(ValueError):

[GitHub] [spark] itholic commented on pull request #42962: [SPARK-44033][PYTHON] Added support for binary ops for list like objects

2023-09-17 Thread via GitHub
itholic commented on PR #42962: URL: https://github.com/apache/spark/pull/42962#issuecomment-1722632813 Also let's reformat the Python codes. We can easily reformat by running `./dev/reformat-python` on the project root path. -- This is an automated message from the Apache Git Service. To

[GitHub] [spark] itholic commented on pull request #42962: [SPARK-44033][PYTHON] Added support for binary ops for list like objects

2023-09-17 Thread via GitHub
itholic commented on PR #42962: URL: https://github.com/apache/spark/pull/42962#issuecomment-1722632383 Let's fill the PR description and add a unit tests first to verify the if the function is working as expected. We can start with adding a very basic test into `python/pyspark/panda

[GitHub] [spark] HyukjinKwon commented on pull request #42962: [SPARK-44033][PYTHON] Added support for binary ops for list like objects

2023-09-17 Thread via GitHub
HyukjinKwon commented on PR #42962: URL: https://github.com/apache/spark/pull/42962#issuecomment-1722616184 cc @itholic -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #42949: [SPARK-45093][CONNECT][PYTHON] Error reporting for addArtifacts query

2023-09-17 Thread via GitHub
HyukjinKwon commented on code in PR #42949: URL: https://github.com/apache/spark/pull/42949#discussion_r1328174896 ## python/pyspark/sql/connect/client/artifact.py: ## @@ -271,6 +276,7 @@ def add_artifacts(self, *path: str, pyfile: bool, archive: bool, file: bool) -> r

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #42949: [SPARK-45093][CONNECT][PYTHON] Error reporting for addArtifacts query

2023-09-17 Thread via GitHub
HyukjinKwon commented on code in PR #42949: URL: https://github.com/apache/spark/pull/42949#discussion_r1328174821 ## python/pyspark/sql/connect/client/logging.py: ## @@ -0,0 +1,43 @@ +import logging +import os +from typing import Optional + +__all__ = [ +"logger", Review C

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #42939: SPARK-43254: Assign a name to the error _LEGACY_ERROR_TEMP_2018

2023-09-17 Thread via GitHub
HyukjinKwon commented on code in PR #42939: URL: https://github.com/apache/spark/pull/42939#discussion_r1328174627 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/ExpressionEncoder.scala: ## @@ -170,7 +170,7 @@ object ExpressionEncoder { * Function that

[GitHub] [spark] github-actions[bot] closed pull request #39691: [SPARK-31561][SQL] Add QUALIFY clause

2023-09-17 Thread via GitHub
github-actions[bot] closed pull request #39691: [SPARK-31561][SQL] Add QUALIFY clause URL: https://github.com/apache/spark/pull/39691 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

[GitHub] [spark] github-actions[bot] closed pull request #40990: [SPARK-43317][SQL] Support combine adjacent aggregation

2023-09-17 Thread via GitHub
github-actions[bot] closed pull request #40990: [SPARK-43317][SQL] Support combine adjacent aggregation URL: https://github.com/apache/spark/pull/40990 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

[GitHub] [spark] github-actions[bot] closed pull request #41108: [SPARK-43427][Protobuf] spark protobuf: modify serde behavior of unsigned integer types

2023-09-17 Thread via GitHub
github-actions[bot] closed pull request #41108: [SPARK-43427][Protobuf] spark protobuf: modify serde behavior of unsigned integer types URL: https://github.com/apache/spark/pull/41108 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitH

[GitHub] [spark] github-actions[bot] closed pull request #41417: [SPARK-43908][SQL] Choose the bigger rowCount to initialize BloomFilterAggregate in InjectRuntimeFilter

2023-09-17 Thread via GitHub
github-actions[bot] closed pull request #41417: [SPARK-43908][SQL] Choose the bigger rowCount to initialize BloomFilterAggregate in InjectRuntimeFilter URL: https://github.com/apache/spark/pull/41417 -- This is an automated message from the Apache Git Service. To respond to the message, pleas

[GitHub] [spark] github-actions[bot] commented on pull request #41498: [SPARK-44001][Protobuf] spark protobuf: handle well known wrapper types

2023-09-17 Thread via GitHub
github-actions[bot] commented on PR #41498: URL: https://github.com/apache/spark/pull/41498#issuecomment-1722611922 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[GitHub] [spark] HyukjinKwon commented on pull request #42916: [MiNOR][DOCS] Fix a typo in HashAggregateExec.scala

2023-09-17 Thread via GitHub
HyukjinKwon commented on PR #42916: URL: https://github.com/apache/spark/pull/42916#issuecomment-1722611273 @neshkeev once you set up your github acitons in your fork, please rebase this so the test is retriggered. -- This is an automated message from the Apache Git Service. To respond to

[GitHub] [spark] zhengruifeng commented on pull request #42958: [SPARK-45168][PYTHON][FOLLOWUP] Add migration guide for Pandas minimum version upgrade

2023-09-17 Thread via GitHub
zhengruifeng commented on PR #42958: URL: https://github.com/apache/spark/pull/42958#issuecomment-1722609739 late LGTM, thanks -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #42929: [SPARK-45167][CONNECT][PYTHON] Python client must call `release_all`

2023-09-17 Thread via GitHub
HyukjinKwon commented on code in PR #42929: URL: https://github.com/apache/spark/pull/42929#discussion_r1328171606 ## python/pyspark/sql/tests/connect/client/test_client.py: ## @@ -147,15 +150,33 @@ def _stub_with(self, execute=None, attach=None): attach_ops=Respons

[GitHub] [spark] HyukjinKwon opened a new pull request, #42965: [SPARK-45167][CONNECT][PYTHON][FOLLOW-UP] Use lighter threading Rlock, and use the existing eventually util function

2023-09-17 Thread via GitHub
HyukjinKwon opened a new pull request, #42965: URL: https://github.com/apache/spark/pull/42965 ### What changes were proposed in this pull request? This PR is a followup of https://github.com/apache/spark/pull/42929 that: - Use lighter threading `Rlock` instead of multithreading `Rl

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #42956: [SPARK-43654][CONNECT][PS][TESTS] Enable `InternalFrameParityTests.test_from_pandas`

2023-09-17 Thread via GitHub
HyukjinKwon commented on code in PR #42956: URL: https://github.com/apache/spark/pull/42956#discussion_r1328167074 ## python/pyspark/pandas/tests/connect/test_parity_internal.py: ## @@ -15,18 +15,86 @@ # limitations under the License. # import unittest +import pandas as pd

[GitHub] [spark] gengliangwang opened a new pull request, #42964: [SPARK-45189][SQL] Creating UnresolvedRelation from TableIdentifier should include the catalog field

2023-09-17 Thread via GitHub
gengliangwang opened a new pull request, #42964: URL: https://github.com/apache/spark/pull/42964 ### What changes were proposed in this pull request? Creating UnresolvedRelation from TableIdentifier should include the catalog field ### Why are the changes needed?

[GitHub] [spark] agubichev opened a new pull request, #42963: WIP: refactor Window operator

2023-09-17 Thread via GitHub
agubichev opened a new pull request, #42963: URL: https://github.com/apache/spark/pull/42963 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? No ### How

[GitHub] [spark] shuwang21 commented on a diff in pull request #42357: [SPARK-44306][YARN] Group FileStatus with few RPC calls within Yarn Client

2023-09-17 Thread via GitHub
shuwang21 commented on code in PR #42357: URL: https://github.com/apache/spark/pull/42357#discussion_r1328148388 ## resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/config.scala: ## @@ -462,6 +462,30 @@ package object config extends Logging { .stringConf

[GitHub] [spark] gdhuper opened a new pull request, #42962: [SPARK-44033][PYTHON] Added support for binary ops for list like objects

2023-09-17 Thread via GitHub
gdhuper opened a new pull request, #42962: URL: https://github.com/apache/spark/pull/42962 ### What changes were proposed in this pull request? ### Why are the changes needed? Fix for [Spark-44033](https://issues.apache.org/jira/browse/SPARK-44033) ###

  1   2   >