[GitHub] [spark] LuciferYang commented on a diff in pull request #38779: [WIP][SPARK-41244][UI] Introducing a Protobuf serializer for UI data on KV store

2022-11-23 Thread GitBox
LuciferYang commented on code in PR #38779: URL: https://github.com/apache/spark/pull/38779#discussion_r1031155887 ## project/SparkBuild.scala: ## @@ -607,9 +607,25 @@ object SparkParallelTestGrouping { object Core { import scala.sys.process.Process + import BuildCommons.

[GitHub] [spark] EnricoMi commented on a diff in pull request #38223: [SPARK-40770][PYTHON] Improved error messages for applyInPandas for schema mismatch

2022-11-23 Thread GitBox
EnricoMi commented on code in PR #38223: URL: https://github.com/apache/spark/pull/38223#discussion_r1031150953 ## python/pyspark/sql/tests/pandas/test_pandas_cogrouped_map.py: ## @@ -165,100 +148,191 @@ def merge_pandas(lft, _): ) def test_apply_in_panda

[GitHub] [spark] EnricoMi commented on a diff in pull request #38223: [SPARK-40770][PYTHON] Improved error messages for applyInPandas for schema mismatch

2022-11-23 Thread GitBox
EnricoMi commented on code in PR #38223: URL: https://github.com/apache/spark/pull/38223#discussion_r1031150574 ## python/pyspark/worker.py: ## @@ -146,7 +146,74 @@ def verify_result_type(result): ) -def wrap_cogrouped_map_pandas_udf(f, return_type, argspec): +def verif

[GitHub] [spark] HyukjinKwon commented on pull request #38787: [SPARK-41251][PS][INFRA] Upgrade pandas from 1.5.1 to 1.5.2

2022-11-23 Thread GitBox
HyukjinKwon commented on PR #38787: URL: https://github.com/apache/spark/pull/38787#issuecomment-1326072793 cc @Yikun @itholic @xinrong-meng FYI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

[GitHub] [spark] LuciferYang commented on a diff in pull request #38779: [WIP][SPARK-41244][UI] Introducing a Protobuf serializer for UI data on KV store

2022-11-23 Thread GitBox
LuciferYang commented on code in PR #38779: URL: https://github.com/apache/spark/pull/38779#discussion_r1031148805 ## core/pom.xml: ## @@ -616,6 +621,50 @@ + +org.apache.maven.plugins +maven-shade-plugin + + fa

[GitHub] [spark] LuciferYang commented on a diff in pull request #38779: [WIP][SPARK-41244][UI] Introducing a Protobuf serializer for UI data on KV store

2022-11-23 Thread GitBox
LuciferYang commented on code in PR #38779: URL: https://github.com/apache/spark/pull/38779#discussion_r1031147004 ## core/pom.xml: ## @@ -616,6 +621,50 @@ + +org.apache.maven.plugins Review Comment: SPARK-40593 and SPARK-41215 do

[GitHub] [spark] zhengruifeng commented on pull request #38774: [SPARK-41240][CONNECT][BUILD][INFRA] Upgrade `Protobuf` to 3.19.5

2022-11-23 Thread GitBox
zhengruifeng commented on PR #38774: URL: https://github.com/apache/spark/pull/38774#issuecomment-1326069519 we may also need to change this place https://github.com/apache/spark/blob/master/connector/connect/src/main/buf.gen.yaml#L30 cc @grundprinzip -- This is an automated messa

[GitHub] [spark] panbingkun opened a new pull request, #38787: [SPARK-41251][PS][INFRA] Upgrade pandas from 1.5.1 to 1.5.2

2022-11-23 Thread GitBox
panbingkun opened a new pull request, #38787: URL: https://github.com/apache/spark/pull/38787 ### What changes were proposed in this pull request? This PR proposes upgrading pandas to 1.5.2, for pandas API on Spark. New version of pandas (1.5.12) was released at Nov 22, 2022. Release

[GitHub] [spark] zhengruifeng commented on pull request #38770: [SPARK-41238][CONNECT][PYTHON] Support more built-in datatypes

2022-11-23 Thread GitBox
zhengruifeng commented on PR #38770: URL: https://github.com/apache/spark/pull/38770#issuecomment-1326058481 > Can you test both nullable=true and nullable=false case? the tests added in `test_schema` covers those cases -- This is an automated message from the Apache Git Service. To

[GitHub] [spark] wilfred-s commented on pull request #38780: [SPARK-41185][K8S][DOCS] Remove ARM limitation for YuniKorn from docs

2022-11-23 Thread GitBox
wilfred-s commented on PR #38780: URL: https://github.com/apache/spark/pull/38780#issuecomment-1326057909 already did that directly after it was logged -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

[GitHub] [spark] LuciferYang commented on a diff in pull request #38782: [SPARK-38728][SQL] Test the error class: FAILED_RENAME_PATH

2022-11-23 Thread GitBox
LuciferYang commented on code in PR #38782: URL: https://github.com/apache/spark/pull/38782#discussion_r1031134476 ## sql/core/src/test/scala/org/apache/spark/sql/errors/QueryExecutionErrorsSuite.scala: ## @@ -637,6 +637,33 @@ class QueryExecutionErrorsSuite sqlState = "0

[GitHub] [spark] LuciferYang commented on pull request #38598: [SPARK-41097][CORE][SQL][SS][PROTOBUF] Remove redundant collection conversion base on Scala 2.13 code

2022-11-23 Thread GitBox
LuciferYang commented on PR #38598: URL: https://github.com/apache/spark/pull/38598#issuecomment-1326049324 ready to merge -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

[GitHub] [spark] itholic commented on pull request #38664: [SPARK-41147][SQL] Assign a name to the legacy error class `_LEGACY_ERROR_TEMP_1042`

2022-11-23 Thread GitBox
itholic commented on PR #38664: URL: https://github.com/apache/spark/pull/38664#issuecomment-1326044896 Thanks for the review, @MaxGekk Just addressed the comments! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use t

[GitHub] [spark] gengliangwang commented on pull request #38783: [SPARK-41247][BUILD] Unify the Protobuf versions in Spark connect and Protobuf connector

2022-11-23 Thread GitBox
gengliangwang commented on PR #38783: URL: https://github.com/apache/spark/pull/38783#issuecomment-1326027832 > shall we also shade protobuf in Spark Core in this PR? Then the proto version is unified in all the places. In this PR there is no actual usage. I would prefer to shade it i

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38768: [SPARK-41230][CONNECT][PYTHON] Remove `str` from Aggregate expression type

2022-11-23 Thread GitBox
zhengruifeng commented on code in PR #38768: URL: https://github.com/apache/spark/pull/38768#discussion_r1031102422 ## python/pyspark/sql/connect/dataframe.py: ## @@ -51,24 +52,11 @@ class GroupingFrame(object): Review Comment: not related to this PR, but shall we renam

[GitHub] [spark] cloud-fan commented on pull request #38783: [SPARK-41247][BUILD] Unify the Protobuf versions in Spark connect and Protobuf connector

2022-11-23 Thread GitBox
cloud-fan commented on PR #38783: URL: https://github.com/apache/spark/pull/38783#issuecomment-1326008658 shall we also shade protobuf in Spark Core in this PR? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

[GitHub] [spark] cloud-fan commented on a diff in pull request #38783: [SPARK-41247][BUILD] Unify the Protobuf versions in Spark connect and Protobuf connector

2022-11-23 Thread GitBox
cloud-fan commented on code in PR #38783: URL: https://github.com/apache/spark/pull/38783#discussion_r1031094917 ## pom.xml: ## @@ -118,7 +118,10 @@ 2.19.0 3.3.4 -2.5.0 + + 2.5.0 + Review Comment: ```suggestion ``` -- This is an

[GitHub] [spark] MaxGekk closed pull request #38710: [SPARK-41179][SQL] Assign a name to the error class _LEGACY_ERROR_TEMP_1092

2022-11-23 Thread GitBox
MaxGekk closed pull request #38710: [SPARK-41179][SQL] Assign a name to the error class _LEGACY_ERROR_TEMP_1092 URL: https://github.com/apache/spark/pull/38710 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] MaxGekk commented on pull request #38710: [SPARK-41179][SQL] Assign a name to the error class _LEGACY_ERROR_TEMP_1092

2022-11-23 Thread GitBox
MaxGekk commented on PR #38710: URL: https://github.com/apache/spark/pull/38710#issuecomment-1326004383 +1, LGTM. Merging to master. Thank you, @panbingkun. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL ab

[GitHub] [spark] amaliujia commented on pull request #38786: [SPARK-41250][CONNECT][PYTHON] DataFrame.to_pandas should not return optional pandas dataframe

2022-11-23 Thread GitBox
amaliujia commented on PR #38786: URL: https://github.com/apache/spark/pull/38786#issuecomment-1326002119 @zhengruifeng @HyukjinKwon cc @xinrong-meng @grundprinzip -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use t

[GitHub] [spark] MaxGekk commented on a diff in pull request #38769: [SPARK-41228][SQL] Rename & Improve error message for `COLUMN_NOT_IN_GROUP_BY_CLAUSE`.

2022-11-23 Thread GitBox
MaxGekk commented on code in PR #38769: URL: https://github.com/apache/spark/pull/38769#discussion_r1031089197 ## core/src/main/resources/error/error-classes.json: ## @@ -785,6 +779,13 @@ "Malformed Protobuf messages are detected in message deserialization. Parse Mode: .

[GitHub] [spark] amaliujia opened a new pull request, #38786: [SPARK-41250][CONNECT][PYTHON] DataFrame.to_pandas should not return optional pandas dataframe

2022-11-23 Thread GitBox
amaliujia opened a new pull request, #38786: URL: https://github.com/apache/spark/pull/38786 ### What changes were proposed in this pull request? The server guarantees to send at least one arrow batch with schema even there is empty result. In this case, ` `DataFrame.to_pandas

[GitHub] [spark] itholic commented on a diff in pull request #38769: [SPARK-41228][SQL] Rename & Improve error message for `COLUMN_NOT_IN_GROUP_BY_CLAUSE`.

2022-11-23 Thread GitBox
itholic commented on code in PR #38769: URL: https://github.com/apache/spark/pull/38769#discussion_r1031083666 ## core/src/main/resources/error/error-classes.json: ## @@ -785,6 +779,13 @@ "Malformed Protobuf messages are detected in message deserialization. Parse Mode: .

[GitHub] [spark] itholic commented on a diff in pull request #38769: [SPARK-41228][SQL] Rename & Improve error message for `COLUMN_NOT_IN_GROUP_BY_CLAUSE`.

2022-11-23 Thread GitBox
itholic commented on code in PR #38769: URL: https://github.com/apache/spark/pull/38769#discussion_r1031083666 ## core/src/main/resources/error/error-classes.json: ## @@ -785,6 +779,13 @@ "Malformed Protobuf messages are detected in message deserialization. Parse Mode: .

[GitHub] [spark] itholic commented on a diff in pull request #38769: [SPARK-41228][SQL] Rename & Improve error message for `COLUMN_NOT_IN_GROUP_BY_CLAUSE`.

2022-11-23 Thread GitBox
itholic commented on code in PR #38769: URL: https://github.com/apache/spark/pull/38769#discussion_r1031083666 ## core/src/main/resources/error/error-classes.json: ## @@ -785,6 +779,13 @@ "Malformed Protobuf messages are detected in message deserialization. Parse Mode: .

[GitHub] [spark] itholic commented on pull request #38769: [SPARK-41228][SQL] Rename & Improve error message for `COLUMN_NOT_IN_GROUP_BY_CLAUSE`.

2022-11-23 Thread GitBox
itholic commented on PR #38769: URL: https://github.com/apache/spark/pull/38769#issuecomment-1325994942 Fixed the Python test first. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

[GitHub] [spark] itholic commented on a diff in pull request #38769: [SPARK-41228][SQL] Rename & Improve error message for `COLUMN_NOT_IN_GROUP_BY_CLAUSE`.

2022-11-23 Thread GitBox
itholic commented on code in PR #38769: URL: https://github.com/apache/spark/pull/38769#discussion_r1031085119 ## core/src/main/resources/error/error-classes.json: ## @@ -785,6 +779,13 @@ "Malformed Protobuf messages are detected in message deserialization. Parse Mode: .

[GitHub] [spark] MaxGekk closed pull request #38707: [SPARK-41176][SQL] Assign a name to the error class _LEGACY_ERROR_TEMP_1042

2022-11-23 Thread GitBox
MaxGekk closed pull request #38707: [SPARK-41176][SQL] Assign a name to the error class _LEGACY_ERROR_TEMP_1042 URL: https://github.com/apache/spark/pull/38707 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] MaxGekk commented on pull request #38707: [SPARK-41176][SQL] Assign a name to the error class _LEGACY_ERROR_TEMP_1042

2022-11-23 Thread GitBox
MaxGekk commented on PR #38707: URL: https://github.com/apache/spark/pull/38707#issuecomment-1325992375 +1, LGTM. Merging to master. Thank you, @panbingkun and @srielau for review. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to G

[GitHub] [spark] itholic commented on a diff in pull request #38769: [SPARK-41228][SQL] Rename & Improve error message for `COLUMN_NOT_IN_GROUP_BY_CLAUSE`.

2022-11-23 Thread GitBox
itholic commented on code in PR #38769: URL: https://github.com/apache/spark/pull/38769#discussion_r1031083666 ## core/src/main/resources/error/error-classes.json: ## @@ -785,6 +779,13 @@ "Malformed Protobuf messages are detected in message deserialization. Parse Mode: .

[GitHub] [spark] MaxGekk commented on pull request #38782: [SPARK-38728][SQL] Test the error class: FAILED_RENAME_PATH

2022-11-23 Thread GitBox
MaxGekk commented on PR #38782: URL: https://github.com/apache/spark/pull/38782#issuecomment-1325989625 cc @panbingkun @LuciferYang @itholic -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

[GitHub] [spark] MaxGekk commented on pull request #38766: [MINOR][SQL] Fix error message for `UNEXPECTED_INPUT_TYPE`

2022-11-23 Thread GitBox
MaxGekk commented on PR #38766: URL: https://github.com/apache/spark/pull/38766#issuecomment-1325988492 +1, LGTM. Merged to master. Thank you, @itholic. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] MaxGekk closed pull request #38766: [MINOR][SQL] Fix error message for `UNEXPECTED_INPUT_TYPE`

2022-11-23 Thread GitBox
MaxGekk closed pull request #38766: [MINOR][SQL] Fix error message for `UNEXPECTED_INPUT_TYPE` URL: https://github.com/apache/spark/pull/38766 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

[GitHub] [spark] MaxGekk commented on a diff in pull request #38784: [SPARK-41248][SQL] Add "spark.sql.json.enablePartialResults" to enable/disable JSON partial results parsing added in SPARK-40646

2022-11-23 Thread GitBox
MaxGekk commented on code in PR #38784: URL: https://github.com/apache/spark/pull/38784#discussion_r1031078969 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -3629,6 +3629,15 @@ object SQLConf { .booleanConf .createWithDefault(true)

[GitHub] [spark] MaxGekk commented on pull request #38772: [SPARK-41237][SQL] Assign a name to the error class `_LEGACY_ERROR_TEMP_0030`

2022-11-23 Thread GitBox
MaxGekk commented on PR #38772: URL: https://github.com/apache/spark/pull/38772#issuecomment-1325981880 > Maybe do we want to consolidate them in one rule ? I agree. Let's use consolidate. I would prefer `DATATYPE` since it is shorter ;-) -- This is an automated message from the Ap

[GitHub] [spark] itholic commented on a diff in pull request #38769: [SPARK-41228][SQL] Rename & Improve error message for `COLUMN_NOT_IN_GROUP_BY_CLAUSE`.

2022-11-23 Thread GitBox
itholic commented on code in PR #38769: URL: https://github.com/apache/spark/pull/38769#discussion_r1031072613 ## core/src/main/resources/error/error-classes.json: ## @@ -785,6 +779,13 @@ "Malformed Protobuf messages are detected in message deserialization. Parse Mode: .

[GitHub] [spark] itholic commented on pull request #38772: [SPARK-41237][SQL] Assign a name to the error class `_LEGACY_ERROR_TEMP_0030`

2022-11-23 Thread GitBox
itholic commented on PR #38772: URL: https://github.com/apache/spark/pull/38772#issuecomment-1325973768 Btw, we use both `DATA_TYPE` and `DATATYPE` for error class name. Maybe do we want to consolidate them in one rule ? -- This is an automated message from the Apache Git Service. To re

[GitHub] [spark] HyukjinKwon closed pull request #38659: [SPARK-41114][CONNECT] Support local data for LocalRelation

2022-11-23 Thread GitBox
HyukjinKwon closed pull request #38659: [SPARK-41114][CONNECT] Support local data for LocalRelation URL: https://github.com/apache/spark/pull/38659 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

[GitHub] [spark] HyukjinKwon commented on pull request #38659: [SPARK-41114][CONNECT] Support local data for LocalRelation

2022-11-23 Thread GitBox
HyukjinKwon commented on PR #38659: URL: https://github.com/apache/spark/pull/38659#issuecomment-1325972726 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] itholic commented on pull request #38772: [SPARK-41237][SQL] Assign a name to the error class `_LEGACY_ERROR_TEMP_0030`

2022-11-23 Thread GitBox
itholic commented on PR #38772: URL: https://github.com/apache/spark/pull/38772#issuecomment-1325972686 > Can't you re-use the existing error class `UNSUPPORTED_DATATYPE`? Oh... I missed that one. Sounds good! -- This is an automated message from the Apache Git Service. To respond t

[GitHub] [spark] HeartSaVioR commented on pull request #38785: [SPARK-41249][SS] Add acceptance test for self-union on streaming query

2022-11-23 Thread GitBox
HeartSaVioR commented on PR #38785: URL: https://github.com/apache/spark/pull/38785#issuecomment-1325972411 cc. @zsxwing @viirya Please take a look. Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] HeartSaVioR opened a new pull request, #38785: [SPARK-41249][SS] Add acceptance test for self-union on streaming query

2022-11-23 Thread GitBox
HeartSaVioR opened a new pull request, #38785: URL: https://github.com/apache/spark/pull/38785 ### What changes were proposed in this pull request? This PR proposes to add a new test suite specifically for self-union tests on streaming query. The test cases are acceptance tests for 4

[GitHub] [spark] ulysses-you commented on a diff in pull request #38739: [SPARK-41207][SQL] Fix BinaryArithmetic with negative scale

2022-11-23 Thread GitBox
ulysses-you commented on code in PR #38739: URL: https://github.com/apache/spark/pull/38739#discussion_r1031054191 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/DecimalPrecisionSuite.scala: ## @@ -276,9 +276,9 @@ class DecimalPrecisionSuite extends Analys

[GitHub] [spark] sadikovi commented on a diff in pull request #38784: [SPARK-41248] Add "spark.sql.json.enablePartialResults" to enable/disable JSON partial results parsing added in SPARK-40646

2022-11-23 Thread GitBox
sadikovi commented on code in PR #38784: URL: https://github.com/apache/spark/pull/38784#discussion_r1031065321 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -3629,6 +3629,15 @@ object SQLConf { .booleanConf .createWithDefault(true)

[GitHub] [spark] sadikovi opened a new pull request, #38784: [SPARK-41248] Add "spark.sql.json.enablePartialResults" to enable/disable JSON partial results parsing added in SPARK-40646

2022-11-23 Thread GitBox
sadikovi opened a new pull request, #38784: URL: https://github.com/apache/spark/pull/38784 ### What changes were proposed in this pull request? This PR adds a SQL config `spark.sql.json.enablePartialResults` to control SPARK-40646 change. This allows us to fall back to th

[GitHub] [spark] gengliangwang commented on pull request #38779: [WIP][SPARK-41244][UI] Introducing a Protobuf serializer for UI data on KV store

2022-11-23 Thread GitBox
gengliangwang commented on PR #38779: URL: https://github.com/apache/spark/pull/38779#issuecomment-1325963483 Pending on https://github.com/apache/spark/pull/38783 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the U

[GitHub] [spark] gengliangwang commented on a diff in pull request #38779: [SPARK-41244][UI] Introducing a Protobuf serializer for UI data on KV store

2022-11-23 Thread GitBox
gengliangwang commented on code in PR #38779: URL: https://github.com/apache/spark/pull/38779#discussion_r1031060603 ## core/pom.xml: ## @@ -532,7 +533,12 @@ org.apache.commons commons-crypto - + + com.google.protobuf Review Comment: Created http

[GitHub] [spark] gengliangwang opened a new pull request, #38783: [SPARK-41247][BUILD] Unify the Protobuf versions in Spark connect and Protobuf connector

2022-11-23 Thread GitBox
gengliangwang opened a new pull request, #38783: URL: https://github.com/apache/spark/pull/38783 ### What changes were proposed in this pull request? Unify the Protobuf versions in Spark connect and Protobuf connector. ### Why are the changes needed? The Prot

[GitHub] [spark] gengliangwang commented on a diff in pull request #38779: [SPARK-41244][UI] Introducing a Protobuf serializer for UI data on KV store

2022-11-23 Thread GitBox
gengliangwang commented on code in PR #38779: URL: https://github.com/apache/spark/pull/38779#discussion_r1031058129 ## core/pom.xml: ## @@ -616,6 +621,50 @@ + +org.apache.maven.plugins +maven-shade-plugin + +

[GitHub] [spark] gengliangwang commented on a diff in pull request #38779: [SPARK-41244][UI] Introducing a Protobuf serializer for UI data on KV store

2022-11-23 Thread GitBox
gengliangwang commented on code in PR #38779: URL: https://github.com/apache/spark/pull/38779#discussion_r1031056472 ## core/pom.xml: ## @@ -532,7 +533,12 @@ org.apache.commons commons-crypto - + + com.google.protobuf Review Comment: This is for

[GitHub] [spark] cloud-fan commented on pull request #38750: [SPARK-41226][SQL] Refactor Spark types by introducing physical types

2022-11-23 Thread GitBox
cloud-fan commented on PR #38750: URL: https://github.com/apache/spark/pull/38750#issuecomment-1325955535 cc @MaxGekk @gengliangwang -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] cloud-fan commented on pull request #38750: [SPARK-41226][SQL] Refactor Spark types by introducing physical types

2022-11-23 Thread GitBox
cloud-fan commented on PR #38750: URL: https://github.com/apache/spark/pull/38750#issuecomment-1325955422 nice refactor! We should have done this earlier, before adding ansi interval types and timestamp ntz. Now we should have more confidence of these new data types. -- This is an automa

[GitHub] [spark] cloud-fan commented on a diff in pull request #38750: [SPARK-41226][SQL] Refactor Spark types by introducing physical types

2022-11-23 Thread GitBox
cloud-fan commented on code in PR #38750: URL: https://github.com/apache/spark/pull/38750#discussion_r1031053828 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/types/PhysicalDataType.scala: ## @@ -0,0 +1,58 @@ +/* + * Licensed to the Apache Software Foundation (ASF

[GitHub] [spark] ulysses-you commented on a diff in pull request #38739: [SPARK-41207][SQL] Fix BinaryArithmetic with negative scale

2022-11-23 Thread GitBox
ulysses-you commented on code in PR #38739: URL: https://github.com/apache/spark/pull/38739#discussion_r1031054191 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/DecimalPrecisionSuite.scala: ## @@ -276,9 +276,9 @@ class DecimalPrecisionSuite extends Analys

[GitHub] [spark] cloud-fan commented on a diff in pull request #38750: [SPARK-41226][SQL] Refactor Spark types by introducing physical types

2022-11-23 Thread GitBox
cloud-fan commented on code in PR #38750: URL: https://github.com/apache/spark/pull/38750#discussion_r1031054150 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/types/PhysicalDataType.scala: ## @@ -0,0 +1,58 @@ +/* + * Licensed to the Apache Software Foundation (ASF

[GitHub] [spark] cloud-fan commented on a diff in pull request #38750: [SPARK-41226][SQL] Refactor Spark types by introducing physical types

2022-11-23 Thread GitBox
cloud-fan commented on code in PR #38750: URL: https://github.com/apache/spark/pull/38750#discussion_r1031053475 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/types/PhysicalDataType.scala: ## @@ -0,0 +1,58 @@ +/* + * Licensed to the Apache Software Foundation (ASF

[GitHub] [spark] cloud-fan commented on a diff in pull request #38750: [SPARK-41226][SQL] Refactor Spark types by introducing physical types

2022-11-23 Thread GitBox
cloud-fan commented on code in PR #38750: URL: https://github.com/apache/spark/pull/38750#discussion_r1031052882 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala: ## @@ -1901,24 +1904,27 @@ object CodeGenerator extends Logging

[GitHub] [spark] cloud-fan commented on a diff in pull request #38750: [SPARK-41226][SQL] Refactor Spark types by introducing physical types

2022-11-23 Thread GitBox
cloud-fan commented on code in PR #38750: URL: https://github.com/apache/spark/pull/38750#discussion_r1031051704 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/RowEncoder.scala: ## @@ -253,13 +255,16 @@ object RowEncoder { } case _: DayTimeInte

[GitHub] [spark] cloud-fan commented on a diff in pull request #38750: [SPARK-41226][SQL] Refactor Spark types by introducing physical types

2022-11-23 Thread GitBox
cloud-fan commented on code in PR #38750: URL: https://github.com/apache/spark/pull/38750#discussion_r1031051302 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/RowEncoder.scala: ## @@ -214,6 +215,7 @@ object RowEncoder { } else { nonNullOut

[GitHub] [spark] cloud-fan commented on a diff in pull request #38750: [SPARK-41226][SQL] Refactor Spark types by introducing physical types

2022-11-23 Thread GitBox
cloud-fan commented on code in PR #38750: URL: https://github.com/apache/spark/pull/38750#discussion_r1031050991 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/InternalRow.scala: ## @@ -129,24 +130,25 @@ object InternalRow { */ def getAccessor(dt: DataType,

[GitHub] [spark] ibuder opened a new pull request, #38782: [SPARK-38728][SQL] Test the error class: FAILED_RENAME_PATH

2022-11-23 Thread GitBox
ibuder opened a new pull request, #38782: URL: https://github.com/apache/spark/pull/38782 ### What changes were proposed in this pull request? This adds a test for error class FAILED_RENAME_PATH in QueryExecutionErrorsSuite. ### Why are the changes needed? @MaxGekk t

[GitHub] [spark] cloud-fan commented on a diff in pull request #38760: [SPARK-41219][SQL] Decimal changePrecision should work with decimal(0, 0)

2022-11-23 Thread GitBox
cloud-fan commented on code in PR #38760: URL: https://github.com/apache/spark/pull/38760#discussion_r1031049024 ## sql/catalyst/src/main/scala/org/apache/spark/sql/types/Decimal.scala: ## @@ -420,7 +420,11 @@ final class Decimal extends Ordered[Decimal] with Serializable {

[GitHub] [spark] Yikun commented on pull request #38780: [SPARK-41185][K8S][DOCS] Remove ARM limitation for YuniKorn from docs

2022-11-23 Thread GitBox
Yikun commented on PR #38780: URL: https://github.com/apache/spark/pull/38780#issuecomment-1325947902 @wilfred-s Would you mind setting your github action accroding to note of https://github.com/apache/spark/pull/38780/checks?check_run_id=9680278988. cc @dongjoon-hyun @yangwwei --

[GitHub] [spark] yorksity opened a new pull request, #38781: [SPARK-41246][core] Solve the problem of RddId negative

2022-11-23 Thread GitBox
yorksity opened a new pull request, #38781: URL: https://github.com/apache/spark/pull/38781 ### What changes were proposed in this pull request? ### Why are the changes needed? solve the problem occurs in long running tasks, such as stream tasks ### Does t

[GitHub] [spark] cloud-fan commented on pull request #38640: [WIP][SPARK-41124][SQL][TEST] Add DSv2 PlanStabilitySuites

2022-11-23 Thread GitBox
cloud-fan commented on PR #38640: URL: https://github.com/apache/spark/pull/38640#issuecomment-1325947062 > Actually, I'm happy to work on making parquet v2 tables available in a separate ticket/PR if you can give my some guidance. I tried to do it long time ago but failed as there ar

[GitHub] [spark] ulysses-you commented on a diff in pull request #38760: [SPARK-41219][SQL] Decimal changePrecision should work with decimal(0, 0)

2022-11-23 Thread GitBox
ulysses-you commented on code in PR #38760: URL: https://github.com/apache/spark/pull/38760#discussion_r1031044398 ## sql/catalyst/src/main/scala/org/apache/spark/sql/types/Decimal.scala: ## @@ -420,7 +420,11 @@ final class Decimal extends Ordered[Decimal] with Serializable {

[GitHub] [spark] amaliujia commented on pull request #38768: [SPARK-41230][CONNECT][PYTHON] Remove `str` from Aggregate expression type

2022-11-23 Thread GitBox
amaliujia commented on PR #38768: URL: https://github.com/apache/spark/pull/38768#issuecomment-1325935616 @zhengruifeng @grundprinzip can you take another look? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

[GitHub] [spark] wilfred-s opened a new pull request, #38780: [SPARK-41185][K8S][DOCS] Remove ARM limitation for YuniKorn from docs

2022-11-23 Thread GitBox
wilfred-s opened a new pull request, #38780: URL: https://github.com/apache/spark/pull/38780 ### What changes were proposed in this pull request? Remove the limitations section from the K8s documentation for YuniKorn. ### Why are the changes needed? The limitation section is not

[GitHub] [spark] cloud-fan commented on a diff in pull request #38779: [SPARK-41244][UI] Introducing a Protobuf serializer for UI data on KV store

2022-11-23 Thread GitBox
cloud-fan commented on code in PR #38779: URL: https://github.com/apache/spark/pull/38779#discussion_r1031036283 ## core/pom.xml: ## @@ -616,6 +621,50 @@ + +org.apache.maven.plugins +maven-shade-plugin + + fals

[GitHub] [spark] cloud-fan commented on a diff in pull request #38760: [SPARK-41219][SQL] Decimal changePrecision should work with decimal(0, 0)

2022-11-23 Thread GitBox
cloud-fan commented on code in PR #38760: URL: https://github.com/apache/spark/pull/38760#discussion_r1031035938 ## sql/catalyst/src/main/scala/org/apache/spark/sql/types/Decimal.scala: ## @@ -420,7 +420,11 @@ final class Decimal extends Ordered[Decimal] with Serializable {

[GitHub] [spark] cloud-fan commented on a diff in pull request #38760: [SPARK-41219][SQL] Decimal changePrecision should work with decimal(0, 0)

2022-11-23 Thread GitBox
cloud-fan commented on code in PR #38760: URL: https://github.com/apache/spark/pull/38760#discussion_r1031035738 ## sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala: ## @@ -3537,6 +3537,12 @@ class DataFrameSuite extends QueryTest }.isEmpty) } } +

[GitHub] [spark] amaliujia commented on a diff in pull request #38768: [SPARK-41230][CONNECT][PYTHON] Remove `str` from Aggregate expression type

2022-11-23 Thread GitBox
amaliujia commented on code in PR #38768: URL: https://github.com/apache/spark/pull/38768#discussion_r1031035663 ## python/pyspark/sql/connect/plan.py: ## @@ -558,29 +557,19 @@ def _repr_html_(self) -> str: class Aggregate(LogicalPlan): -MeasureType = Tuple["ExpressionO

[GitHub] [spark] LuciferYang commented on a diff in pull request #38779: [SPARK-41244][UI] Introducing a Protobuf serializer for UI data on KV store

2022-11-23 Thread GitBox
LuciferYang commented on code in PR #38779: URL: https://github.com/apache/spark/pull/38779#discussion_r1031033790 ## project/SparkBuild.scala: ## @@ -607,9 +607,25 @@ object SparkParallelTestGrouping { object Core { import scala.sys.process.Process + import BuildCommons.

[GitHub] [spark] ulysses-you commented on pull request #38761: [SPARK-40988][SQL][TEST] Test case for insert partition should verify value

2022-11-23 Thread GitBox
ulysses-you commented on PR #38761: URL: https://github.com/apache/spark/pull/38761#issuecomment-1325911082 thank you @rangareddy , it seems some wrong with github action. Can you rebase your branch to retry ? -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [spark] LuciferYang commented on a diff in pull request #38779: [SPARK-41244][UI] Introducing a Protobuf serializer for UI data on KV store

2022-11-23 Thread GitBox
LuciferYang commented on code in PR #38779: URL: https://github.com/apache/spark/pull/38779#discussion_r1031027267 ## core/pom.xml: ## @@ -616,6 +621,50 @@ + +org.apache.maven.plugins +maven-shade-plugin + + fa

[GitHub] [spark] cloud-fan commented on a diff in pull request #38779: [SPARK-41244][UI] Introducing a Protobuf serializer for UI data on KV store

2022-11-23 Thread GitBox
cloud-fan commented on code in PR #38779: URL: https://github.com/apache/spark/pull/38779#discussion_r1031025029 ## core/pom.xml: ## @@ -616,6 +621,50 @@ + +org.apache.maven.plugins +maven-shade-plugin + + fals

[GitHub] [spark] cloud-fan commented on a diff in pull request #38779: [SPARK-41244][UI] Introducing a Protobuf serializer for UI data on KV store

2022-11-23 Thread GitBox
cloud-fan commented on code in PR #38779: URL: https://github.com/apache/spark/pull/38779#discussion_r1031025176 ## core/pom.xml: ## @@ -532,7 +533,12 @@ org.apache.commons commons-crypto - + + com.google.protobuf Review Comment: Can we have indi

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38742: [SPARK-41216][CONNECT][PYTHON] Make AnalyzePlan support multiple analysis tasks And implement isLocal/isStreaming/printSchema/

2022-11-23 Thread GitBox
zhengruifeng commented on code in PR #38742: URL: https://github.com/apache/spark/pull/38742#discussion_r1031023071 ## connector/connect/src/main/protobuf/spark/connect/base.proto: ## @@ -100,18 +70,138 @@ message AnalyzePlanRequest { // logging purposes and will not be inter

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38742: [SPARK-41216][CONNECT][PYTHON] Make AnalyzePlan support multiple analysis tasks And implement isLocal/isStreaming/printSchema/

2022-11-23 Thread GitBox
zhengruifeng commented on code in PR #38742: URL: https://github.com/apache/spark/pull/38742#discussion_r1031022749 ## connector/connect/src/main/protobuf/spark/connect/base.proto: ## @@ -100,18 +70,138 @@ message AnalyzePlanRequest { // logging purposes and will not be inter

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38742: [SPARK-41216][CONNECT][PYTHON] Make AnalyzePlan support multiple analysis tasks And implement isLocal/isStreaming/printSchema/

2022-11-23 Thread GitBox
zhengruifeng commented on code in PR #38742: URL: https://github.com/apache/spark/pull/38742#discussion_r1031021916 ## python/pyspark/sql/connect/dataframe.py: ## @@ -797,6 +796,137 @@ def schema(self) -> StructType: else: return self._schema +@proper

[GitHub] [spark] LuciferYang commented on pull request #38743: [SPARK-41215][BUILD][PROTOBUF] Support user configurable protoc executables when building Spark Protobuf.

2022-11-23 Thread GitBox
LuciferYang commented on PR #38743: URL: https://github.com/apache/spark/pull/38743#issuecomment-1325892309 Congratulations @WolverineJiang -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

[GitHub] [spark] WolverineJiang commented on pull request #38743: [SPARK-41215][BUILD][PROTOBUF] Support user configurable protoc executables when building Spark Protobuf.

2022-11-23 Thread GitBox
WolverineJiang commented on PR #38743: URL: https://github.com/apache/spark/pull/38743#issuecomment-1325891951 Thanks @HyukjinKwon @LuciferYang @AmplabJenkins~ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL ab

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38778: [SPARK-41227][CONNECT][PYTHON] Implement `DataFrame.crossJoin`

2022-11-23 Thread GitBox
zhengruifeng commented on code in PR #38778: URL: https://github.com/apache/spark/pull/38778#discussion_r1031016264 ## python/pyspark/sql/tests/connect/test_connect_plan_only.py: ## @@ -58,6 +58,12 @@ def test_join_condition(self): )._plan.to_proto(self.connect)

[GitHub] [spark] HyukjinKwon closed pull request #38743: [SPARK-41215][BUILD][PROTOBUF] Support user configurable protoc executables when building Spark Protobuf.

2022-11-23 Thread GitBox
HyukjinKwon closed pull request #38743: [SPARK-41215][BUILD][PROTOBUF] Support user configurable protoc executables when building Spark Protobuf. URL: https://github.com/apache/spark/pull/38743 -- This is an automated message from the Apache Git Service. To respond to the message, please log

[GitHub] [spark] HyukjinKwon commented on pull request #38743: [SPARK-41215][BUILD][PROTOBUF] Support user configurable protoc executables when building Spark Protobuf.

2022-11-23 Thread GitBox
HyukjinKwon commented on PR #38743: URL: https://github.com/apache/spark/pull/38743#issuecomment-1325885697 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] cloud-fan commented on a diff in pull request #38742: [SPARK-41216][CONNECT][PYTHON] Make AnalyzePlan support multiple analysis tasks And implement isLocal/isStreaming/printSchema/sem

2022-11-23 Thread GitBox
cloud-fan commented on code in PR #38742: URL: https://github.com/apache/spark/pull/38742#discussion_r1031014545 ## python/pyspark/sql/connect/dataframe.py: ## @@ -797,6 +796,137 @@ def schema(self) -> StructType: else: return self._schema +@property

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #38567: [SPARK-41054][UI][CORE] Support RocksDB as KVStore in live UI

2022-11-23 Thread GitBox
dongjoon-hyun commented on code in PR #38567: URL: https://github.com/apache/spark/pull/38567#discussion_r1031012684 ## core/src/main/scala/org/apache/spark/status/AppStatusStore.scala: ## @@ -769,7 +772,12 @@ private[spark] object AppStatusStore { def createLiveStore(

[GitHub] [spark] zhengruifeng commented on pull request #38757: [SPARK-41222][CONNECT][PYTHON] Unify the typing definitions

2022-11-23 Thread GitBox
zhengruifeng commented on PR #38757: URL: https://github.com/apache/spark/pull/38757#issuecomment-1325877024 merged into master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

[GitHub] [spark] zhengruifeng closed pull request #38757: [SPARK-41222][CONNECT][PYTHON] Unify the typing definitions

2022-11-23 Thread GitBox
zhengruifeng closed pull request #38757: [SPARK-41222][CONNECT][PYTHON] Unify the typing definitions URL: https://github.com/apache/spark/pull/38757 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38742: [SPARK-41216][CONNECT][PYTHON] Make AnalyzePlan support multiple analysis tasks And implement isLocal/isStreaming/printSchema/

2022-11-23 Thread GitBox
zhengruifeng commented on code in PR #38742: URL: https://github.com/apache/spark/pull/38742#discussion_r1031003410 ## connector/connect/src/main/protobuf/spark/connect/base.proto: ## @@ -100,18 +70,138 @@ message AnalyzePlanRequest { // logging purposes and will not be inter

[GitHub] [spark] yabola commented on a diff in pull request #38560: [WIP][SPARK-38005][core] Support cleaning up merged shuffle files and state from external shuffle service

2022-11-23 Thread GitBox
yabola commented on code in PR #38560: URL: https://github.com/apache/spark/pull/38560#discussion_r1030998613 ## core/src/main/scala/org/apache/spark/storage/BlockManagerMasterEndpoint.scala: ## @@ -327,43 +327,52 @@ class BlockManagerMasterEndpoint( } }.toSeq -

[GitHub] [spark] yabola commented on a diff in pull request #38560: [WIP][SPARK-38005][core] Support cleaning up merged shuffle files and state from external shuffle service

2022-11-23 Thread GitBox
yabola commented on code in PR #38560: URL: https://github.com/apache/spark/pull/38560#discussion_r1023816561 ## common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java: ## @@ -654,8 +731,7 @@ public MergeStatuses finalizeShuffleMerge(

[GitHub] [spark] panbingkun commented on pull request #38710: [SPARK-41179][SQL] Assign a name to the error class _LEGACY_ERROR_TEMP_1092

2022-11-23 Thread GitBox
panbingkun commented on PR #38710: URL: https://github.com/apache/spark/pull/38710#issuecomment-1325855794 > @panbingkun Please, resolve conflicts. Done. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

[GitHub] [spark] amaliujia commented on a diff in pull request #38742: [SPARK-41216][CONNECT][PYTHON] Make AnalyzePlan support multiple analysis tasks And implement isLocal/isStreaming/printSchema/sem

2022-11-23 Thread GitBox
amaliujia commented on code in PR #38742: URL: https://github.com/apache/spark/pull/38742#discussion_r1030992346 ## connector/connect/src/main/protobuf/spark/connect/base.proto: ## @@ -100,18 +70,138 @@ message AnalyzePlanRequest { // logging purposes and will not be interpre

[GitHub] [spark] zhengruifeng commented on a diff in pull request #38742: [SPARK-41216][CONNECT][PYTHON] Make AnalyzePlan support multiple analysis tasks And implement isLocal/isStreaming/printSchema/

2022-11-23 Thread GitBox
zhengruifeng commented on code in PR #38742: URL: https://github.com/apache/spark/pull/38742#discussion_r1030992308 ## connector/connect/src/main/protobuf/spark/connect/base.proto: ## @@ -100,18 +70,138 @@ message AnalyzePlanRequest { // logging purposes and will not be inter

[GitHub] [spark] HeartSaVioR commented on pull request #38777: [SPARK-41151][FOLLOW-UP][SQL] Keep built-in file _metadata fields nullable value consistent

2022-11-23 Thread GitBox
HeartSaVioR commented on PR #38777: URL: https://github.com/apache/spark/pull/38777#issuecomment-1325844004 Ah OK, let's wait for feedback from @ala and ensure we make clear before merging it. -- This is an automated message from the Apache Git Service. To respond to the message, please l

[GitHub] [spark] Yaohua628 commented on pull request #38777: [SPARK-41151][FOLLOW-UP][SQL] Keep built-in file _metadata fields nullable value consistent

2022-11-23 Thread GitBox
Yaohua628 commented on PR #38777: URL: https://github.com/apache/spark/pull/38777#issuecomment-1325843243 Thank you, Jungtaek! Also wanna confirm with @ala on nullability of `row_index` -- This is an automated message from the Apache Git Service. To respond to the message, please log on t

[GitHub] [spark] amaliujia commented on pull request #38770: [SPARK-41238][CONNECT][PYTHON] Support more datatypes

2022-11-23 Thread GitBox
amaliujia commented on PR #38770: URL: https://github.com/apache/spark/pull/38770#issuecomment-1325840723 Thanks. Can you test both `nullable=true` and `nullable=false` case? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to G

[GitHub] [spark] HyukjinKwon closed pull request #38767: [SPARK-41183][SQL][FOLLOWUP] Change the name from injectPlanNormalizationRules to injectPlanNormalizationRule

2022-11-23 Thread GitBox
HyukjinKwon closed pull request #38767: [SPARK-41183][SQL][FOLLOWUP] Change the name from injectPlanNormalizationRules to injectPlanNormalizationRule URL: https://github.com/apache/spark/pull/38767 -- This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [spark] HyukjinKwon commented on pull request #38767: [SPARK-41183][SQL][FOLLOWUP] Change the name from injectPlanNormalizationRules to injectPlanNormalizationRule

2022-11-23 Thread GitBox
HyukjinKwon commented on PR #38767: URL: https://github.com/apache/spark/pull/38767#issuecomment-1325839678 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

  1   2   >