[GitHub] [spark] rmcyang commented on a diff in pull request #37638: [SPARK-33573][SHUFFLE][YARN] Shuffle server side metrics for Push-based shuffle

2022-10-17 Thread GitBox
rmcyang commented on code in PR #37638: URL: https://github.com/apache/spark/pull/37638#discussion_r997761687 ## common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java: ## @@ -1904,4 +1941,42 @@ long getPos() { return pos;

[GitHub] [spark] cloud-fan closed pull request #37887: [SPARK-40360] ALREADY_EXISTS and NOT_FOUND exceptions

2022-10-17 Thread GitBox
cloud-fan closed pull request #37887: [SPARK-40360] ALREADY_EXISTS and NOT_FOUND exceptions URL: https://github.com/apache/spark/pull/37887 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

[GitHub] [spark] cloud-fan commented on pull request #37887: [SPARK-40360] ALREADY_EXISTS and NOT_FOUND exceptions

2022-10-17 Thread GitBox
cloud-fan commented on PR #37887: URL: https://github.com/apache/spark/pull/37887#issuecomment-1281856578 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

[GitHub] [spark] rmcyang commented on a diff in pull request #37638: [SPARK-33573][SHUFFLE][YARN] Shuffle server side metrics for Push-based shuffle

2022-10-17 Thread GitBox
rmcyang commented on code in PR #37638: URL: https://github.com/apache/spark/pull/37638#discussion_r997733472 ## common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java: ## @@ -1124,9 +1143,23 @@ private boolean isDuplicateBlock() {

[GitHub] [spark] rmcyang commented on a diff in pull request #37638: [SPARK-33573][SHUFFLE][YARN] Shuffle server side metrics for Push-based shuffle

2022-10-17 Thread GitBox
rmcyang commented on code in PR #37638: URL: https://github.com/apache/spark/pull/37638#discussion_r997751290 ## common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java: ## @@ -593,6 +607,9 @@ public void onData(String streamId, ByteBuf

[GitHub] [spark] rangadi commented on a diff in pull request #38286: [SPARK-40657] Add support for Java classes in Protobuf functions

2022-10-17 Thread GitBox
rangadi commented on code in PR #38286: URL: https://github.com/apache/spark/pull/38286#discussion_r997748022 ## connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/utils/SchemaConverters.scala: ## @@ -66,6 +66,10 @@ object SchemaConverters { Some(DayTimeInt

[GitHub] [spark] LuciferYang commented on pull request #38294: [SPARK-40369][SQL] Migrate the type check failures of calls via reflection onto error classes

2022-10-17 Thread GitBox
LuciferYang commented on PR #38294: URL: https://github.com/apache/spark/pull/38294#issuecomment-1281844501 Test first -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To un

[GitHub] [spark] LuciferYang opened a new pull request, #38294: [SPARK-40369][SQL] Migrate the type check failures of calls via reflection onto error classes

2022-10-17 Thread GitBox
LuciferYang opened a new pull request, #38294: URL: https://github.com/apache/spark/pull/38294 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

[GitHub] [spark] rmcyang commented on a diff in pull request #37638: [SPARK-33573][SHUFFLE][YARN] Shuffle server side metrics for Push-based shuffle

2022-10-17 Thread GitBox
rmcyang commented on code in PR #37638: URL: https://github.com/apache/spark/pull/37638#discussion_r997736011 ## common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java: ## @@ -1124,9 +1143,23 @@ private boolean isDuplicateBlock() {

[GitHub] [spark] rmcyang commented on a diff in pull request #37638: [SPARK-33573][SHUFFLE][YARN] Shuffle server side metrics for Push-based shuffle

2022-10-17 Thread GitBox
rmcyang commented on code in PR #37638: URL: https://github.com/apache/spark/pull/37638#discussion_r997733472 ## common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java: ## @@ -1124,9 +1143,23 @@ private boolean isDuplicateBlock() {

[GitHub] [spark] gengliangwang commented on pull request #38257: [SPARK-40798][SQL] Alter partition should verify value follow storeAssignmentPolicy

2022-10-17 Thread GitBox
gengliangwang commented on PR #38257: URL: https://github.com/apache/spark/pull/38257#issuecomment-1281834266 @ulysses-you Thanks for the ping. Could you add a legacy flag and migration guide? -- This is an automated message from the Apache Git Service. To respond to the message, please l

[GitHub] [spark] amaliujia commented on a diff in pull request #38293: [SPARK-40828][CONNECT][PYTHON][TESTING] Drop Python test tables before and after unit tests

2022-10-17 Thread GitBox
amaliujia commented on code in PR #38293: URL: https://github.com/apache/spark/pull/38293#discussion_r997724002 ## python/pyspark/testing/connectutils.py: ## @@ -74,7 +73,7 @@ def _udf_mock(cls, *args, **kwargs) -> str: @classmethod def setUpClass(cls: Any) -> None:

[GitHub] [spark] amaliujia commented on pull request #38293: [SPARK-40828][CONNECT][PYTHON][TESTING] Drop Python test tables before and after unit tests

2022-10-17 Thread GitBox
amaliujia commented on PR #38293: URL: https://github.com/apache/spark/pull/38293#issuecomment-1281822734 R: @HyukjinKwon You mentioned we should drop test tables. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and us

[GitHub] [spark] amaliujia opened a new pull request, #38293: [SPARK-40828][CONNECT][PYTHON][TESTING] Drop Python test tables before and after unit tests

2022-10-17 Thread GitBox
amaliujia opened a new pull request, #38293: URL: https://github.com/apache/spark/pull/38293 ### What changes were proposed in this pull request? 1. Instead of random names, use fixed names for test tables. 2. Try to drop test tables before and after unit tests. ### W

[GitHub] [spark] cloud-fan closed pull request #38227: [SPARK-40774][CONNECT] Add Sample to proto and Connect DSL

2022-10-17 Thread GitBox
cloud-fan closed pull request #38227: [SPARK-40774][CONNECT] Add Sample to proto and Connect DSL URL: https://github.com/apache/spark/pull/38227 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

[GitHub] [spark] itholic commented on pull request #38292: [SPARK-40779][PS][TEST] Fix `corrwith` to work properly with different anchor.

2022-10-17 Thread GitBox
itholic commented on PR #38292: URL: https://github.com/apache/spark/pull/38292#issuecomment-1281816024 FYI: created JIRA for re-enabling the skipped test in the future: https://issues.apache.org/jira/browse/SPARK-40827 -- This is an automated message from the Apache Git Service. To respo

[GitHub] [spark] cloud-fan commented on pull request #38227: [SPARK-40774][CONNECT] Add Sample to proto and Connect DSL

2022-10-17 Thread GitBox
cloud-fan commented on PR #38227: URL: https://github.com/apache/spark/pull/38227#issuecomment-1281815840 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

[GitHub] [spark] itholic opened a new pull request, #38292: [SPARK-40779][PS][TEST] Fix `corrwith` to work properly with different anchor.

2022-10-17 Thread GitBox
itholic opened a new pull request, #38292: URL: https://github.com/apache/spark/pull/38292 ### What changes were proposed in this pull request? This PR proposes to disable some tests for DataFrame.corrwith since there is regression in pandas 1.5.0. We should re-enab

[GitHub] [spark] SandishKumarHN commented on a diff in pull request #38286: [SPARK-40657] Add support for Java classes in Protobuf functions

2022-10-17 Thread GitBox
SandishKumarHN commented on code in PR #38286: URL: https://github.com/apache/spark/pull/38286#discussion_r997714243 ## connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/utils/SchemaConverters.scala: ## @@ -66,6 +66,10 @@ object SchemaConverters { Some(Day

[GitHub] [spark] amaliujia closed pull request #38289: [DO NOT MERGE] Test Python Lint

2022-10-17 Thread GitBox
amaliujia closed pull request #38289: [DO NOT MERGE] Test Python Lint URL: https://github.com/apache/spark/pull/38289 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubsc

[GitHub] [spark] viirya commented on pull request #38291: [SPARK-40826][SS] Add additional checkpoint rename file check

2022-10-17 Thread GitBox
viirya commented on PR #38291: URL: https://github.com/apache/spark/pull/38291#issuecomment-1281760137 cc @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] viirya opened a new pull request, #38291: [SPARK-40826][SS] Add additional checkpoint rename file check

2022-10-17 Thread GitBox
viirya opened a new pull request, #38291: URL: https://github.com/apache/spark/pull/38291 ### What changes were proposed in this pull request? This adds additional checkpoint rename file check. ### Why are the changes needed? We encountered an issue recent

[GitHub] [spark] HyukjinKwon closed pull request #38278: [SPARK-40809][SPARK-40780][FOLLOW-UP] Improve filter and alias testing coverage in python client

2022-10-17 Thread GitBox
HyukjinKwon closed pull request #38278: [SPARK-40809][SPARK-40780][FOLLOW-UP] Improve filter and alias testing coverage in python client URL: https://github.com/apache/spark/pull/38278 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Git

[GitHub] [spark] HyukjinKwon commented on pull request #38278: [SPARK-40809][SPARK-40780][FOLLOW-UP] Improve filter and alias testing coverage in python client

2022-10-17 Thread GitBox
HyukjinKwon commented on PR #38278: URL: https://github.com/apache/spark/pull/38278#issuecomment-1281753706 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] HyukjinKwon closed pull request #38287: [SPARK-40796][BUILD][FOLLOW-UP] Fix Mypy check on unused "type: ignore"

2022-10-17 Thread GitBox
HyukjinKwon closed pull request #38287: [SPARK-40796][BUILD][FOLLOW-UP] Fix Mypy check on unused "type: ignore" URL: https://github.com/apache/spark/pull/38287 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] HyukjinKwon commented on pull request #38287: [SPARK-40796][BUILD][FOLLOW-UP] Fix Mypy check on unused "type: ignore"

2022-10-17 Thread GitBox
HyukjinKwon commented on PR #38287: URL: https://github.com/apache/spark/pull/38287#issuecomment-1281752931 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] HyukjinKwon closed pull request #38290: [SPARK-40799][BUILD][CONNECT][FOLLOW-UP] Enforce scalafmt for Spark Connect module

2022-10-17 Thread GitBox
HyukjinKwon closed pull request #38290: [SPARK-40799][BUILD][CONNECT][FOLLOW-UP] Enforce scalafmt for Spark Connect module URL: https://github.com/apache/spark/pull/38290 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use t

[GitHub] [spark] HyukjinKwon commented on pull request #38290: [SPARK-40799][BUILD][CONNECT][FOLLOW-UP] Enforce scalafmt for Spark Connect module

2022-10-17 Thread GitBox
HyukjinKwon commented on PR #38290: URL: https://github.com/apache/spark/pull/38290#issuecomment-1281743944 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] amaliujia opened a new pull request, #38290: [SPARK-40799][BUILD][CONNECT][FOLLOW-UP] Enforce scalafmt for Spark Connect module

2022-10-17 Thread GitBox
amaliujia opened a new pull request, #38290: URL: https://github.com/apache/spark/pull/38290 ### What changes were proposed in this pull request? This PR run `scalafmt` in master branch to format scala files in Connect module which will fix Scala lint. ### Why are the c

[GitHub] [spark] cloud-fan closed pull request #38200: [SPARK-40743][CONNECT] StructType should contain a list of StructField and each field should have a name

2022-10-17 Thread GitBox
cloud-fan closed pull request #38200: [SPARK-40743][CONNECT] StructType should contain a list of StructField and each field should have a name URL: https://github.com/apache/spark/pull/38200 -- This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] cloud-fan commented on pull request #38200: [SPARK-40743][CONNECT] StructType should contain a list of StructField and each field should have a name

2022-10-17 Thread GitBox
cloud-fan commented on PR #38200: URL: https://github.com/apache/spark/pull/38200#issuecomment-1281720501 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

[GitHub] [spark] cloud-fan commented on a diff in pull request #38200: [SPARK-40743][CONNECT] StructType should contain a list of StructField and each field should have a name

2022-10-17 Thread GitBox
cloud-fan commented on code in PR #38200: URL: https://github.com/apache/spark/pull/38200#discussion_r997649989 ## connector/connect/src/test/scala/org/apache/spark/sql/connect/planner/SparkConnectProtoSuite.scala: ## @@ -114,7 +115,26 @@ class SparkConnectProtoSuite extends Pla

[GitHub] [spark] HyukjinKwon commented on pull request #38212: [SPARK-40655][PYTHON][PROTOBUF] PySpark support for from_protobuf and to_protobuf

2022-10-17 Thread GitBox
HyukjinKwon commented on PR #38212: URL: https://github.com/apache/spark/pull/38212#issuecomment-1281688677 Will merge once the tests pass. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

[GitHub] [spark] ulysses-you commented on pull request #38257: [SPARK-40798][SQL] Alter partition should verify value follow storeAssignmentPolicy

2022-10-17 Thread GitBox
ulysses-you commented on PR #38257: URL: https://github.com/apache/spark/pull/38257#issuecomment-1281688522 also cc @MaxGekk @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

[GitHub] [spark] jackylee-ch commented on pull request #35669: [SPARK-38041][SQL] DataFilter pushed down dynamically

2022-10-17 Thread GitBox
jackylee-ch commented on PR #35669: URL: https://github.com/apache/spark/pull/35669#issuecomment-1281688191 @huaxingao Maybe remove the `Stale` tag? Also cc @cloud-fan @sunchao -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitH

[GitHub] [spark] amaliujia opened a new pull request, #38289: [DO NOT MERGE] Test Python Lint

2022-10-17 Thread GitBox
amaliujia opened a new pull request, #38289: URL: https://github.com/apache/spark/pull/38289 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How w

[GitHub] [spark] SandishKumarHN commented on a diff in pull request #38212: [SPARK-40655][PYTHON][PROTOBUF] PySpark support for from_protobuf and to_protobuf

2022-10-17 Thread GitBox
SandishKumarHN commented on code in PR #38212: URL: https://github.com/apache/spark/pull/38212#discussion_r997623711 ## python/docs/source/reference/pyspark.sql/protobuf.rst: ## @@ -0,0 +1,28 @@ +.. Licensed to the Apache Software Foundation (ASF) under one Review Comment:

[GitHub] [spark] HyukjinKwon commented on pull request #38212: [SPARK-40655][PYTHON][PROTOBUF] PySpark support for from_protobuf and to_protobuf

2022-10-17 Thread GitBox
HyukjinKwon commented on PR #38212: URL: https://github.com/apache/spark/pull/38212#issuecomment-1281669757 LGTM. Thanks for working on this, this is a nice feature. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38212: [SPARK-40655][PYTHON][PROTOBUF] PySpark support for from_protobuf and to_protobuf

2022-10-17 Thread GitBox
HyukjinKwon commented on code in PR #38212: URL: https://github.com/apache/spark/pull/38212#discussion_r997618683 ## connector/protobuf/src/test/resources/protobuf/pyspark_test.proto: ## @@ -0,0 +1,32 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38212: [SPARK-40655][PYTHON][PROTOBUF] PySpark support for from_protobuf and to_protobuf

2022-10-17 Thread GitBox
HyukjinKwon commented on code in PR #38212: URL: https://github.com/apache/spark/pull/38212#discussion_r997618344 ## python/pyspark/sql/protobuf/functions.py: ## @@ -0,0 +1,215 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agr

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38212: [SPARK-40655][PYTHON][PROTOBUF] PySpark support for from_protobuf and to_protobuf

2022-10-17 Thread GitBox
HyukjinKwon commented on code in PR #38212: URL: https://github.com/apache/spark/pull/38212#discussion_r997618270 ## python/pyspark/sql/protobuf/functions.py: ## @@ -0,0 +1,215 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agr

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38212: [SPARK-40655][PYTHON][PROTOBUF] PySpark support for from_protobuf and to_protobuf

2022-10-17 Thread GitBox
HyukjinKwon commented on code in PR #38212: URL: https://github.com/apache/spark/pull/38212#discussion_r997618165 ## python/pyspark/sql/protobuf/functions.py: ## @@ -0,0 +1,215 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agr

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38212: [SPARK-40655][PYTHON][PROTOBUF] PySpark support for from_protobuf and to_protobuf

2022-10-17 Thread GitBox
HyukjinKwon commented on code in PR #38212: URL: https://github.com/apache/spark/pull/38212#discussion_r997617692 ## python/pyspark/sql/protobuf/functions.py: ## @@ -0,0 +1,215 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agr

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38212: [SPARK-40655][PYTHON][PROTOBUF] PySpark support for from_protobuf and to_protobuf

2022-10-17 Thread GitBox
HyukjinKwon commented on code in PR #38212: URL: https://github.com/apache/spark/pull/38212#discussion_r997617587 ## python/pyspark/sql/protobuf/functions.py: ## @@ -0,0 +1,215 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agr

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38212: [SPARK-40655][PYTHON][PROTOBUF] PySpark support for from_protobuf and to_protobuf

2022-10-17 Thread GitBox
HyukjinKwon commented on code in PR #38212: URL: https://github.com/apache/spark/pull/38212#discussion_r997617458 ## python/pyspark/sql/protobuf/functions.py: ## @@ -0,0 +1,215 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agr

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38212: [SPARK-40655][PYTHON][PROTOBUF] PySpark support for from_protobuf and to_protobuf

2022-10-17 Thread GitBox
HyukjinKwon commented on code in PR #38212: URL: https://github.com/apache/spark/pull/38212#discussion_r997617233 ## python/docs/source/reference/pyspark.sql/protobuf.rst: ## @@ -0,0 +1,28 @@ +.. Licensed to the Apache Software Foundation (ASF) under one Review Comment: Let

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #38279: [SPARK-40816][CONNECT][PYTHON] Rename LogicalPlan.collect to LogicalPlan.to_proto

2022-10-17 Thread GitBox
HyukjinKwon commented on code in PR #38279: URL: https://github.com/apache/spark/pull/38279#discussion_r997615233 ## python/pyspark/sql/connect/plan.py: ## @@ -80,9 +80,19 @@ def _verify(self, session: "RemoteSparkSession") -> bool: return test_plan == plan -def

[GitHub] [spark] amaliujia commented on pull request #38287: [SPARK-40796][BUILD][FOLLOW-UP] Fix Mypy check on unused "type: ignore"

2022-10-17 Thread GitBox
amaliujia commented on PR #38287: URL: https://github.com/apache/spark/pull/38287#issuecomment-1281662879 The python code on Aggregate is stale anyway: they are calling non-existing fields already. -- This is an automated message from the Apache Git Service. To respond to the message, ple

[GitHub] [spark] github-actions[bot] closed pull request #36806: [SPARK-39398][GRAPHX]message checkpointer support storage level

2022-10-17 Thread GitBox
github-actions[bot] closed pull request #36806: [SPARK-39398][GRAPHX]message checkpointer support storage level URL: https://github.com/apache/spark/pull/36806 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] github-actions[bot] closed pull request #36768: [SPARK-39380][SQL] Ignore comment syntax in dfs command

2022-10-17 Thread GitBox
github-actions[bot] closed pull request #36768: [SPARK-39380][SQL] Ignore comment syntax in dfs command URL: https://github.com/apache/spark/pull/36768 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

[GitHub] [spark] github-actions[bot] closed pull request #36505: [SPARK-39131][SQL] Rewrite exists as LeftSemi earlier to allow filters to be inferred

2022-10-17 Thread GitBox
github-actions[bot] closed pull request #36505: [SPARK-39131][SQL] Rewrite exists as LeftSemi earlier to allow filters to be inferred URL: https://github.com/apache/spark/pull/36505 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHu

[GitHub] [spark] github-actions[bot] commented on pull request #35669: [SPARK-38041][SQL] DataFilter pushed down dynamically

2022-10-17 Thread GitBox
github-actions[bot] commented on PR #35669: URL: https://github.com/apache/spark/pull/35669#issuecomment-1281661695 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.

[GitHub] [spark] github-actions[bot] closed pull request #37098: [SPARK-39690][SQL] Fixes Reuse exchange across subqueries with AQE if subquery side exchange materialized first

2022-10-17 Thread GitBox
github-actions[bot] closed pull request #37098: [SPARK-39690][SQL] Fixes Reuse exchange across subqueries with AQE if subquery side exchange materialized first URL: https://github.com/apache/spark/pull/37098 -- This is an automated message from the Apache Git Service. To respond to the messag

[GitHub] [spark] amaliujia commented on pull request #38287: [SPARK-40796][BUILD][FOLLOW-UP] Fix Mypy check on unused "type: ignore"

2022-10-17 Thread GitBox
amaliujia commented on PR #38287: URL: https://github.com/apache/spark/pull/38287#issuecomment-1281661118 @zhengruifeng I don't know how this is triggered. If you check this example PR: https://github.com/apache/spark/pull/38275. After rebasing it started to fail on this issue. That

[GitHub] [spark] HyukjinKwon closed pull request #38258: [SPARK-40799] [BUILD] [CONNECT] Enforce scalafmt for Spark Connect module.

2022-10-17 Thread GitBox
HyukjinKwon closed pull request #38258: [SPARK-40799] [BUILD] [CONNECT] Enforce scalafmt for Spark Connect module. URL: https://github.com/apache/spark/pull/38258 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL ab

[GitHub] [spark] HyukjinKwon commented on pull request #38258: [SPARK-40799] [BUILD] [CONNECT] Enforce scalafmt for Spark Connect module.

2022-10-17 Thread GitBox
HyukjinKwon commented on PR #38258: URL: https://github.com/apache/spark/pull/38258#issuecomment-1281660155 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] HyukjinKwon closed pull request #38283: [SPARK-40818][CONNECT] Add Intersect to Connect proto and DSL

2022-10-17 Thread GitBox
HyukjinKwon closed pull request #38283: [SPARK-40818][CONNECT] Add Intersect to Connect proto and DSL URL: https://github.com/apache/spark/pull/38283 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] HyukjinKwon commented on pull request #38283: [SPARK-40818][CONNECT] Add Intersect to Connect proto and DSL

2022-10-17 Thread GitBox
HyukjinKwon commented on PR #38283: URL: https://github.com/apache/spark/pull/38283#issuecomment-1281659504 Oops, sorry guys. Closing -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] zhengruifeng commented on pull request #38287: [SPARK-40796][BUILD][FOLLOW-UP] Fix Mypy check on unused "type: ignore"

2022-10-17 Thread GitBox
zhengruifeng commented on PR #38287: URL: https://github.com/apache/spark/pull/38287#issuecomment-1281653230 it seems that the CI in master works well? https://github.com/apache/spark/actions/runs/3264890886/jobs/5368154706 -- This is an automated message from the Apache Git Service. To r

[GitHub] [spark] amaliujia commented on pull request #38287: [SPARK-40796][BUILD][FOLLOW-UP] Fix Mypy check on unused "type: ignore"

2022-10-17 Thread GitBox
amaliujia commented on PR #38287: URL: https://github.com/apache/spark/pull/38287#issuecomment-1281651358 Yes. This issue is fixed. Waiting for tests passing. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abo

[GitHub] [spark] amaliujia commented on pull request #38287: [SPARK-40796][BUILD][FOLLOW-UP] Fix unused "type: ignore" comment

2022-10-17 Thread GitBox
amaliujia commented on PR #38287: URL: https://github.com/apache/spark/pull/38287#issuecomment-1281626234 ah I see it is now: The python code needs to be updated because of the proto change. -- This is an automated message from the Apache Git Service. To respond to the message, plea

[GitHub] [spark] amaliujia commented on a diff in pull request #38275: [SPARK-40813][CONNECT] Add limit and offset to Connect DSL

2022-10-17 Thread GitBox
amaliujia commented on code in PR #38275: URL: https://github.com/apache/spark/pull/38275#discussion_r997460595 ## connector/connect/src/main/protobuf/spark/connect/relations.proto: ## @@ -37,10 +37,11 @@ message Relation { Join join = 5; Union union = 6; Sort sor

[GitHub] [spark] rangadi commented on a diff in pull request #38286: [SPARK-40657] Add support for Java classes in Protobuf functions

2022-10-17 Thread GitBox
rangadi commented on code in PR #38286: URL: https://github.com/apache/spark/pull/38286#discussion_r997356317 ## connector/protobuf/src/test/scala/org/apache/spark/sql/protobuf/ProtobufFunctionsSuite.scala: ## @@ -56,44 +91,45 @@ class ProtobufFunctionsSuite extends QueryTest wi

[GitHub] [spark] amaliujia commented on pull request #38287: [SPARK-40796][BUILD][FOLLOW-UP] Fix unused "type: ignore" comment

2022-10-17 Thread GitBox
amaliujia commented on PR #38287: URL: https://github.com/apache/spark/pull/38287#issuecomment-1281578578 https://user-images.githubusercontent.com/1938382/196296144-5b699f77-0c69-4efe-af43-a507c3ca123f.png";> Interesting. The ignore comment was actually useful. I am confused now on h

[GitHub] [spark] mposdev21 commented on a diff in pull request #38286: [SPARK-40657] Add support for Java classes in Protobuf functions

2022-10-17 Thread GitBox
mposdev21 commented on code in PR #38286: URL: https://github.com/apache/spark/pull/38286#discussion_r997544880 ## connector/protobuf/src/test/scala/org/apache/spark/sql/protobuf/ProtobufCatalystDataConversionSuite.scala: ## @@ -99,26 +115,32 @@ class ProtobufCatalystDataConvers

[GitHub] [spark] amaliujia commented on pull request #38264: [SPARK-40823][CONNECT] Connect Proto should carry unparsed identifiers

2022-10-17 Thread GitBox
amaliujia commented on PR #38264: URL: https://github.com/apache/spark/pull/38264#issuecomment-1281542461 R: @cloud-fan @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] rangadi commented on a diff in pull request #38286: [SPARK-40657] Add support for Java classes in Protobuf functions

2022-10-17 Thread GitBox
rangadi commented on code in PR #38286: URL: https://github.com/apache/spark/pull/38286#discussion_r997540684 ## connector/protobuf/pom.xml: ## @@ -110,6 +109,47 @@ + +com.github.os72 +protoc-jar-maven-plugin Review Comment:

[GitHub] [spark] SandishKumarHN commented on a diff in pull request #38286: [SPARK-40657] Add support for Java classes in Protobuf functions

2022-10-17 Thread GitBox
SandishKumarHN commented on code in PR #38286: URL: https://github.com/apache/spark/pull/38286#discussion_r997533289 ## connector/protobuf/pom.xml: ## @@ -110,6 +109,47 @@ + +com.github.os72 +protoc-jar-maven-plugin Review Com

[GitHub] [spark] alex-balikov opened a new pull request, #38288: [SPARK-40821][SQL][CORE][PYTHON][SS] Introduce window_time function to extract event time from the window column

2022-10-17 Thread GitBox
alex-balikov opened a new pull request, #38288: URL: https://github.com/apache/spark/pull/38288 ### What changes were proposed in this pull request? This PR introduces a window_time function to extract streaming event time from a window column produced by the window aggreg

[GitHub] [spark] rangadi commented on a diff in pull request #38286: [SPARK-40657] Add support for Java classes in Protobuf functions

2022-10-17 Thread GitBox
rangadi commented on code in PR #38286: URL: https://github.com/apache/spark/pull/38286#discussion_r997526883 ## connector/protobuf/src/test/scala/org/apache/spark/sql/protobuf/ProtobufCatalystDataConversionSuite.scala: ## @@ -99,26 +115,32 @@ class ProtobufCatalystDataConversio

[GitHub] [spark] rangadi commented on a diff in pull request #38286: [SPARK-40657] Add support for Java classes in Protobuf functions

2022-10-17 Thread GitBox
rangadi commented on code in PR #38286: URL: https://github.com/apache/spark/pull/38286#discussion_r997524177 ## connector/protobuf/src/test/scala/org/apache/spark/sql/protobuf/ProtobufCatalystDataConversionSuite.scala: ## @@ -99,26 +115,32 @@ class ProtobufCatalystDataConversio

[GitHub] [spark] amaliujia commented on a diff in pull request #38287: [SPARK-40796][BUILD][FOLLOW-UP] Fix unused "type: ignore" comment

2022-10-17 Thread GitBox
amaliujia commented on code in PR #38287: URL: https://github.com/apache/spark/pull/38287#discussion_r997523189 ## python/pyspark/sql/connect/plan.py: ## @@ -322,15 +322,11 @@ def _convert_measure( ) -> proto.Aggregate.AggregateFunction: exp, fun = m measu

[GitHub] [spark] amaliujia commented on a diff in pull request #38276: [SPARK-40812][CONNECT] Add Deduplicate to Connect proto and DSL

2022-10-17 Thread GitBox
amaliujia commented on code in PR #38276: URL: https://github.com/apache/spark/pull/38276#discussion_r997522129 ## connector/connect/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala: ## @@ -71,6 +73,42 @@ class SparkConnectPlanner(plan: proto.Relatio

[GitHub] [spark] sadikovi commented on pull request #38090: [SPARK-40646][SQL] Fix returning partial results in JSON data source and JSON functions

2022-10-17 Thread GitBox
sadikovi commented on PR #38090: URL: https://github.com/apache/spark/pull/38090#issuecomment-1281509174 Thanks for merging! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] mposdev21 commented on a diff in pull request #38286: [SPARK-40657] Add support for Java classes in Protobuf functions

2022-10-17 Thread GitBox
mposdev21 commented on code in PR #38286: URL: https://github.com/apache/spark/pull/38286#discussion_r997366952 ## connector/protobuf/src/test/scala/org/apache/spark/sql/protobuf/ProtobufCatalystDataConversionSuite.scala: ## @@ -99,26 +115,32 @@ class ProtobufCatalystDataConvers

[GitHub] [spark] amaliujia commented on a diff in pull request #38276: [SPARK-40812][CONNECT] Add Deduplicate to Connect proto and DSL

2022-10-17 Thread GitBox
amaliujia commented on code in PR #38276: URL: https://github.com/apache/spark/pull/38276#discussion_r997483148 ## connector/connect/src/test/scala/org/apache/spark/sql/connect/planner/SparkConnectPlannerSuite.scala: ## @@ -31,8 +31,11 @@ import org.apache.spark.sql.catalyst.pl

[GitHub] [spark] rangadi commented on a diff in pull request #38212: [SPARK-40655][PYTHON][PROTOBUF] PySpark support for from_protobuf and to_protobuf

2022-10-17 Thread GitBox
rangadi commented on code in PR #38212: URL: https://github.com/apache/spark/pull/38212#discussion_r997488830 ## python/pyspark/sql/protobuf/functions.py: ## @@ -0,0 +1,215 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreeme

[GitHub] [spark] rangadi commented on a diff in pull request #38212: [SPARK-40655][PYTHON][PROTOBUF] PySpark support for from_protobuf and to_protobuf

2022-10-17 Thread GitBox
rangadi commented on code in PR #38212: URL: https://github.com/apache/spark/pull/38212#discussion_r997488162 ## python/pyspark/sql/protobuf/functions.py: ## @@ -0,0 +1,215 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreeme

[GitHub] [spark] amaliujia commented on a diff in pull request #38276: [SPARK-40812][CONNECT] Add Deduplicate to Connect proto and DSL

2022-10-17 Thread GitBox
amaliujia commented on code in PR #38276: URL: https://github.com/apache/spark/pull/38276#discussion_r997483148 ## connector/connect/src/test/scala/org/apache/spark/sql/connect/planner/SparkConnectPlannerSuite.scala: ## @@ -31,8 +31,11 @@ import org.apache.spark.sql.catalyst.pl

[GitHub] [spark] amaliujia commented on a diff in pull request #38276: [SPARK-40812][CONNECT] Add Deduplicate to Connect proto and DSL

2022-10-17 Thread GitBox
amaliujia commented on code in PR #38276: URL: https://github.com/apache/spark/pull/38276#discussion_r997479516 ## connector/connect/src/test/scala/org/apache/spark/sql/connect/planner/SparkConnectDeduplicateSuite.scala: ## @@ -0,0 +1,68 @@ +/* + * Licensed to the Apache Softwar

[GitHub] [spark] amaliujia commented on a diff in pull request #38276: [SPARK-40812][CONNECT] Add Deduplicate to Connect proto and DSL

2022-10-17 Thread GitBox
amaliujia commented on code in PR #38276: URL: https://github.com/apache/spark/pull/38276#discussion_r997479516 ## connector/connect/src/test/scala/org/apache/spark/sql/connect/planner/SparkConnectDeduplicateSuite.scala: ## @@ -0,0 +1,68 @@ +/* + * Licensed to the Apache Softwar

[GitHub] [spark] rangadi commented on a diff in pull request #38286: [SPARK-40657] Add support for Java classes in Protobuf functions

2022-10-17 Thread GitBox
rangadi commented on code in PR #38286: URL: https://github.com/apache/spark/pull/38286#discussion_r997477047 ## connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/ProtobufDataToCatalyst.scala: ## @@ -108,18 +112,16 @@ private[protobuf] case class ProtobufDataToCata

[GitHub] [spark] amaliujia commented on a diff in pull request #38276: [SPARK-40812][CONNECT] Add Deduplicate to Connect proto and DSL

2022-10-17 Thread GitBox
amaliujia commented on code in PR #38276: URL: https://github.com/apache/spark/pull/38276#discussion_r997476071 ## connector/connect/src/main/protobuf/spark/connect/relations.proto: ## @@ -41,6 +41,7 @@ message Relation { Aggregate aggregate = 9; SQL sql = 10; Loc

[GitHub] [spark] amaliujia commented on a diff in pull request #38276: [SPARK-40812][CONNECT] Add Deduplicate to Connect proto and DSL

2022-10-17 Thread GitBox
amaliujia commented on code in PR #38276: URL: https://github.com/apache/spark/pull/38276#discussion_r997473877 ## connector/connect/src/main/protobuf/spark/connect/relations.proto: ## @@ -163,6 +164,14 @@ message Sort { } } +// Relation of type [[Deduplicate]] which have

[GitHub] [spark] amaliujia commented on a diff in pull request #38276: [SPARK-40812][CONNECT] Add Deduplicate to Connect proto and DSL

2022-10-17 Thread GitBox
amaliujia commented on code in PR #38276: URL: https://github.com/apache/spark/pull/38276#discussion_r997473877 ## connector/connect/src/main/protobuf/spark/connect/relations.proto: ## @@ -163,6 +164,14 @@ message Sort { } } +// Relation of type [[Deduplicate]] which have

[GitHub] [spark] amaliujia commented on a diff in pull request #38275: [SPARK-40813][CONNECT] Add limit and offset to Connect DSL

2022-10-17 Thread GitBox
amaliujia commented on code in PR #38275: URL: https://github.com/apache/spark/pull/38275#discussion_r997460595 ## connector/connect/src/main/protobuf/spark/connect/relations.proto: ## @@ -37,10 +37,11 @@ message Relation { Join join = 5; Union union = 6; Sort sor

[GitHub] [spark] amaliujia commented on pull request #38287: [SPARK-40796][BUILD][FOLLOW-UP] Fix unused "type: ignore" comment

2022-10-17 Thread GitBox
amaliujia commented on PR #38287: URL: https://github.com/apache/spark/pull/38287#issuecomment-1281433117 @HyukjinKwon @zhengruifeng -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] amaliujia commented on pull request #38275: [SPARK-40813][CONNECT] Add limit and offset to Connect DSL

2022-10-17 Thread GitBox
amaliujia commented on PR #38275: URL: https://github.com/apache/spark/pull/38275#issuecomment-1281415549 > Thanks for doing this. Maybe add some python tests as well? Or do you want to do this in a follow up? I will follow up for python. -- This is an automated message from the Ap

[GitHub] [spark] amaliujia commented on a diff in pull request #38275: [SPARK-40813][CONNECT] Add limit and offset to Connect DSL

2022-10-17 Thread GitBox
amaliujia commented on code in PR #38275: URL: https://github.com/apache/spark/pull/38275#discussion_r997460595 ## connector/connect/src/main/protobuf/spark/connect/relations.proto: ## @@ -37,10 +37,11 @@ message Relation { Join join = 5; Union union = 6; Sort sor

[GitHub] [spark] amaliujia opened a new pull request, #38287: [DO NOT MERGE] Test Python Lint

2022-10-17 Thread GitBox
amaliujia opened a new pull request, #38287: URL: https://github.com/apache/spark/pull/38287 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How w

[GitHub] [spark] AmplabJenkins commented on pull request #38285: [SPARK-40820][BUG] Creating StructType from Json

2022-10-17 Thread GitBox
AmplabJenkins commented on PR #38285: URL: https://github.com/apache/spark/pull/38285#issuecomment-1281389273 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] AmplabJenkins commented on pull request #38286: [SPARK-40657] Add support for Java classes in Protobuf functions

2022-10-17 Thread GitBox
AmplabJenkins commented on PR #38286: URL: https://github.com/apache/spark/pull/38286#issuecomment-1281389211 Can one of the admins verify this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] grundprinzip commented on a diff in pull request #38270: [SPARK-40538] [CONNECT] Improve built-in function support for Python client.

2022-10-17 Thread GitBox
grundprinzip commented on code in PR #38270: URL: https://github.com/apache/spark/pull/38270#discussion_r997430729 ## connector/connect/src/main/scala/org/apache/spark/sql/connect/dsl/package.scala: ## @@ -64,6 +64,36 @@ package object dsl { .build() } +/**

[GitHub] [spark] rangadi commented on a diff in pull request #38286: [SPARK-40657] Add support for Java classes in Protobuf functions

2022-10-17 Thread GitBox
rangadi commented on code in PR #38286: URL: https://github.com/apache/spark/pull/38286#discussion_r997414434 ## connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/ProtobufDataToCatalyst.scala: ## @@ -108,18 +112,16 @@ private[protobuf] case class ProtobufDataToCata

[GitHub] [spark] SandishKumarHN commented on a diff in pull request #38286: [SPARK-40657] Add support for Java classes in Protobuf functions

2022-10-17 Thread GitBox
SandishKumarHN commented on code in PR #38286: URL: https://github.com/apache/spark/pull/38286#discussion_r997402081 ## connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/ProtobufDataToCatalyst.scala: ## @@ -108,18 +112,16 @@ private[protobuf] case class ProtobufDat

[GitHub] [spark] grundprinzip commented on a diff in pull request #38270: [SPARK-40538] [CONNECT] Improve built-in function support for Python client.

2022-10-17 Thread GitBox
grundprinzip commented on code in PR #38270: URL: https://github.com/apache/spark/pull/38270#discussion_r997387895 ## connector/connect/src/main/scala/org/apache/spark/sql/connect/dsl/package.scala: ## @@ -64,6 +64,36 @@ package object dsl { .build() } +/**

[GitHub] [spark] grundprinzip commented on a diff in pull request #38275: [SPARK-40813][CONNECT] Add limit and offset to Connect DSL

2022-10-17 Thread GitBox
grundprinzip commented on code in PR #38275: URL: https://github.com/apache/spark/pull/38275#discussion_r997384334 ## connector/connect/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala: ## @@ -173,10 +174,16 @@ class SparkConnectPlanner(plan: proto.R

[GitHub] [spark] grundprinzip commented on a diff in pull request #38276: [SPARK-40812][CONNECT] Add Deduplicate to Connect proto and DSL

2022-10-17 Thread GitBox
grundprinzip commented on code in PR #38276: URL: https://github.com/apache/spark/pull/38276#discussion_r997381115 ## connector/connect/src/main/protobuf/spark/connect/relations.proto: ## @@ -163,6 +164,14 @@ message Sort { } } +// Relation of type [[Deduplicate]] which ha

[GitHub] [spark] grundprinzip commented on a diff in pull request #38283: [SPARK-40818][CONNECT] Add Intersect to Connect proto and DSL

2022-10-17 Thread GitBox
grundprinzip commented on code in PR #38283: URL: https://github.com/apache/spark/pull/38283#discussion_r997376731 ## connector/connect/src/main/protobuf/spark/connect/relations.proto: ## @@ -41,6 +41,7 @@ message Relation { Aggregate aggregate = 9; SQL sql = 10;

[GitHub] [spark] SandishKumarHN commented on pull request #38212: [SPARK-40655][PYTHON][PROTOBUF] PySpark support for from_protobuf and to_protobuf

2022-10-17 Thread GitBox
SandishKumarHN commented on PR #38212: URL: https://github.com/apache/spark/pull/38212#issuecomment-1281292352 @HyukjinKwon @gengliangwang can we merge the PR? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

  1   2   >