[GitHub] [spark] MaxGekk commented on pull request #40264: [SPARK-42635][SQL][3.3] Fix the TimestampAdd expression

2023-03-03 Thread via GitHub
MaxGekk commented on PR #40264: URL: https://github.com/apache/spark/pull/40264#issuecomment-1454657486 Seems like the test failure is related to the changes: ``` [info] - SPARK-42635: timestampadd unit conversion overflow *** FAILED *** (12 milliseconds) [info] (non-codegen mode)

[GitHub] [spark] zhengruifeng commented on pull request #40276: [SPARK-42630][CONNECT][PYTHON] Implement data type string parser

2023-03-03 Thread via GitHub
zhengruifeng commented on PR #40276: URL: https://github.com/apache/spark/pull/40276#issuecomment-1454583496 I don't know the internal of Parser well, but I guess if we want to reach 100% compatibility, we may need to reuse the `.g4` files and implement a subset of `AstBuilder` to support

[GitHub] [spark] zhengruifeng commented on a diff in pull request #40277: [SPARK-42555][CONNECT][FOLLOWUP] Add the new proto msg to support the remaining jdbc API

2023-03-03 Thread via GitHub
zhengruifeng commented on code in PR #40277: URL: https://github.com/apache/spark/pull/40277#discussion_r1125397504 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/DataFrameReader.scala: ## @@ -250,6 +250,47 @@ class DataFrameReader private[sql] (sparkSession

[GitHub] [spark] beliefer opened a new pull request, #40277: [SPARK-42555][CONNECT][FOLLOWUP] Add the new proto msg to support the remaining jdbc API

2023-03-03 Thread via GitHub
beliefer opened a new pull request, #40277: URL: https://github.com/apache/spark/pull/40277 ### What changes were proposed in this pull request? https://github.com/apache/spark/pull/40252 supported some jdbc API that reuse the proto msg `DataSource`. The `DataFrameReader` also have anothe

[GitHub] [spark] zhengruifeng commented on a diff in pull request #40270: [SPARK-42662][CONNECT][PYTHON][PS] Support `withSequenceColumn` as PySpark DataFrame internal function.

2023-03-03 Thread via GitHub
zhengruifeng commented on code in PR #40270: URL: https://github.com/apache/spark/pull/40270#discussion_r1125388338 ## connector/connect/common/src/main/protobuf/spark/connect/relations.proto: ## @@ -781,3 +782,10 @@ message FrameMap { CommonInlineUserDefinedFunction func = 2

[GitHub] [spark] zhengruifeng commented on a diff in pull request #40270: [SPARK-42662][CONNECT][PYTHON][PS] Support `withSequenceColumn` as PySpark DataFrame internal function.

2023-03-03 Thread via GitHub
zhengruifeng commented on code in PR #40270: URL: https://github.com/apache/spark/pull/40270#discussion_r1125387144 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala: ## @@ -509,6 +511,13 @@ class SparkConnectPlanner(val se

[GitHub] [spark] zhengruifeng commented on a diff in pull request #40263: [SPARK-42659][ML] Reimplement `FPGrowthModel.transform` with dataframe operations

2023-03-03 Thread via GitHub
zhengruifeng commented on code in PR #40263: URL: https://github.com/apache/spark/pull/40263#discussion_r1125384107 ## mllib/src/main/scala/org/apache/spark/ml/fpm/FPGrowth.scala: ## @@ -275,29 +274,38 @@ class FPGrowthModel private[ml] ( @Since("2.2.0") override def trans

[GitHub] [spark] zhengruifeng commented on a diff in pull request #40263: [SPARK-42659][ML] Reimplement `FPGrowthModel.transform` with dataframe operations

2023-03-03 Thread via GitHub
zhengruifeng commented on code in PR #40263: URL: https://github.com/apache/spark/pull/40263#discussion_r1125383324 ## mllib/src/main/scala/org/apache/spark/ml/fpm/FPGrowth.scala: ## @@ -275,29 +274,38 @@ class FPGrowthModel private[ml] ( @Since("2.2.0") override def trans

[GitHub] [spark] beliefer commented on pull request #40265: [SPARK-42556][CONNECT] Dataset.colregex should link a plan_id when it only matches a single column.

2023-03-03 Thread via GitHub
beliefer commented on PR #40265: URL: https://github.com/apache/spark/pull/40265#issuecomment-1454533003 @hvanhovell @zhengruifeng Thank you! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

[GitHub] [spark] zhengruifeng commented on a diff in pull request #40255: [SPARK-42558][CONNECT] Implement `DataFrameStatFunctions` except `bloomFilter` functions

2023-03-03 Thread via GitHub
zhengruifeng commented on code in PR #40255: URL: https://github.com/apache/spark/pull/40255#discussion_r1125375617 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/DataFrameStatFunctions.scala: ## @@ -0,0 +1,605 @@ +/* + * Licensed to the Apache Software Foun

[GitHub] [spark] zhengruifeng commented on a diff in pull request #40228: [SPARK-41874][CONNECT][PYTHON] Support SameSemantics in Spark Connect

2023-03-03 Thread via GitHub
zhengruifeng commented on code in PR #40228: URL: https://github.com/apache/spark/pull/40228#discussion_r1125371018 ## python/pyspark/sql/tests/connect/test_parity_dataframe.py: ## @@ -60,11 +60,6 @@ def test_repartitionByRange_dataframe(self): def test_repr_behaviors(self)

[GitHub] [spark] zhengruifeng commented on pull request #40265: [SPARK-42556][CONNECT] Dataset.colregex should link a plan_id when it only matches a single column.

2023-03-03 Thread via GitHub
zhengruifeng commented on PR #40265: URL: https://github.com/apache/spark/pull/40265#issuecomment-1454493588 merged into master/branch-3.4 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

[GitHub] [spark] zhengruifeng closed pull request #40265: [SPARK-42556][CONNECT] Dataset.colregex should link a plan_id when it only matches a single column.

2023-03-03 Thread via GitHub
zhengruifeng closed pull request #40265: [SPARK-42556][CONNECT] Dataset.colregex should link a plan_id when it only matches a single column. URL: https://github.com/apache/spark/pull/40265 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] zhengruifeng commented on pull request #40265: [SPARK-42556][CONNECT] Dataset.colregex should link a plan_id when it only matches a single column.

2023-03-03 Thread via GitHub
zhengruifeng commented on PR #40265: URL: https://github.com/apache/spark/pull/40265#issuecomment-1454489494 we'd better always add e2e tests, since it was added in `ClientE2ESuite`, I think don't need to add one in `test_connect_basic` -- This is an automated message from the Apache Git

[GitHub] [spark] LuciferYang commented on pull request #40218: [SPARK-42579][CONNECT] Part-1: `function.lit` support `Array[_]` dataType

2023-03-03 Thread via GitHub
LuciferYang commented on PR #40218: URL: https://github.com/apache/spark/pull/40218#issuecomment-1454426268 Some other things to do, will update later -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] LuciferYang commented on a diff in pull request #40218: [SPARK-42579][CONNECT] Part-1: `function.lit` support `Array[_]` dataType

2023-03-03 Thread via GitHub
LuciferYang commented on code in PR #40218: URL: https://github.com/apache/spark/pull/40218#discussion_r1125340340 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/LiteralValueProtoConverter.scala: ## @@ -130,4 +135,117 @@ object LiteralValueProtoC

[GitHub] [spark] hvanhovell commented on a diff in pull request #39091: [SPARK-41527][CONNECT][PYTHON] Implement `DataFrame.observe`

2023-03-03 Thread via GitHub
hvanhovell commented on code in PR #39091: URL: https://github.com/apache/spark/pull/39091#discussion_r1125288854 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala: ## @@ -338,6 +340,22 @@ class SparkConnectPlanner(session:

[GitHub] [spark] hvanhovell commented on a diff in pull request #39091: [SPARK-41527][CONNECT][PYTHON] Implement `DataFrame.observe`

2023-03-03 Thread via GitHub
hvanhovell commented on code in PR #39091: URL: https://github.com/apache/spark/pull/39091#discussion_r1125287159 ## connector/connect/common/src/main/protobuf/spark/connect/relations.proto: ## @@ -598,3 +599,18 @@ message ToSchema { // The Sever side will update the datafram

[GitHub] [spark] hvanhovell commented on a diff in pull request #39091: [SPARK-41527][CONNECT][PYTHON] Implement `DataFrame.observe`

2023-03-03 Thread via GitHub
hvanhovell commented on code in PR #39091: URL: https://github.com/apache/spark/pull/39091#discussion_r1125286902 ## connector/connect/common/src/main/protobuf/spark/connect/base.proto: ## @@ -181,6 +185,17 @@ message ExecutePlanResponse { string metric_type = 3; }

[GitHub] [spark] hvanhovell commented on a diff in pull request #39091: [SPARK-41527][CONNECT][PYTHON] Implement `DataFrame.observe`

2023-03-03 Thread via GitHub
hvanhovell commented on code in PR #39091: URL: https://github.com/apache/spark/pull/39091#discussion_r1125285556 ## connector/connect/common/src/main/protobuf/spark/connect/base.proto: ## @@ -158,6 +159,9 @@ message ExecutePlanResponse { // batch of results and then represen

[GitHub] [spark] LuciferYang commented on pull request #40254: [SPARK-42654][BUILD] Upgrade dropwizard metrics 4.2.17

2023-03-03 Thread via GitHub
LuciferYang commented on PR #40254: URL: https://github.com/apache/spark/pull/40254#issuecomment-1454360432 > friendly ping @dongjoon-hyun , I found the following error message in Java11&17 maven build log > > ``` > Error: [ERROR] An error occurred attempting to read POM > org.c

[GitHub] [spark] gengliangwang commented on pull request #40269: [WIP][DOC] Updating the Style for the Spark Docs based on the Webpage

2023-03-03 Thread via GitHub
gengliangwang commented on PR #40269: URL: https://github.com/apache/spark/pull/40269#issuecomment-1454353833 There is also differences on the top bar and left menu when scrolling down the page: Take https://spark.apache.org/docs/3.3.2/sql-ref-ansi-compliance.html as an example, of this

[GitHub] [spark] gengliangwang commented on pull request #40269: [WIP][DOC] Updating the Style for the Spark Docs based on the Webpage

2023-03-03 Thread via GitHub
gengliangwang commented on PR #40269: URL: https://github.com/apache/spark/pull/40269#issuecomment-1454352840 @grundprinzip Thanks for the work. +1 for the approach. Could you point out where is exactly the same as the PR https://github.com/apache/spark-website/pull/359 so that we can rev

[GitHub] [spark] beliefer commented on pull request #40252: [SPARK-42555][CONNECT] Add JDBC to DataFrameReader

2023-03-03 Thread via GitHub
beliefer commented on PR #40252: URL: https://github.com/apache/spark/pull/40252#issuecomment-1454351727 @hvanhovell @dongjoon-hyun @LuciferYang Thank you! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] ueshin opened a new pull request, #40276: [SPARK-42630][CONNECT][PYTHON] Implement data type string parser

2023-03-03 Thread via GitHub
ueshin opened a new pull request, #40276: URL: https://github.com/apache/spark/pull/40276 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was

[GitHub] [spark] hvanhovell closed pull request #40252: [SPARK-42555][CONNECT] Add JDBC to DataFrameReader

2023-03-03 Thread via GitHub
hvanhovell closed pull request #40252: [SPARK-42555][CONNECT] Add JDBC to DataFrameReader URL: https://github.com/apache/spark/pull/40252 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] hvanhovell commented on pull request #40252: [SPARK-42555][CONNECT] Add JDBC to DataFrameReader

2023-03-03 Thread via GitHub
hvanhovell commented on PR #40252: URL: https://github.com/apache/spark/pull/40252#issuecomment-1454350262 merging to master/3.4 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

[GitHub] [spark] LuciferYang commented on a diff in pull request #40218: [SPARK-42579][CONNECT] Part-1: `function.lit` support `Array[_]` dataType

2023-03-03 Thread via GitHub
LuciferYang commented on code in PR #40218: URL: https://github.com/apache/spark/pull/40218#discussion_r1125246264 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/LiteralValueProtoConverter.scala: ## @@ -130,4 +135,117 @@ object LiteralValueProtoC

[GitHub] [spark] LuciferYang commented on a diff in pull request #40218: [SPARK-42579][CONNECT] Part-1: `function.lit` support `Array[_]` dataType

2023-03-03 Thread via GitHub
LuciferYang commented on code in PR #40218: URL: https://github.com/apache/spark/pull/40218#discussion_r1125245356 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/LiteralValueProtoConverter.scala: ## @@ -130,4 +135,117 @@ object LiteralValueProtoC

[GitHub] [spark] hvanhovell commented on a diff in pull request #40275: [SPARK-42557][CONNECT] Add Broadcast to functions

2023-03-03 Thread via GitHub
hvanhovell commented on code in PR #40275: URL: https://github.com/apache/spark/pull/40275#discussion_r1125245064 ## connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/ClientE2ETestSuite.scala: ## @@ -495,6 +495,14 @@ class ClientE2ETestSuite extends RemoteSparkSes

[GitHub] [spark] shrprasa commented on pull request #40258: [SPARK-42655][SQL]:Incorrect ambiguous column reference error

2023-03-03 Thread via GitHub
shrprasa commented on PR #40258: URL: https://github.com/apache/spark/pull/40258#issuecomment-1454348175 @srowen @dongjoon-hyun Can you please review this PR? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abo

[GitHub] [spark] hvanhovell commented on a diff in pull request #40275: [SPARK-42557][CONNECT] Add Broadcast to functions

2023-03-03 Thread via GitHub
hvanhovell commented on code in PR #40275: URL: https://github.com/apache/spark/pull/40275#discussion_r1125244539 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/functions.scala: ## @@ -1228,6 +1228,22 @@ object functions { def map_from_arrays(keys: Column

[GitHub] [spark] hvanhovell commented on a diff in pull request #40218: [SPARK-42579][CONNECT] Part-1: `function.lit` support `Array[_]` dataType

2023-03-03 Thread via GitHub
hvanhovell commented on code in PR #40218: URL: https://github.com/apache/spark/pull/40218#discussion_r1125244166 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/LiteralValueProtoConverter.scala: ## @@ -130,4 +135,117 @@ object LiteralValueProtoCo

[GitHub] [spark] LuciferYang commented on a diff in pull request #40218: [SPARK-42579][CONNECT] Part-1: `function.lit` support `Array[_]` dataType

2023-03-03 Thread via GitHub
LuciferYang commented on code in PR #40218: URL: https://github.com/apache/spark/pull/40218#discussion_r1125243878 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/expressions/LiteralProtoConverter.scala: ## @@ -0,0 +1,297 @@ +/* + * Licensed to the Apache Sof

[GitHub] [spark] hvanhovell commented on a diff in pull request #40218: [SPARK-42579][CONNECT] Part-1: `function.lit` support `Array[_]` dataType

2023-03-03 Thread via GitHub
hvanhovell commented on code in PR #40218: URL: https://github.com/apache/spark/pull/40218#discussion_r1125243219 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/LiteralValueProtoConverter.scala: ## @@ -130,4 +135,117 @@ object LiteralValueProtoCo

[GitHub] [spark] hvanhovell commented on a diff in pull request #40218: [SPARK-42579][CONNECT] Part-1: `function.lit` support `Array[_]` dataType

2023-03-03 Thread via GitHub
hvanhovell commented on code in PR #40218: URL: https://github.com/apache/spark/pull/40218#discussion_r1125241559 ## connector/connect/common/src/main/protobuf/spark/connect/expressions.proto: ## @@ -189,6 +190,11 @@ message Expression { int32 days = 2; int64 micro

[GitHub] [spark] hvanhovell commented on a diff in pull request #40218: [SPARK-42579][CONNECT] Part-1: `function.lit` support `Array[_]` dataType

2023-03-03 Thread via GitHub
hvanhovell commented on code in PR #40218: URL: https://github.com/apache/spark/pull/40218#discussion_r1125240275 ## connector/connect/common/src/main/protobuf/spark/connect/expressions.proto: ## @@ -189,6 +190,11 @@ message Expression { int32 days = 2; int64 micro

[GitHub] [spark] hvanhovell commented on a diff in pull request #40218: [SPARK-42579][CONNECT] Part-1: `function.lit` support `Array[_]` dataType

2023-03-03 Thread via GitHub
hvanhovell commented on code in PR #40218: URL: https://github.com/apache/spark/pull/40218#discussion_r1125231142 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/expressions/LiteralProtoConverter.scala: ## @@ -0,0 +1,297 @@ +/* + * Licensed to the Apache Soft

[GitHub] [spark] beliefer opened a new pull request, #40275: [SPARK-42557][CONNECT] Add Broadcast to functions

2023-03-03 Thread via GitHub
beliefer opened a new pull request, #40275: URL: https://github.com/apache/spark/pull/40275 ### What changes were proposed in this pull request? Currently, the connect functions missing the broadcast API. This PR want add this API to connect's functions. ### Why are the changes

[GitHub] [spark] hvanhovell commented on a diff in pull request #40218: [SPARK-42579][CONNECT] Part-1: `function.lit` support `Array[_]` dataType

2023-03-03 Thread via GitHub
hvanhovell commented on code in PR #40218: URL: https://github.com/apache/spark/pull/40218#discussion_r1125231861 ## connector/connect/common/src/main/protobuf/spark/connect/expressions.proto: ## @@ -189,6 +190,11 @@ message Expression { int32 days = 2; int64 micro

[GitHub] [spark] hvanhovell commented on a diff in pull request #40218: [SPARK-42579][CONNECT] Part-1: `function.lit` support `Array[_]` dataType

2023-03-03 Thread via GitHub
hvanhovell commented on code in PR #40218: URL: https://github.com/apache/spark/pull/40218#discussion_r1125231142 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/expressions/LiteralProtoConverter.scala: ## @@ -0,0 +1,297 @@ +/* + * Licensed to the Apache Soft

[GitHub] [spark] hvanhovell commented on a diff in pull request #40218: [SPARK-42579][CONNECT] Part-1: `function.lit` support `Array[_]` dataType

2023-03-03 Thread via GitHub
hvanhovell commented on code in PR #40218: URL: https://github.com/apache/spark/pull/40218#discussion_r1125229080 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/expressions/LiteralProtoConverter.scala: ## @@ -0,0 +1,289 @@ +/* + * Licensed to the Apache Soft

[GitHub] [spark] LuciferYang commented on a diff in pull request #40255: [SPARK-42558][CONNECT] Partial implement `DataFrameStatFunctions`

2023-03-03 Thread via GitHub
LuciferYang commented on code in PR #40255: URL: https://github.com/apache/spark/pull/40255#discussion_r1125222488 ## connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/CheckConnectJvmClientCompatibility.scala: ## @@ -133,6 +134,10 @@ object CheckCon

[GitHub] [spark] hvanhovell commented on a diff in pull request #40255: [SPARK-42558][CONNECT] Partial implement `DataFrameStatFunctions`

2023-03-03 Thread via GitHub
hvanhovell commented on code in PR #40255: URL: https://github.com/apache/spark/pull/40255#discussion_r1125221876 ## connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/CheckConnectJvmClientCompatibility.scala: ## @@ -133,6 +134,10 @@ object CheckConn

[GitHub] [spark] LuciferYang commented on a diff in pull request #40255: [SPARK-42558][CONNECT] Partial implement `DataFrameStatFunctions`

2023-03-03 Thread via GitHub
LuciferYang commented on code in PR #40255: URL: https://github.com/apache/spark/pull/40255#discussion_r1125221046 ## connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/CheckConnectJvmClientCompatibility.scala: ## @@ -133,6 +134,10 @@ object CheckCon

[GitHub] [spark] LuciferYang commented on pull request #40254: [SPARK-42654][BUILD] Upgrade dropwizard metrics 4.2.17

2023-03-03 Thread via GitHub
LuciferYang commented on PR #40254: URL: https://github.com/apache/spark/pull/40254#issuecomment-1454336997 Recently, I often encounter Maven build failed of Java 11&17 GA build task due to timeout ... a little strange -- This is an automated message from the Apache Git Service. To respo

[GitHub] [spark] hvanhovell commented on a diff in pull request #40255: [SPARK-42558][CONNECT] Partial implement `DataFrameStatFunctions`

2023-03-03 Thread via GitHub
hvanhovell commented on code in PR #40255: URL: https://github.com/apache/spark/pull/40255#discussion_r1125217576 ## connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/CheckConnectJvmClientCompatibility.scala: ## @@ -133,6 +134,10 @@ object CheckConn

[GitHub] [spark] amaliujia commented on a diff in pull request #40228: [SPARK-41874][CONNECT][PYTHON] Support SameSemantics in Spark Connect

2023-03-03 Thread via GitHub
amaliujia commented on code in PR #40228: URL: https://github.com/apache/spark/pull/40228#discussion_r1125216964 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/Dataset.scala: ## @@ -2724,8 +2724,20 @@ class Dataset[T] private[sql] ( throw new Unsupporte

[GitHub] [spark] LuciferYang commented on pull request #40274: [SPARK-42215][CONNECT] Simplify Scala Client IT tests

2023-03-03 Thread via GitHub
LuciferYang commented on PR #40274: URL: https://github.com/apache/spark/pull/40274#issuecomment-1454336041 Thanks for your work @zhenlineo If you don't mind, please give me more time to think about this pr :) -- This is an automated message from the Apache Git Service. To resp

[GitHub] [spark] LuciferYang commented on a diff in pull request #40274: [SPARK-42215][CONNECT] Simplify Scala Client IT tests

2023-03-03 Thread via GitHub
LuciferYang commented on code in PR #40274: URL: https://github.com/apache/spark/pull/40274#discussion_r1125214866 ## connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/ClientE2ETestSuite.scala: ## @@ -76,7 +76,8 @@ class ClientE2ETestSuite extends RemoteSparkSessi

[GitHub] [spark] LuciferYang commented on a diff in pull request #40274: [SPARK-42215][CONNECT] Simplify Scala Client IT tests

2023-03-03 Thread via GitHub
LuciferYang commented on code in PR #40274: URL: https://github.com/apache/spark/pull/40274#discussion_r1125212429 ## connector/connect/client/jvm/src/test/scala/org/apache/spark/sql/ClientE2ETestSuite.scala: ## @@ -76,7 +76,8 @@ class ClientE2ETestSuite extends RemoteSparkSessi

[GitHub] [spark] hvanhovell commented on a diff in pull request #40228: [SPARK-41874][CONNECT][PYTHON] Support SameSemantics in Spark Connect

2023-03-03 Thread via GitHub
hvanhovell commented on code in PR #40228: URL: https://github.com/apache/spark/pull/40228#discussion_r1125208793 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/Dataset.scala: ## @@ -2724,8 +2724,20 @@ class Dataset[T] private[sql] ( throw new Unsupport

[GitHub] [spark] LuciferYang commented on pull request #40255: [SPARK-42558][CONNECT] Partial implement `DataFrameStatFunctions`

2023-03-03 Thread via GitHub
LuciferYang commented on PR #40255: URL: https://github.com/apache/spark/pull/40255#issuecomment-1454330973 Now all paased -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

[GitHub] [spark] hvanhovell closed pull request #40272: [SPARK-42667][CONNECT] Spark Connect: newSession API

2023-03-03 Thread via GitHub
hvanhovell closed pull request #40272: [SPARK-42667][CONNECT] Spark Connect: newSession API URL: https://github.com/apache/spark/pull/40272 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

[GitHub] [spark] beliefer commented on pull request #40252: [SPARK-42555][CONNECT] Add JDBC to DataFrameReader

2023-03-03 Thread via GitHub
beliefer commented on PR #40252: URL: https://github.com/apache/spark/pull/40252#issuecomment-1454320335 @dongjoon-hyun @hvanhovell It seems the build scala 2.13 failed is unrelated to this PR. -- This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [spark] zhenlineo commented on pull request #40274: [SPARK-42215][CONNECT] Single command to run Scala Client IT tests

2023-03-03 Thread via GitHub
zhenlineo commented on PR #40274: URL: https://github.com/apache/spark/pull/40274#issuecomment-1454314742 The full error (even with the clean master branch): ``` build/mvn clean build/mvn -Pscala-2.13 compile -pl connector/connect/client/jvm -am -DskipTests build/mvn -Pscala-2.13

[GitHub] [spark] amaliujia commented on pull request #40228: [SPARK-41874][CONNECT][PYTHON] Support SameSemantics in Spark Connect

2023-03-03 Thread via GitHub
amaliujia commented on PR #40228: URL: https://github.com/apache/spark/pull/40228#issuecomment-1454313336 @hvanhovell I just addressed actionable comments -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

[GitHub] [spark] zhenlineo commented on pull request #40274: [SPARK-42215][CONNECT] Single command to run Scala Client IT tests

2023-03-03 Thread via GitHub
zhenlineo commented on PR #40274: URL: https://github.com/apache/spark/pull/40274#issuecomment-1454313007 @hvanhovell cc @LuciferYang -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

[GitHub] [spark] zhenlineo opened a new pull request, #40274: [SPARK-42215][CONNECT] Single command to run Scala Client IT tests

2023-03-03 Thread via GitHub
zhenlineo opened a new pull request, #40274: URL: https://github.com/apache/spark/pull/40274 ### What changes were proposed in this pull request? Make use of the new spark-connect script to make the Scala client test to not directly depends on any other modules. The dependency is still

[GitHub] [spark] anishshri-db commented on pull request #40273: [SPARK-42668][SS] Catch exception while trying to close compressed stream in HDFSStateStoreProvider abort

2023-03-03 Thread via GitHub
anishshri-db commented on PR #40273: URL: https://github.com/apache/spark/pull/40273#issuecomment-1454301804 @HeartSaVioR - please take a look. Thx -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [spark] anishshri-db opened a new pull request, #40273: [SPARK-42668][SS] Catch exception while trying to close compressed stream in HDFSStateStoreProvider abort

2023-03-03 Thread via GitHub
anishshri-db opened a new pull request, #40273: URL: https://github.com/apache/spark/pull/40273 ### What changes were proposed in this pull request? We have seen some cases where the task exits as cancelled/failed which triggers the abort in the task completion listener for HDFSStateStore

[GitHub] [spark] amaliujia commented on a diff in pull request #40272: [SPARK-42667][CONNECT] Spark Connect: newSession API

2023-03-03 Thread via GitHub
amaliujia commented on code in PR #40272: URL: https://github.com/apache/spark/pull/40272#discussion_r1125136310 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/SparkSession.scala: ## @@ -344,7 +344,9 @@ class SparkSession private[sql] ( // scalastyle:on

[GitHub] [spark] hvanhovell commented on pull request #40228: [SPARK-41874][CONNECT][PYTHON] Support SameSemantics in Spark Connect

2023-03-03 Thread via GitHub
hvanhovell commented on PR #40228: URL: https://github.com/apache/spark/pull/40228#issuecomment-1454255864 @amaliujia if you have time, let's also get this one over the line. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] hvanhovell commented on a diff in pull request #40272: [SPARK-42667][CONNECT] Spark Connect: newSession API

2023-03-03 Thread via GitHub
hvanhovell commented on code in PR #40272: URL: https://github.com/apache/spark/pull/40272#discussion_r1125126775 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/SparkSession.scala: ## @@ -344,7 +344,9 @@ class SparkSession private[sql] ( // scalastyle:on

[GitHub] [spark] amaliujia commented on pull request #40272: [SPARK-42667][CONNECT] Spark Connect: newSession API

2023-03-03 Thread via GitHub
amaliujia commented on PR #40272: URL: https://github.com/apache/spark/pull/40272#issuecomment-1454247387 @hvanhovell -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To un

[GitHub] [spark] amaliujia opened a new pull request, #40272: [SPARK-42667][CONNECT] Spark Connect: newSession API

2023-03-03 Thread via GitHub
amaliujia opened a new pull request, #40272: URL: https://github.com/apache/spark/pull/40272 ### What changes were proposed in this pull request? This PR proposes an implementation of newSession API. The idea is we reuse user context(e.g. user_id), gRPC channel, etc. But diffe

[GitHub] [spark] dongjoon-hyun commented on pull request #40064: [SPARK-42478] Make a serializable jobTrackerId instead of a non-serializable JobID in FileWriterFactory

2023-03-03 Thread via GitHub
dongjoon-hyun commented on PR #40064: URL: https://github.com/apache/spark/pull/40064#issuecomment-1454081754 Also, cc @sunchao -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

[GitHub] [spark] dongjoon-hyun commented on pull request #40064: [SPARK-42478] Make a serializable jobTrackerId instead of a non-serializable JobID in FileWriterFactory

2023-03-03 Thread via GitHub
dongjoon-hyun commented on PR #40064: URL: https://github.com/apache/spark/pull/40064#issuecomment-1454080985 Hi, @cloud-fan . SPARK-41448 landed to master/3.3/3.2 and this is merge this to master/3.4 only. I'm wondering if we are planning backporting to branch-3.3 and 3.2. - https://git

[GitHub] [spark] itholic commented on pull request #40236: [SPARK-38735][SQL][Tests] Add tests for the error class: INTERNAL_ERROR

2023-03-03 Thread via GitHub
itholic commented on PR #40236: URL: https://github.com/apache/spark/pull/40236#issuecomment-1454045008 Just FYI and really not a big deal, we typically use upper-cased "TESTS" or "TEST" for PR title when the change only includes the tests. -- This is an automated message from the Apache

[GitHub] [spark] itholic commented on a diff in pull request #40236: [SPARK-38735][SQL][Tests] Add tests for the error class: INTERNAL_ERROR

2023-03-03 Thread via GitHub
itholic commented on code in PR #40236: URL: https://github.com/apache/spark/pull/40236#discussion_r1124931668 ## sql/core/src/test/scala/org/apache/spark/sql/errors/QueryExecutionErrorsSuite.scala: ## @@ -765,6 +770,58 @@ class QueryExecutionErrorsSuite ) } } + +

[GitHub] [spark] itholic commented on a diff in pull request #40271: [WIP][SPARK-42258][PYTHON] pyspark.sql.functions should not expose typing.cast

2023-03-03 Thread via GitHub
itholic commented on code in PR #40271: URL: https://github.com/apache/spark/pull/40271#discussion_r1124915899 ## python/pyspark/sql/functions.py: ## @@ -22,20 +22,10 @@ import sys import functools import warnings -from typing import ( -Any, -cast, Review Comment:

[GitHub] [spark] amaliujia commented on a diff in pull request #40255: [SPARK-42558][CONNECT] Partial implement `DataFrameStatFunctions`

2023-03-03 Thread via GitHub
amaliujia commented on code in PR #40255: URL: https://github.com/apache/spark/pull/40255#discussion_r1124913199 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/DataFrameStatFunctions.scala: ## @@ -0,0 +1,670 @@ +/* + * Licensed to the Apache Software Foundat

[GitHub] [spark] shrprasa commented on a diff in pull request #40128: [SPARK-42466][K8S]: Cleanup k8s upload directory when job terminates

2023-03-03 Thread via GitHub
shrprasa commented on code in PR #40128: URL: https://github.com/apache/spark/pull/40128#discussion_r1124903174 ## resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/KubernetesClientApplication.scala: ## @@ -143,6 +144,9 @@ private[spark] class C

[GitHub] [spark] shrprasa commented on a diff in pull request #40128: [SPARK-42466][K8S]: Cleanup k8s upload directory when job terminates

2023-03-03 Thread via GitHub
shrprasa commented on code in PR #40128: URL: https://github.com/apache/spark/pull/40128#discussion_r1124903174 ## resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/KubernetesClientApplication.scala: ## @@ -143,6 +144,9 @@ private[spark] class C

[GitHub] [spark] shrprasa commented on a diff in pull request #40128: [SPARK-42466][K8S]: Cleanup k8s upload directory when job terminates

2023-03-03 Thread via GitHub
shrprasa commented on code in PR #40128: URL: https://github.com/apache/spark/pull/40128#discussion_r1124903174 ## resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/KubernetesClientApplication.scala: ## @@ -143,6 +144,9 @@ private[spark] class C

[GitHub] [spark] shrprasa commented on a diff in pull request #40128: [SPARK-42466][K8S]: Cleanup k8s upload directory when job terminates

2023-03-03 Thread via GitHub
shrprasa commented on code in PR #40128: URL: https://github.com/apache/spark/pull/40128#discussion_r1124899927 ## core/src/main/scala/org/apache/spark/UploadDirManager.scala: ## @@ -0,0 +1,78 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * cont

[GitHub] [spark] ritikam2 commented on pull request #40116: [SPARK-41391][SQL] The output column name of groupBy.agg(count_distinct) is incorrect

2023-03-03 Thread via GitHub
ritikam2 commented on PR #40116: URL: https://github.com/apache/spark/pull/40116#issuecomment-1453976515 Any comments. Apparently having all expr as unresolvedAlias is not working. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Git

[GitHub] [spark] itholic commented on a diff in pull request #40270: [SPARK-42497][CONNECT][PYTHON][PS] Support `withSequenceColumn` as PySpark DataFrame internal function.

2023-03-03 Thread via GitHub
itholic commented on code in PR #40270: URL: https://github.com/apache/spark/pull/40270#discussion_r1124880464 ## connector/connect/common/src/main/protobuf/spark/connect/relations.proto: ## @@ -781,3 +782,10 @@ message FrameMap { CommonInlineUserDefinedFunction func = 2; }

[GitHub] [spark] itholic commented on a diff in pull request #40270: [SPARK-42497][CONNECT][PYTHON][PS] Support `withSequenceColumn` as PySpark DataFrame internal function.

2023-03-03 Thread via GitHub
itholic commented on code in PR #40270: URL: https://github.com/apache/spark/pull/40270#discussion_r1124880464 ## connector/connect/common/src/main/protobuf/spark/connect/relations.proto: ## @@ -781,3 +782,10 @@ message FrameMap { CommonInlineUserDefinedFunction func = 2; }

[GitHub] [spark] holdenk commented on a diff in pull request #40128: [SPARK-42466][K8S]: Cleanup k8s upload directory when job terminates

2023-03-03 Thread via GitHub
holdenk commented on code in PR #40128: URL: https://github.com/apache/spark/pull/40128#discussion_r1124885459 ## resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/KubernetesClientApplication.scala: ## @@ -143,6 +144,9 @@ private[spark] class Cl

[GitHub] [spark] itholic commented on a diff in pull request #40270: [SPARK-42497][CONNECT][PYTHON][PS] Support `withSequenceColumn` as PySpark DataFrame internal function.

2023-03-03 Thread via GitHub
itholic commented on code in PR #40270: URL: https://github.com/apache/spark/pull/40270#discussion_r1124880464 ## connector/connect/common/src/main/protobuf/spark/connect/relations.proto: ## @@ -781,3 +782,10 @@ message FrameMap { CommonInlineUserDefinedFunction func = 2; }

[GitHub] [spark] itholic commented on a diff in pull request #40270: [SPARK-42497][CONNECT][PYTHON][PS] Support `withSequenceColumn` as PySpark DataFrame internal function.

2023-03-03 Thread via GitHub
itholic commented on code in PR #40270: URL: https://github.com/apache/spark/pull/40270#discussion_r1124880464 ## connector/connect/common/src/main/protobuf/spark/connect/relations.proto: ## @@ -781,3 +782,10 @@ message FrameMap { CommonInlineUserDefinedFunction func = 2; }

[GitHub] [spark] itholic commented on a diff in pull request #40270: [SPARK-42497][CONNECT][PYTHON][PS] Support `withSequenceColumn` as PySpark DataFrame internal function.

2023-03-03 Thread via GitHub
itholic commented on code in PR #40270: URL: https://github.com/apache/spark/pull/40270#discussion_r1124880464 ## connector/connect/common/src/main/protobuf/spark/connect/relations.proto: ## @@ -781,3 +782,10 @@ message FrameMap { CommonInlineUserDefinedFunction func = 2; }

[GitHub] [spark] itholic commented on a diff in pull request #40270: [SPARK-42497][CONNECT][PYTHON][PS] Support `withSequenceColumn` as PySpark DataFrame internal function.

2023-03-03 Thread via GitHub
itholic commented on code in PR #40270: URL: https://github.com/apache/spark/pull/40270#discussion_r1124880464 ## connector/connect/common/src/main/protobuf/spark/connect/relations.proto: ## @@ -781,3 +782,10 @@ message FrameMap { CommonInlineUserDefinedFunction func = 2; }

[GitHub] [spark] itholic commented on a diff in pull request #40270: [SPARK-42497][CONNECT][PYTHON][PS] Support `withSequenceColumn` as PySpark DataFrame internal function.

2023-03-03 Thread via GitHub
itholic commented on code in PR #40270: URL: https://github.com/apache/spark/pull/40270#discussion_r1124880464 ## connector/connect/common/src/main/protobuf/spark/connect/relations.proto: ## @@ -781,3 +782,10 @@ message FrameMap { CommonInlineUserDefinedFunction func = 2; }

[GitHub] [spark] shrprasa commented on pull request #37880: [SPARK-39399] [CORE] [K8S]: Fix proxy-user authentication for Spark on k8s in cluster deploy mode

2023-03-03 Thread via GitHub
shrprasa commented on PR #37880: URL: https://github.com/apache/spark/pull/37880#issuecomment-1453939327 Gentle ping @holdenk @dongjoon-hyun @Ngone51 , @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the U

[GitHub] [spark] shrprasa commented on pull request #40128: [SPARK-42466][K8S]: Cleanup k8s upload directory when job terminates

2023-03-03 Thread via GitHub
shrprasa commented on PR #40128: URL: https://github.com/apache/spark/pull/40128#issuecomment-1453938288 Gentle ping @dongjoon-hyun @holdenk @srowen -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] shrprasa commented on pull request #40258: [WIP][SPARK-42655]:Incorrect ambiguous column reference error

2023-03-03 Thread via GitHub
shrprasa commented on PR #40258: URL: https://github.com/apache/spark/pull/40258#issuecomment-1453913043 @srowen Please ignore that change. It was work in progress to check few things. The reason why we get ambiguous error in below scenario and why it's not correct is the result of att

[GitHub] [spark] LuciferYang commented on a diff in pull request #40255: [SPARK-42558][CONNECT] Partial implement `DataFrameStatFunctions`

2023-03-03 Thread via GitHub
LuciferYang commented on code in PR #40255: URL: https://github.com/apache/spark/pull/40255#discussion_r1124806858 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/DataFrameStatFunctions.scala: ## @@ -0,0 +1,605 @@ +/* + * Licensed to the Apache Software Found

[GitHub] [spark] LuciferYang commented on pull request #40255: [SPARK-42558][CONNECT] Partial implement `DataFrameStatFunctions`

2023-03-03 Thread via GitHub
LuciferYang commented on PR #40255: URL: https://github.com/apache/spark/pull/40255#issuecomment-1453872602 > @LuciferYang can you update the binary compatibility tests? done -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Git

[GitHub] [spark] LuciferYang commented on a diff in pull request #40255: [SPARK-42558][CONNECT] Partial implement `DataFrameStatFunctions`

2023-03-03 Thread via GitHub
LuciferYang commented on code in PR #40255: URL: https://github.com/apache/spark/pull/40255#discussion_r1124797844 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/DataFrameStatFunctions.scala: ## @@ -0,0 +1,605 @@ +/* + * Licensed to the Apache Software Found

[GitHub] [spark] hvanhovell commented on a diff in pull request #40255: [SPARK-42558][CONNECT] Partial implement `DataFrameStatFunctions`

2023-03-03 Thread via GitHub
hvanhovell commented on code in PR #40255: URL: https://github.com/apache/spark/pull/40255#discussion_r1124795511 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/DataFrameStatFunctions.scala: ## @@ -0,0 +1,605 @@ +/* + * Licensed to the Apache Software Founda

[GitHub] [spark] FurcyPin opened a new pull request, #40271: [WIP][SPARK-42258][PYTHON] pyspark.sql.functions should not expose typing.cast

2023-03-03 Thread via GitHub
FurcyPin opened a new pull request, #40271: URL: https://github.com/apache/spark/pull/40271 ### What changes were proposed in this pull request? In the `pyspark.sql.functions`, we replaced `from typing import foo, bar, etc` with `import typing` and all uses of `foo`

[GitHub] [spark] LuciferYang commented on a diff in pull request #40255: [SPARK-42558][CONNECT] Partial implement `DataFrameStatFunctions`

2023-03-03 Thread via GitHub
LuciferYang commented on code in PR #40255: URL: https://github.com/apache/spark/pull/40255#discussion_r1124788183 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/DataFrameStatFunctions.scala: ## @@ -0,0 +1,605 @@ +/* + * Licensed to the Apache Software Found

[GitHub] [spark] hvanhovell commented on a diff in pull request #40270: [SPARK-42497][CONNECT][PYTHON][PS] Support `withSequenceColumn` as PySpark DataFrame internal function.

2023-03-03 Thread via GitHub
hvanhovell commented on code in PR #40270: URL: https://github.com/apache/spark/pull/40270#discussion_r1124779267 ## connector/connect/common/src/main/protobuf/spark/connect/relations.proto: ## @@ -781,3 +782,10 @@ message FrameMap { CommonInlineUserDefinedFunction func = 2;

[GitHub] [spark] hvanhovell commented on pull request #40255: [SPARK-42558][CONNECT] Partial implement `DataFrameStatFunctions`

2023-03-03 Thread via GitHub
hvanhovell commented on PR #40255: URL: https://github.com/apache/spark/pull/40255#issuecomment-1453850392 @LuciferYang can you update the binary compatibility tests? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] hvanhovell commented on a diff in pull request #40255: [SPARK-42558][CONNECT] Partial implement `DataFrameStatFunctions`

2023-03-03 Thread via GitHub
hvanhovell commented on code in PR #40255: URL: https://github.com/apache/spark/pull/40255#discussion_r1124775799 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/DataFrameStatFunctions.scala: ## @@ -0,0 +1,670 @@ +/* + * Licensed to the Apache Software Founda

[GitHub] [spark] chenhao-db commented on pull request #40264: [SPARK-42635][SQL][3.3] Fix the TimestampAdd expression

2023-03-03 Thread via GitHub
chenhao-db commented on PR #40264: URL: https://github.com/apache/spark/pull/40264#issuecomment-1453847644 @MaxGekk It seems that `checkErrorInExpression` doesn't exist in 3.3, so I still have to use the old `checkExceptionInExpression`. Is that okay? -- This is an automated message from

[GitHub] [spark] LuciferYang commented on pull request #40254: [SPARK-42654][BUILD] Upgrade dropwizard metrics 4.2.17

2023-03-03 Thread via GitHub
LuciferYang commented on PR #40254: URL: https://github.com/apache/spark/pull/40254#issuecomment-1453831820 friendly ping @dongjoon-hyun , I found the following error message in Java11&17 maven build log ``` Error: [ERROR] An error occurred attempting to read POM org.codehaus.pl

[GitHub] [spark] LuciferYang commented on a diff in pull request #40255: [SPARK-42558][CONNECT] Partial implement `DataFrameStatFunctions`

2023-03-03 Thread via GitHub
LuciferYang commented on code in PR #40255: URL: https://github.com/apache/spark/pull/40255#discussion_r1124736465 ## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/DataFrameStatFunctions.scala: ## @@ -0,0 +1,665 @@ +/* + * Licensed to the Apache Software Found

  1   2   >