[GitHub] [spark] LuciferYang commented on a diff in pull request #40552: [SPARK-42921][SQL][TESTS] Split `timestampNTZ/datetime-special.sql` into w/ and w/o `ansi` suffix to pass sql analyzer test in

2023-03-24 Thread via GitHub
LuciferYang commented on code in PR #40552: URL: https://github.com/apache/spark/pull/40552#discussion_r1148309479 ## sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/ThriftServerQueryTestSuite.scala: ## @@ -88,6 +88,8 @@ class

[GitHub] [spark] yaooqinn commented on pull request #40533: [SPARK-42906][K8S] Resource name prefix should start with an alphabetic character

2023-03-24 Thread via GitHub
yaooqinn commented on PR #40533: URL: https://github.com/apache/spark/pull/40533#issuecomment-1483728354 Hi @dongjoon-hyun thanks for pinging me, SGTM -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] mridulm commented on pull request #40548: [Minor][Core] Remove unused variables and method in Spark listeners

2023-03-24 Thread via GitHub
mridulm commented on PR #40548: URL: https://github.com/apache/spark/pull/40548#issuecomment-1483720147 There is a minor build failure (unused import) - but please feel free to merge after fixing it and CI passes. -- This is an automated message from the Apache Git Service. To respond to

[GitHub] [spark] dtenedor commented on pull request #40496: [SPARK-42874][SQL] Enable new golden file test framework for analysis for all input files

2023-03-24 Thread via GitHub
dtenedor commented on PR #40496: URL: https://github.com/apache/spark/pull/40496#issuecomment-1483703189 Looks like @LuciferYang fixed it with https://github.com/apache/spark/pull/40552. Thanks so much for the fix! -- This is an automated message from the Apache Git Service. To respond

[GitHub] [spark] dtenedor commented on pull request #40552: [SPARK-42921][SQL][TESTS] Split `timestampNTZ/datetime-special.sql` into w/ and w/o `ansi` suffix to pass sql analyzer test in ansi mode

2023-03-24 Thread via GitHub
dtenedor commented on PR #40552: URL: https://github.com/apache/spark/pull/40552#issuecomment-1483702380 I am so sorry to break the build again :| thanks for fixing it! It looks like we need separate regular and ANSI test cases now! -- This is an automated message from the Apache Git

[GitHub] [spark] LuciferYang commented on pull request #40552: [SPARK-42921][SQL][TESTS] Split `timestampNTZ/datetime-special.sql` into w/ and w/o `ansi` for fix sql analyzer test

2023-03-24 Thread via GitHub
LuciferYang commented on PR #40552: URL: https://github.com/apache/spark/pull/40552#issuecomment-1483701403 cc @dtenedor @HyukjinKwon FYI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] LuciferYang opened a new pull request, #40552: [SPARK-42921][SQL][TESTS] Split `timestampNTZ/datetime-special.sql` into w/ and w/o `ansi` for test

2023-03-24 Thread via GitHub
LuciferYang opened a new pull request, #40552: URL: https://github.com/apache/spark/pull/40552 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ###

[GitHub] [spark] LuciferYang commented on pull request #40496: [SPARK-42874][SQL] Enable new golden file test framework for analysis for all input files

2023-03-24 Thread via GitHub
LuciferYang commented on PR #40496: URL: https://github.com/apache/spark/pull/40496#issuecomment-1483683148 ``` [info] - timestampNTZ/datetime-special.sql_analyzer_test *** FAILED *** (11 milliseconds) [info] timestampNTZ/datetime-special.sql_analyzer_test [info] Expected

[GitHub] [spark] pan3793 commented on pull request #40533: [SPARK-42906][K8S] Resource name prefix should start with an alphabetic character

2023-03-24 Thread via GitHub
pan3793 commented on PR #40533: URL: https://github.com/apache/spark/pull/40533#issuecomment-1483677257 > To @pan3793 and @yaooqinn . IMO, what we need is only one additional line at the end of the replacement. WDYT? > > ```scala > .replaceAll("^[0-9]", "x") > ``` Yes,

[GitHub] [spark] pan3793 commented on a diff in pull request #40533: [SPARK-42906][K8S] Resource name prefix should start with an alphabetic character

2023-03-24 Thread via GitHub
pan3793 commented on code in PR #40533: URL: https://github.com/apache/spark/pull/40533#discussion_r1148249740 ## resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesConf.scala: ## @@ -259,9 +259,9 @@ private[spark] object KubernetesConf {

[GitHub] [spark] itholic commented on pull request #40540: [SPARK-42914][PYTHON] Reuse `transformUnregisteredFunction` for `DistributedSequenceID`.

2023-03-24 Thread via GitHub
itholic commented on PR #40540: URL: https://github.com/apache/spark/pull/40540#issuecomment-1483670465 Updated! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] LuciferYang commented on a diff in pull request #39124: [SPARK-42913][BUILD] Upgrade Hadoop to 3.3.5

2023-03-24 Thread via GitHub
LuciferYang commented on code in PR #39124: URL: https://github.com/apache/spark/pull/39124#discussion_r1148241568 ## dev/deps/spark-deps-hadoop-3-hive-2.3: ## @@ -116,7 +116,6 @@ janino/3.1.9//janino-3.1.9.jar javassist/3.25.0-GA//javassist-3.25.0-GA.jar

[GitHub] [spark] LuciferYang commented on a diff in pull request #39124: [SPARK-42913][BUILD] Upgrade Hadoop to 3.3.5

2023-03-24 Thread via GitHub
LuciferYang commented on code in PR #39124: URL: https://github.com/apache/spark/pull/39124#discussion_r1148203209 ## dev/deps/spark-deps-hadoop-3-hive-2.3: ## @@ -116,7 +116,6 @@ janino/3.1.9//janino-3.1.9.jar javassist/3.25.0-GA//javassist-3.25.0-GA.jar

[GitHub] [spark] LuciferYang commented on pull request #39124: [SPARK-42913][BUILD] Upgrade Hadoop to 3.3.5

2023-03-24 Thread via GitHub
LuciferYang commented on PR #39124: URL: https://github.com/apache/spark/pull/39124#issuecomment-1483639184 > Shall we exclude the following dependencies from our side and let the user add them if they need? > > ``` > cos_api-bundle/5.6.69//cos_api-bundle-5.6.69.jar >

[GitHub] [spark] ritikam2 commented on pull request #40116: [SPARK-41391][SQL] The output column name of groupBy.agg(count_distinct) is incorrect

2023-03-24 Thread via GitHub
ritikam2 commented on PR #40116: URL: https://github.com/apache/spark/pull/40116#issuecomment-1483606678 Right. This is simple 1 file fix with addition of test case versus the other one which may involve number of files. -- This is an automated message from the Apache Git Service. To

[GitHub] [spark] ryan-johnson-databricks opened a new pull request, #40551: [SPARK] Project implements ExposesMetadataColumns

2023-03-24 Thread via GitHub
ryan-johnson-databricks opened a new pull request, #40551: URL: https://github.com/apache/spark/pull/40551 ### What changes were proposed in this pull request? NOTE: This is a stacked pull request. Ignore the bottom two commits. The work that `AddMetadataColumns`

[GitHub] [spark] ryan-johnson-databricks opened a new pull request, #40550: [SPARK] LogicalPlan.metadataOutput always contains AttributeReference

2023-03-24 Thread via GitHub
ryan-johnson-databricks opened a new pull request, #40550: URL: https://github.com/apache/spark/pull/40550 ### What changes were proposed in this pull request? Today, `LogicalPlan.metadataOutput` is a `Seq[Attribute]`. However, it always contains `AttributeReference`, because

[GitHub] [spark] ueshin opened a new pull request, #40549: [SPARK-42920][CONNECT][PYTHON] Enable tests for UDF with UDT

2023-03-24 Thread via GitHub
ueshin opened a new pull request, #40549: URL: https://github.com/apache/spark/pull/40549 ### What changes were proposed in this pull request? Enables tests for UDF with UDT. ### Why are the changes needed? Now that UDF with UDT should work, the related tests should be

[GitHub] [spark] chenhao-db commented on pull request #40429: [SPARK-42775][SQL] Throw exception when ApproximatePercentile result doesn't fit into output decimal type.

2023-03-24 Thread via GitHub
chenhao-db commented on PR #40429: URL: https://github.com/apache/spark/pull/40429#issuecomment-1483544934 Hi @LuciferYang, could you help me review this PR? Or do you know who would be more suitable to review it? Thanks a lot! -- This is an automated message from the Apache Git Service.

[GitHub] [spark] gengliangwang opened a new pull request, #40548: [Minor][Core] Remove unused variables and method in Spark listeners

2023-03-24 Thread via GitHub
gengliangwang opened a new pull request, #40548: URL: https://github.com/apache/spark/pull/40548 ### What changes were proposed in this pull request? Remove unused variables and method in Spark listeners ### Why are the changes needed? Code cleanup ###

[GitHub] [spark] revans2 commented on pull request #40524: [SPARK-42898][SQL] Mark that string/date casts do not need time zone id

2023-03-24 Thread via GitHub
revans2 commented on PR #40524: URL: https://github.com/apache/spark/pull/40524#issuecomment-1483436510 @MaxGekk This might take a little while. I don't want to delete the test, but I also cannot just switch the data types over to Timestamp because the partition string is not always stored

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #40533: [SPARK-42906][K8S] Resource name prefix should start with an alphabetic character

2023-03-24 Thread via GitHub
dongjoon-hyun commented on code in PR #40533: URL: https://github.com/apache/spark/pull/40533#discussion_r1147984980 ## resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesConf.scala: ## @@ -259,9 +259,9 @@ private[spark] object KubernetesConf

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #40533: [SPARK-42906][K8S] Resource name prefix should start with an alphabetic character

2023-03-24 Thread via GitHub
dongjoon-hyun commented on code in PR #40533: URL: https://github.com/apache/spark/pull/40533#discussion_r1147983528 ## resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/KubernetesConfSuite.scala: ## @@ -252,6 +252,14 @@ class KubernetesConfSuite

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #40533: [SPARK-42906][K8S] Resource name prefix should start with an alphabetic character

2023-03-24 Thread via GitHub
dongjoon-hyun commented on code in PR #40533: URL: https://github.com/apache/spark/pull/40533#discussion_r1147982928 ## resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesConf.scala: ## @@ -259,9 +259,9 @@ private[spark] object KubernetesConf

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #40533: [SPARK-42906][K8S] Resource name prefix should start with an alphabetic character

2023-03-24 Thread via GitHub
dongjoon-hyun commented on code in PR #40533: URL: https://github.com/apache/spark/pull/40533#discussion_r1147981474 ## resource-managers/kubernetes/core/src/test/scala/org/apache/spark/deploy/k8s/KubernetesConfSuite.scala: ## @@ -252,6 +252,14 @@ class KubernetesConfSuite

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #40533: [SPARK-42906][K8S] Resource name prefix should start with an alphabetic character

2023-03-24 Thread via GitHub
dongjoon-hyun commented on code in PR #40533: URL: https://github.com/apache/spark/pull/40533#discussion_r1147981215 ## resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesConf.scala: ## @@ -259,9 +259,9 @@ private[spark] object KubernetesConf

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #40543: [SPARK-42916][SQL] JDBCTableCatalog Keeps Char/Varchar meta on the read-side

2023-03-24 Thread via GitHub
dongjoon-hyun commented on code in PR #40543: URL: https://github.com/apache/spark/pull/40543#discussion_r1147975618 ## connector/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/OracleIntegrationSuite.scala: ## @@ -86,6 +86,8 @@ class OracleIntegrationSuite

[GitHub] [spark] dtenedor commented on pull request #40496: [SPARK-42874][SQL] Enable new golden file test framework for analysis for all input files

2023-03-24 Thread via GitHub
dtenedor commented on PR #40496: URL: https://github.com/apache/spark/pull/40496#issuecomment-1483263177 @HyukjinKwon I ran the test locally and it passes. Maybe it is fixed at head now? -- This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [spark] ueshin commented on pull request #40538: [SPARK-42911][PYTHON] Introduce more basic exceptions

2023-03-24 Thread via GitHub
ueshin commented on PR #40538: URL: https://github.com/apache/spark/pull/40538#issuecomment-1483241667 @HyukjinKwon #40547 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] ueshin opened a new pull request, #40547: [SPARK-42911][PYTHON][3.4] Introduce more basic exceptions

2023-03-24 Thread via GitHub
ueshin opened a new pull request, #40547: URL: https://github.com/apache/spark/pull/40547 ### What changes were proposed in this pull request? Introduces more basic exceptions. - ArithmeticException - ArrayIndexOutOfBoundsException - DateTimeException -

[GitHub] [spark] ueshin commented on a diff in pull request #40526: [SPARK-42899][SQL] Fix DataFrame.to(schema) to handle the case where there is a non-nullable nested field in a nullable field

2023-03-24 Thread via GitHub
ueshin commented on code in PR #40526: URL: https://github.com/apache/spark/pull/40526#discussion_r1147913599 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala: ## @@ -119,7 +120,7 @@ object Project { case

[GitHub] [spark] ueshin opened a new pull request, #40546: [SPARK-42899][SQL][FOLLOWUP] Project.reconcileColumnType should use KnownNotNull instead of AssertNotNull

2023-03-24 Thread via GitHub
ueshin opened a new pull request, #40546: URL: https://github.com/apache/spark/pull/40546 ### What changes were proposed in this pull request? This is a follow-up of #40526. `Project.reconcileColumnType` should use `KnownNotNull` instead of `AssertNotNull`, also only when

[GitHub] [spark] pan3793 commented on pull request #38357: [SPARK-40887][K8S] Allow Spark on K8s to integrate w/ Log Service

2023-03-24 Thread via GitHub
pan3793 commented on PR #38357: URL: https://github.com/apache/spark/pull/38357#issuecomment-1483210805 Update PR state. Currently, the PR is stuck at "Is it good to let 3rd-party log service use POD NAME to access Driver log?" In

[GitHub] [spark] pan3793 commented on a diff in pull request #38357: [SPARK-40887][K8S] Allow Spark on K8s to integrate w/ Log Service

2023-03-24 Thread via GitHub
pan3793 commented on code in PR #38357: URL: https://github.com/apache/spark/pull/38357#discussion_r1054620568 ## core/src/main/scala/org/apache/spark/scheduler/SchedulerBackend.scala: ## @@ -74,14 +76,24 @@ private[spark] trait SchedulerBackend { * Executors tab for the

[GitHub] [spark] pan3793 commented on pull request #39160: [SPARK-41667][K8S] Expose env var SPARK_DRIVER_POD_NAME in Driver Pod

2023-03-24 Thread via GitHub
pan3793 commented on PR #39160: URL: https://github.com/apache/spark/pull/39160#issuecomment-1483184326 Another case, [GoogleCloudPlatform/spark-on-k8s-operator](https://github.com/GoogleCloudPlatform/spark-on-k8s-operator) also use Pod Name to fetch driver and executor log

[GitHub] [spark] amaliujia commented on pull request #40537: [SPARK-42202][CONNECT][TEST][FOLLOWUP] Loop around command entry in SimpleSparkConnectService

2023-03-24 Thread via GitHub
amaliujia commented on PR #40537: URL: https://github.com/apache/spark/pull/40537#issuecomment-1483184075 Late LGTM! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] ueshin commented on a diff in pull request #40526: [SPARK-42899][SQL] Fix DataFrame.to(schema) to handle the case where there is a non-nullable nested field in a nullable field

2023-03-24 Thread via GitHub
ueshin commented on code in PR #40526: URL: https://github.com/apache/spark/pull/40526#discussion_r1147885076 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala: ## @@ -119,7 +120,7 @@ object Project { case

[GitHub] [spark] pan3793 commented on a diff in pull request #40533: [SPARK-42906][K8S] Resource name prefix should start with an alphabetic character

2023-03-24 Thread via GitHub
pan3793 commented on code in PR #40533: URL: https://github.com/apache/spark/pull/40533#discussion_r1145950352 ## resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesConf.scala: ## @@ -255,13 +255,17 @@ private[spark] object KubernetesConf {

[GitHub] [spark] hvanhovell closed pull request #40515: [SPARK-42884][CONNECT] Add Ammonite REPL integration

2023-03-24 Thread via GitHub
hvanhovell closed pull request #40515: [SPARK-42884][CONNECT] Add Ammonite REPL integration URL: https://github.com/apache/spark/pull/40515 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] hvanhovell commented on a diff in pull request #40515: [SPARK-42884][CONNECT] Add Ammonite REPL integration

2023-03-24 Thread via GitHub
hvanhovell commented on code in PR #40515: URL: https://github.com/apache/spark/pull/40515#discussion_r1147863450 ## connector/connect/bin/spark-connect-scala-client: ## @@ -46,6 +45,4 @@ build/sbt "${SCALA_ARG}" "sql/package;connect-client-jvm/assembly"

[GitHub] [spark] hvanhovell commented on pull request #40515: [SPARK-42884][CONNECT] Add Ammonite REPL integration

2023-03-24 Thread via GitHub
hvanhovell commented on PR #40515: URL: https://github.com/apache/spark/pull/40515#issuecomment-1483149265 Merging this one. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [spark] pan3793 commented on a diff in pull request #40533: [SPARK-42906][K8S] Resource name prefix should start with an alphabetic character

2023-03-24 Thread via GitHub
pan3793 commented on code in PR #40533: URL: https://github.com/apache/spark/pull/40533#discussion_r1147855519 ## resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesConf.scala: ## @@ -255,13 +255,17 @@ private[spark] object KubernetesConf {

[GitHub] [spark] pan3793 commented on a diff in pull request #40533: [SPARK-42906][K8S] Resource name prefix should start with an alphabetic character

2023-03-24 Thread via GitHub
pan3793 commented on code in PR #40533: URL: https://github.com/apache/spark/pull/40533#discussion_r1147855130 ## resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesConf.scala: ## @@ -255,13 +255,17 @@ private[spark] object KubernetesConf {

[GitHub] [spark] dtenedor commented on pull request #40496: [SPARK-42874][SQL] Enable new golden file test framework for analysis for all input files

2023-03-24 Thread via GitHub
dtenedor commented on PR #40496: URL: https://github.com/apache/spark/pull/40496#issuecomment-1483097427 Sure, I can take a look. On Fri, Mar 24, 2023 at 3:12 AM Hyukjin Kwon ***@***.***> wrote: > I think ANSI test fails after this PR: > > [info] -

[GitHub] [spark] sunchao commented on a diff in pull request #39124: [SPARK-42913][BUILD] Upgrade Hadoop to 3.3.5

2023-03-24 Thread via GitHub
sunchao commented on code in PR #39124: URL: https://github.com/apache/spark/pull/39124#discussion_r1147789204 ## dev/deps/spark-deps-hadoop-3-hive-2.3: ## @@ -64,17 +65,18 @@ gcs-connector/hadoop3-2.2.7/shaded/gcs-connector-hadoop3-2.2.7-shaded.jar

[GitHub] [spark] sunchao commented on a diff in pull request #39124: [SPARK-42913][BUILD] Upgrade Hadoop to 3.3.5

2023-03-24 Thread via GitHub
sunchao commented on code in PR #39124: URL: https://github.com/apache/spark/pull/39124#discussion_r1147783122 ## dev/deps/spark-deps-hadoop-3-hive-2.3: ## @@ -51,6 +51,7 @@ commons-math3/3.6.1//commons-math3-3.6.1.jar commons-pool/1.5.4//commons-pool-1.5.4.jar

[GitHub] [spark] yaooqinn commented on pull request #40544: [SPARK-42917][SQL] Correct getUpdateColumnNullabilityQuery for DerbyDialect

2023-03-24 Thread via GitHub
yaooqinn commented on PR #40544: URL: https://github.com/apache/spark/pull/40544#issuecomment-1482882715 thanks, merged to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] yaooqinn closed pull request #40544: [SPARK-42917][SQL] Correct getUpdateColumnNullabilityQuery for DerbyDialect

2023-03-24 Thread via GitHub
yaooqinn closed pull request #40544: [SPARK-42917][SQL] Correct getUpdateColumnNullabilityQuery for DerbyDialect URL: https://github.com/apache/spark/pull/40544 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] shrprasa commented on pull request #40258: [SPARK-42655][SQL] Incorrect ambiguous column reference error

2023-03-24 Thread via GitHub
shrprasa commented on PR #40258: URL: https://github.com/apache/spark/pull/40258#issuecomment-1482819187 FWIW Both the use cases were working fine in Spark 2.3 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] shrprasa commented on pull request #40258: [SPARK-42655][SQL] Incorrect ambiguous column reference error

2023-03-24 Thread via GitHub
shrprasa commented on PR #40258: URL: https://github.com/apache/spark/pull/40258#issuecomment-1482792058 > I think case 1 works by accident. It's not an intentional design. I don't think it's a bug that case 2 doesn't work. As I had said in previous comment : Not sure about the

[GitHub] [spark] cloud-fan commented on pull request #40258: [SPARK-42655][SQL] Incorrect ambiguous column reference error

2023-03-24 Thread via GitHub
cloud-fan commented on PR #40258: URL: https://github.com/apache/spark/pull/40258#issuecomment-1482756439 I think case 1 works by accident. It's not an intentional design. I don't think it's a bug that case 2 doesn't work. -- This is an automated message from the Apache Git Service. To

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #40535: [SPARK-42907][CONNECT][PYTHON] Implement Avro functions

2023-03-24 Thread via GitHub
HyukjinKwon commented on code in PR #40535: URL: https://github.com/apache/spark/pull/40535#discussion_r1147474920 ## python/pyspark/sql/connect/avro/functions.py: ## @@ -0,0 +1,108 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #40535: [SPARK-42907][CONNECT][PYTHON] Implement Avro functions

2023-03-24 Thread via GitHub
HyukjinKwon commented on code in PR #40535: URL: https://github.com/apache/spark/pull/40535#discussion_r1147472671 ## dev/sparktestsupport/modules.py: ## @@ -747,6 +747,7 @@ def __hash__(self): "pyspark.sql.connect.readwriter", Review Comment: oh sorry. Yes, you're

[GitHub] [spark] johanl-db opened a new pull request, #40545: [WIP][SPARK-42918] Introduce abstractions to create constant and generated metadata fields

2023-03-24 Thread via GitHub
johanl-db opened a new pull request, #40545: URL: https://github.com/apache/spark/pull/40545 ### What changes were proposed in this pull request? This change refactors the metadata attribute introduced in https://github.com/apache/spark/pull/39314 to allow easier creation and

[GitHub] [spark] yaooqinn commented on pull request #40544: [SPARK-42917][SQL] Correct getUpdateColumnNullabilityQuery for DerbyDialect

2023-03-24 Thread via GitHub
yaooqinn commented on PR #40544: URL: https://github.com/apache/spark/pull/40544#issuecomment-1482580099 cc @cloud-fan @HyukjinKwon @dongjoon-hyun thanks -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] yaooqinn opened a new pull request, #40544: [SPARK-42917][SQL] Correct getUpdateColumnNullabilityQuery for DerbyDialect

2023-03-24 Thread via GitHub
yaooqinn opened a new pull request, #40544: URL: https://github.com/apache/spark/pull/40544 ### What changes were proposed in this pull request? Fix nullability clause for derby dialect, according to the official derby lang ref guide. ### Why are the

[GitHub] [spark] WeichenXu123 commented on a diff in pull request #40520: [SPARK-42896][SQL][PYTHON] Make `mapInPandas` / `mapInArrow` support barrier mode execution

2023-03-24 Thread via GitHub
WeichenXu123 commented on code in PR #40520: URL: https://github.com/apache/spark/pull/40520#discussion_r1147384917 ## python/pyspark/sql/pandas/map_ops.py: ## @@ -60,6 +62,7 @@ def mapInPandas( schema : :class:`pyspark.sql.types.DataType` or str the

[GitHub] [spark] HyukjinKwon commented on pull request #40538: [SPARK-42911][PYTHON] Introduce more basic exceptions

2023-03-24 Thread via GitHub
HyukjinKwon commented on PR #40538: URL: https://github.com/apache/spark/pull/40538#issuecomment-1482560940 @ueshin it has a conflict w/ branch-3.4. would you mind creating a backport PR? -- This is an automated message from the Apache Git Service. To respond to the message, please log

[GitHub] [spark] HyukjinKwon closed pull request #40538: [SPARK-42911][PYTHON] Introduce more basic exceptions

2023-03-24 Thread via GitHub
HyukjinKwon closed pull request #40538: [SPARK-42911][PYTHON] Introduce more basic exceptions URL: https://github.com/apache/spark/pull/40538 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] HyukjinKwon commented on pull request #40538: [SPARK-42911][PYTHON] Introduce more basic exceptions

2023-03-24 Thread via GitHub
HyukjinKwon commented on PR #40538: URL: https://github.com/apache/spark/pull/40538#issuecomment-1482558850 Merged to master and branch-3.4. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] HyukjinKwon commented on pull request #40496: [SPARK-42874][SQL] Enable new golden file test framework for analysis for all input files

2023-03-24 Thread via GitHub
HyukjinKwon commented on PR #40496: URL: https://github.com/apache/spark/pull/40496#issuecomment-1482557118 I think ANSI test fails after this PR: ``` [info] - timestampNTZ/datetime-special.sql_analyzer_test *** FAILED *** (31 milliseconds) [info]

[GitHub] [spark] shrprasa commented on a diff in pull request #40258: [SPARK-42655][SQL] Incorrect ambiguous column reference error

2023-03-24 Thread via GitHub
shrprasa commented on code in PR #40258: URL: https://github.com/apache/spark/pull/40258#discussion_r1147356838 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/package.scala: ## @@ -258,7 +258,7 @@ package object expressions { case (Seq(), _)

[GitHub] [spark] yaooqinn commented on pull request #40543: [SPARK-42916][SQL] JDBCTableCatalog Keeps Char/Varchar meta on the read-side

2023-03-24 Thread via GitHub
yaooqinn commented on PR #40543: URL: https://github.com/apache/spark/pull/40543#issuecomment-1482531153 cc @cloud-fan @HyukjinKwon @dongjoon-hyun thanks -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] yaooqinn opened a new pull request, #40543: [SPARK-42916][SQL] JDBCTableCatalog Keeps Char/Varchar meta on the read-side

2023-03-24 Thread via GitHub
yaooqinn opened a new pull request, #40543: URL: https://github.com/apache/spark/pull/40543 ### What changes were proposed in this pull request? In this PR, we make the JDBCTableCatalog mapping the Char/Varchar to the raw implementation to avoid losing meta

[GitHub] [spark] xinrong-meng commented on pull request #40539: [SPARK-42891][CONNECT][PYTHON][3.4] Implement CoGrouped Map API

2023-03-24 Thread via GitHub
xinrong-meng commented on PR #40539: URL: https://github.com/apache/spark/pull/40539#issuecomment-1482442079 Merged to branch-3.4, thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] xinrong-meng closed pull request #40539: [SPARK-42891][CONNECT][PYTHON][3.4] Implement CoGrouped Map API

2023-03-24 Thread via GitHub
xinrong-meng closed pull request #40539: [SPARK-42891][CONNECT][PYTHON][3.4] Implement CoGrouped Map API URL: https://github.com/apache/spark/pull/40539 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] allisonwang-db commented on a diff in pull request #40536: [SPARK-42895][CONNECT] Improve error messages for stopped Spark sessions

2023-03-24 Thread via GitHub
allisonwang-db commented on code in PR #40536: URL: https://github.com/apache/spark/pull/40536#discussion_r1147266298 ## python/pyspark/sql/connect/client.py: ## @@ -513,8 +513,11 @@ class SparkConnectClient(object): """ @classmethod -def retry_exception(cls, e:

[GitHub] [spark] itholic commented on pull request #40540: [SPARK-42914][PYTHON] Reuse `transformUnregisteredFunction` for `DistributedSequenceID`.

2023-03-24 Thread via GitHub
itholic commented on PR #40540: URL: https://github.com/apache/spark/pull/40540#issuecomment-1482441552 Thanks for the review @zhengruifeng ! Will adress the comments soon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] zhengruifeng commented on a diff in pull request #40535: [SPARK-42907][CONNECT][PYTHON] Implement Avro functions

2023-03-24 Thread via GitHub
zhengruifeng commented on code in PR #40535: URL: https://github.com/apache/spark/pull/40535#discussion_r1147254339 ## dev/sparktestsupport/modules.py: ## @@ -747,6 +747,7 @@ def __hash__(self): "pyspark.sql.connect.readwriter", Review Comment: thanks, let me have

[GitHub] [spark] cloud-fan commented on a diff in pull request #40258: [SPARK-42655][SQL] Incorrect ambiguous column reference error

2023-03-24 Thread via GitHub
cloud-fan commented on code in PR #40258: URL: https://github.com/apache/spark/pull/40258#discussion_r1147252040 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/package.scala: ## @@ -258,7 +258,7 @@ package object expressions { case (Seq(), _)

[GitHub] [spark] zhengruifeng commented on a diff in pull request #40540: [SPARK-42914][PYTHON] Reuse `transformUnregisteredFunction` for `DistributedSequenceID`.

2023-03-24 Thread via GitHub
zhengruifeng commented on code in PR #40540: URL: https://github.com/apache/spark/pull/40540#discussion_r1147243714 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala: ## @@ -1216,6 +1214,9 @@ class SparkConnectPlanner(val

[GitHub] [spark] zhengruifeng commented on a diff in pull request #40540: [SPARK-42914][PYTHON] Reuse `transformUnregisteredFunction` for `DistributedSequenceID`.

2023-03-24 Thread via GitHub
zhengruifeng commented on code in PR #40540: URL: https://github.com/apache/spark/pull/40540#discussion_r1147237303 ## python/pyspark/sql/connect/expressions.py: ## @@ -974,13 +974,3 @@ def to_plan(self, session: "SparkConnectClient") -> proto.Expression: def

[GitHub] [spark] Yikf commented on a diff in pull request #40437: [SPARK-41259][SQL] SparkSQLDriver Output schema and result string should be consistent

2023-03-24 Thread via GitHub
Yikf commented on code in PR #40437: URL: https://github.com/apache/spark/pull/40437#discussion_r1147228669 ## sql/core/src/main/scala/org/apache/spark/sql/execution/HiveResult.scala: ## @@ -50,36 +51,44 @@ object HiveResult { } /** - * Returns the result as a hive

[GitHub] [spark] panbingkun opened a new pull request, #40542: [SPARK-42915][SQL] Codegen Support for sentences

2023-03-24 Thread via GitHub
panbingkun opened a new pull request, #40542: URL: https://github.com/apache/spark/pull/40542 ### What changes were proposed in this pull request? The PR adds Codegen Support for sentences. ### Why are the changes needed? Improve codegen coverage and performance. ###

[GitHub] [spark] shrprasa commented on pull request #40258: [SPARK-42655][SQL] Incorrect ambiguous column reference error

2023-03-24 Thread via GitHub
shrprasa commented on PR #40258: URL: https://github.com/apache/spark/pull/40258#issuecomment-1482368131 > > It works because the resolved column has just one match > > But there are two id columns. Does Spark already do deduplication somewhere? Not sure about the

[GitHub] [spark] shrprasa commented on pull request #40128: [SPARK-42466][K8S]: Cleanup k8s upload directory when job terminates

2023-03-24 Thread via GitHub
shrprasa commented on PR #40128: URL: https://github.com/apache/spark/pull/40128#issuecomment-1482363384 Gentle Ping @dongjoon-hyun @holdenk -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] yaooqinn commented on pull request #40541: [SPARK-42861][SQL] Use private[sql] instead of protected[sql] to avoid generating API doc

2023-03-24 Thread via GitHub
yaooqinn commented on PR #40541: URL: https://github.com/apache/spark/pull/40541#issuecomment-1482332207 thanks, merged to master and 3.4 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] yaooqinn closed pull request #40541: [SPARK-42861][SQL] Use private[sql] instead of protected[sql] to avoid generating API doc

2023-03-24 Thread via GitHub
yaooqinn closed pull request #40541: [SPARK-42861][SQL] Use private[sql] instead of protected[sql] to avoid generating API doc URL: https://github.com/apache/spark/pull/40541 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] cloud-fan commented on pull request #40116: [SPARK-41391][SQL] The output column name of groupBy.agg(count_distinct) is incorrect

2023-03-24 Thread via GitHub
cloud-fan commented on PR #40116: URL: https://github.com/apache/spark/pull/40116#issuecomment-1482331121 > any UnresolvedFunction should have UnresolvedAlias. SGTM. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] cloud-fan commented on a diff in pull request #40400: [SPARK-41359][SQL] Use `PhysicalDataType` instead of DataType in UnsafeRow

2023-03-24 Thread via GitHub
cloud-fan commented on code in PR #40400: URL: https://github.com/apache/spark/pull/40400#discussion_r1147177710 ## sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeRow.java: ## @@ -73,48 +74,44 @@ public static int calculateBitSetWidthInBytes(int

[GitHub] [spark] yaooqinn commented on pull request #40531: [SPARK-42904][SQL] Char/Varchar Support for JDBC Catalog

2023-03-24 Thread via GitHub
yaooqinn commented on PR #40531: URL: https://github.com/apache/spark/pull/40531#issuecomment-1482330150 thanks, merged to master and 3.4 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] yaooqinn closed pull request #40531: [SPARK-42904][SQL] Char/Varchar Support for JDBC Catalog

2023-03-24 Thread via GitHub
yaooqinn closed pull request #40531: [SPARK-42904][SQL] Char/Varchar Support for JDBC Catalog URL: https://github.com/apache/spark/pull/40531 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] cloud-fan commented on pull request #40258: [SPARK-42655][SQL] Incorrect ambiguous column reference error

2023-03-24 Thread via GitHub
cloud-fan commented on PR #40258: URL: https://github.com/apache/spark/pull/40258#issuecomment-1482326317 > It works because the resolved column has just one match But there are two id columns. Does Spark already do deduplication somewhere? -- This is an automated message from the

[GitHub] [spark] cloud-fan commented on pull request #40462: [SPARK-42832][SQL] Remove repartition if it is the child of LocalLimit

2023-03-24 Thread via GitHub
cloud-fan commented on PR #40462: URL: https://github.com/apache/spark/pull/40462#issuecomment-1482318942 I don't quite get the rationale. For `SELECT /*+ REBALANCE */ * FROM t WHERE id > 1 LIMIT 5;`, the user explicitly requires to do a rebalance before limit, why do we remove it? It's a