Re: [PR] [SPARK-44685][SQL] Remove deprecated Catalog#createExternalTable [spark]

2023-11-29 Thread via GitHub
LuciferYang commented on PR #42356: URL: https://github.com/apache/spark/pull/42356#issuecomment-1831418085 > > We should only remove an API if there is clear evidence showing that no one is using it. > > Thanks, I will open a discussion in the mailing list after 3.5.0 released.

Re: [PR] [SPARK-46052][CORE] Remove function TaskScheduler.killAllTaskAttempts [spark]

2023-11-29 Thread via GitHub
beliefer commented on code in PR #43954: URL: https://github.com/apache/spark/pull/43954#discussion_r1408964711 ## core/src/test/scala/org/apache/spark/scheduler/TaskSchedulerImplSuite.scala: ## @@ -1671,37 +1671,6 @@ class TaskSchedulerImplSuite extends SparkFunSuite with Loca

[PR] [DO-NOT-REVIEW][DRAFT] Spark 45637 multiple state test [spark]

2023-11-29 Thread via GitHub
WweiL opened a new pull request, #44076: URL: https://github.com/apache/spark/pull/44076 ### What changes were proposed in this pull request? logs: ``` -=-=-=-=-=-=-= currentBatchId = 0, lastExecutionRequiresAnotherBatch = false, isNewDataAvailable = true, should

Re: [PR] [DO-NOT-REVIEW][DRAFT] Spark 45637 multiple state test [spark]

2023-11-29 Thread via GitHub
WweiL commented on code in PR #44076: URL: https://github.com/apache/spark/pull/44076#discussion_r1408970548 ## sql/core/src/test/scala/org/apache/spark/sql/streaming/MultiStatefulOperatorsSuite.scala: ## @@ -878,6 +878,147 @@ class MultiStatefulOperatorsSuite testOutputWat

Re: [PR] [SPARK-46052][CORE] Remove function TaskScheduler.killAllTaskAttempts [spark]

2023-11-29 Thread via GitHub
beliefer commented on code in PR #43954: URL: https://github.com/apache/spark/pull/43954#discussion_r1408972767 ## core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala: ## @@ -296,18 +296,31 @@ private[spark] class TaskSchedulerImpl( new TaskSetManager(thi

Re: [PR] [SPARK-46052][CORE] Remove function TaskScheduler.killAllTaskAttempts [spark]

2023-11-29 Thread via GitHub
Ngone51 commented on code in PR #43954: URL: https://github.com/apache/spark/pull/43954#discussion_r1408984350 ## core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala: ## @@ -1894,24 +1894,8 @@ private[spark] class DAGScheduler( job.numFinished +=

Re: [PR] [SPARK-42492][SQL] Add new function filter_value [spark]

2023-11-29 Thread via GitHub
beliefer commented on PR #40085: URL: https://github.com/apache/spark/pull/40085#issuecomment-1831515735 I got it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubs

Re: [PR] [SPARK-46052][CORE] Remove function TaskScheduler.killAllTaskAttempts [spark]

2023-11-29 Thread via GitHub
Ngone51 commented on code in PR #43954: URL: https://github.com/apache/spark/pull/43954#discussion_r1408994168 ## core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala: ## @@ -296,18 +296,31 @@ private[spark] class TaskSchedulerImpl( new TaskSetManager(this

Re: [PR] [DO-NOT-REVIEW][DRAFT] Spark 45637 multiple state test [spark]

2023-11-29 Thread via GitHub
WweiL commented on code in PR #44076: URL: https://github.com/apache/spark/pull/44076#discussion_r1408970548 ## sql/core/src/test/scala/org/apache/spark/sql/streaming/MultiStatefulOperatorsSuite.scala: ## @@ -878,6 +878,147 @@ class MultiStatefulOperatorsSuite testOutputWat

Re: [PR] [SPARK-46052][CORE] Remove function TaskScheduler.killAllTaskAttempts [spark]

2023-11-29 Thread via GitHub
beliefer commented on code in PR #43954: URL: https://github.com/apache/spark/pull/43954#discussion_r1409052716 ## core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala: ## @@ -1894,24 +1894,8 @@ private[spark] class DAGScheduler( job.numFinished +

Re: [PR] [SPARK-46145][SQL] spark.catalog.listTables does not throw exception when the table or view is not found [spark]

2023-11-29 Thread via GitHub
beliefer commented on code in PR #44061: URL: https://github.com/apache/spark/pull/44061#discussion_r1409060815 ## sql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala: ## @@ -146,38 +147,44 @@ class CatalogImpl(sparkSession: SparkSession) extends Catalog {

Re: [PR] [SPARK-46145][SQL] spark.catalog.listTables does not throw exception when the table or view is not found [spark]

2023-11-29 Thread via GitHub
beliefer commented on code in PR #44061: URL: https://github.com/apache/spark/pull/44061#discussion_r1409060815 ## sql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala: ## @@ -146,38 +147,44 @@ class CatalogImpl(sparkSession: SparkSession) extends Catalog {

Re: [PR] [SPARK-46139][SQL][TESTS] Fix `QueryExecutionErrorsSuite` with Java 21 [spark]

2023-11-29 Thread via GitHub
nija-at commented on PR #44056: URL: https://github.com/apache/spark/pull/44056#issuecomment-1831623447 Amazing. Thank you! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [SPARK-46092][SQL] Don't push down Parquet row group filters that overflow [spark]

2023-11-29 Thread via GitHub
johanl-db commented on PR #44006: URL: https://github.com/apache/spark/pull/44006#issuecomment-1831624859 > It's unfortunate that the check for Spark type versus Parquet type happens in `ParquetVectorUpdaterFactory` which is after predicate pushdown for row groups. Will similar issue happen

[PR] [SPARK-46171][SQL][PYSPARK][R] Remove `HiveContext` from Apache Spark 4.0 [spark]

2023-11-29 Thread via GitHub
LuciferYang opened a new pull request, #44077: URL: https://github.com/apache/spark/pull/44077 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

[PR] [WIP][DOCS] Describe arguments of `encode()` [spark]

2023-11-29 Thread via GitHub
MaxGekk opened a new pull request, #44078: URL: https://github.com/apache/spark/pull/44078 ### What changes were proposed in this pull request? In the PR, I propose to update the description of the `Encode` expression and apparently the `encode()` function by describing the arguments `str

Re: [PR] [SPARK-46171][SQL][PYTHON][R][DOCS] Remove `HiveContext` from Apache Spark 4.0 [spark]

2023-11-29 Thread via GitHub
LuciferYang commented on PR #44077: URL: https://github.com/apache/spark/pull/44077#issuecomment-1831679067 Already send a email to dev mail list for discussion https://lists.apache.org/thread/qfznmh1dvjf9r3qn2qc8zkryk3x1t05w -- This is an automated message from the Apache Git Service. To

Re: [PR] [SPARK-23015][WINDOWS] Mitigate bug in Windows where starting multiple Spark instances within the same second causes a failure [spark]

2023-11-29 Thread via GitHub
LuciferYang commented on PR #43706: URL: https://github.com/apache/spark/pull/43706#issuecomment-1831708308 friendly ping @panbingkun , if convenient, such as through offline communication, please help to verify this patch on Windows. Thanks ~ -- This is an automated message from

Re: [PR] [SPARK-46036][SQL] Removing error-class from raise_error function [spark]

2023-11-29 Thread via GitHub
cloud-fan commented on PR #44004: URL: https://github.com/apache/spark/pull/44004#issuecomment-1831748410 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

Re: [PR] [SPARK-46036][SQL] Removing error-class from raise_error function [spark]

2023-11-29 Thread via GitHub
cloud-fan closed pull request #44004: [SPARK-46036][SQL] Removing error-class from raise_error function URL: https://github.com/apache/spark/pull/44004 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

Re: [PR] [SPARK-46052][CORE] Remove function TaskScheduler.killAllTaskAttempts [spark]

2023-11-29 Thread via GitHub
cloud-fan commented on code in PR #43954: URL: https://github.com/apache/spark/pull/43954#discussion_r1409183926 ## core/src/main/scala/org/apache/spark/internal/config/package.scala: ## @@ -2603,4 +2603,13 @@ package object config { .stringConf .toSequence

Re: [PR] [SPARK-44685][SQL] Remove deprecated Catalog#createExternalTable [spark]

2023-11-29 Thread via GitHub
Hisoka-X commented on PR #42356: URL: https://github.com/apache/spark/pull/42356#issuecomment-1831790628 > > > We should only remove an API if there is clear evidence showing that no one is using it. > > > > > > Thanks, I will open a discussion in the mailing list after 3.5.0 rel

Re: [PR] [SPARK-46172][SQL][DOCS] Describe arguments of `encode()` [spark]

2023-11-29 Thread via GitHub
beliefer commented on code in PR #44078: URL: https://github.com/apache/spark/pull/44078#discussion_r1409260683 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringExpressions.scala: ## @@ -2670,13 +2670,16 @@ case class StringDecode(bin: Expression, c

Re: [PR] [SPARK-46055][SQL][FOLLOWUP] Respect code style for scala and reduce the stack depth. [spark]

2023-11-29 Thread via GitHub
beliefer commented on PR #44079: URL: https://github.com/apache/spark/pull/44079#issuecomment-1831872346 ping @heyihong cc @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-44685][SQL] Remove deprecated Catalog#createExternalTable [spark]

2023-11-29 Thread via GitHub
LuciferYang commented on PR #42356: URL: https://github.com/apache/spark/pull/42356#issuecomment-1831874765 > > > > We should only remove an API if there is clear evidence showing that no one is using it. > > > > > > > > > Thanks, I will open a discussion in the mailing list after

Re: [PR] [SPARK-46043][SQL] Support create table using DSv2 sources [spark]

2023-11-29 Thread via GitHub
beliefer commented on code in PR #43949: URL: https://github.com/apache/spark/pull/43949#discussion_r1409265752 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/connector/SimpleTableProvider.scala: ## @@ -31,6 +31,7 @@ trait SimpleTableProvider extends TableProvider

Re: [PR] [SPARK-46052][CORE] Remove function TaskScheduler.killAllTaskAttempts [spark]

2023-11-29 Thread via GitHub
cloud-fan commented on code in PR #43954: URL: https://github.com/apache/spark/pull/43954#discussion_r1409276985 ## core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala: ## @@ -1004,6 +1004,17 @@ private[spark] class TaskSetManager( maybeFinishTaskSet() }

Re: [PR] [SPARK-46055][SQL][FOLLOWUP] Respect code style for scala and reduce the stack depth. [spark]

2023-11-29 Thread via GitHub
cloud-fan commented on code in PR #44079: URL: https://github.com/apache/spark/pull/44079#discussion_r1409292351 ## sql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala: ## @@ -503,7 +503,7 @@ class CatalogImpl(sparkSession: SparkSession) extends Catalog {

Re: [PR] [SPARK-46052][CORE] Remove function TaskScheduler.killAllTaskAttempts [spark]

2023-11-29 Thread via GitHub
Ngone51 commented on code in PR #43954: URL: https://github.com/apache/spark/pull/43954#discussion_r1409325407 ## core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala: ## @@ -1004,6 +1004,17 @@ private[spark] class TaskSetManager( maybeFinishTaskSet() }

Re: [PR] [SPARK-46137] update janino to version 3.1.11 [spark]

2023-11-29 Thread via GitHub
igreenfield commented on PR #44053: URL: https://github.com/apache/spark/pull/44053#issuecomment-1832046214 @LuciferYang What part of performance results of tpcds is needed? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] [SPARK-46052][CORE] Remove function TaskScheduler.killAllTaskAttempts [spark]

2023-11-29 Thread via GitHub
mridulm commented on code in PR #43954: URL: https://github.com/apache/spark/pull/43954#discussion_r1409479495 ## core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala: ## @@ -1894,24 +1894,8 @@ private[spark] class DAGScheduler( job.numFinished +=

Re: [PR] [SPARK-46052][CORE] Remove function TaskScheduler.killAllTaskAttempts [spark]

2023-11-29 Thread via GitHub
mridulm commented on code in PR #43954: URL: https://github.com/apache/spark/pull/43954#discussion_r1409488053 ## core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala: ## @@ -1871,21 +1871,6 @@ private[spark] class DAGScheduler( markStageAsFinis

Re: [PR] [SPARK-46052][CORE] Remove function TaskScheduler.killAllTaskAttempts [spark]

2023-11-29 Thread via GitHub
mridulm commented on code in PR #43954: URL: https://github.com/apache/spark/pull/43954#discussion_r1409488053 ## core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala: ## @@ -1871,21 +1871,6 @@ private[spark] class DAGScheduler( markStageAsFinis

Re: [PR] [SPARK-46052][CORE] Remove function TaskScheduler.killAllTaskAttempts [spark]

2023-11-29 Thread via GitHub
mridulm commented on code in PR #43954: URL: https://github.com/apache/spark/pull/43954#discussion_r1409488053 ## core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala: ## @@ -1871,21 +1871,6 @@ private[spark] class DAGScheduler( markStageAsFinis

Re: [PR] [SPARK-46052][CORE] Remove function TaskScheduler.killAllTaskAttempts [spark]

2023-11-29 Thread via GitHub
mridulm commented on code in PR #43954: URL: https://github.com/apache/spark/pull/43954#discussion_r1409491574 ## core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala: ## @@ -2860,6 +2844,11 @@ private[spark] class DAGScheduler( if (runningStages.contains

Re: [PR] [SPARK-46040][SQL][Python] Update UDTF API for 'analyze' partitioning/ordering columns to support general expressions [spark]

2023-11-29 Thread via GitHub
dtenedor commented on code in PR #43946: URL: https://github.com/apache/spark/pull/43946#discussion_r1409512037 ## sql/core/src/main/scala/org/apache/spark/sql/UDTFRegistration.scala: ## @@ -44,6 +45,7 @@ class UDTFRegistration private[sql] (tableFunctionRegistry: TableFunction

Re: [PR] [SPARK-45746][Python] Return specific error messages if UDTF 'analyze' or 'eval' method accepts or returns wrong values [spark]

2023-11-29 Thread via GitHub
dtenedor commented on code in PR #43611: URL: https://github.com/apache/spark/pull/43611#discussion_r1409526093 ## python/pyspark/sql/worker/analyze_udtf.py: ## @@ -116,12 +118,89 @@ def main(infile: IO, outfile: IO) -> None: handler = read_udtf(infile) args, k

Re: [PR] [SPARK-46141][SQL] Change default for spark.sql.legacy.ctePrecedencePolicy to CORRECTED [spark]

2023-11-29 Thread via GitHub
cloud-fan commented on PR #44058: URL: https://github.com/apache/spark/pull/44058#issuecomment-1832223039 thanks, merging to master! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

Re: [PR] [WIP][Spark 38473] use error classes in org.apache.spark.scheduler [spark]

2023-11-29 Thread via GitHub
asmitalim closed pull request #43941: [WIP][Spark 38473] use error classes in org.apache.spark.scheduler URL: https://github.com/apache/spark/pull/43941 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] [SPARK-46141][SQL] Change default for spark.sql.legacy.ctePrecedencePolicy to CORRECTED [spark]

2023-11-29 Thread via GitHub
cloud-fan closed pull request #44058: [SPARK-46141][SQL] Change default for spark.sql.legacy.ctePrecedencePolicy to CORRECTED URL: https://github.com/apache/spark/pull/44058 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] [SPARK-46171][SQL][PYTHON][R][DOCS] Remove `HiveContext` from Apache Spark 4.0 [spark]

2023-11-29 Thread via GitHub
dongjoon-hyun commented on code in PR #44077: URL: https://github.com/apache/spark/pull/44077#discussion_r1409558060 ## python/pyspark/sql/context.py: ## @@ -311,6 +312,24 @@ def registerJavaFunction( ) return self.sparkSession.udf.registerJavaFunction(name, ja

Re: [PR] [SPARK-46171][SQL][PYTHON][R][DOCS] Remove `HiveContext` from Apache Spark 4.0 [spark]

2023-11-29 Thread via GitHub
LuciferYang commented on code in PR #44077: URL: https://github.com/apache/spark/pull/44077#discussion_r1409563409 ## python/pyspark/sql/context.py: ## @@ -311,6 +312,24 @@ def registerJavaFunction( ) return self.sparkSession.udf.registerJavaFunction(name, java

Re: [PR] [SPARK-46171][SQL][PYTHON][R][DOCS] Remove `HiveContext` from Apache Spark 4.0 [spark]

2023-11-29 Thread via GitHub
dongjoon-hyun commented on PR #44077: URL: https://github.com/apache/spark/pull/44077#issuecomment-1832316270 I agree with you, @LuciferYang . Thank you for the heads-up email. Since we need an official vote for this specific API, I replied your email thread. Let's see the community at

Re: [PR] [SPARK-46171][SQL][PYTHON][R][DOCS] Remove `HiveContext` from Apache Spark 4.0 [spark]

2023-11-29 Thread via GitHub
LuciferYang commented on code in PR #44077: URL: https://github.com/apache/spark/pull/44077#discussion_r1409591501 ## python/pyspark/sql/context.py: ## @@ -311,6 +312,24 @@ def registerJavaFunction( ) return self.sparkSession.udf.registerJavaFunction(name, java

Re: [PR] [SPARK-46170][SQL] Support inject adaptive query post planner strategy rules in SparkSessionExtensions [spark]

2023-11-29 Thread via GitHub
dongjoon-hyun commented on code in PR #44074: URL: https://github.com/apache/spark/pull/44074#discussion_r1409595327 ## sql/core/src/main/scala/org/apache/spark/sql/SparkSessionExtensions.scala: ## @@ -114,12 +115,15 @@ class SparkSessionExtensions { type ColumnarRuleBuilder

Re: [PR] [SPARK-46170][SQL] Support inject adaptive query post planner strategy rules in SparkSessionExtensions [spark]

2023-11-29 Thread via GitHub
dongjoon-hyun commented on code in PR #44074: URL: https://github.com/apache/spark/pull/44074#discussion_r1409595327 ## sql/core/src/main/scala/org/apache/spark/sql/SparkSessionExtensions.scala: ## @@ -114,12 +115,15 @@ class SparkSessionExtensions { type ColumnarRuleBuilder

Re: [PR] [SPARK-46170][SQL] Support inject adaptive query post planner strategy rules in SparkSessionExtensions [spark]

2023-11-29 Thread via GitHub
dongjoon-hyun commented on code in PR #44074: URL: https://github.com/apache/spark/pull/44074#discussion_r1409597388 ## sql/core/src/main/scala/org/apache/spark/sql/SparkSessionExtensions.scala: ## @@ -114,12 +115,15 @@ class SparkSessionExtensions { type ColumnarRuleBuilder

Re: [PR] [SPARK-46170][SQL] Support inject adaptive query post planner strategy rules in SparkSessionExtensions [spark]

2023-11-29 Thread via GitHub
dongjoon-hyun commented on code in PR #44074: URL: https://github.com/apache/spark/pull/44074#discussion_r1409598151 ## sql/core/src/main/scala/org/apache/spark/sql/SparkSessionExtensions.scala: ## @@ -149,6 +153,14 @@ class SparkSessionExtensions { queryStageOptimizerRuleB

Re: [PR] [SPARK-46170][SQL] Support inject adaptive query post planner strategy rules in SparkSessionExtensions [spark]

2023-11-29 Thread via GitHub
dongjoon-hyun commented on code in PR #44074: URL: https://github.com/apache/spark/pull/44074#discussion_r1409598886 ## sql/core/src/main/scala/org/apache/spark/sql/SparkSessionExtensions.scala: ## @@ -185,6 +197,15 @@ class SparkSessionExtensions { queryStageOptimizerRuleB

Re: [PR] [SPARK-46155][INFRA] Upgrade github-script action to v7 [spark]

2023-11-29 Thread via GitHub
dongjoon-hyun closed pull request #44071: [SPARK-46155][INFRA] Upgrade github-script action to v7 URL: https://github.com/apache/spark/pull/44071 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-46155][INFRA] Upgrade github-script action to v7 [spark]

2023-11-29 Thread via GitHub
dongjoon-hyun commented on PR #44071: URL: https://github.com/apache/spark/pull/44071#issuecomment-1832340675 Merged to master. Thank you, @panbingkun and @LuciferYang . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] [SPARK-46152][SQL] XML: Add DecimalType support in XML schema inference [spark]

2023-11-29 Thread via GitHub
shujingyang-db commented on code in PR #44069: URL: https://github.com/apache/spark/pull/44069#discussion_r1409666292 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/xml/XmlInferSchema.scala: ## @@ -39,6 +40,8 @@ class XmlInferSchema(options: XmlOptions, caseSensiti

[PR] [SPARK-46174][BUILD] Upgrade `gcs-connector` to 2.2.18 [spark]

2023-11-29 Thread via GitHub
dongjoon-hyun opened a new pull request, #44081: URL: https://github.com/apache/spark/pull/44081 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### H

Re: [PR] [SPARK-46171][SQL][PYTHON][R][DOCS] Remove `HiveContext` from Apache Spark 4.0 [spark]

2023-11-29 Thread via GitHub
LuciferYang closed pull request #44077: [SPARK-46171][SQL][PYTHON][R][DOCS] Remove `HiveContext` from Apache Spark 4.0 URL: https://github.com/apache/spark/pull/44077 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR

Re: [PR] [SPARK-46171][SQL][PYTHON][R][DOCS] Remove `HiveContext` from Apache Spark 4.0 [spark]

2023-11-29 Thread via GitHub
LuciferYang commented on PR #44077: URL: https://github.com/apache/spark/pull/44077#issuecomment-1832467193 After carefully reading https://lists.apache.org/thread/mrx0y078cf3ozs7czykvv864y6dr55xq, I have decided to abandon the deletion of HiveContext. As @gatorsmile said, its maintenance

Re: [PR] [SPARK-46040][SQL][Python] Update UDTF API for 'analyze' partitioning/ordering columns to support general expressions [spark]

2023-11-29 Thread via GitHub
ueshin commented on code in PR #43946: URL: https://github.com/apache/spark/pull/43946#discussion_r1409698731 ## sql/core/src/main/scala/org/apache/spark/sql/execution/python/UserDefinedPythonFunction.scala: ## @@ -106,7 +106,7 @@ case class UserDefinedPythonTableFunction(

Re: [PR] [SPARK-46124][CORE][SQL][SS][CONNECT][DSTREAM][MLLIB][ML][PYTHON][R][AVRO][K8S][YARN][UI] Replace explicit `ArrayOps#toSeq` with `s.c.immutable.ArraySeq.unsafeWrapArray` [spark]

2023-11-29 Thread via GitHub
LuciferYang commented on PR #44041: URL: https://github.com/apache/spark/pull/44041#issuecomment-1832475933 @srowen this PR Is ready, should we push it forward? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use t

Re: [PR] [SPARK-46171][SQL][PYTHON][R][DOCS] Remove `HiveContext` from Apache Spark 4.0 [spark]

2023-11-29 Thread via GitHub
dongjoon-hyun commented on PR #44077: URL: https://github.com/apache/spark/pull/44077#issuecomment-1832478448 Thank you for your decision, @LuciferYang . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] [SPARK-45746][Python] Return specific error messages if UDTF 'analyze' or 'eval' method accepts or returns wrong values [spark]

2023-11-29 Thread via GitHub
ueshin closed pull request #43611: [SPARK-45746][Python] Return specific error messages if UDTF 'analyze' or 'eval' method accepts or returns wrong values URL: https://github.com/apache/spark/pull/43611 -- This is an automated message from the Apache Git Service. To respond to the message, pl

Re: [PR] [SPARK-45746][Python] Return specific error messages if UDTF 'analyze' or 'eval' method accepts or returns wrong values [spark]

2023-11-29 Thread via GitHub
ueshin commented on PR #43611: URL: https://github.com/apache/spark/pull/43611#issuecomment-1832510327 The failed tests seem not related to this PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

Re: [PR] [SPARK-45746][Python] Return specific error messages if UDTF 'analyze' or 'eval' method accepts or returns wrong values [spark]

2023-11-29 Thread via GitHub
ueshin commented on PR #43611: URL: https://github.com/apache/spark/pull/43611#issuecomment-1832510620 Thanks! merging to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comm

Re: [PR] [SPARK-45629][CORE][SQL][CONNECT][ML][STREAMING][BUILD][EXAMPLES]Fix `Implicit definition should have explicit type` [spark]

2023-11-29 Thread via GitHub
LuciferYang commented on PR #43526: URL: https://github.com/apache/spark/pull/43526#issuecomment-1832530609 @laglangyue Could you modify the description of this PR again? Need to focus on the description of the `Why are the changes needed?` part. In addition to fixing a compilation w

Re: [PR] [SPARK-46124][CORE][SQL][SS][CONNECT][DSTREAM][MLLIB][ML][PYTHON][R][AVRO][K8S][YARN][UI] Replace explicit `ArrayOps#toSeq` with `s.c.immutable.ArraySeq.unsafeWrapArray` [spark]

2023-11-29 Thread via GitHub
srowen closed pull request #44041: [SPARK-46124][CORE][SQL][SS][CONNECT][DSTREAM][MLLIB][ML][PYTHON][R][AVRO][K8S][YARN][UI] Replace explicit `ArrayOps#toSeq` with `s.c.immutable.ArraySeq.unsafeWrapArray` URL: https://github.com/apache/spark/pull/44041 -- This is an automated message from

Re: [PR] [SPARK-46124][CORE][SQL][SS][CONNECT][DSTREAM][MLLIB][ML][PYTHON][R][AVRO][K8S][YARN][UI] Replace explicit `ArrayOps#toSeq` with `s.c.immutable.ArraySeq.unsafeWrapArray` [spark]

2023-11-29 Thread via GitHub
srowen commented on PR #44041: URL: https://github.com/apache/spark/pull/44041#issuecomment-1832536908 Merged to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To u

Re: [PR] [SPARK-46124][CORE][SQL][SS][CONNECT][DSTREAM][MLLIB][ML][PYTHON][R][AVRO][K8S][YARN][UI] Replace explicit `ArrayOps#toSeq` with `s.c.immutable.ArraySeq.unsafeWrapArray` [spark]

2023-11-29 Thread via GitHub
LuciferYang commented on PR #44041: URL: https://github.com/apache/spark/pull/44041#issuecomment-1832539867 Thanks @srowen -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [SPARK-46170][SQL] Support inject adaptive query post planner strategy rules in SparkSessionExtensions [spark]

2023-11-29 Thread via GitHub
LuciferYang commented on code in PR #44074: URL: https://github.com/apache/spark/pull/44074#discussion_r1409761044 ## sql/core/src/test/scala/org/apache/spark/sql/SparkSessionExtensionSuite.scala: ## @@ -516,6 +518,33 @@ class SparkSessionExtensionSuite extends SparkFunSuite wi

Re: [PR] [SPARK-46170][SQL] Support inject adaptive query post planner strategy rules in SparkSessionExtensions [spark]

2023-11-29 Thread via GitHub
LuciferYang commented on code in PR #44074: URL: https://github.com/apache/spark/pull/44074#discussion_r1409762593 ## sql/core/src/test/scala/org/apache/spark/sql/SparkSessionExtensionSuite.scala: ## @@ -516,6 +518,33 @@ class SparkSessionExtensionSuite extends SparkFunSuite wi

[PR] [SPARK-45962][SQL] Remove treatEmptyValuesAsNulls and use nullValue option instead in XML [spark]

2023-11-29 Thread via GitHub
shujingyang-db opened a new pull request, #44082: URL: https://github.com/apache/spark/pull/44082 ### What changes were proposed in this pull request? This is a follow-up PR on [43852](https://github.com/apache/spark/pull/43852). It resolves the issue when handling whitespace

Re: [PR] [SPARK-46170][SQL] Support inject adaptive query post planner strategy rules in SparkSessionExtensions [spark]

2023-11-29 Thread via GitHub
LuciferYang commented on code in PR #44074: URL: https://github.com/apache/spark/pull/44074#discussion_r1409772748 ## sql/core/src/test/scala/org/apache/spark/sql/SparkSessionExtensionSuite.scala: ## @@ -516,6 +518,33 @@ class SparkSessionExtensionSuite extends SparkFunSuite wi

Re: [PR] [SPARK-46170][SQL] Support inject adaptive query post planner strategy rules in SparkSessionExtensions [spark]

2023-11-29 Thread via GitHub
LuciferYang commented on code in PR #44074: URL: https://github.com/apache/spark/pull/44074#discussion_r1409781928 ## sql/core/src/test/scala/org/apache/spark/sql/SparkSessionExtensionSuite.scala: ## @@ -516,6 +518,33 @@ class SparkSessionExtensionSuite extends SparkFunSuite wi

Re: [PR] [SPARK-46170][SQL] Support inject adaptive query post planner strategy rules in SparkSessionExtensions [spark]

2023-11-29 Thread via GitHub
LuciferYang commented on code in PR #44074: URL: https://github.com/apache/spark/pull/44074#discussion_r1409794398 ## sql/core/src/test/scala/org/apache/spark/sql/SparkSessionExtensionSuite.scala: ## @@ -516,6 +518,33 @@ class SparkSessionExtensionSuite extends SparkFunSuite wi

Re: [PR] [SPARK-46075][CONNECT] Improvements to SparkConnectSessionManager [spark]

2023-11-29 Thread via GitHub
juliuszsompolski commented on code in PR #43985: URL: https://github.com/apache/spark/pull/43985#discussion_r1409797975 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SessionHolder.scala: ## @@ -186,7 +196,13 @@ case class SessionHolder(userId: S

Re: [PR] [SPARK-46075][CONNECT] Improvements to SparkConnectSessionManager [spark]

2023-11-29 Thread via GitHub
juliuszsompolski commented on code in PR #43985: URL: https://github.com/apache/spark/pull/43985#discussion_r1409807117 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SessionHolder.scala: ## @@ -205,14 +220,26 @@ case class SessionHolder(userId:

Re: [PR] [SPARK-46075][CONNECT] Improvements to SparkConnectSessionManager [spark]

2023-11-29 Thread via GitHub
juliuszsompolski commented on code in PR #43985: URL: https://github.com/apache/spark/pull/43985#discussion_r1409826101 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectSessionManager.scala: ## @@ -95,47 +92,134 @@ class SparkConnectSes

Re: [PR] [SPARK-46174][BUILD] Upgrade `gcs-connector` to 2.2.18 [spark]

2023-11-29 Thread via GitHub
dongjoon-hyun commented on PR #44081: URL: https://github.com/apache/spark/pull/44081#issuecomment-1832661787 Thank you, @LuciferYang . Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] [SPARK-46174][BUILD] Upgrade `gcs-connector` to 2.2.18 [spark]

2023-11-29 Thread via GitHub
dongjoon-hyun closed pull request #44081: [SPARK-46174][BUILD] Upgrade `gcs-connector` to 2.2.18 URL: https://github.com/apache/spark/pull/44081 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

Re: [PR] [SPARK-46172][SQL][DOCS] Describe arguments of `encode()` [spark]

2023-11-29 Thread via GitHub
dongjoon-hyun closed pull request #44078: [SPARK-46172][SQL][DOCS] Describe arguments of `encode()` URL: https://github.com/apache/spark/pull/44078 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

Re: [PR] [SPARK-46169][PS] Assign appropriate JIRA numbers for missing parameters from `DataFrame` API. [spark]

2023-11-29 Thread via GitHub
dongjoon-hyun closed pull request #44073: [SPARK-46169][PS] Assign appropriate JIRA numbers for missing parameters from `DataFrame` API. URL: https://github.com/apache/spark/pull/44073 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Git

[PR] created rough service [spark]

2023-11-29 Thread via GitHub
thesauravshukla opened a new pull request, #44083: URL: https://github.com/apache/spark/pull/44083 All the Pod events are stored in a queue. The sslContext creation and error handling upon closing need to be modified. Tests also need to be looked at. -- This is an automated message from t

Re: [PR] [SPARK-45267][PS][FOLLOWUP] Remove duplicated test [spark]

2023-11-29 Thread via GitHub
dongjoon-hyun commented on PR #44072: URL: https://github.com/apache/spark/pull/44072#issuecomment-1832685466 +1, LGTM because `FrameComputeMixin` has it. Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

Re: [PR] [SPARK-45267][PS][FOLLOWUP] Remove duplicated test [spark]

2023-11-29 Thread via GitHub
dongjoon-hyun closed pull request #44072: [SPARK-45267][PS][FOLLOWUP] Remove duplicated test URL: https://github.com/apache/spark/pull/44072 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

Re: [PR] [SPARK-45861][PYTHON][DOCS] Add user guide for dataframe creation [spark]

2023-11-29 Thread via GitHub
allanf-db commented on code in PR #43897: URL: https://github.com/apache/spark/pull/43897#discussion_r1409861283 ## python/docs/source/user_guide/sql/index.rst: ## @@ -16,13 +16,14 @@ under the License. -= -Spark SQL -= += +DataFrame and

Re: [PR] [SPARK-46145][SQL] spark.catalog.listTables does not throw exception when the table or view is not found [spark]

2023-11-29 Thread via GitHub
amaliujia commented on code in PR #44061: URL: https://github.com/apache/spark/pull/44061#discussion_r1409863040 ## sql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala: ## @@ -146,38 +147,44 @@ class CatalogImpl(sparkSession: SparkSession) extends Catalog {

Re: [PR] [SPARK-45861][PYTHON][DOCS] Add user guide for dataframe creation [spark]

2023-11-29 Thread via GitHub
allanf-db commented on code in PR #43897: URL: https://github.com/apache/spark/pull/43897#discussion_r1409864256 ## python/docs/source/user_guide/sql/dataframe_creation.rst: ## @@ -0,0 +1,239 @@ +.. Licensed to the Apache Software Foundation (ASF) under one +or more contrib

Re: [PR] [SPARK-46075][CONNECT] Improvements to SparkConnectSessionManager [spark]

2023-11-29 Thread via GitHub
juliuszsompolski commented on code in PR #43985: URL: https://github.com/apache/spark/pull/43985#discussion_r1409864334 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectSessionManager.scala: ## @@ -95,47 +92,134 @@ class SparkConnectSes

Re: [PR] [SPARK-45861][PYTHON][DOCS] Add user guide for dataframe creation [spark]

2023-11-29 Thread via GitHub
allisonwang-db commented on code in PR #43897: URL: https://github.com/apache/spark/pull/43897#discussion_r1409849131 ## python/docs/source/user_guide/sql/dataframe_creation.rst: ## @@ -0,0 +1,239 @@ +.. Licensed to the Apache Software Foundation (ASF) under one +or more co

Re: [PR] [SPARK-46075][CONNECT] Improvements to SparkConnectSessionManager [spark]

2023-11-29 Thread via GitHub
juliuszsompolski commented on code in PR #43985: URL: https://github.com/apache/spark/pull/43985#discussion_r1409865055 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SessionHolder.scala: ## @@ -205,14 +220,26 @@ case class SessionHolder(userId:

Re: [PR] [SPARK-46075][CONNECT] Improvements to SparkConnectSessionManager [spark]

2023-11-29 Thread via GitHub
juliuszsompolski commented on code in PR #43985: URL: https://github.com/apache/spark/pull/43985#discussion_r1409826101 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectSessionManager.scala: ## @@ -95,47 +92,134 @@ class SparkConnectSes

Re: [PR] [SPARK-46108][SQL] keepInnerXmlAsRaw option for Built-in XML Data Source [spark]

2023-11-29 Thread via GitHub
shujingyang-db commented on PR #44022: URL: https://github.com/apache/spark/pull/44022#issuecomment-1832720439 @ufuksungu Thanks for working on this! I'd like to discuss the proposed use case for this feature. We might have a more straight-forward alternative. For instance, in the `Person`

Re: [PR] [SPARK-46075][CONNECT] Improvements to SparkConnectSessionManager [spark]

2023-11-29 Thread via GitHub
rangadi commented on code in PR #43985: URL: https://github.com/apache/spark/pull/43985#discussion_r1409890157 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/service/SparkConnectSessionManager.scala: ## @@ -95,47 +92,134 @@ class SparkConnectSessionManag

Re: [PR] created rough service [spark]

2023-11-29 Thread via GitHub
dongjoon-hyun commented on PR #44083: URL: https://github.com/apache/spark/pull/44083#issuecomment-1832742928 Please follow the community guideline by filing a proper JIRA and fill the PR template, @thesauravshukla . -- This is an automated message from the Apache Git Service. To respond

Re: [PR] created rough service [spark]

2023-11-29 Thread via GitHub
dongjoon-hyun closed pull request #44083: created rough service URL: https://github.com/apache/spark/pull/44083 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe,

Re: [PR] [SPARK-46133][CORE][DOCS] Make `ShuffleWriteProcessor.write` method description up-to-date [spark]

2023-11-29 Thread via GitHub
dongjoon-hyun closed pull request #44054: [SPARK-46133][CORE][DOCS] Make `ShuffleWriteProcessor.write` method description up-to-date URL: https://github.com/apache/spark/pull/44054 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] [SPARK-46133][CORE][DOCS] Make `ShuffleWriteProcessor.write` method description up-to-date [spark]

2023-11-29 Thread via GitHub
dongjoon-hyun commented on PR #44054: URL: https://github.com/apache/spark/pull/44054#issuecomment-1832746332 Thank you, @zwangsheng and @yaooqinn . Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] [SPARK-46060][BUILD][TESTS] Upgrade `MySQL/MariaDB/PostgreSQL/DB2` test dependencies [spark]

2023-11-29 Thread via GitHub
dongjoon-hyun closed pull request #43963: [SPARK-46060][BUILD][TESTS] Upgrade `MySQL/MariaDB/PostgreSQL/DB2` test dependencies URL: https://github.com/apache/spark/pull/43963 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and us

[PR] Add CrossDbmsQueryTestSuites, which allows generating golden files with Postgres/other DBMS [spark]

2023-11-29 Thread via GitHub
andylam-db opened a new pull request, #44084: URL: https://github.com/apache/spark/pull/44084 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? No. ### H

Re: [PR] [SPARK-45959][SQL] Improving performance when addition of 1 column at a time causes increase in the LogicalPlan tree depth [spark]

2023-11-29 Thread via GitHub
ahshahid commented on PR #43854: URL: https://github.com/apache/spark/pull/43854#issuecomment-1832769219 @JoshRosen . : Sorry I did not notice you left comments.. For some reason I do not receive any emails on any PR review comments. Yes I completely agree that these huge tree plans caus

Re: [PR] [SPARK-23607][CORE] Use HDFS extended attributes to store application summary information in SHS [spark]

2023-11-29 Thread via GitHub
dongjoon-hyun commented on PR #43939: URL: https://github.com/apache/spark/pull/43939#issuecomment-1832773612 cc @mridulm because he was the original reviewer from #40949 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and us

Re: [PR] [SPARK-45959][SQL] Improving performance when addition of 1 column at a time causes increase in the LogicalPlan tree depth [spark]

2023-11-29 Thread via GitHub
ahshahid commented on PR #43854: URL: https://github.com/apache/spark/pull/43854#issuecomment-1832781990 @JoshRosen So the behaviour of caching per se does not change. It is the retrieval and massaging of the cached plan before returning to the lookup api which changes. -- This is an au

Re: [PR] [SPARK-46108][SQL] keepInnerXmlAsRaw option for Built-in XML Data Source [spark]

2023-11-29 Thread via GitHub
ufuksungu commented on PR #44022: URL: https://github.com/apache/spark/pull/44022#issuecomment-1832801629 @shujingyang-db Hey! Thanks for the feedback. I might have misunderstood your scenario, so please correct me if i am wrong. keepInnerXmlAsRaw allows the schema of the 'person' field

  1   2   >