date:20230410

[GitHub] [spark] LuciferYang opened a new pull request, #40737: [SPARK-43093][SQL][TESTS] Refactor `Add a directory when spark.sql.legacy.addSingleFileInAddFile set to false` to use random directories

2023-04-10 Thread via GitHub

LuciferYang opened a new pull request, #40737: URL: https://github.com/apache/spark/pull/40737 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ###

[GitHub] [spark] amaliujia commented on a diff in pull request #40693: [SPARK-43058] Move Numeric and Fractional to PhysicalDataType

2023-04-10 Thread via GitHub

amaliujia commented on code in PR #40693: URL: https://github.com/apache/spark/pull/40693#discussion_r1162331907 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala: ## @@ -820,11 +821,13 @@ case class Divide( } private lazy val

[GitHub] [spark] amaliujia commented on a diff in pull request #40693: [SPARK-43058] Move Numeric and Fractional to PhysicalDataType

2023-04-10 Thread via GitHub

amaliujia commented on code in PR #40693: URL: https://github.com/apache/spark/pull/40693#discussion_r1162330879 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala: ## @@ -820,11 +821,13 @@ case class Divide( } private lazy val

[GitHub] [spark] amaliujia commented on a diff in pull request #40693: [SPARK-43058] Move Numeric and Fractional to PhysicalDataType

2023-04-10 Thread via GitHub

amaliujia commented on code in PR #40693: URL: https://github.com/apache/spark/pull/40693#discussion_r1162330656 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/decimalExpressions.scala: ## @@ -273,7 +274,8 @@ case class DecimalDivideWithOverflowCheck(

[GitHub] [spark] amaliujia commented on a diff in pull request #40693: [SPARK-43058] Move Numeric and Fractional to PhysicalDataType

2023-04-10 Thread via GitHub

amaliujia commented on code in PR #40693: URL: https://github.com/apache/spark/pull/40693#discussion_r1162330498 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala: ## @@ -820,11 +821,13 @@ case class Divide( } private lazy val

[GitHub] [spark] WweiL commented on pull request #40691: [SPARK-43031] [SS] [Connect] Enable unit test and doctest for streaming

2023-04-10 Thread via GitHub

WweiL commented on PR #40691: URL: https://github.com/apache/spark/pull/40691#issuecomment-1502715640 Hi @HyukjinKwon could you please take another look? Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] pengzhon-db opened a new pull request, #40736: [SPARK-43084] [SS] Add applyInPandasWithState support for spark connect

2023-04-10 Thread via GitHub

pengzhon-db opened a new pull request, #40736: URL: https://github.com/apache/spark/pull/40736 ### What changes were proposed in this pull request? This change adds applyInPandasWithState support for Spark connect. Example (try with local mode `./bin/pyspark --remote

[GitHub] [spark] LuciferYang opened a new pull request, #40735: [SPARK-43092][CONNECT] Clean up unimplemented `dropDuplicatesWithinWatermark` series functions from `Dataset`

2023-04-10 Thread via GitHub

LuciferYang opened a new pull request, #40735: URL: https://github.com/apache/spark/pull/40735 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ###

[GitHub] [spark] cloud-fan commented on a diff in pull request #40693: [SPARK-43058] Move Numeric and Fractional to PhysicalDataType

2023-04-10 Thread via GitHub

cloud-fan commented on code in PR #40693: URL: https://github.com/apache/spark/pull/40693#discussion_r1162318834 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/decimalExpressions.scala: ## @@ -273,7 +274,8 @@ case class DecimalDivideWithOverflowCheck(

[GitHub] [spark] cloud-fan commented on a diff in pull request #40693: [SPARK-43058] Move Numeric and Fractional to PhysicalDataType

2023-04-10 Thread via GitHub

cloud-fan commented on code in PR #40693: URL: https://github.com/apache/spark/pull/40693#discussion_r1162318609 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala: ## @@ -820,11 +821,13 @@ case class Divide( } private lazy val

[GitHub] [spark] LuciferYang commented on pull request #40721: [SPARK-43080][BUILD] Upgrade `zstd-jni` to 1.5.5-1

2023-04-10 Thread via GitHub

LuciferYang commented on PR #40721: URL: https://github.com/apache/spark/pull/40721#issuecomment-1502708533 > New results look reasonable. I have been in a team meeting this morning. It seems that the results of `ZStandardBenchmark` are somewhat related to the CPU model. --

[GitHub] [spark] cloud-fan commented on a diff in pull request #40693: [SPARK-43058] Move Numeric and Fractional to PhysicalDataType

2023-04-10 Thread via GitHub

cloud-fan commented on code in PR #40693: URL: https://github.com/apache/spark/pull/40693#discussion_r1162310377 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala: ## @@ -821,10 +822,11 @@ case class Divide( private lazy val div:

[GitHub] [spark] cloud-fan commented on a diff in pull request #40693: [SPARK-43058] Move Numeric and Fractional to PhysicalDataType

2023-04-10 Thread via GitHub

cloud-fan commented on code in PR #40693: URL: https://github.com/apache/spark/pull/40693#discussion_r1162309743 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala: ## @@ -821,10 +822,11 @@ case class Divide( private lazy val div:

[GitHub] [spark] cloud-fan commented on a diff in pull request #40693: [SPARK-43058] Move Numeric and Fractional to PhysicalDataType

2023-04-10 Thread via GitHub

cloud-fan commented on code in PR #40693: URL: https://github.com/apache/spark/pull/40693#discussion_r1162309432 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala: ## @@ -821,10 +822,11 @@ case class Divide( private lazy val div:

[GitHub] [spark] ryan-johnson-databricks commented on a diff in pull request #40677: [SPARK-43039][SQL] Support custom fields in the file source _metadata column.

2023-04-10 Thread via GitHub

ryan-johnson-databricks commented on code in PR #40677: URL: https://github.com/apache/spark/pull/40677#discussion_r1162288743 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormat.scala: ## @@ -176,6 +186,23 @@ trait FileFormat { * By default all

[GitHub] [spark] aokolnychyi commented on pull request #40734: [SPARK-43088][SQL] Respect RequiresDistributionAndOrdering in CTAS/RTAS

2023-04-10 Thread via GitHub

aokolnychyi commented on PR #40734: URL: https://github.com/apache/spark/pull/40734#issuecomment-1502642472 @huaxingao @cloud-fan @dongjoon-hyun @sunchao @viirya @gengliangwang, could you take a look at the approach used in this PR and let me know what you think? If it seems reasonable,

[GitHub] [spark] aokolnychyi commented on a diff in pull request #40734: [SPARK-43088][SQL] Respect RequiresDistributionAndOrdering in CTAS/RTAS

2023-04-10 Thread via GitHub

aokolnychyi commented on code in PR #40734: URL: https://github.com/apache/spark/pull/40734#discussion_r1162267054 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/V2StageTables.scala: ## @@ -0,0 +1,80 @@ +/* + * Licensed to the Apache Software

[GitHub] [spark] cloud-fan commented on a diff in pull request #40707: [SPARK-43033][SQL] Avoid task retries due to AssertNotNull checks

2023-04-10 Thread via GitHub

cloud-fan commented on code in PR #40707: URL: https://github.com/apache/spark/pull/40707#discussion_r1162266420 ## core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala: ## @@ -929,6 +929,13 @@ private[spark] class TaskSetManager( info.id,

[GitHub] [spark] cloud-fan commented on a diff in pull request #40707: [SPARK-43033][SQL] Avoid task retries due to AssertNotNull checks

2023-04-10 Thread via GitHub

cloud-fan commented on code in PR #40707: URL: https://github.com/apache/spark/pull/40707#discussion_r1162266204 ## core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala: ## @@ -929,6 +929,13 @@ private[spark] class TaskSetManager( info.id,

[GitHub] [spark] aokolnychyi commented on a diff in pull request #40734: [SPARK-43088][SQL] Respect RequiresDistributionAndOrdering in CTAS/RTAS

2023-04-10 Thread via GitHub

aokolnychyi commented on code in PR #40734: URL: https://github.com/apache/spark/pull/40734#discussion_r1162265053 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Strategy.scala: ## @@ -184,19 +175,23 @@ class DataSourceV2Strategy(session:

[GitHub] [spark] cloud-fan commented on a diff in pull request #40707: [SPARK-43033][SQL] Avoid task retries due to AssertNotNull checks

2023-04-10 Thread via GitHub

cloud-fan commented on code in PR #40707: URL: https://github.com/apache/spark/pull/40707#discussion_r1162264814 ## core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala: ## @@ -929,6 +929,13 @@ private[spark] class TaskSetManager( info.id,

[GitHub] [spark] aokolnychyi commented on a diff in pull request #40734: [SPARK-43088][SQL] Respect RequiresDistributionAndOrdering in CTAS/RTAS

2023-04-10 Thread via GitHub

aokolnychyi commented on code in PR #40734: URL: https://github.com/apache/spark/pull/40734#discussion_r1162264468 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Strategy.scala: ## @@ -99,16 +100,6 @@ class DataSourceV2Strategy(session:

[GitHub] [spark] aokolnychyi opened a new pull request, #40734: [SPARK-43088][SQL] Respect RequiresDistributionAndOrdering in CTAS/RTAS

2023-04-10 Thread via GitHub

aokolnychyi opened a new pull request, #40734: URL: https://github.com/apache/spark/pull/40734 ### What changes were proposed in this pull request? This PR moves table staging during CTAS/RTAS into the optimizer so that the `V2Writes` rule would distribute and order

[GitHub] [spark] cloud-fan commented on a diff in pull request #40707: [SPARK-43033][SQL] Avoid task retries due to AssertNotNull checks

2023-04-10 Thread via GitHub

cloud-fan commented on code in PR #40707: URL: https://github.com/apache/spark/pull/40707#discussion_r1162263526 ## core/src/main/scala/org/apache/spark/SparkException.scala: ## @@ -355,3 +355,24 @@ private[spark] class SparkSQLFeatureNotSupportedException( override def

[GitHub] [spark] HyukjinKwon closed pull request #40733: [SPARK-43089][CONNECT] Redact debug string in UI

2023-04-10 Thread via GitHub

HyukjinKwon closed pull request #40733: [SPARK-43089][CONNECT] Redact debug string in UI URL: https://github.com/apache/spark/pull/40733 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] HyukjinKwon commented on pull request #40733: [SPARK-43089][CONNECT] Redact debug string in UI

2023-04-10 Thread via GitHub

HyukjinKwon commented on PR #40733: URL: https://github.com/apache/spark/pull/40733#issuecomment-1502632324 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] warrenzhu25 commented on pull request #40730: [SPARK-43086][CORE] Support bin pack task scheduling on executors

2023-04-10 Thread via GitHub

warrenzhu25 commented on PR #40730: URL: https://github.com/apache/spark/pull/40730#issuecomment-1502631790 > I understand the intention but there is a chance of instability due to `OutOfDisk` and sometimes `OutOfMemory`. In addition, bin-packed executors could work slower due to the

[GitHub] [spark] cloud-fan commented on a diff in pull request #40677: [SPARK-43039][SQL] Support custom fields in the file source _metadata column.

2023-04-10 Thread via GitHub

cloud-fan commented on code in PR #40677: URL: https://github.com/apache/spark/pull/40677#discussion_r1162257020 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormat.scala: ## @@ -176,6 +186,23 @@ trait FileFormat { * By default all field name is

[GitHub] [spark] wangyum commented on pull request #40731: [SPARK-43087][SQL] Support coalesce buckets in join in AQE

2023-04-10 Thread via GitHub

wangyum commented on PR #40731: URL: https://github.com/apache/spark/pull/40731#issuecomment-1502627560 cc @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [spark] cloud-fan commented on a diff in pull request #40677: [SPARK-43039][SQL] Support custom fields in the file source _metadata column.

2023-04-10 Thread via GitHub

cloud-fan commented on code in PR #40677: URL: https://github.com/apache/spark/pull/40677#discussion_r1162254984 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormat.scala: ## @@ -176,6 +186,23 @@ trait FileFormat { * By default all field name is

[GitHub] [spark] amaliujia commented on a diff in pull request #40693: [SPARK-43058] Move Numeric and Fractional to PhysicalDataType

2023-04-10 Thread via GitHub

amaliujia commented on code in PR #40693: URL: https://github.com/apache/spark/pull/40693#discussion_r1162254891 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala: ## @@ -902,152 +903,191 @@ case class Cast( } // LongConverter -

[GitHub] [spark] cloud-fan commented on a diff in pull request #40677: [SPARK-43039][SQL] Support custom fields in the file source _metadata column.

2023-04-10 Thread via GitHub

cloud-fan commented on code in PR #40677: URL: https://github.com/apache/spark/pull/40677#discussion_r1162251053 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/namedExpressions.scala: ## @@ -554,6 +555,31 @@ object FileSourceMetadataAttribute {

[GitHub] [spark] yaooqinn commented on pull request #40718: [SPARK-43077][SQL] Improve the error message of UNRECOGNIZED_SQL_TYPE

2023-04-10 Thread via GitHub

yaooqinn commented on PR #40718: URL: https://github.com/apache/spark/pull/40718#issuecomment-1502613838 thanks, merged to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] yaooqinn closed pull request #40718: [SPARK-43077][SQL] Improve the error message of UNRECOGNIZED_SQL_TYPE

2023-04-10 Thread via GitHub

yaooqinn closed pull request #40718: [SPARK-43077][SQL] Improve the error message of UNRECOGNIZED_SQL_TYPE URL: https://github.com/apache/spark/pull/40718 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] WeichenXu123 commented on a diff in pull request #40724: [SPARK-43081] [ML] [CONNECT] Add torch distributor data loader that loads data from spark partition data

2023-04-10 Thread via GitHub

WeichenXu123 commented on code in PR #40724: URL: https://github.com/apache/spark/pull/40724#discussion_r1162236929 ## python/pyspark/ml/torch/distributor.py: ## @@ -744,7 +814,99 @@ def run(self, train_object: Union[Callable, str], *args: Any) -> Optional[Any]:

[GitHub] [spark] WeichenXu123 commented on a diff in pull request #40724: [SPARK-43081] [ML] [CONNECT] Add torch distributor data loader that loads data from spark partition data

2023-04-10 Thread via GitHub

WeichenXu123 commented on code in PR #40724: URL: https://github.com/apache/spark/pull/40724#discussion_r1162236025 ## python/pyspark/ml/torch/distributor.py: ## @@ -744,7 +814,99 @@ def run(self, train_object: Union[Callable, str], *args: Any) -> Optional[Any]:

[GitHub] [spark] WeichenXu123 commented on a diff in pull request #40724: [SPARK-43081] [ML] [CONNECT] Add torch distributor data loader that loads data from spark partition data

2023-04-10 Thread via GitHub

WeichenXu123 commented on code in PR #40724: URL: https://github.com/apache/spark/pull/40724#discussion_r1162235246 ## python/pyspark/ml/tests/connect/test_parity_torch_data_loader.py: ## @@ -0,0 +1,52 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more

[GitHub] [spark] rithwik-db commented on a diff in pull request #40724: [SPARK-43081] [ML] [CONNECT] Add torch distributor data loader that loads data from spark partition data

2023-04-10 Thread via GitHub

rithwik-db commented on code in PR #40724: URL: https://github.com/apache/spark/pull/40724#discussion_r1162233821 ## python/pyspark/ml/torch/distributor.py: ## @@ -744,7 +814,99 @@ def run(self, train_object: Union[Callable, str], *args: Any) -> Optional[Any]:

[GitHub] [spark] rithwik-db commented on a diff in pull request #40724: [SPARK-43081] [ML] [CONNECT] Add torch distributor data loader that loads data from spark partition data

2023-04-10 Thread via GitHub

rithwik-db commented on code in PR #40724: URL: https://github.com/apache/spark/pull/40724#discussion_r1162233394 ## python/pyspark/ml/torch/distributor.py: ## @@ -744,7 +814,99 @@ def run(self, train_object: Union[Callable, str], *args: Any) -> Optional[Any]:

[GitHub] [spark] rithwik-db commented on a diff in pull request #40724: [SPARK-43081] [ML] [CONNECT] Add torch distributor data loader that loads data from spark partition data

2023-04-10 Thread via GitHub

rithwik-db commented on code in PR #40724: URL: https://github.com/apache/spark/pull/40724#discussion_r1162233394 ## python/pyspark/ml/torch/distributor.py: ## @@ -744,7 +814,99 @@ def run(self, train_object: Union[Callable, str], *args: Any) -> Optional[Any]:

[GitHub] [spark] dongjoon-hyun closed pull request #40723: [SPARK-43090][CONNECT][TESTS] Move `withTable` from `RemoteSparkSession` to `SQLHelper`

2023-04-10 Thread via GitHub

dongjoon-hyun closed pull request #40723: [SPARK-43090][CONNECT][TESTS] Move `withTable` from `RemoteSparkSession` to `SQLHelper` URL: https://github.com/apache/spark/pull/40723 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] LuciferYang commented on pull request #40726: [SPARK-42382][BUILD] Upgrade `cyclonedx-maven-plugin` to 2.7.6

2023-04-10 Thread via GitHub

LuciferYang commented on PR #40726: URL: https://github.com/apache/spark/pull/40726#issuecomment-1502587494 late LGTM ~ Thanks @dongjoon-hyun and all ~ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] LuciferYang commented on pull request #40723: [SPARK-43090][CONNECT][TESTS] Move `withTable` from `RemoteSparkSession` to `SQLHelper`

2023-04-10 Thread via GitHub

LuciferYang commented on PR #40723: URL: https://github.com/apache/spark/pull/40723#issuecomment-1502586204 > Could you file a JIRA for this, @LuciferYang ? This contribution looks enough to have a JIRA issue. @dongjoon-hyun thanks for your suggestion ~ created SPARK-43090 -- This

[GitHub] [spark] rithwik-db commented on a diff in pull request #40724: [SPARK-43081] [ML] [CONNECT] Add torch distributor data loader that loads data from spark partition data

2023-04-10 Thread via GitHub

rithwik-db commented on code in PR #40724: URL: https://github.com/apache/spark/pull/40724#discussion_r1162227496 ## python/pyspark/ml/tests/connect/test_parity_torch_data_loader.py: ## @@ -0,0 +1,52 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +#

[GitHub] [spark] cloud-fan commented on a diff in pull request #40693: [SPARK-43058] Move Numeric and Fractional to PhysicalDataType

2023-04-10 Thread via GitHub

cloud-fan commented on code in PR #40693: URL: https://github.com/apache/spark/pull/40693#discussion_r1162225985 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala: ## @@ -902,152 +903,191 @@ case class Cast( } // LongConverter -

[GitHub] [spark] cloud-fan commented on a diff in pull request #40693: [SPARK-43058] Move Numeric and Fractional to PhysicalDataType

2023-04-10 Thread via GitHub

cloud-fan commented on code in PR #40693: URL: https://github.com/apache/spark/pull/40693#discussion_r1162225985 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala: ## @@ -902,152 +903,191 @@ case class Cast( } // LongConverter -

[GitHub] [spark] yaooqinn commented on a diff in pull request #40718: [SPARK-43077][SQL] Improve the error message of UNRECOGNIZED_SQL_TYPE

2023-04-10 Thread via GitHub

yaooqinn commented on code in PR #40718: URL: https://github.com/apache/spark/pull/40718#discussion_r1162219705 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala: ## @@ -177,68 +177,56 @@ object JdbcUtils extends Logging with

[GitHub] [spark] HyukjinKwon opened a new pull request, #40733: [SPARK-43089][CONNECT] Redact debug string in UI

2023-04-10 Thread via GitHub

HyukjinKwon opened a new pull request, #40733: URL: https://github.com/apache/spark/pull/40733 ### What changes were proposed in this pull request? This PR is a followup of https://github.com/apache/spark/pull/40603 which redacts the debug string shown in UI. ### Why are the

[GitHub] [spark] amaliujia commented on a diff in pull request #40693: [SPARK-43058] Move Numeric and Fractional to PhysicalDataType

2023-04-10 Thread via GitHub

amaliujia commented on code in PR #40693: URL: https://github.com/apache/spark/pull/40693#discussion_r1162212156 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala: ## @@ -902,152 +903,191 @@ case class Cast( } // LongConverter -

[GitHub] [spark] amaliujia commented on a diff in pull request #40693: [SPARK-43058] Move Numeric and Fractional to PhysicalDataType

2023-04-10 Thread via GitHub

amaliujia commented on code in PR #40693: URL: https://github.com/apache/spark/pull/40693#discussion_r1162212156 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala: ## @@ -902,152 +903,191 @@ case class Cast( } // LongConverter -

[GitHub] [spark] HyukjinKwon commented on pull request #40603: [MINOR][CONNECT] Adding Proto Debug String to Job Description.

2023-04-10 Thread via GitHub

HyukjinKwon commented on PR #40603: URL: https://github.com/apache/spark/pull/40603#issuecomment-1502562037 Let me make a PR to redact it for now at least. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [spark] HyukjinKwon commented on pull request #40603: [MINOR][CONNECT] Adding Proto Debug String to Job Description.

2023-04-10 Thread via GitHub

HyukjinKwon commented on PR #40603: URL: https://github.com/apache/spark/pull/40603#issuecomment-1502561858 Actually it would also have a security concern as it exposes the local data as is. -- This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [spark] dongjoon-hyun commented on pull request #40685: [SPARK-43050][SQL] Fix construct aggregate expressions by replacing grouping functions

2023-04-10 Thread via GitHub

dongjoon-hyun commented on PR #40685: URL: https://github.com/apache/spark/pull/40685#issuecomment-1502557611 Ya, I think so too~ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] dongjoon-hyun commented on pull request #40685: [SPARK-43050][SQL] Fix construct aggregate expressions by replacing grouping functions

2023-04-10 Thread via GitHub

dongjoon-hyun commented on PR #40685: URL: https://github.com/apache/spark/pull/40685#issuecomment-1502558052 This patch can wait for Apache Spark 3.4.1 and 3.3.3. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [spark] amaliujia commented on a diff in pull request #40693: [SPARK-43058] Move Numeric and Fractional to PhysicalDataType

2023-04-10 Thread via GitHub

amaliujia commented on code in PR #40693: URL: https://github.com/apache/spark/pull/40693#discussion_r1162212156 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala: ## @@ -902,152 +903,191 @@ case class Cast( } // LongConverter -

[GitHub] [spark] cloud-fan commented on pull request #40685: [SPARK-43050][SQL] Fix construct aggregate expressions by replacing grouping functions

2023-04-10 Thread via GitHub

cloud-fan commented on PR #40685: URL: https://github.com/apache/spark/pull/40685#issuecomment-1502553970 Since it's not a regression, we don't need to block 3.4 either. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] cloud-fan commented on a diff in pull request #40693: [SPARK-43058] Move Numeric and Fractional to PhysicalDataType

2023-04-10 Thread via GitHub

cloud-fan commented on code in PR #40693: URL: https://github.com/apache/spark/pull/40693#discussion_r1162208044 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala: ## @@ -902,152 +903,191 @@ case class Cast( } // LongConverter -

[GitHub] [spark] cloud-fan commented on a diff in pull request #40693: [SPARK-43058] Move Numeric and Fractional to PhysicalDataType

2023-04-10 Thread via GitHub

cloud-fan commented on code in PR #40693: URL: https://github.com/apache/spark/pull/40693#discussion_r1162208044 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala: ## @@ -902,152 +903,191 @@ case class Cast( } // LongConverter -

[GitHub] [spark] xinrong-meng commented on pull request #40725: [SPARK-43082][Connect][PYTHON] Arrow-optimized Python UDFs in Spark Connect

2023-04-10 Thread via GitHub

xinrong-meng commented on PR #40725: URL: https://github.com/apache/spark/pull/40725#issuecomment-1502530555 CI failed because of ``` Run echo "APACHE_SPARK_REF=$(git rev-parse HEAD)" >> $GITHUB_ENV fatal: detected dubious ownership in repository at '/__w/spark/spark' To add an

[GitHub] [spark] zhengruifeng commented on a diff in pull request #40695: [SPARK-42994][ML][CONNECT] PyTorch Distributor support Local Mode with GPU

2023-04-10 Thread via GitHub

zhengruifeng commented on code in PR #40695: URL: https://github.com/apache/spark/pull/40695#discussion_r1162185540 ## python/pyspark/ml/torch/distributor.py: ## @@ -548,12 +560,23 @@ def set_torch_config(context: "BarrierTaskContext") -> None:

[GitHub] [spark] WeichenXu123 commented on a diff in pull request #40695: [SPARK-42994][ML][CONNECT] PyTorch Distributor support Local Mode with GPU

2023-04-10 Thread via GitHub

WeichenXu123 commented on code in PR #40695: URL: https://github.com/apache/spark/pull/40695#discussion_r1162186452 ## python/pyspark/ml/torch/tests/test_distributor.py: ## @@ -328,11 +328,11 @@ def test_get_num_tasks_locally(self) -> None: def

[GitHub] [spark] dtenedor opened a new pull request, #40732: [WIP][SPARK-43085][SQL] Support column DEFAULT assignment for multi-part table names

2023-04-10 Thread via GitHub

dtenedor opened a new pull request, #40732: URL: https://github.com/apache/spark/pull/40732 ### What changes were proposed in this pull request? This PR adds support for column DEFAULT assignment for multi-part table names. ### Why are the changes needed? Spark SQL

[GitHub] [spark] WeichenXu123 commented on a diff in pull request #40695: [SPARK-42994][ML][CONNECT] PyTorch Distributor support Local Mode with GPU

2023-04-10 Thread via GitHub

WeichenXu123 commented on code in PR #40695: URL: https://github.com/apache/spark/pull/40695#discussion_r1162185759 ## python/pyspark/ml/torch/distributor.py: ## @@ -535,12 +555,23 @@ def set_torch_config(context: "BarrierTaskContext") -> None:

[GitHub] [spark] zhengruifeng commented on a diff in pull request #40695: [SPARK-42994][ML][CONNECT] PyTorch Distributor support Local Mode with GPU

2023-04-10 Thread via GitHub

zhengruifeng commented on code in PR #40695: URL: https://github.com/apache/spark/pull/40695#discussion_r1162185540 ## python/pyspark/ml/torch/distributor.py: ## @@ -548,12 +560,23 @@ def set_torch_config(context: "BarrierTaskContext") -> None:

[GitHub] [spark] WeichenXu123 commented on a diff in pull request #40695: [SPARK-42994][ML][CONNECT] PyTorch Distributor support Local Mode with GPU

2023-04-10 Thread via GitHub

WeichenXu123 commented on code in PR #40695: URL: https://github.com/apache/spark/pull/40695#discussion_r1162182017 ## python/pyspark/ml/torch/distributor.py: ## @@ -548,12 +560,23 @@ def set_torch_config(context: "BarrierTaskContext") -> None:

[GitHub] [spark] github-actions[bot] closed pull request #37348: [SPARK-39854][SQL] replaceWithAliases should keep the original children for Generate

2023-04-10 Thread via GitHub

github-actions[bot] closed pull request #37348: [SPARK-39854][SQL] replaceWithAliases should keep the original children for Generate URL: https://github.com/apache/spark/pull/37348 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] WeichenXu123 commented on a diff in pull request #40695: [SPARK-42994][ML][CONNECT] PyTorch Distributor support Local Mode with GPU

2023-04-10 Thread via GitHub

WeichenXu123 commented on code in PR #40695: URL: https://github.com/apache/spark/pull/40695#discussion_r1162184392 ## python/pyspark/ml/torch/distributor.py: ## @@ -150,8 +158,18 @@ def __init__( local_mode: bool = True, use_gpu: bool = True, ): -

[GitHub] [spark] WeichenXu123 commented on a diff in pull request #40695: [SPARK-42994][ML][CONNECT] PyTorch Distributor support Local Mode with GPU

2023-04-10 Thread via GitHub

WeichenXu123 commented on code in PR #40695: URL: https://github.com/apache/spark/pull/40695#discussion_r1162183831 ## python/pyspark/ml/torch/distributor.py: ## @@ -501,6 +517,10 @@ def _get_spark_task_function( input_params = self.input_params driver_address

[GitHub] [spark] zhengruifeng commented on pull request #40695: [SPARK-42994][ML][CONNECT] PyTorch Distributor support Local Mode with GPU

2023-04-10 Thread via GitHub

zhengruifeng commented on PR #40695: URL: https://github.com/apache/spark/pull/40695#issuecomment-1502498964 @grundprinzip would you mind taking another look at the changes in protos? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] WeichenXu123 commented on a diff in pull request #40695: [SPARK-42994][ML][CONNECT] PyTorch Distributor support Local Mode with GPU

2023-04-10 Thread via GitHub

WeichenXu123 commented on code in PR #40695: URL: https://github.com/apache/spark/pull/40695#discussion_r1162182017 ## python/pyspark/ml/torch/distributor.py: ## @@ -548,12 +560,23 @@ def set_torch_config(context: "BarrierTaskContext") -> None:

[GitHub] [spark] dongjoon-hyun commented on pull request #40726: [SPARK-42382][BUILD] Upgrade `cyclonedx-maven-plugin` to 2.7.6

2023-04-10 Thread via GitHub

dongjoon-hyun commented on PR #40726: URL: https://github.com/apache/spark/pull/40726#issuecomment-1502484690 Oh, it was intentional https://github.com/apache/spark/pull/40726#pullrequestreview-1378012264, but thank you! Thank you, @HyukjinKwon and @viirya ! -- This is an

[GitHub] [spark] HyukjinKwon commented on pull request #40689: [SPARK-42951][SS][Connect] DataStreamReader APIs

2023-04-10 Thread via GitHub

HyukjinKwon commented on PR #40689: URL: https://github.com/apache/spark/pull/40689#issuecomment-1502484037 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] HyukjinKwon closed pull request #40689: [SPARK-42951][SS][Connect] DataStreamReader APIs

2023-04-10 Thread via GitHub

HyukjinKwon closed pull request #40689: [SPARK-42951][SS][Connect] DataStreamReader APIs URL: https://github.com/apache/spark/pull/40689 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] HyukjinKwon commented on pull request #40726: [SPARK-42382][BUILD] Upgrade `cyclonedx-maven-plugin` to 2.7.6

2023-04-10 Thread via GitHub

HyukjinKwon commented on PR #40726: URL: https://github.com/apache/spark/pull/40726#issuecomment-1502483678 Hm, for some reasons, it shows @LuciferYang as a primary author. I manually changed it to @dongjoon-hyun. -- This is an automated message from the Apache Git Service. To respond to

[GitHub] [spark] HyukjinKwon closed pull request #40726: [SPARK-42382][BUILD] Upgrade `cyclonedx-maven-plugin` to 2.7.6

2023-04-10 Thread via GitHub

HyukjinKwon closed pull request #40726: [SPARK-42382][BUILD] Upgrade `cyclonedx-maven-plugin` to 2.7.6 URL: https://github.com/apache/spark/pull/40726 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [spark] HyukjinKwon commented on pull request #40726: [SPARK-42382][BUILD] Upgrade `cyclonedx-maven-plugin` to 2.7.6

2023-04-10 Thread via GitHub

HyukjinKwon commented on PR #40726: URL: https://github.com/apache/spark/pull/40726#issuecomment-1502482929 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #40691: [SPARK-43031] [SS] [Connect] Enable unit test and doctest for streaming

2023-04-10 Thread via GitHub

HyukjinKwon commented on code in PR #40691: URL: https://github.com/apache/spark/pull/40691#discussion_r1162175072 ## python/pyspark/sql/streaming/query.py: ## @@ -188,7 +192,7 @@ def awaitTermination(self, timeout: Optional[int] = None) -> Optional[bool]: Return

[GitHub] [spark] wangyum opened a new pull request, #40731: [SPARK-43087][SQL] Support coalesce buckets in join in AQE

2023-04-10 Thread via GitHub

wangyum opened a new pull request, #40731: URL: https://github.com/apache/spark/pull/40731 ### What changes were proposed in this pull request? This PR adds `CoalesceBucketsInJoin` to `AdaptiveSparkPlanExec.queryStagePreparationRules`. ### Why are the changes needed?

[GitHub] [spark] warrenzhu25 commented on pull request #40730: [SPARK-43086][CORE] Support bin pack task scheduling on executors

2023-04-10 Thread via GitHub

warrenzhu25 commented on PR #40730: URL: https://github.com/apache/spark/pull/40730#issuecomment-1502463722 @dongjoon-hyun @mridulm @Ngone51 Help take a look? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] dongjoon-hyun closed pull request #40727: [SPARK-43083][SQL][TESTS] Mark `*StateStoreSuite` as `ExtendedSQLTest`

2023-04-10 Thread via GitHub

dongjoon-hyun closed pull request #40727: [SPARK-43083][SQL][TESTS] Mark `*StateStoreSuite` as `ExtendedSQLTest` URL: https://github.com/apache/spark/pull/40727 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [spark] dongjoon-hyun commented on pull request #40727: [SPARK-43083][SQL][TESTS] Mark `*StateStoreSuite` as `ExtendedSQLTest`

2023-04-10 Thread via GitHub

dongjoon-hyun commented on PR #40727: URL: https://github.com/apache/spark/pull/40727#issuecomment-1502463385 I also confirmed the moved `*StateStoreSuite` output in the GitHub Action log on this PR. - https://github.com/dongjoon-hyun/spark/actions/runs/4661381120/jobs/8250624115

[GitHub] [spark] wangyum commented on pull request #40555: [SPARK-42926][BUILD][SQL] Upgrade Parquet to 1.13.0

2023-04-10 Thread via GitHub

wangyum commented on PR #40555: URL: https://github.com/apache/spark/pull/40555#issuecomment-1502458598 > BTW, if you mind, please revise the PR description. > > 1. Removing `Maybe it can improve read performance.` from the PR description. > 2. Coping [[SPARK-42926][BUILD][SQL]

[GitHub] [spark] warrenzhu25 opened a new pull request, #40730: [SPARK-43086][CORE] Support bin pack task scheduling on executors

2023-04-10 Thread via GitHub

warrenzhu25 opened a new pull request, #40730: URL: https://github.com/apache/spark/pull/40730 ### What changes were proposed in this pull request? Support bin pack task scheduling on executors. This is controlled by `spark.scheduler.binPack.enabled` ### Why are the changes

[GitHub] [spark] dongjoon-hyun commented on pull request #40727: [SPARK-43083][SQL][TESTS] Mark `*StateStoreSuite` as `ExtendedSQLTest`

2023-04-10 Thread via GitHub

dongjoon-hyun commented on PR #40727: URL: https://github.com/apache/spark/pull/40727#issuecomment-1502456159 Thank you, @huaxingao ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [spark] gengliangwang closed pull request #40710: [SPARK-43071][SQL] Support SELECT DEFAULT with ORDER BY, LIMIT, OFFSET for INSERT source relation

2023-04-10 Thread via GitHub

gengliangwang closed pull request #40710: [SPARK-43071][SQL] Support SELECT DEFAULT with ORDER BY, LIMIT, OFFSET for INSERT source relation URL: https://github.com/apache/spark/pull/40710 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] gengliangwang commented on pull request #40710: [SPARK-43071][SQL] Support SELECT DEFAULT with ORDER BY, LIMIT, OFFSET for INSERT source relation

2023-04-10 Thread via GitHub

gengliangwang commented on PR #40710: URL: https://github.com/apache/spark/pull/40710#issuecomment-1502451369 Thanks, merging to master/3.4 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] ryan-johnson-databricks commented on a diff in pull request #40677: [SPARK-43039][SQL] Support custom fields in the file source _metadata column.

2023-04-10 Thread via GitHub

ryan-johnson-databricks commented on code in PR #40677: URL: https://github.com/apache/spark/pull/40677#discussion_r1162154097 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileIndex.scala: ## @@ -23,11 +23,30 @@ import

[GitHub] [spark] ryan-johnson-databricks commented on a diff in pull request #40677: [SPARK-43039][SQL] Support custom fields in the file source _metadata column.

2023-04-10 Thread via GitHub

ryan-johnson-databricks commented on code in PR #40677: URL: https://github.com/apache/spark/pull/40677#discussion_r1162154097 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileIndex.scala: ## @@ -23,11 +23,30 @@ import

[GitHub] [spark] ryan-johnson-databricks commented on a diff in pull request #40677: [SPARK-43039][SQL] Support custom fields in the file source _metadata column.

2023-04-10 Thread via GitHub

ryan-johnson-databricks commented on code in PR #40677: URL: https://github.com/apache/spark/pull/40677#discussion_r1162151522 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/namedExpressions.scala: ## @@ -554,6 +554,28 @@ object

[GitHub] [spark] ryan-johnson-databricks commented on a diff in pull request #40677: [SPARK-43039][SQL] Support custom fields in the file source _metadata column.

2023-04-10 Thread via GitHub

ryan-johnson-databricks commented on code in PR #40677: URL: https://github.com/apache/spark/pull/40677#discussion_r1162151522 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/namedExpressions.scala: ## @@ -554,6 +554,28 @@ object

[GitHub] [spark] dongjoon-hyun commented on pull request #40727: [SPARK-43083][SQL][TESTS] Mark `*StateStoreSuite` as `ExtendedSQLTest`

2023-04-10 Thread via GitHub

dongjoon-hyun commented on PR #40727: URL: https://github.com/apache/spark/pull/40727#issuecomment-1502430516 Could you review this PR when you have some time, @huaxingao ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] dongjoon-hyun commented on pull request #40555: [SPARK-42926][BUILD][SQL] Upgrade Parquet to 1.13.0

2023-04-10 Thread via GitHub

dongjoon-hyun commented on PR #40555: URL: https://github.com/apache/spark/pull/40555#issuecomment-1502429680 BTW, if you mind, please revise the PR description. 1. Removing `Maybe it can improve read performance.` from the PR description. 2. Coping

[GitHub] [spark] dongjoon-hyun commented on pull request #40555: [SPARK-42926][BUILD][SQL] Upgrade Parquet to 1.13.0

2023-04-10 Thread via GitHub

dongjoon-hyun commented on PR #40555: URL: https://github.com/apache/spark/pull/40555#issuecomment-1502428223 Thank you for the confirmation. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [spark] wangyum commented on pull request #40555: [SPARK-42926][BUILD][SQL] Upgrade Parquet to 1.13.0

2023-04-10 Thread via GitHub

wangyum commented on PR #40555: URL: https://github.com/apache/spark/pull/40555#issuecomment-1502427643 @dongjoon-hyun Yes. It's no noticeable significant perf difference. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [spark] dongjoon-hyun commented on pull request #40726: [SPARK-42382][BUILD] Upgrade `cyclonedx-maven-plugin` to 2.7.6

2023-04-10 Thread via GitHub

dongjoon-hyun commented on PR #40726: URL: https://github.com/apache/spark/pull/40726#issuecomment-1502427149 Could you review this PR, @viirya ? I verified manually. ``` $ ls -alt total 67688 -rw-r--r--@ 1 dongjoon staff 1955 Apr 10 15:27 maven-metadata-local.xml

[GitHub] [spark] zhenlineo opened a new pull request, #40729: [WIP][CONNECT] Adding groupByKey + mapGroup functions

2023-04-10 Thread via GitHub

zhenlineo opened a new pull request, #40729: URL: https://github.com/apache/spark/pull/40729 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

[GitHub] [spark] dongjoon-hyun commented on pull request #40687: [SPARK-43052][CORE] Handle stacktrace with null file name in event log

2023-04-10 Thread via GitHub

dongjoon-hyun commented on PR #40687: URL: https://github.com/apache/spark/pull/40687#issuecomment-1502424431 Thank you for your answers, @warrenzhu25 . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [spark] warrenzhu25 commented on pull request #40687: [SPARK-43052][CORE] Handle stacktrace with null file name in event log

2023-04-10 Thread via GitHub

warrenzhu25 commented on PR #40687: URL: https://github.com/apache/spark/pull/40687#issuecomment-1502423743 > Do you happen to know when this bug starts, @warrenzhu25 ? Sorry, I have no idea. It's 1st time I have seen this. -- This is an automated message from the Apache Git

[GitHub] [spark] dongjoon-hyun commented on pull request #40687: [SPARK-43052][CORE] Handle stacktrace with null file name in event log

2023-04-10 Thread via GitHub

dongjoon-hyun commented on PR #40687: URL: https://github.com/apache/spark/pull/40687#issuecomment-1502418739 Do you happen to know when this bug starts, @warrenzhu25 ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [spark] warrenzhu25 commented on pull request #40687: [SPARK-43052][CORE] Handle stacktrace with null file name in event log

2023-04-10 Thread via GitHub

warrenzhu25 commented on PR #40687: URL: https://github.com/apache/spark/pull/40687#issuecomment-1502409840 > BTW, according to JIRA, is this a regression at Apache Spark 3.3.2, @warrenzhu25 ? I don't think so. -- This is an automated message from the Apache Git Service. To

1 2 3 >

1 - 100 of 252 matches

Mail list logo