[PR] [SPARK-45915][SQL] Treat decimal(x, 0) the same as IntegralType in `PromoteStrings` [spark]

2023-11-14 Thread via GitHub
wangyum opened a new pull request, #43812: URL: https://github.com/apache/spark/pull/43812 ### What changes were proposed in this pull request? The common type of decimal(x, 0) and string is double. But the common type of int/bigint and string are int/bigint. This PR updates `P

Re: [PR] [SPARK-45919][CORE][SQL] Use Java 16 `record` to simplify Java class definition [spark]

2023-11-14 Thread via GitHub
LuciferYang commented on PR #43796: URL: https://github.com/apache/spark/pull/43796#issuecomment-1811959949 @dongjoon-hyun I want to clarify the issue. We don't want to use `record` here because `field` in the original class doesn't provide an Accessor, but since `record` automatically gene

Re: [PR] [SPARK-45764][PYTHON][DOCS] Make code block copyable [spark]

2023-11-14 Thread via GitHub
HyukjinKwon closed pull request #43799: [SPARK-45764][PYTHON][DOCS] Make code block copyable URL: https://github.com/apache/spark/pull/43799 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

Re: [PR] [SPARK-45764][PYTHON][DOCS] Make code block copyable [spark]

2023-11-14 Thread via GitHub
HyukjinKwon commented on PR #43799: URL: https://github.com/apache/spark/pull/43799#issuecomment-1811958045 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [SPARK-45926][SQL] Implementing equals and hashCode which takes into account pushed runtime filters , in InMemoryTable related scans [spark]

2023-11-14 Thread via GitHub
dongjoon-hyun commented on PR #43808: URL: https://github.com/apache/spark/pull/43808#issuecomment-1811953814 Thank you so much for the details and a upcoming prototype test case. For that issue, you can file a new JIRA issue. > Though I ought to point out that while running my test

Re: [PR] [SPARK-45764][PYTHON][DOCS] Make code block copyable [spark]

2023-11-14 Thread via GitHub
panbingkun commented on PR #43799: URL: https://github.com/apache/spark/pull/43799#issuecomment-1811951239 > @panbingkun shall we update the already released docs? I think it should be possible, but I need to investigate how to add it more easily. 😄 -- This is an automated message

Re: [PR] [SPARK-45919][CORE][SQL] Use Java 16 `record` to simplify Java class definition [spark]

2023-11-14 Thread via GitHub
LuciferYang commented on PR #43796: URL: https://github.com/apache/spark/pull/43796#issuecomment-1811951525 > This is a nice syntax in general, @LuciferYang . > > However, we cannot use this when the class provides information hiding. > > ```java > jshell> private static clas

Re: [PR] [SPARK-45926][SQL] Implementing equals and hashCode which takes into account pushed runtime filters , in InMemoryTable related scans [spark]

2023-11-14 Thread via GitHub
ahshahid commented on PR #43808: URL: https://github.com/apache/spark/pull/43808#issuecomment-1811946724 @dongjoon-hyun I think the reason for not catching the issue of reuse of exchange is a mix of multiple things 1) Spark is not testing with any concrete DataSourceV2 implementation. (

Re: [PR] [SPARK-45919][CORE][SQL] Use Java 16 `record` to simplify Java class definition [spark]

2023-11-14 Thread via GitHub
LuciferYang commented on code in PR #43796: URL: https://github.com/apache/spark/pull/43796#discussion_r1393752725 ## common/network-common/src/main/java/org/apache/spark/network/protocol/StreamChunkId.java: ## @@ -26,14 +26,7 @@ /** * Encapsulates a request for a particular c

Re: [PR] [WIP][SPARK-44098][INFRA] Introduce python breaking change detection [spark]

2023-11-14 Thread via GitHub
zhengruifeng commented on PR #42125: URL: https://github.com/apache/spark/pull/42125#issuecomment-1811928579 > > the python linter fails with > > ``` > > Python compilation failed with the following errors: > > *** Error compiling 'dev/aexpy/aexpy/diffing/evaluators/typing.py'...

Re: [PR] [SPARK-45919][CORE][SQL] Use Java 16 `record` to simplify Java class definition [spark]

2023-11-14 Thread via GitHub
dongjoon-hyun commented on code in PR #43796: URL: https://github.com/apache/spark/pull/43796#discussion_r1393751280 ## common/network-common/src/main/java/org/apache/spark/network/protocol/StreamChunkId.java: ## @@ -26,14 +26,7 @@ /** * Encapsulates a request for a particular

Re: [PR] [SPARK-43393][SQL] Address sequence expression overflow bug. [spark]

2023-11-14 Thread via GitHub
dongjoon-hyun commented on PR #41072: URL: https://github.com/apache/spark/pull/41072#issuecomment-1811916481 Thank you, @thepinetree and @cloud-fan . Given that this is a long-standing overflow bug, do you think we can have this fix in other live release branches, `branch-3.4` and `branch-

Re: [PR] [SPARK-45764][PYTHON][DOCS] Make code block copyable [spark]

2023-11-14 Thread via GitHub
itholic commented on code in PR #43799: URL: https://github.com/apache/spark/pull/43799#discussion_r1393735691 ## dev/requirements.txt: ## @@ -37,6 +37,7 @@ numpydoc jinja2<3.0.0 sphinx<3.1.0 sphinx-plotly-directive +sphinx-copybutton Review Comment: Sounds good. Thanks!

Re: [PR] [SPARK-45764][PYTHON][DOCS] Make code block copyable [spark]

2023-11-14 Thread via GitHub
zhengruifeng commented on PR #43799: URL: https://github.com/apache/spark/pull/43799#issuecomment-1811910417 @panbingkun shall we update the already released docs? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR

Re: [PR] [SPARK-45904][SQL][CONNECT] Mode function should supports sort with order direction [spark]

2023-11-14 Thread via GitHub
beliefer commented on PR #43786: URL: https://github.com/apache/spark/pull/43786#issuecomment-1811910009 The GA failure is unrelated. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-45908][Python] Add support for writing empty DataFrames to parquet with partitions [spark]

2023-11-14 Thread via GitHub
maintian commented on code in PR #43798: URL: https://github.com/apache/spark/pull/43798#discussion_r1393718772 ## python/pyspark/sql/readwriter.py: ## @@ -1936,7 +1936,23 @@ def parquet( if partitionBy is not None: self.partitionBy(partitionBy) se

Re: [PR] [SPARK-45905][SQL] Least common type between decimal types should retain integral digits first [spark]

2023-11-14 Thread via GitHub
dongjoon-hyun commented on code in PR #43781: URL: https://github.com/apache/spark/pull/43781#discussion_r1393707711 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -4541,6 +4541,15 @@ object SQLConf { .booleanConf .createWithDefau

Re: [PR] [SPARK-45908][Python] Add support for writing empty DataFrames to parquet with partitions [spark]

2023-11-14 Thread via GitHub
maintian commented on code in PR #43798: URL: https://github.com/apache/spark/pull/43798#discussion_r1393707527 ## python/pyspark/sql/readwriter.py: ## @@ -1936,7 +1936,23 @@ def parquet( if partitionBy is not None: self.partitionBy(partitionBy) se

Re: [PR] [SPARK-45592][SPARK-45282][SQL] Correctness issue in AQE with InMemoryTableScanExec [spark]

2023-11-14 Thread via GitHub
dongjoon-hyun commented on PR #43760: URL: https://github.com/apache/spark/pull/43760#issuecomment-1811882080 Gentle ping~, @maryannxue . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

Re: [PR] [SPARK-45905][SQL] Least common type between decimal types should retain integral digits first [spark]

2023-11-14 Thread via GitHub
cloud-fan commented on PR #43781: URL: https://github.com/apache/spark/pull/43781#issuecomment-1811881698 > Is there reproducer that can be added as unit test to show the issue in e2e example? I think the updated tests show the problem. -- This is an automated message from the Apa

Re: [PR] [SPARK-45905][SQL] Least common type between decimal types should retain integral digits first [spark]

2023-11-14 Thread via GitHub
cloud-fan commented on code in PR #43781: URL: https://github.com/apache/spark/pull/43781#discussion_r1393704857 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -4541,6 +4541,15 @@ object SQLConf { .booleanConf .createWithDefault(f

Re: [PR] [SPARK-45905][SQL] Least common type between decimal types should retain integral digits first [spark]

2023-11-14 Thread via GitHub
cloud-fan commented on code in PR #43781: URL: https://github.com/apache/spark/pull/43781#discussion_r1393704651 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -4541,6 +4541,15 @@ object SQLConf { .booleanConf .createWithDefault(f

Re: [PR] [SPARK-45905][SQL] Least common type between decimal types should retain integral digits first [spark]

2023-11-14 Thread via GitHub
dongjoon-hyun commented on code in PR #43781: URL: https://github.com/apache/spark/pull/43781#discussion_r1393703751 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -4541,6 +4541,15 @@ object SQLConf { .booleanConf .createWithDefau

Re: [PR] [SPARK-43393][SQL] Address sequence expression overflow bug. [spark]

2023-11-14 Thread via GitHub
cloud-fan closed pull request #41072: [SPARK-43393][SQL] Address sequence expression overflow bug. URL: https://github.com/apache/spark/pull/41072 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-43393][SQL] Address sequence expression overflow bug. [spark]

2023-11-14 Thread via GitHub
cloud-fan commented on PR #41072: URL: https://github.com/apache/spark/pull/41072#issuecomment-1811877484 thanks, merging to master/3.5! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

Re: [PR] [SPARK-45920][SQL] group by ordinal should be idempotent [spark]

2023-11-14 Thread via GitHub
dongjoon-hyun commented on PR #43797: URL: https://github.com/apache/spark/pull/43797#issuecomment-1811877168 BTW, the first commit failed with the following relevant failure. Could you double-check that because the last commit seems irrelevant with that failure, @cloud-fan ? https:/

Re: [PR] [SPARK-45908][Python] Add support for writing empty DataFrames to parquet with partitions [spark]

2023-11-14 Thread via GitHub
HyukjinKwon commented on code in PR #43798: URL: https://github.com/apache/spark/pull/43798#discussion_r1393699124 ## python/pyspark/sql/readwriter.py: ## @@ -1936,7 +1936,23 @@ def parquet( if partitionBy is not None: self.partitionBy(partitionBy)

Re: [PR] [SPARK-45927][PYTHON] Update path handling in Python data source [spark]

2023-11-14 Thread via GitHub
cloud-fan commented on code in PR #43809: URL: https://github.com/apache/spark/pull/43809#discussion_r1393654355 ## sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala: ## @@ -246,7 +246,15 @@ class DataFrameReader private[sql](sparkSession: SparkSession) extends

Re: [PR] [SPARK-45924][SQL] Fixing the canonicalization of SubqueryAdaptiveBroadcastExec and making it equivalent with SubqueryBroadcastExec [spark]

2023-11-14 Thread via GitHub
ahshahid commented on PR #43806: URL: https://github.com/apache/spark/pull/43806#issuecomment-1811817310 The other option is that we make the canonicalized form of both SubqueryAdaptiveBroadcastExec and SubqueryBroadcastExec to be of type SubqueryBroadcastExec. that way equals and hashCode

Re: [PR] [SPARK-45925][SQL] Making SubqueryBroadcastExec equivalent to SubqueryAdaptiveBroadcastExec [spark]

2023-11-14 Thread via GitHub
ahshahid commented on PR #43807: URL: https://github.com/apache/spark/pull/43807#issuecomment-1811817204 The other option is that we make the canonicalized form of both SubqueryAdaptiveBroadcastExec and SubqueryBroadcastExec to be of type SubqueryBroadcastExec. that way equals and hashCode

Re: [PR] [SPARK-45927][PYTHON] Update path handling in Python data source [spark]

2023-11-14 Thread via GitHub
cloud-fan commented on code in PR #43809: URL: https://github.com/apache/spark/pull/43809#discussion_r1393642111 ## python/pyspark/sql/datasource.py: ## @@ -45,30 +45,19 @@ class DataSource(ABC): """ @final -def __init__( -self, -paths: List[str],

Re: [PR] [SPARK-45926][SQL] Implementing equals and hashCode which takes into account pushed runtime filters , in InMemoryTable related scans [spark]

2023-11-14 Thread via GitHub
ahshahid commented on PR #43808: URL: https://github.com/apache/spark/pull/43808#issuecomment-1811808992 @HyukjinKwon thanks for correcting the titles of the PRs. will take care next time.. -- This is an automated message from the Apache Git Service. To respond to the message, please log

Re: [PR] [MINOR][DOCS] Correct additional Conda documentation URL to fix 404 errors [spark]

2023-11-14 Thread via GitHub
dead-1ine commented on PR #43794: URL: https://github.com/apache/spark/pull/43794#issuecomment-1811789159 Thank you :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To un

[PR] [SPARK-45931][PYTHON][DOCS] Refine docstring of mapInPandas [spark]

2023-11-14 Thread via GitHub
allisonwang-db opened a new pull request, #43811: URL: https://github.com/apache/spark/pull/43811 ### What changes were proposed in this pull request? This PR improves the docstring of the dataframe function `mapInPandas`. ### Why are the changes needed? To improv

[PR] [SPARK-45930][SQL] Support non-deterministic UDFs in MapInPandas/MapInArrow [spark]

2023-11-14 Thread via GitHub
allisonwang-db opened a new pull request, #43810: URL: https://github.com/apache/spark/pull/43810 ### What changes were proposed in this pull request? This PR supports non-deterministic UDFs in MapInPandas and MapInArrow. ### Why are the changes needed? Currently,

Re: [PR] [SPARK-45908][Python] Add support for writing empty DataFrames to parquet with partitions [spark]

2023-11-14 Thread via GitHub
maintian commented on code in PR #43798: URL: https://github.com/apache/spark/pull/43798#discussion_r1393610399 ## python/pyspark/sql/readwriter.py: ## @@ -1936,7 +1936,23 @@ def parquet( if partitionBy is not None: self.partitionBy(partitionBy) se

Re: [PR] [SPARK-45731][SQL] Also update partition statistics with `ANALYZE TABLE` command [spark]

2023-11-14 Thread via GitHub
patsukp-db commented on code in PR #43629: URL: https://github.com/apache/spark/pull/43629#discussion_r1393596801 ## sql/core/src/main/scala/org/apache/spark/sql/execution/command/CommandUtils.scala: ## @@ -37,6 +39,7 @@ import org.apache.spark.sql.execution.QueryExecution impo

Re: [PR] [SPARK-45731][SQL] Also update partition statistics with `ANALYZE TABLE` command [spark]

2023-11-14 Thread via GitHub
patsukp-db commented on code in PR #43629: URL: https://github.com/apache/spark/pull/43629#discussion_r1393596801 ## sql/core/src/main/scala/org/apache/spark/sql/execution/command/CommandUtils.scala: ## @@ -37,6 +39,7 @@ import org.apache.spark.sql.execution.QueryExecution impo

Re: [PR] [SPARK-45924][SQL] Fixing the canonicalization of SubqueryAdaptiveBroadcastExec and making it equivalent with SubqueryBroadcastExec [spark]

2023-11-14 Thread via GitHub
ahshahid commented on code in PR #43806: URL: https://github.com/apache/spark/pull/43806#discussion_r1393590377 ## sql/core/src/main/scala/org/apache/spark/sql/execution/SubqueryAdaptiveBroadcastExec.scala: ## @@ -44,9 +46,21 @@ case class SubqueryAdaptiveBroadcastExec( thr

Re: [PR] [SPARK-45924][SQL] Fixing the canonicalization of SubqueryAdaptiveBroadcastExec and making it equivalent with SubqueryBroadcastExec [spark]

2023-11-14 Thread via GitHub
ahshahid commented on code in PR #43806: URL: https://github.com/apache/spark/pull/43806#discussion_r1393590377 ## sql/core/src/main/scala/org/apache/spark/sql/execution/SubqueryAdaptiveBroadcastExec.scala: ## @@ -44,9 +46,21 @@ case class SubqueryAdaptiveBroadcastExec( thr

Re: [PR] [SPARK-45924][SQL] Fixing the canonicalization of SubqueryAdaptiveBroadcastExec and making it equivalent with SubqueryBroadcastExec [spark]

2023-11-14 Thread via GitHub
ahshahid commented on code in PR #43806: URL: https://github.com/apache/spark/pull/43806#discussion_r1393590377 ## sql/core/src/main/scala/org/apache/spark/sql/execution/SubqueryAdaptiveBroadcastExec.scala: ## @@ -44,9 +46,21 @@ case class SubqueryAdaptiveBroadcastExec( thr

Re: [PR] [SPARK-45924][SQL] Fixing the canonicalization of SubqueryAdaptiveBroadcastExec and making it equivalent with SubqueryBroadcastExec [spark]

2023-11-14 Thread via GitHub
ahshahid commented on code in PR #43806: URL: https://github.com/apache/spark/pull/43806#discussion_r1393590377 ## sql/core/src/main/scala/org/apache/spark/sql/execution/SubqueryAdaptiveBroadcastExec.scala: ## @@ -44,9 +46,21 @@ case class SubqueryAdaptiveBroadcastExec( thr

Re: [PR] [SPARK-45924][SQL] Fixing the canonicalization of SubqueryAdaptiveBroadcastExec and making it equivalent with SubqueryBroadcastExec [spark]

2023-11-14 Thread via GitHub
ahshahid commented on code in PR #43806: URL: https://github.com/apache/spark/pull/43806#discussion_r1393590377 ## sql/core/src/main/scala/org/apache/spark/sql/execution/SubqueryAdaptiveBroadcastExec.scala: ## @@ -44,9 +46,21 @@ case class SubqueryAdaptiveBroadcastExec( thr

Re: [PR] [SPARK-45810][Python] Create Python UDTF API to stop consuming rows from the input table [spark]

2023-11-14 Thread via GitHub
ueshin commented on code in PR #43682: URL: https://github.com/apache/spark/pull/43682#discussion_r1393589351 ## python/pyspark/sql/tests/test_udtf.py: ## @@ -2482,6 +2533,7 @@ def tearDownClass(cls): super(UDTFTests, cls).tearDownClass() +''' Review Comment:

Re: [PR] [SPARK-45597][PYTHON][SQL] Support creating table using a Python data source in SQL [spark]

2023-11-14 Thread via GitHub
allisonwang-db commented on code in PR #43784: URL: https://github.com/apache/spark/pull/43784#discussion_r1393582670 ## python/pyspark/sql/tests/test_python_datasource.py: ## @@ -118,25 +118,26 @@ def reader(self, schema) -> "DataSourceReader": self.spark.dataSource.

Re: [PR] [SPARK-45920][SQL] group by ordinal should be idempotent [spark]

2023-11-14 Thread via GitHub
cloud-fan commented on PR #43797: URL: https://github.com/apache/spark/pull/43797#issuecomment-1811735675 > Given the following warning, this sounds like this could cause a correctness issue. Did I understand correctly? Yes, but only when you manipulate logical plans directly. SQL/Dat

Re: [PR] [SPARK-45924][SQL] Fixing the canonicalization of SubqueryAdaptiveBroadcastExec and making it equivalent with SubqueryBroadcastExec [spark]

2023-11-14 Thread via GitHub
ulysses-you commented on code in PR #43806: URL: https://github.com/apache/spark/pull/43806#discussion_r1393580889 ## sql/core/src/main/scala/org/apache/spark/sql/execution/SubqueryAdaptiveBroadcastExec.scala: ## @@ -44,9 +46,21 @@ case class SubqueryAdaptiveBroadcastExec(

Re: [PR] [SPARK-44496][SQL][FOLLOW-UP] CalendarIntervalType is also orderable [spark]

2023-11-14 Thread via GitHub
yaooqinn commented on PR #43805: URL: https://github.com/apache/spark/pull/43805#issuecomment-1811732522 FYI, add orderable support. https://issues.apache.org/jira/browse/SPARK-29679 https://issues.apache.org/jira/browse/SPARK-29385 drop orderable support. https://i

Re: [PR] [SPARK-45905][SQL] Least common type between decimal types should retain integral digits first [spark]

2023-11-14 Thread via GitHub
cloud-fan commented on code in PR #43781: URL: https://github.com/apache/spark/pull/43781#discussion_r1393576370 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/DecimalPrecision.scala: ## @@ -64,7 +65,11 @@ object DecimalPrecision extends TypeCoercionRule {

Re: [PR] [SPARK-45597][PYTHON][SQL] Support creating table using a Python data source in SQL [spark]

2023-11-14 Thread via GitHub
HyukjinKwon commented on code in PR #43784: URL: https://github.com/apache/spark/pull/43784#discussion_r1393562595 ## python/pyspark/sql/tests/test_python_datasource.py: ## @@ -118,25 +118,26 @@ def reader(self, schema) -> "DataSourceReader": self.spark.dataSource.reg

Re: [PR] [SPARK-45912][SQL] Enhancement of XSDToSchema API: Change to HDFS API for cloud storage accessibility [spark]

2023-11-14 Thread via GitHub
HyukjinKwon commented on code in PR #43789: URL: https://github.com/apache/spark/pull/43789#discussion_r1393558732 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/xml/XSDToSchema.scala: ## @@ -35,34 +38,32 @@ import org.apache.spark.sql.types._ object XSDT

Re: [PR] [SPARK-45912][SQL] Enhancement of XSDToSchema API: Change to HDFS API for cloud storage accessibility [spark]

2023-11-14 Thread via GitHub
HyukjinKwon commented on code in PR #43789: URL: https://github.com/apache/spark/pull/43789#discussion_r1393556662 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/xml/XSDToSchema.scala: ## @@ -35,34 +38,32 @@ import org.apache.spark.sql.types._ object XSDT

Re: [PR] [SPARK-45912][SQL] Enhancement of XSDToSchema API: Change to HDFS API for cloud storage accessibility [spark]

2023-11-14 Thread via GitHub
HyukjinKwon commented on code in PR #43789: URL: https://github.com/apache/spark/pull/43789#discussion_r1393556057 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/xml/XSDToSchema.scala: ## @@ -35,34 +38,32 @@ import org.apache.spark.sql.types._ object XSDT

Re: [PR] [SPARK-45912][SQL] Enhancement of XSDToSchema API: Change to HDFS API for cloud storage accessibility [spark]

2023-11-14 Thread via GitHub
HyukjinKwon commented on code in PR #43789: URL: https://github.com/apache/spark/pull/43789#discussion_r1393554560 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/xml/XSDToSchema.scala: ## @@ -35,34 +38,32 @@ import org.apache.spark.sql.types._ object XSDT

Re: [PR] [SPARK-45912][SQL] Enhancement of XSDToSchema API: Change to HDFS API for cloud storage accessibility [spark]

2023-11-14 Thread via GitHub
HyukjinKwon commented on code in PR #43789: URL: https://github.com/apache/spark/pull/43789#discussion_r1393554560 ## sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/xml/XSDToSchema.scala: ## @@ -35,34 +38,32 @@ import org.apache.spark.sql.types._ object XSDT

Re: [PR] [MINOR][DOCS] Correct additional Conda documentation URL to fix 404 errors [spark]

2023-11-14 Thread via GitHub
HyukjinKwon closed pull request #43794: [MINOR][DOCS] Correct additional Conda documentation URL to fix 404 errors URL: https://github.com/apache/spark/pull/43794 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL ab

Re: [PR] [SPARK-45844][SQL][FOLLOWUP] Improve the caseSensitivityOrdering for XmlInferSchema [spark]

2023-11-14 Thread via GitHub
beliefer commented on PR #43802: URL: https://github.com/apache/spark/pull/43802#issuecomment-1811694498 @HyukjinKwon Thank you! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

Re: [PR] [MINOR][DOCS] Correct additional Conda documentation URL to fix 404 errors [spark]

2023-11-14 Thread via GitHub
HyukjinKwon commented on PR #43794: URL: https://github.com/apache/spark/pull/43794#issuecomment-1811694428 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [MINOR][DOCS] Correct additional Conda documentation URL to fix 404 errors [spark]

2023-11-14 Thread via GitHub
HyukjinKwon commented on PR #43794: URL: https://github.com/apache/spark/pull/43794#issuecomment-1811694361 I locally verified this change. Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the U

Re: [PR] [SPARK-45918][PS] Optimize `MultiIndex.symmetric_difference` [spark]

2023-11-14 Thread via GitHub
HyukjinKwon closed pull request #43795: [SPARK-45918][PS] Optimize `MultiIndex.symmetric_difference` URL: https://github.com/apache/spark/pull/43795 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

Re: [PR] [SPARK-45918][PS] Optimize `MultiIndex.symmetric_difference` [spark]

2023-11-14 Thread via GitHub
HyukjinKwon commented on PR #43795: URL: https://github.com/apache/spark/pull/43795#issuecomment-1811693758 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [SPARK-45920][SQL] group by ordinal should be idempotent [spark]

2023-11-14 Thread via GitHub
HyukjinKwon commented on code in PR #43797: URL: https://github.com/apache/spark/pull/43797#discussion_r1393547242 ## sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/SubstituteUnresolvedOrdinalsSuite.scala: ## @@ -67,4 +68,22 @@ class SubstituteUnresolvedOrdin

Re: [PR] [SPARK-45908][Python] Add support for writing empty DataFrames to parquet with partitions [spark]

2023-11-14 Thread via GitHub
HyukjinKwon commented on code in PR #43798: URL: https://github.com/apache/spark/pull/43798#discussion_r1393546849 ## python/pyspark/sql/readwriter.py: ## @@ -1936,7 +1936,23 @@ def parquet( if partitionBy is not None: self.partitionBy(partitionBy)

Re: [PR] [SPARK-45844][SQL][FOLLOWUP] Improve the caseSensitivityOrdering for XmlInferSchema [spark]

2023-11-14 Thread via GitHub
HyukjinKwon closed pull request #43802: [SPARK-45844][SQL][FOLLOWUP] Improve the caseSensitivityOrdering for XmlInferSchema URL: https://github.com/apache/spark/pull/43802 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use t

Re: [PR] [SPARK-45844][SQL][FOLLOWUP] Improve the caseSensitivityOrdering for XmlInferSchema [spark]

2023-11-14 Thread via GitHub
HyukjinKwon commented on PR #43802: URL: https://github.com/apache/spark/pull/43802#issuecomment-1811690078 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [SPARK-45562][SQL][FOLLOW-UP] XML: Fix SQLSTATE for missing rowTag error [spark]

2023-11-14 Thread via GitHub
HyukjinKwon closed pull request #43804: [SPARK-45562][SQL][FOLLOW-UP] XML: Fix SQLSTATE for missing rowTag error URL: https://github.com/apache/spark/pull/43804 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

Re: [PR] [SPARK-45562][SQL][FOLLOW-UP] XML: Fix SQLSTATE for missing rowTag error [spark]

2023-11-14 Thread via GitHub
HyukjinKwon commented on PR #43804: URL: https://github.com/apache/spark/pull/43804#issuecomment-1811689374 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [SPARK-44496][SQL][FOLLOW-UP] CalendarIntervalType is also orderable [spark]

2023-11-14 Thread via GitHub
HyukjinKwon commented on code in PR #43805: URL: https://github.com/apache/spark/pull/43805#discussion_r1393545050 ## sql/api/src/main/scala/org/apache/spark/sql/catalyst/expressions/OrderUtils.scala: ## @@ -16,15 +16,16 @@ */ package org.apache.spark.sql.catalyst.expressions

Re: [PR] [SPARK-45924][SQL] Fixing the canonicalization of SubqueryAdaptiveBroadcastExec and making it equivalent with SubqueryBroadcastExec [spark]

2023-11-14 Thread via GitHub
HyukjinKwon commented on PR #43806: URL: https://github.com/apache/spark/pull/43806#issuecomment-1811687115 cc @peter-toth and @ulysses-you FYI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-45925][SQL] Making SubqueryBroadcastExec equivalent to SubqueryAdaptiveBroadcastExec [spark]

2023-11-14 Thread via GitHub
HyukjinKwon commented on PR #43807: URL: https://github.com/apache/spark/pull/43807#issuecomment-1811686321 cc @ulysses-you and @peter-toth FYI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-45926][SQL] Implementing equals and hashCode which takes into account pushed runtime filters , in InMemoryTable related scans [spark]

2023-11-14 Thread via GitHub
HyukjinKwon commented on PR #43808: URL: https://github.com/apache/spark/pull/43808#issuecomment-1811684533 cc @ulysses-you and @peter-toth FYI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-45873][CORE][YARN][K8S] Make ExecutorFailureTracker more tolerant when app remains sufficient resources [spark]

2023-11-14 Thread via GitHub
yaooqinn commented on code in PR #43746: URL: https://github.com/apache/spark/pull/43746#discussion_r1393541374 ## core/src/main/scala/org/apache/spark/internal/config/package.scala: ## @@ -2087,6 +2087,17 @@ package object config { .doubleConf .createOptional +

Re: [PR] [SPARK-45873][CORE][YARN][K8S] Make ExecutorFailureTracker more tolerant when app remains sufficient resources [spark]

2023-11-14 Thread via GitHub
yaooqinn commented on PR #43746: URL: https://github.com/apache/spark/pull/43746#issuecomment-1811680290 > Preemption on yarn shouldn't be going against the number of failed executors. If it is then something has changed and we should fix that. Yes, you are right > This is a co

Re: [PR] [SPARK-45913][PYTHON] Make the internal attributes private from PySpark errors. [spark]

2023-11-14 Thread via GitHub
HyukjinKwon closed pull request #43790: [SPARK-45913][PYTHON] Make the internal attributes private from PySpark errors. URL: https://github.com/apache/spark/pull/43790 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the U

Re: [PR] [SPARK-45913][PYTHON] Make the internal attributes private from PySpark errors. [spark]

2023-11-14 Thread via GitHub
HyukjinKwon commented on PR #43790: URL: https://github.com/apache/spark/pull/43790#issuecomment-1811678634 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [SPARK-40559][PYTHON] Add applyInArrow to groupBy and cogroup [spark]

2023-11-14 Thread via GitHub
HyukjinKwon commented on PR #38624: URL: https://github.com/apache/spark/pull/38624#issuecomment-1811676307 cc @ueshin and @xinrong-meng for review if you find some time. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] [SPARK-45764][PYTHON][DOCS] Make code block copyable [spark]

2023-11-14 Thread via GitHub
panbingkun commented on PR #43799: URL: https://github.com/apache/spark/pull/43799#issuecomment-1811673011 > You might need to revert [8375103](https://github.com/apache/spark/commit/83751035685c84c681e88ac6e55fbcc9d6d37ef5) Done. -- This is an automated message from the Apache Git

Re: [PR] [SPARK-45764][PYTHON][DOCS] Make code block copyable [spark]

2023-11-14 Thread via GitHub
panbingkun commented on code in PR #43799: URL: https://github.com/apache/spark/pull/43799#discussion_r1393532221 ## dev/requirements.txt: ## @@ -37,6 +37,7 @@ numpydoc jinja2<3.0.0 sphinx<3.1.0 sphinx-plotly-directive +sphinx-copybutton Review Comment: Currently `sphinx-

Re: [PR] [SPARK-45813][CONNECT][PYTHON] Return the observed metrics from commands [spark]

2023-11-14 Thread via GitHub
HyukjinKwon closed pull request #43690: [SPARK-45813][CONNECT][PYTHON] Return the observed metrics from commands URL: https://github.com/apache/spark/pull/43690 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

Re: [PR] [SPARK-45813][CONNECT][PYTHON] Return the observed metrics from commands [spark]

2023-11-14 Thread via GitHub
HyukjinKwon commented on PR #43690: URL: https://github.com/apache/spark/pull/43690#issuecomment-1811671653 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [SPARK-45511][SS] State Data Source - Reader [spark]

2023-11-14 Thread via GitHub
HeartSaVioR closed pull request #43425: [SPARK-45511][SS] State Data Source - Reader URL: https://github.com/apache/spark/pull/43425 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comm

Re: [PR] [SPARK-45511][SS] State Data Source - Reader [spark]

2023-11-14 Thread via GitHub
HeartSaVioR commented on PR #43425: URL: https://github.com/apache/spark/pull/43425#issuecomment-1811667303 Thanks all for reviewing! Merging to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] [SPARK-45904][SQL][CONNECT] Mode function should supports sort with order direction [spark]

2023-11-14 Thread via GitHub
beliefer commented on PR #43786: URL: https://github.com/apache/spark/pull/43786#issuecomment-1811662016 ping @MaxGekk cc @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-45913][PYTHON] Make the internal attributes private from PySpark errors. [spark]

2023-11-14 Thread via GitHub
itholic commented on PR #43790: URL: https://github.com/apache/spark/pull/43790#issuecomment-1811642917 CI passed. cc @ueshin @HyukjinKwon could you review an additional fix when you find some time? -- This is an automated message from the Apache Git Service. To respond to the message, pl

Re: [PR] [SPARK-45810][Python] Create Python UDTF API to stop consuming rows from the input table [spark]

2023-11-14 Thread via GitHub
dtenedor commented on code in PR #43682: URL: https://github.com/apache/spark/pull/43682#discussion_r1393418903 ## python/pyspark/sql/tests/test_udtf.py: ## @@ -2467,6 +2468,53 @@ def terminate(self): [Row(count=20, buffer="abc")], ) +def test_udtf_wi

Re: [PR] [SPARK-45810][Python] Create Python UDTF API to stop consuming rows from the input table [spark]

2023-11-14 Thread via GitHub
dtenedor commented on code in PR #43682: URL: https://github.com/apache/spark/pull/43682#discussion_r1393418317 ## python/pyspark/sql/tests/test_udtf.py: ## @@ -2467,6 +2468,53 @@ def terminate(self): [Row(count=20, buffer="abc")], ) +def test_udtf_wi

Re: [PR] [SPARK-45597][PYTHON][SQL] Support creating table using a Python data source in SQL [spark]

2023-11-14 Thread via GitHub
allisonwang-db commented on code in PR #43784: URL: https://github.com/apache/spark/pull/43784#discussion_r1393417932 ## python/pyspark/sql/tests/test_python_datasource.py: ## @@ -118,25 +118,26 @@ def reader(self, schema) -> "DataSourceReader": self.spark.dataSource.

Re: [PR] [SPARK-45927][PYTHON] Update path handling in Python data source [spark]

2023-11-14 Thread via GitHub
allisonwang-db commented on PR #43809: URL: https://github.com/apache/spark/pull/43809#issuecomment-1811529611 cc @HyukjinKwon @cloud-fan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

[PR] [SPARK-45927][PYTHON] Update path handling in Python data source [spark]

2023-11-14 Thread via GitHub
allisonwang-db opened a new pull request, #43809: URL: https://github.com/apache/spark/pull/43809 ### What changes were proposed in this pull request? This PR updates how to handle `path` values from the `load()` method. It changes the DataSource class constructor and add `p

[PR] SPARK-45926 : Implementing equals and hashCode which takes into account pushed runtime filters , in InMemoryTable related scans [spark]

2023-11-14 Thread via GitHub
ahshahid opened a new pull request, #43808: URL: https://github.com/apache/spark/pull/43808 ### What changes were proposed in this pull request? Implementing equals and hashCode in the InMemoryBatchScan and InMemoryV2FilterBatchScan so that the pushed runtime filters are taken into acco

Re: [PR] [SPARK-45905][SQL] Least common type between decimal types should retain integral digits first [spark]

2023-11-14 Thread via GitHub
viirya commented on code in PR #43781: URL: https://github.com/apache/spark/pull/43781#discussion_r1393385818 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -4541,6 +4541,15 @@ object SQLConf { .booleanConf .createWithDefault(fals

Re: [PR] [SPARK-45905][SQL] Least common type between decimal types should retain integral digits first [spark]

2023-11-14 Thread via GitHub
viirya commented on code in PR #43781: URL: https://github.com/apache/spark/pull/43781#discussion_r1393383832 ## docs/sql-ref-ansi-compliance.md: ## @@ -240,6 +240,25 @@ The least common type resolution is used to: - Derive the result type for expressions such as the case expre

Re: [PR] [SPARK-45597][PYTHON][SQL] Support creating table using a Python data source in SQL [spark]

2023-11-14 Thread via GitHub
allisonwang-db commented on code in PR #43784: URL: https://github.com/apache/spark/pull/43784#discussion_r1393381350 ## python/pyspark/sql/tests/test_python_datasource.py: ## @@ -118,25 +118,26 @@ def reader(self, schema) -> "DataSourceReader": self.spark.dataSource.

Re: [PR] [SPARK-45905][SQL] Least common type between decimal types should retain integral digits first [spark]

2023-11-14 Thread via GitHub
viirya commented on code in PR #43781: URL: https://github.com/apache/spark/pull/43781#discussion_r1393380697 ## docs/sql-ref-ansi-compliance.md: ## @@ -240,6 +240,25 @@ The least common type resolution is used to: - Derive the result type for expressions such as the case expre

Re: [PR] [SPARK-45905][SQL] Least common type between decimal types should retain integral digits first [spark]

2023-11-14 Thread via GitHub
viirya commented on code in PR #43781: URL: https://github.com/apache/spark/pull/43781#discussion_r1393380697 ## docs/sql-ref-ansi-compliance.md: ## @@ -240,6 +240,25 @@ The least common type resolution is used to: - Derive the result type for expressions such as the case expre

Re: [PR] [SPARK-45905][SQL] Least common type between decimal types should retain integral digits first [spark]

2023-11-14 Thread via GitHub
viirya commented on code in PR #43781: URL: https://github.com/apache/spark/pull/43781#discussion_r1393379712 ## docs/sql-ref-ansi-compliance.md: ## @@ -240,6 +240,25 @@ The least common type resolution is used to: - Derive the result type for expressions such as the case expre

[PR] Spark 45925: Making SubqueryBroadcastExec equivalent to SubqueryAdaptiveBroadcastExec [spark]

2023-11-14 Thread via GitHub
ahshahid opened a new pull request, #43807: URL: https://github.com/apache/spark/pull/43807 ### What changes were proposed in this pull request? Implementing equals and hashCode in SubqueryBroadcastExec so that it is made equivalent to SubqueryAdaptiveBroadcastExec . During the bug testin

Re: [PR] [SPARK-45756][CORE] Support `spark.master.useAppNameAsAppId.enabled` [spark]

2023-11-14 Thread via GitHub
mridulm commented on PR #43743: URL: https://github.com/apache/spark/pull/43743#issuecomment-1811435846 Sounds good to me, thanks @dongjoon-hyun ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] [SPARK-45511][SS] State Data Source - Reader [spark]

2023-11-14 Thread via GitHub
HeartSaVioR commented on PR #43425: URL: https://github.com/apache/spark/pull/43425#issuecomment-1811428536 I'll rebase to retrigger CI and merge if everything is good. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use t

[PR] Spark 45924: Fixing the canonicalization of SubqueryAdaptiveBroadcastExec and making it equivalent with SubqueryBroadcastExec [spark]

2023-11-14 Thread via GitHub
ahshahid opened a new pull request, #43806: URL: https://github.com/apache/spark/pull/43806 ### What changes were proposed in this pull request? The canonicalization of SubqueryAdaptiveBroadcastExec is now canonicalizing the buildPlan : LogicalPlan SubqueryAdaptiveBroadcastExec is now

  1   2   >