Re: [PR] [SPARK-46677][CONNECT][FOLLOWUPS] Fix `dataset.col("*")` in Scala Client [spark]

2024-01-16 Thread via GitHub
zhengruifeng commented on PR #44748: URL: https://github.com/apache/spark/pull/44748#issuecomment-1893240007 There is an issue on `count(df.col("*"))`, I am going to fix it in a separate PR first -- This is an automated message from the Apache Git Service. To respond to the message, pleas

Re: [PR] [WIP][SPARK-46677][CONNECT][FOLLOWUPS] Fix `dataset.col("*")` in Scala Client [spark]

2024-01-16 Thread via GitHub
LuciferYang commented on PR #44748: URL: https://github.com/apache/spark/pull/44748#issuecomment-1893248701 maybe we should use `[FOLLOWUP]` in pr title ? `[FOLLOWUPS]` does not exist before `[SPARK-43611][PS][CONNECT][TESTS][FOLLOWUPS] Enable more tests` ... -- This is an automated messa

Re: [PR] [SPARK-46717][CORE] Simplify `ReloadingX509TrustManager` by the exit operation only depend on interrupt thread. [spark]

2024-01-16 Thread via GitHub
beliefer commented on PR #44720: URL: https://github.com/apache/spark/pull/44720#issuecomment-1893264789 @dongjoon-hyun @mridulm @srowen @LuciferYang Thank you! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

[PR] [SPARK-46677][CONNECT][FOLLOWUP] Convert `count(df["*"])` to `count(1)` on client side [spark]

2024-01-16 Thread via GitHub
zhengruifeng opened a new pull request, #44752: URL: https://github.com/apache/spark/pull/44752 ### What changes were proposed in this pull request? before https://github.com/apache/spark/pull/44689, `df["*"]` and `sf.col("*")` are both convert to `UnresolvedStar`, and then `Count(Unreso

[PR] [SPARK-46732][CONNECT]Make Subquery/Broadcast thread work with Connect's artifact management [spark]

2024-01-16 Thread via GitHub
xieshuaihu opened a new pull request, #44753: URL: https://github.com/apache/spark/pull/44753 ### What changes were proposed in this pull request? Similar with SPARK-44794, propagate JobArtifactState to broadcast/subquery thread. This is an example: ```scala val add1

Re: [PR] [WIP][CORE] Simplify the ContextCleaner|BlockManager by the exit operation only depend on interrupt thread. [spark]

2024-01-16 Thread via GitHub
beliefer commented on PR #44732: URL: https://github.com/apache/spark/pull/44732#issuecomment-1893296284 > Unlike `ReloadingX509TrustManager `, here the `InterruptedException` is caught and ignored in some of the downstream methods: so we cant rely on that pattern here. `ContextClean

Re: [PR] [SPARK-46727][SQL] Port `classifyException()` in JDBC dialects on error classes [spark]

2024-01-16 Thread via GitHub
MaxGekk commented on PR #44739: URL: https://github.com/apache/spark/pull/44739#issuecomment-1893301272 cc @milastdbx -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To un

Re: [PR] [SPARK-46729][DOCS] Withdraw the recommendation of using Concurrent Mark Sweep (CMS) Garbage Collector [spark]

2024-01-16 Thread via GitHub
yaooqinn closed pull request #44746: [SPARK-46729][DOCS] Withdraw the recommendation of using Concurrent Mark Sweep (CMS) Garbage Collector URL: https://github.com/apache/spark/pull/44746 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] [SPARK-46729][DOCS] Withdraw the recommendation of using Concurrent Mark Sweep (CMS) Garbage Collector [spark]

2024-01-16 Thread via GitHub
yaooqinn commented on PR #44746: URL: https://github.com/apache/spark/pull/44746#issuecomment-1893390977 Thank you @LuciferYang @beliefer, merged to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] [SPARK-46727][SQL] Port `classifyException()` in JDBC dialects on error classes [spark]

2024-01-16 Thread via GitHub
beliefer commented on code in PR #44739: URL: https://github.com/apache/spark/pull/44739#discussion_r1453174209 ## sql/core/src/main/scala/org/apache/spark/sql/jdbc/H2Dialect.scala: ## @@ -224,25 +228,21 @@ private[sql] object H2Dialect extends JdbcDialect { throw n

Re: [PR] [SPARK-46677][CONNECT][FOLLOWUP] Convert `count(df["*"])` to `count(1)` on client side [spark]

2024-01-16 Thread via GitHub
zhengruifeng closed pull request #44752: [SPARK-46677][CONNECT][FOLLOWUP] Convert `count(df["*"])` to `count(1)` on client side URL: https://github.com/apache/spark/pull/44752 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

Re: [PR] [SPARK-46677][CONNECT][FOLLOWUP] Convert `count(df["*"])` to `count(1)` on client side [spark]

2024-01-16 Thread via GitHub
zhengruifeng commented on PR #44752: URL: https://github.com/apache/spark/pull/44752#issuecomment-1893422052 thanks, merged to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-46727][SQL] Port `classifyException()` in JDBC dialects on error classes [spark]

2024-01-16 Thread via GitHub
MaxGekk commented on code in PR #44739: URL: https://github.com/apache/spark/pull/44739#discussion_r145321 ## sql/core/src/main/scala/org/apache/spark/sql/jdbc/H2Dialect.scala: ## @@ -224,25 +228,21 @@ private[sql] object H2Dialect extends JdbcDialect { throw ne

Re: [PR] [SPARK-45527][CORE] Use fraction to do the resource calculation [spark]

2024-01-16 Thread via GitHub
wbo4958 commented on PR #44690: URL: https://github.com/apache/spark/pull/44690#issuecomment-1893485651 Hi @tgravescs, Could you help to review it? Thx very much. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] [SPARK-46727][SQL] Port `classifyException()` in JDBC dialects on error classes [spark]

2024-01-16 Thread via GitHub
beliefer commented on code in PR #44739: URL: https://github.com/apache/spark/pull/44739#discussion_r1453267761 ## sql/core/src/main/scala/org/apache/spark/sql/jdbc/H2Dialect.scala: ## @@ -224,25 +228,21 @@ private[sql] object H2Dialect extends JdbcDialect { throw n

Re: [PR] [SPARK-46677][CONNECT][FOLLOWUP] Fix `dataset.col("*")` in Scala Client [spark]

2024-01-16 Thread via GitHub
zhengruifeng closed pull request #44748: [SPARK-46677][CONNECT][FOLLOWUP] Fix `dataset.col("*")` in Scala Client URL: https://github.com/apache/spark/pull/44748 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

Re: [PR] [SPARK-46677][CONNECT][FOLLOWUP] Fix `dataset.col("*")` in Scala Client [spark]

2024-01-16 Thread via GitHub
zhengruifeng commented on PR #44748: URL: https://github.com/apache/spark/pull/44748#issuecomment-1893617939 merged to master, thank you! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

Re: [PR] [SPARK-46734][INFRA] Combine pip installations for lint and doc respectively [spark]

2024-01-16 Thread via GitHub
zhengruifeng commented on code in PR #44754: URL: https://github.com/apache/spark/pull/44754#discussion_r1453368381 ## .github/workflows/build_and_test.yml: ## @@ -751,13 +752,16 @@ jobs: Rscript -e "install.packages(c('devtools', 'testthat', 'knitr', 'rmarkdown', 'mar

Re: [PR] [SPARK-46733][CORE] Simplify the BlockManager by the exit operation only depend on interrupt thread. [spark]

2024-01-16 Thread via GitHub
beliefer commented on PR #44732: URL: https://github.com/apache/spark/pull/44732#issuecomment-1893757819 > Unlike `ReloadingX509TrustManager `, here the `InterruptedException` is caught and ignored in some of the downstream methods: so we cant rely on that pattern here. I reverted th

Re: [PR] [SPARK-46590][SQL] Fix coalesce failed with BroadcastJoin and Union [spark]

2024-01-16 Thread via GitHub
jackylee-ch commented on code in PR #44661: URL: https://github.com/apache/spark/pull/44661#discussion_r1453489448 ## sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/CoalesceShufflePartitions.scala: ## @@ -146,13 +147,15 @@ case class CoalesceShufflePartitions(se

Re: [PR] [SPARK-46590][SQL] Fix coalesce failed with BroadcastJoin and Union [spark]

2024-01-16 Thread via GitHub
jackylee-ch commented on code in PR #44661: URL: https://github.com/apache/spark/pull/44661#discussion_r1453489448 ## sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/CoalesceShufflePartitions.scala: ## @@ -146,13 +147,15 @@ case class CoalesceShufflePartitions(se

Re: [PR] [SPARK-43919][SQL] Extract JSON functionality out of Row [spark]

2024-01-16 Thread via GitHub
tfinn-ias commented on PR #41425: URL: https://github.com/apache/spark/pull/41425#issuecomment-1893858562 Hi, your current public documentation for the Java API lists these methods as available with no disclaimers about them being private or unstable: https://spark.apache.org/docs/3.4

Re: [PR] [SPARK-43919][SQL] Extract JSON functionality out of Row [spark]

2024-01-16 Thread via GitHub
hvanhovell commented on PR #41425: URL: https://github.com/apache/spark/pull/41425#issuecomment-1893877733 @tfinn-ias what is the question? We put the toJson functionality back in a later PR. -- This is an automated message from the Apache Git Service. To respond to the message, please lo

Re: [PR] [SPARK-46727][SQL] Port `classifyException()` in JDBC dialects on error classes [spark]

2024-01-16 Thread via GitHub
MaxGekk commented on PR #44739: URL: https://github.com/apache/spark/pull/44739#issuecomment-1893894442 Merging to master. Thank you, @beliefer and @cloud-fan for review. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] [SPARK-46727][SQL] Port `classifyException()` in JDBC dialects on error classes [spark]

2024-01-16 Thread via GitHub
MaxGekk closed pull request #44739: [SPARK-46727][SQL] Port `classifyException()` in JDBC dialects on error classes URL: https://github.com/apache/spark/pull/44739 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

Re: [PR] [SPARK-43919][SQL] Extract JSON functionality out of Row [spark]

2024-01-16 Thread via GitHub
tfinn-ias commented on PR #41425: URL: https://github.com/apache/spark/pull/41425#issuecomment-1893907566 In 3.4.0 the jsonValue() was callable from Java, and the Javadoc did not indicate it was unstable or unavailable. In 3.4.1 it fails at runtime with a method not found exception, a

Re: [PR] [WIP][SPARK-24815] [CORE] Trigger Interval based DRA for Structured Streaming [spark]

2024-01-16 Thread via GitHub
pkotikalapudi commented on PR #42352: URL: https://github.com/apache/spark/pull/42352#issuecomment-1893936508 @mentasm , yeah we have an [email thread](https://lists.apache.org/thread/9yx0jnk9h1234joymwlzfx2gh2m8b9bo) going for a long time. Mich was gracious enough to do a review, but sugge

Re: [PR] [SPARK-46734][INFRA] Combine pip installations for lint and doc respectively [spark]

2024-01-16 Thread via GitHub
nchammas commented on code in PR #44754: URL: https://github.com/apache/spark/pull/44754#discussion_r1453598396 ## .github/workflows/build_and_test.yml: ## @@ -751,13 +752,16 @@ jobs: Rscript -e "install.packages(c('devtools', 'testthat', 'knitr', 'rmarkdown', 'markdow

Re: [PR] [SPARK-46734][INFRA] Combine pip installations for lint and doc respectively [spark]

2024-01-16 Thread via GitHub
nchammas commented on code in PR #44754: URL: https://github.com/apache/spark/pull/44754#discussion_r1453598396 ## .github/workflows/build_and_test.yml: ## @@ -751,13 +752,16 @@ jobs: Rscript -e "install.packages(c('devtools', 'testthat', 'knitr', 'rmarkdown', 'markdow

Re: [PR] [SPARK-46638][Python] Create Python UDTF API to acquire execution memory for function evaluation [spark]

2024-01-16 Thread via GitHub
nickstanishadb commented on code in PR #44678: URL: https://github.com/apache/spark/pull/44678#discussion_r1453648020 ## python/pyspark/sql/udtf.py: ## @@ -133,12 +133,28 @@ class AnalyzeResult: If non-empty, this is a sequence of expressions that the UDTF is specifyin

Re: [PR] [SPARK-46638][Python] Create Python UDTF API to acquire execution memory for function evaluation [spark]

2024-01-16 Thread via GitHub
nickstanishadb commented on code in PR #44678: URL: https://github.com/apache/spark/pull/44678#discussion_r1453648020 ## python/pyspark/sql/udtf.py: ## @@ -133,12 +133,28 @@ class AnalyzeResult: If non-empty, this is a sequence of expressions that the UDTF is specifyin

[PR] [SPARK-46395][CORE] Assign Spark configs to groups for use in documentation [spark]

2024-01-16 Thread via GitHub
nchammas opened a new pull request, #44755: URL: https://github.com/apache/spark/pull/44755 ### What changes were proposed in this pull request? Enable Spark configs to be assigned to documentation groups. These groups will be used to automatically build config tables for display in o

[PR] [SPARK-46395][DOCS] Assign Spark configs to groups for use in documentation [spark]

2024-01-16 Thread via GitHub
nchammas opened a new pull request, #44756: URL: https://github.com/apache/spark/pull/44756 ### What changes were proposed in this pull request? Enable Spark configs to be assigned to documentation groups. These groups will be used to automatically build config tables for display in o

Re: [PR] [SPARK-46395][DOCS] Assign Spark configs to groups for use in documentation [spark]

2024-01-16 Thread via GitHub
nchammas closed pull request #44300: [SPARK-46395][DOCS] Assign Spark configs to groups for use in documentation URL: https://github.com/apache/spark/pull/44300 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

Re: [PR] [SPARK-46395][DOCS] Assign Spark configs to groups for use in documentation [spark]

2024-01-16 Thread via GitHub
nchammas commented on PR #44300: URL: https://github.com/apache/spark/pull/44300#issuecomment-1894096892 Silence on a PR usually means there is something wrong with it, so I am proactively trying to figure out how to move this idea forward. The possibilities I am considering are: 1

[PR] [WIP][SQL] Add the error class `UNSUPPORTED_CALL` [spark]

2024-01-16 Thread via GitHub
MaxGekk opened a new pull request, #44757: URL: https://github.com/apache/spark/pull/44757 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? Yes, it can if user's c

Re: [PR] [WIP][SPARK-24815] [CORE] Trigger Interval based DRA for Structured Streaming [spark]

2024-01-16 Thread via GitHub
krymitch commented on PR #42352: URL: https://github.com/apache/spark/pull/42352#issuecomment-1894134932 Hi @mridulm. We appreciate you support on this. DRA is essential to auto scaling up and back down. Can you please confirm if this proposal was ever dropped in the dev list for discussio

Re: [PR] [WIP][SQL] Add the error class `UNSUPPORTED_CALL` [spark]

2024-01-16 Thread via GitHub
MaxGekk commented on code in PR #44757: URL: https://github.com/apache/spark/pull/44757#discussion_r1453709810 ## common/utils/src/main/scala/org/apache/spark/SparkException.scala: ## @@ -228,6 +228,19 @@ private[spark] class SparkUnsupportedOperationException private( overr

Re: [PR] [SPARK-46717][CORE] Simplify `ReloadingX509TrustManager` by the exit operation only depend on interrupt thread. [spark]

2024-01-16 Thread via GitHub
hasnain-db commented on PR #44720: URL: https://github.com/apache/spark/pull/44720#issuecomment-1894154071 thanks for doing this @beliefer ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

[PR] [SPARK-46725] Add DAYNAME function [spark]

2024-01-16 Thread via GitHub
PetarVasiljevic-DB opened a new pull request, #44758: URL: https://github.com/apache/spark/pull/44758 ### What changes were proposed in this pull request? Added DAYNAME function that returns three letter abbreviation day name for the specified date to: - Scala API - Python A

Re: [PR] [SPARK-46094] Support Executor JVM Profiling [spark]

2024-01-16 Thread via GitHub
parthchandra commented on PR #44021: URL: https://github.com/apache/spark/pull/44021#issuecomment-1894323461 Thank you @dongjoon-hyun @mridulm -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[PR] [SPARK-46735][PYTHON][TESTS] `pyspark.sql.tests.test_group` should skip Pandas tests if not available [spark]

2024-01-16 Thread via GitHub
dongjoon-hyun opened a new pull request, #44759: URL: https://github.com/apache/spark/pull/44759 ### What changes were proposed in this pull request? This PR aims to skip `Pandas`-related tests in `pyspark.sql.tests.test_group` if `Pandas` is not installed. ### Why are the chan

Re: [PR] [SPARK-46686][PYTHON][CONNECT] Basic support of SparkSession based Python UDF profiler [spark]

2024-01-16 Thread via GitHub
xinrong-meng commented on PR #44697: URL: https://github.com/apache/spark/pull/44697#issuecomment-1894330650 LGTM after the conflicts resolved, thanks for the nice work! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] [SPARK-46735][PYTHON][TESTS] `pyspark.sql.tests.test_group` should skip Pandas tests if not available [spark]

2024-01-16 Thread via GitHub
dongjoon-hyun commented on PR #44759: URL: https://github.com/apache/spark/pull/44759#issuecomment-1894336202 cc @xinrong-meng, @zhengruifeng, @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] [SPARK-46686][PYTHON][CONNECT] Basic support of SparkSession based Python UDF profiler [spark]

2024-01-16 Thread via GitHub
xinrong-meng commented on code in PR #44697: URL: https://github.com/apache/spark/pull/44697#discussion_r1453876420 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -2927,6 +2927,17 @@ object SQLConf { // show full stacktrace in tests but

Re: [PR] [SPARK-46686][PYTHON][CONNECT] Basic support of SparkSession based Python UDF profiler [spark]

2024-01-16 Thread via GitHub
xinrong-meng commented on code in PR #44697: URL: https://github.com/apache/spark/pull/44697#discussion_r1453896168 ## python/pyspark/sql/profiler.py: ## @@ -0,0 +1,176 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements.

Re: [PR] [SPARK-46663][PYTHON] Disable memory profiler for pandas UDFs with iterators [spark]

2024-01-16 Thread via GitHub
xinrong-meng closed pull request #44668: [SPARK-46663][PYTHON] Disable memory profiler for pandas UDFs with iterators URL: https://github.com/apache/spark/pull/44668 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] [SPARK-46663][PYTHON] Disable memory profiler for pandas UDFs with iterators [spark]

2024-01-16 Thread via GitHub
xinrong-meng commented on PR #44668: URL: https://github.com/apache/spark/pull/44668#issuecomment-1894370380 Thanks all! Merged to master, will do manual cherry-pick for branch-3.5 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitH

Re: [PR] [SPARK-46686][PYTHON][CONNECT] Basic support of SparkSession based Python UDF profiler [spark]

2024-01-16 Thread via GitHub
ueshin commented on code in PR #44697: URL: https://github.com/apache/spark/pull/44697#discussion_r1453948329 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -2927,6 +2927,17 @@ object SQLConf { // show full stacktrace in tests but hide i

Re: [PR] [SPARK-46735][PYTHON][TESTS] `pyspark.sql.tests.test_group` should skip Pandas/PyArrow tests if not available [spark]

2024-01-16 Thread via GitHub
dongjoon-hyun commented on PR #44759: URL: https://github.com/apache/spark/pull/44759#issuecomment-1894487512 Could you review this PR when you have some time, @xinrong-meng ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub an

Re: [PR] [SPARK-46395][DOCS] Assign Spark configs to groups for use in documentation [spark]

2024-01-16 Thread via GitHub
bjornjorgensen commented on PR #44300: URL: https://github.com/apache/spark/pull/44300#issuecomment-1894495683 @nchammas there seams to be a lot (some) of PR's that have been approved now, but have not been merged to master. So I think it have something to do with Hollidays and so on..

Re: [PR] [WIP][SPARK-24815] [CORE] Trigger Interval based DRA for Structured Streaming [spark]

2024-01-16 Thread via GitHub
mentasm commented on PR #42352: URL: https://github.com/apache/spark/pull/42352#issuecomment-1894518160 I will happily add my experiences to that email thread as a consumer of SSS if I can work out how. Our prod environment runs around 40 SSS apps and traffic is determined by banking t

Re: [PR] [SPARK-46707][SQL] Added throwable field to expressions to improve predicate pushdown [spark]

2024-01-16 Thread via GitHub
kelvinjian-db commented on code in PR #44716: URL: https://github.com/apache/spark/pull/44716#discussion_r1454048236 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala: ## @@ -2983,6 +2983,9 @@ case class Sequence( override d

Re: [PR] [SPARK-46629] Fix for STRUCT type DDL not picking up nullability and comment [spark]

2024-01-16 Thread via GitHub
vitaliili-db commented on PR #44644: URL: https://github.com/apache/spark/pull/44644#issuecomment-1894530476 @MaxGekk changed the title and added test. The #44644 seems to be similar to my initial PR, I leave it for you to decide which one we should merge. -- This is an automated message

Re: [PR] [SPARK-46735][PYTHON][TESTS] `pyspark.sql.tests.test_group` should skip Pandas/PyArrow tests if not available [spark]

2024-01-16 Thread via GitHub
dongjoon-hyun commented on PR #44759: URL: https://github.com/apache/spark/pull/44759#issuecomment-1894529741 Could you review this Python test PR, @viirya ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

Re: [PR] [SPARK-46629] Fix for STRUCT type DDL not picking up nullability and comment [spark]

2024-01-16 Thread via GitHub
vitaliili-db commented on code in PR #44644: URL: https://github.com/apache/spark/pull/44644#discussion_r1454047848 ## connector/protobuf/src/test/scala/org/apache/spark/sql/protobuf/ProtobufSerdeSuite.scala: ## @@ -126,14 +122,6 @@ class ProtobufSerdeSuite extends SharedSparkSe

Re: [PR] [SPARK-46629] Fix for STRUCT type DDL not picking up nullability and comment [spark]

2024-01-16 Thread via GitHub
vitaliili-db commented on code in PR #44644: URL: https://github.com/apache/spark/pull/44644#discussion_r1454047848 ## connector/protobuf/src/test/scala/org/apache/spark/sql/protobuf/ProtobufSerdeSuite.scala: ## @@ -126,14 +122,6 @@ class ProtobufSerdeSuite extends SharedSparkSe

Re: [PR] [SPARK-46629] Fix for STRUCT type DDL not picking up nullability and comment [spark]

2024-01-16 Thread via GitHub
vitaliili-db commented on code in PR #44644: URL: https://github.com/apache/spark/pull/44644#discussion_r1454047848 ## connector/protobuf/src/test/scala/org/apache/spark/sql/protobuf/ProtobufSerdeSuite.scala: ## @@ -126,14 +122,6 @@ class ProtobufSerdeSuite extends SharedSparkSe

Re: [PR] [SPARK-46735][PYTHON][TESTS] `pyspark.sql.tests.test_group` should skip Pandas/PyArrow tests if not available [spark]

2024-01-16 Thread via GitHub
dongjoon-hyun commented on PR #44759: URL: https://github.com/apache/spark/pull/44759#issuecomment-1894584317 Thank you, @xinrong-meng and @viirya . Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] [SPARK-46735][PYTHON][TESTS] `pyspark.sql.tests.test_group` should skip Pandas/PyArrow tests if not available [spark]

2024-01-16 Thread via GitHub
xinrong-meng commented on PR #44759: URL: https://github.com/apache/spark/pull/44759#issuecomment-1894584582 Thank you @dongjoon-hyun for catching that and for the fix! LGTM -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] [SPARK-46735][PYTHON][TESTS] `pyspark.sql.tests.test_group` should skip Pandas/PyArrow tests if not available [spark]

2024-01-16 Thread via GitHub
dongjoon-hyun closed pull request #44759: [SPARK-46735][PYTHON][TESTS] `pyspark.sql.tests.test_group` should skip Pandas/PyArrow tests if not available URL: https://github.com/apache/spark/pull/44759 -- This is an automated message from the Apache Git Service. To respond to the message, pleas

Re: [PR] [SPARK-46735][PYTHON][TESTS] `pyspark.sql.tests.test_group` should skip Pandas/PyArrow tests if not available [spark]

2024-01-16 Thread via GitHub
xinrong-meng commented on PR #44759: URL: https://github.com/apache/spark/pull/44759#issuecomment-1894603816 I wanted to add that those tests do not necessarily rely on Pandas/PyArrow, but the "assertDataFrameEqual" utility used in the tests does. I'll file a follow-up PR to adjust that. CC

Re: [PR] [SPARK-46686][PYTHON][CONNECT] Basic support of SparkSession based Python UDF profiler [spark]

2024-01-16 Thread via GitHub
xinrong-meng commented on code in PR #44697: URL: https://github.com/apache/spark/pull/44697#discussion_r1454127640 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -2927,6 +2927,17 @@ object SQLConf { // show full stacktrace in tests but

Re: [PR] [SPARK-46686][PYTHON][CONNECT] Basic support of SparkSession based Python UDF profiler [spark]

2024-01-16 Thread via GitHub
ueshin commented on code in PR #44697: URL: https://github.com/apache/spark/pull/44697#discussion_r1454150862 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -2927,6 +2927,17 @@ object SQLConf { // show full stacktrace in tests but hide i

Re: [PR] [SPARK-46730][PYTHON][DOCS] Refine docstring of `str_to_map/map_filter/map_zip_with` [spark]

2024-01-16 Thread via GitHub
HyukjinKwon commented on PR #44747: URL: https://github.com/apache/spark/pull/44747#issuecomment-1894677647 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [SPARK-46730][PYTHON][DOCS] Refine docstring of `str_to_map/map_filter/map_zip_with` [spark]

2024-01-16 Thread via GitHub
HyukjinKwon closed pull request #44747: [SPARK-46730][PYTHON][DOCS] Refine docstring of `str_to_map/map_filter/map_zip_with` URL: https://github.com/apache/spark/pull/44747 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] [SPARK-46732][CONNECT]Make Subquery/Broadcast thread work with Connect's artifact management [spark]

2024-01-16 Thread via GitHub
HyukjinKwon commented on PR #44753: URL: https://github.com/apache/spark/pull/44753#issuecomment-1894682878 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [SPARK-46732][CONNECT]Make Subquery/Broadcast thread work with Connect's artifact management [spark]

2024-01-16 Thread via GitHub
HyukjinKwon closed pull request #44753: [SPARK-46732][CONNECT]Make Subquery/Broadcast thread work with Connect's artifact management URL: https://github.com/apache/spark/pull/44753 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] [SPARK-46638][Python] Create Python UDTF API to acquire execution memory for function evaluation [spark]

2024-01-16 Thread via GitHub
dtenedor commented on code in PR #44678: URL: https://github.com/apache/spark/pull/44678#discussion_r1454199865 ## sql/core/src/main/scala/org/apache/spark/sql/execution/python/EvalPythonUDTFExec.scala: ## @@ -137,4 +150,46 @@ trait EvalPythonUDTFExec extends UnaryExecNode {

Re: [PR] [SPARK-46732][CONNECT]Make Subquery/Broadcast thread work with Connect's artifact management [spark]

2024-01-16 Thread via GitHub
HyukjinKwon commented on PR #44753: URL: https://github.com/apache/spark/pull/44753#issuecomment-1894686193 It has a conflict in branch-3.5. Mind creating a backporting PR please? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHu

Re: [PR] [SPARK-46728][PYTHON] Check Pandas installation properly [spark]

2024-01-16 Thread via GitHub
itholic commented on PR #44745: URL: https://github.com/apache/spark/pull/44745#issuecomment-1894686764 > Could you do the same things for the other packages like PyArrow, @itholic ? Sure. I just confirmed that other packages work as expected without any changes unlike Pandas (e.g. P

Re: [PR] [SPARK-46728][PYTHON] Check Pandas installation properly [spark]

2024-01-16 Thread via GitHub
HyukjinKwon commented on PR #44745: URL: https://github.com/apache/spark/pull/44745#issuecomment-1894687602 @itholic can you actually check why this happens only in pandas though? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHu

Re: [PR] [SPARK-46728][PYTHON] Check Pandas installation properly [spark]

2024-01-16 Thread via GitHub
HyukjinKwon commented on PR #44745: URL: https://github.com/apache/spark/pull/44745#issuecomment-1894688648 My concern is that, this is sort of a hacky bandaid fix. It is a bit weird that we do this only for pandas without knowing what's exactly going on. -- This is an automated message f

Re: [PR] [SPARK-46728][PYTHON] Check Pandas installation properly [spark]

2024-01-16 Thread via GitHub
itholic commented on PR #44745: URL: https://github.com/apache/spark/pull/44745#issuecomment-1894691125 I roughly suspect that this happened due to the same package names in our project here and there (such as `pyspark.pandas`, `pyspark.sql.pandas`), so the namespace conflicts issue occur f

Re: [PR] [SPARK-46728][PYTHON] Check Pandas installation properly [spark]

2024-01-16 Thread via GitHub
HyukjinKwon commented on PR #44745: URL: https://github.com/apache/spark/pull/44745#issuecomment-1894705810 > I roughly suspect that this happened due to the same package names in our project here and there (such as pyspark.pandas, pyspark.sql.pandas), so the namespace conflicts issue occur

Re: [PR] [SPARK-46728][PYTHON] Check Pandas installation properly [spark]

2024-01-16 Thread via GitHub
itholic commented on PR #44745: URL: https://github.com/apache/spark/pull/44745#issuecomment-1894707004 > It'd be great if we can at least googling and it only happens in pandas before merging this. Yeah, I googled when I submitting this PR, but unfortunately couldn't figure out any

Re: [PR] [SPARK-46735][PYTHON][TESTS] `pyspark.sql.tests.test_group` should skip Pandas/PyArrow tests if not available [spark]

2024-01-16 Thread via GitHub
zhengruifeng commented on PR #44759: URL: https://github.com/apache/spark/pull/44759#issuecomment-1894721157 Late LGTM -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To un

Re: [PR] [SPARK-46686][PYTHON][CONNECT] Basic support of SparkSession based Python UDF profiler [spark]

2024-01-16 Thread via GitHub
xinrong-meng commented on code in PR #44697: URL: https://github.com/apache/spark/pull/44697#discussion_r1454242920 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -2927,6 +2927,17 @@ object SQLConf { // show full stacktrace in tests but

Re: [PR] [WIP][SPARK-43221][CORE] the BlockManager with the persisted block is preferred [spark]

2024-01-16 Thread via GitHub
github-actions[bot] closed pull request #40883: [WIP][SPARK-43221][CORE] the BlockManager with the persisted block is preferred URL: https://github.com/apache/spark/pull/40883 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

Re: [PR] [SPARK-45369][SQL] Push down limit through generate [spark]

2024-01-16 Thread via GitHub
github-actions[bot] closed pull request #43167: [SPARK-45369][SQL] Push down limit through generate URL: https://github.com/apache/spark/pull/43167 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

[PR] [SPARK-46737][SQL][TESTS] Use the default ORC compression in OrcReadBenchmark [spark]

2024-01-16 Thread via GitHub
dongjoon-hyun opened a new pull request, #44761: URL: https://github.com/apache/spark/pull/44761 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this

Re: [PR] [SPARK-46737][SQL][TESTS] Use the default ORC compression in OrcReadBenchmark [spark]

2024-01-16 Thread via GitHub
dongjoon-hyun commented on PR #44761: URL: https://github.com/apache/spark/pull/44761#issuecomment-1894742540 I'll convert this into a normal PR after adding the benchmark result. Thank you, @HyukjinKwon ! -- This is an automated message from the Apache Git Service. To respond to the m

Re: [PR] [SPARK-46686][PYTHON][CONNECT] Basic support of SparkSession based Python UDF profiler [spark]

2024-01-16 Thread via GitHub
HyukjinKwon commented on code in PR #44697: URL: https://github.com/apache/spark/pull/44697#discussion_r1454293613 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -2927,6 +2927,17 @@ object SQLConf { // show full stacktrace in tests but h

Re: [PR] [SPARK-46686][PYTHON][CONNECT] Basic support of SparkSession based Python UDF profiler [spark]

2024-01-16 Thread via GitHub
HyukjinKwon commented on code in PR #44697: URL: https://github.com/apache/spark/pull/44697#discussion_r1454293613 ## sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala: ## @@ -2927,6 +2927,17 @@ object SQLConf { // show full stacktrace in tests but h

[PR] [SPARK-46715][INFRA][3.4] Pin `sphinxcontrib-*` [spark]

2024-01-16 Thread via GitHub
zhengruifeng opened a new pull request, #44762: URL: https://github.com/apache/spark/pull/44762 ### What changes were proposed in this pull request? Pin `sphinxcontrib-*` and other deps for doc ### Why are the changes needed? to fix CI ### Does this PR introduce _a

Re: [PR] [SPARK-46638][Python] Create Python UDTF API to acquire execution memory for function evaluation [spark]

2024-01-16 Thread via GitHub
ueshin commented on code in PR #44678: URL: https://github.com/apache/spark/pull/44678#discussion_r1454267605 ## python/pyspark/sql/udtf.py: ## @@ -133,12 +133,28 @@ class AnalyzeResult: If non-empty, this is a sequence of expressions that the UDTF is specifying for Ca

Re: [PR] [SPARK-46474][INFRA] Upgrade upload-artifact action to v4 [spark]

2024-01-16 Thread via GitHub
panbingkun commented on PR #44728: URL: https://github.com/apache/spark/pull/44728#issuecomment-1894805910 cc @HyukjinKwon @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

[PR] [SPARK-46732][CONNECT][3.5]Make Subquery/Broadcast thread work with Connect's artifact management [spark]

2024-01-16 Thread via GitHub
xieshuaihu opened a new pull request, #44763: URL: https://github.com/apache/spark/pull/44763 ### What changes were proposed in this pull request? Similar with SPARK-44794, propagate JobArtifactState to broadcast/subquery thread. This is an example: ```scala val add1

Re: [PR] [SPARK-46732][CONNECT][3.5]Make Subquery/Broadcast thread work with Connect's artifact management [spark]

2024-01-16 Thread via GitHub
xieshuaihu commented on PR #44763: URL: https://github.com/apache/spark/pull/44763#issuecomment-1894811373 @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] [SPARK-46732][CONNECT]Make Subquery/Broadcast thread work with Connect's artifact management [spark]

2024-01-16 Thread via GitHub
xieshuaihu commented on PR #44753: URL: https://github.com/apache/spark/pull/44753#issuecomment-1894811179 @HyukjinKwon Thanks. And a new backport pr has been created. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use th

Re: [PR] [SPARK-45433][SQL] Fix CSV/JSON schema inference when timestamps do not match specified timestampFormat [spark]

2024-01-16 Thread via GitHub
cloud-fan commented on code in PR #43243: URL: https://github.com/apache/spark/pull/43243#discussion_r1454349106 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVInferSchema.scala: ## @@ -202,8 +202,11 @@ class CSVInferSchema(val options: CSVOptions) extends

Re: [PR] [SPARK-46734][INFRA] Combine pip installations for lint and doc respectively [spark]

2024-01-16 Thread via GitHub
zhengruifeng commented on code in PR #44754: URL: https://github.com/apache/spark/pull/44754#discussion_r1454370205 ## .github/workflows/build_and_test.yml: ## @@ -702,8 +702,9 @@ jobs: - name: Install Python linter dependencies if: inputs.branch != 'branch-3.4' && i

Re: [PR] [SPARK-46474][INFRA] Upgrade upload-artifact action to v4 [spark]

2024-01-16 Thread via GitHub
HyukjinKwon commented on PR #44728: URL: https://github.com/apache/spark/pull/44728#issuecomment-1894834993 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [SPARK-46474][INFRA] Upgrade upload-artifact action to v4 [spark]

2024-01-16 Thread via GitHub
HyukjinKwon closed pull request #44728: [SPARK-46474][INFRA] Upgrade upload-artifact action to v4 URL: https://github.com/apache/spark/pull/44728 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-46474][INFRA] Upgrade upload-artifact action to v4 [spark]

2024-01-16 Thread via GitHub
panbingkun commented on PR #44728: URL: https://github.com/apache/spark/pull/44728#issuecomment-1894844737 Let's continue to observe. Additionally, I am trying to fix the `pypy3 test`, which involves a lot of files. -- This is an automated message from the Apache Git Service. To respond t

Re: [PR] [SPARK-46737][SQL][TESTS] Use the default ORC compression in OrcReadBenchmark [spark]

2024-01-16 Thread via GitHub
dongjoon-hyun closed pull request #44761: [SPARK-46737][SQL][TESTS] Use the default ORC compression in OrcReadBenchmark URL: https://github.com/apache/spark/pull/44761 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the U

Re: [PR] [SPARK-46737][SQL][TESTS] Use the default ORC compression in OrcReadBenchmark [spark]

2024-01-16 Thread via GitHub
dongjoon-hyun commented on PR #44761: URL: https://github.com/apache/spark/pull/44761#issuecomment-1894844415 > @dongjoon-hyun Just in case, the benchmark belongs to hive (at least it is in `sql/hive`). Which codec does hive use, and has it switched to zstandard already? Ah, you're r

Re: [PR] [SPARK-46728][PYTHON] Check Pandas installation properly [spark]

2024-01-16 Thread via GitHub
itholic commented on PR #44745: URL: https://github.com/apache/spark/pull/44745#issuecomment-1894850179 It seems like if there are extension packages that use parts of the package we're trying to remove, `pip uninstall` adds those dependencies to the `Would not remove` list and doesn't actu

Re: [PR] [SPARK-46612][SQL] Do not convert array type string retrieved from jdbc driver [spark]

2024-01-16 Thread via GitHub
yaooqinn closed pull request #44459: [SPARK-46612][SQL] Do not convert array type string retrieved from jdbc driver URL: https://github.com/apache/spark/pull/44459 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

Re: [PR] [SPARK-46728][PYTHON] Check Pandas installation properly [spark]

2024-01-16 Thread via GitHub
itholic commented on PR #44745: URL: https://github.com/apache/spark/pull/44745#issuecomment-1894859579 Updated PR description and comment accordingly. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

  1   2   >