Re: [PR] [SPARK-46547][SS] Swallow non-fatal exception in maintenance task to avoid deadlock between maintenance thread and streaming aggregation operator [spark]

2024-01-10 Thread via GitHub
HeartSaVioR commented on PR #44542: URL: https://github.com/apache/spark/pull/44542#issuecomment-1884371290 @rangadi Please read through my comments in above. Here are links for you: * https://github.com/apache/spark/pull/44542#issuecomment-1882621196 * https://github.com/ap

Re: [PR] [SPARK-46474][INFRA] Upgrade upload-artifact action to v4 [spark]

2024-01-10 Thread via GitHub
panbingkun commented on code in PR #44662: URL: https://github.com/apache/spark/pull/44662#discussion_r1446954812 ## .github/workflows/build_and_test.yml: ## @@ -468,15 +468,15 @@ jobs: name: PySpark - name: Upload test results to report if: always() -

Re: [PR] [SPARK-46635][PYTHON][DOCS] Refine docstring of `from_csv/schema_of_csv/to_csv` [spark]

2024-01-10 Thread via GitHub
LuciferYang closed pull request #44639: [SPARK-46635][PYTHON][DOCS] Refine docstring of `from_csv/schema_of_csv/to_csv` URL: https://github.com/apache/spark/pull/44639 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the U

Re: [PR] [SPARK-46635][PYTHON][DOCS] Refine docstring of `from_csv/schema_of_csv/to_csv` [spark]

2024-01-10 Thread via GitHub
LuciferYang commented on PR #44639: URL: https://github.com/apache/spark/pull/44639#issuecomment-1884419945 Merged into master. Thanks @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

Re: [PR] [SPARK-46650][CORE][SQL][YARN] Replace AtomicBoolean with volatile boolean [spark]

2024-01-10 Thread via GitHub
srowen commented on PR #44638: URL: https://github.com/apache/spark/pull/44638#issuecomment-1884463326 It's probably fine but does this improve anything? It's about the same -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] [SPARK-19335][SPARK-38200][SQL] Add upserts for writing to JDBC [spark]

2024-01-10 Thread via GitHub
jatin5251 commented on PR #41518: URL: https://github.com/apache/spark/pull/41518#issuecomment-1884526594 @beliefer can you please approve the PR? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] [SPARK-46650][CORE][SQL][YARN] Replace AtomicBoolean with volatile boolean [spark]

2024-01-10 Thread via GitHub
beliefer commented on PR #44638: URL: https://github.com/apache/spark/pull/44638#issuecomment-1884556786 > It's probably fine but does this improve anything? It's about the same Reduce a little overhead due to `AtomicBoolean's` `get` and `set` actually use volatile . -- This is an

Re: [PR] [SPARK-46652][SQL][TESTS] Remove `Snappy` from `TPCDSQueryBenchmark` benchmark case name [spark]

2024-01-10 Thread via GitHub
dongjoon-hyun closed pull request #44657: [SPARK-46652][SQL][TESTS] Remove `Snappy` from `TPCDSQueryBenchmark` benchmark case name URL: https://github.com/apache/spark/pull/44657 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub an

Re: [PR] [SPARK-46652][SQL][TESTS] Remove `Snappy` from `TPCDSQueryBenchmark` benchmark case name [spark]

2024-01-10 Thread via GitHub
dongjoon-hyun commented on PR #44657: URL: https://github.com/apache/spark/pull/44657#issuecomment-1884579868 Thank you all! Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-45593][BUILD] Building a runnable distribution from master code running spark-sql raise error [spark]

2024-01-10 Thread via GitHub
Yikf commented on code in PR #43436: URL: https://github.com/apache/spark/pull/43436#discussion_r1447182488 ## connector/connect/client/jvm/pom.xml: ## @@ -137,6 +137,10 @@ io.grpc.** + Review Comment: you are right,

Re: [PR] [SPARK-45593][BUILD] Building a runnable distribution from master code running spark-sql raise error [spark]

2024-01-10 Thread via GitHub
Yikf commented on code in PR #43436: URL: https://github.com/apache/spark/pull/43436#discussion_r1447185917 ## connector/connect/client/jvm/pom.xml: ## @@ -137,6 +137,10 @@ io.grpc.** + Review Comment: BTW, I also te

Re: [PR] [SPARK-46525][DOCKER][TESTS][FOLLOWUP] Fix docker-integration-tests on Apple Silicon for db2 and oracle with third-party docker environments [spark]

2024-01-10 Thread via GitHub
dongjoon-hyun commented on PR #44612: URL: https://github.com/apache/spark/pull/44612#issuecomment-1884589451 > I've noticed that this alternative was less stable than the official one. It failed from time to time. Please give me more time to test. Oh, thank you for spotting that. Sur

[PR] [SPARK-46656][PS][TESTS] Split `GroupbyParitySplitApplyTests` [spark]

2024-01-10 Thread via GitHub
zhengruifeng opened a new pull request, #44664: URL: https://github.com/apache/spark/pull/44664 ### What changes were proposed in this pull request? Split `GroupbyParitySplitApplyTests` ### Why are the changes needed? to testing parallelism this test normally takes

Re: [PR] [SPARK-46655][SQL] Skip query context catching in `DataFrame` methods [spark]

2024-01-10 Thread via GitHub
cloud-fan commented on code in PR #44501: URL: https://github.com/apache/spark/pull/44501#discussion_r1447280384 ## sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamSuite.scala: ## @@ -713,9 +713,7 @@ class StreamSuite extends StreamTest { "columnName" -> "`

Re: [PR] [SPARK-46655][SQL] Skip query context catching in `DataFrame` methods [spark]

2024-01-10 Thread via GitHub
cloud-fan commented on code in PR #44501: URL: https://github.com/apache/spark/pull/44501#discussion_r1447279164 ## sql/core/src/test/scala/org/apache/spark/sql/DataFrameSetOperationsSuite.scala: ## @@ -374,11 +374,7 @@ class DataFrameSetOperationsSuite extends QueryTest

[PR] [SPARK-46654][SQL] df.show() of pyspark displayed different results between Regular Spark and Spark Connect [spark]

2024-01-10 Thread via GitHub
panbingkun opened a new pull request, #44665: URL: https://github.com/apache/spark/pull/44665 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this

Re: [PR] [SPARK-46617][SQL] Create-table-if-not-exists should not silently overwrite existing tables [spark]

2024-01-10 Thread via GitHub
adrians commented on PR #44622: URL: https://github.com/apache/spark/pull/44622#issuecomment-1884833930 Hi @nchammas, thanks for the feedback! I've just added unit-tests to validate this kind of behavior. -- This is an automated message from the Apache Git Service. To respond to the mes

Re: [PR] [SPARK-46655][SQL] Skip query context catching in `DataFrame` methods [spark]

2024-01-10 Thread via GitHub
MaxGekk commented on code in PR #44501: URL: https://github.com/apache/spark/pull/44501#discussion_r1447423055 ## sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamSuite.scala: ## @@ -713,9 +713,7 @@ class StreamSuite extends StreamTest { "columnName" -> "`rn

Re: [PR] [SPARK-46655][SQL] Skip query context catching in `DataFrame` methods [spark]

2024-01-10 Thread via GitHub
MaxGekk commented on code in PR #44501: URL: https://github.com/apache/spark/pull/44501#discussion_r144730 ## sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamSuite.scala: ## @@ -713,9 +713,7 @@ class StreamSuite extends StreamTest { "columnName" -> "`rn

Re: [PR] [SPARK-46655][SQL] Skip query context catching in `DataFrame` methods [spark]

2024-01-10 Thread via GitHub
MaxGekk commented on code in PR #44501: URL: https://github.com/apache/spark/pull/44501#discussion_r1447446426 ## sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamSuite.scala: ## @@ -713,9 +713,7 @@ class StreamSuite extends StreamTest { "columnName" -> "`rn

Re: [PR] [SPARK-46547][SS] Swallow non-fatal exception in maintenance task to avoid deadlock between maintenance thread and streaming aggregation operator [spark]

2024-01-10 Thread via GitHub
HeartSaVioR commented on PR #44542: URL: https://github.com/apache/spark/pull/44542#issuecomment-1884933933 https://github.com/anishshri-db/spark/actions/runs/7471692378/job/20332452975 Failing module is unrelated. -- This is an automated message from the Apache Git Service. To respond

Re: [PR] [SPARK-46547][SS] Swallow non-fatal exception in maintenance task to avoid deadlock between maintenance thread and streaming aggregation operator [spark]

2024-01-10 Thread via GitHub
HeartSaVioR commented on PR #44542: URL: https://github.com/apache/spark/pull/44542#issuecomment-1884934131 Thanks! Merging to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-46547][SS] Swallow non-fatal exception in maintenance task to avoid deadlock between maintenance thread and streaming aggregation operator [spark]

2024-01-10 Thread via GitHub
HeartSaVioR closed pull request #44542: [SPARK-46547][SS] Swallow non-fatal exception in maintenance task to avoid deadlock between maintenance thread and streaming aggregation operator URL: https://github.com/apache/spark/pull/44542 -- This is an automated message from the Apache Git Servic

Re: [PR] [SPARK-46655][SQL] Skip query context catching in `DataFrame` methods [spark]

2024-01-10 Thread via GitHub
MaxGekk commented on code in PR #44501: URL: https://github.com/apache/spark/pull/44501#discussion_r1447477286 ## sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamSuite.scala: ## @@ -713,9 +713,7 @@ class StreamSuite extends StreamTest { "columnName" -> "`rn

Re: [PR] [SPARK-46655][SQL] Skip query context catching in `DataFrame` methods [spark]

2024-01-10 Thread via GitHub
cloud-fan commented on code in PR #44501: URL: https://github.com/apache/spark/pull/44501#discussion_r1447481635 ## sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamSuite.scala: ## @@ -713,9 +713,7 @@ class StreamSuite extends StreamTest { "columnName" -> "`

Re: [PR] [SPARK-46655][SQL] Skip query context catching in `DataFrame` methods [spark]

2024-01-10 Thread via GitHub
cloud-fan commented on PR #44501: URL: https://github.com/apache/spark/pull/44501#issuecomment-1885038313 @MaxGekk after a second thought, I feel that we don't need dataframe error context for analysis errors. The analysis is eager in dataframe, so people will know the dataframe call site a

Re: [PR] [SPARK-45022][SQL] Provide context for dataset API errors [spark]

2024-01-10 Thread via GitHub
ryan-johnson-databricks commented on code in PR #43334: URL: https://github.com/apache/spark/pull/43334#discussion_r1447541033 ## sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala: ## @@ -1572,7 +1589,9 @@ class Dataset[T] private[sql]( * @since 2.0.0 */ @sca

Re: [PR] [SPARK-45022][SQL] Provide context for dataset API errors [spark]

2024-01-10 Thread via GitHub
cloud-fan commented on code in PR #43334: URL: https://github.com/apache/spark/pull/43334#discussion_r1447601095 ## sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala: ## @@ -1572,7 +1589,9 @@ class Dataset[T] private[sql]( * @since 2.0.0 */ @scala.annotation.

Re: [PR] [SPARK-45022][SQL] Provide context for dataset API errors [spark]

2024-01-10 Thread via GitHub
ryan-johnson-databricks commented on code in PR #43334: URL: https://github.com/apache/spark/pull/43334#discussion_r1447667682 ## sql/core/src/main/scala/org/apache/spark/sql/package.scala: ## @@ -73,4 +76,41 @@ package object sql { * with rebasing. */ private[sql] va

Re: [PR] [SPARK-45022][SQL] Provide context for dataset API errors [spark]

2024-01-10 Thread via GitHub
MaxGekk commented on code in PR #43334: URL: https://github.com/apache/spark/pull/43334#discussion_r1447671645 ## sql/core/src/main/scala/org/apache/spark/sql/package.scala: ## @@ -73,4 +76,41 @@ package object sql { * with rebasing. */ private[sql] val SPARK_LEGACY_I

Re: [PR] [SPARK-46641][SS] Add maxBytesPerTrigger threshold [spark]

2024-01-10 Thread via GitHub
MaxNevermind commented on PR #44636: URL: https://github.com/apache/spark/pull/44636#issuecomment-1885273239 @viirya Fixed the issues, tests are passing now. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[PR] [SPARK-46657][INFRA] Install `lxml` in Python 3.12 [spark]

2024-01-10 Thread via GitHub
dongjoon-hyun opened a new pull request, #44666: URL: https://github.com/apache/spark/pull/44666 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### H

Re: [PR] [SPARK-42199][SQL] Fix issues around Dataset.groupByKey [spark]

2024-01-10 Thread via GitHub
EnricoMi commented on PR #39754: URL: https://github.com/apache/spark/pull/39754#issuecomment-1885446429 @cloud-fan @dongjoon-hyun are you interested in fixing these inconsistencies? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Gi

[PR] [SPARK-46658][DOCS] Loosen Ruby dependency specification [spark]

2024-01-10 Thread via GitHub
nchammas opened a new pull request, #44667: URL: https://github.com/apache/spark/pull/44667 ### What changes were proposed in this pull request? As [promised here][1], this change loosens our Ruby dependency specification so that Bundler can update transitive dependencies more easily.

[PR] Disable memory profiler for iterator UDFs [spark]

2024-01-10 Thread via GitHub
xinrong-meng opened a new pull request, #44668: URL: https://github.com/apache/spark/pull/44668 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### Ho

Re: [PR] [SPARK-46657][INFRA] Install `lxml` in Python 3.12 [spark]

2024-01-10 Thread via GitHub
dongjoon-hyun commented on PR #44666: URL: https://github.com/apache/spark/pull/44666#issuecomment-1885573439 All tests passed. Could you review this INFRA PR for Python 3.12 when you have some time, @viirya ? -- This is an automated message from the Apache Git Service. To respond to the

Re: [PR] [SPARK-46658][DOCS] Loosen Ruby dependency specification [spark]

2024-01-10 Thread via GitHub
nchammas commented on PR #44667: URL: https://github.com/apache/spark/pull/44667#issuecomment-1885577128 cc @HyukjinKwon @srowen - We should update the build images to include Ruby 3. https://github.com/apache/spark/blob/11ac856919815f7ef2e534e205d1ed83398de136/dev/create-release/spa

[PR] while talking with mohan [spark]

2024-01-10 Thread via GitHub
ramraghu474 opened a new pull request, #44669: URL: https://github.com/apache/spark/pull/44669 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

Re: [PR] while talking with mohan [spark]

2024-01-10 Thread via GitHub
ramraghu474 closed pull request #44669: while talking with mohan URL: https://github.com/apache/spark/pull/44669 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe,

[PR] [SPARK-46660][CONNECT] ReattachExecute requests updates aliveness of SessionHolder [spark]

2024-01-10 Thread via GitHub
vicennial opened a new pull request, #44670: URL: https://github.com/apache/spark/pull/44670 ### What changes were proposed in this pull request? This PR makes `SparkConnectReattachExecuteHandler` fetch the `ExecuteHolder` via the`SessionHolder` which in turn refreshes it's al

Re: [PR] [SPARK-46657][INFRA] Install `lxml` in Python 3.12 [spark]

2024-01-10 Thread via GitHub
dongjoon-hyun commented on PR #44666: URL: https://github.com/apache/spark/pull/44666#issuecomment-1885665640 Thank you, @viirya . Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] [SPARK-46657][INFRA] Install `lxml` in Python 3.12 [spark]

2024-01-10 Thread via GitHub
dongjoon-hyun closed pull request #44666: [SPARK-46657][INFRA] Install `lxml` in Python 3.12 URL: https://github.com/apache/spark/pull/44666 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

Re: [PR] [SPARK-46641][SS] Add maxBytesPerTrigger threshold [spark]

2024-01-10 Thread via GitHub
viirya commented on code in PR #44636: URL: https://github.com/apache/spark/pull/44636#discussion_r1447955436 ## sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSourceSuite.scala: ## @@ -1153,174 +1153,213 @@ class FileStreamSourceSuite extends FileStreamSource

[PR] [SPARK-46382][SQL] XML: Update doc for `ignoreSurroundingSpaces` [spark]

2024-01-10 Thread via GitHub
shujingyang-db opened a new pull request, #44671: URL: https://github.com/apache/spark/pull/44671 ### What changes were proposed in this pull request? Update doc for `ignoreSurroundingSpaces` ### Why are the changes needed? Be aligned with the implementation

Re: [PR] [SPARK-46641][SS] Add maxBytesPerTrigger threshold [spark]

2024-01-10 Thread via GitHub
viirya commented on code in PR #44636: URL: https://github.com/apache/spark/pull/44636#discussion_r1447960178 ## sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSourceSuite.scala: ## @@ -1777,6 +1817,19 @@ class FileStreamSourceSuite extends FileStreamSourceTest

Re: [PR] [SPARK-46641][SS] Add maxBytesPerTrigger threshold [spark]

2024-01-10 Thread via GitHub
viirya commented on code in PR #44636: URL: https://github.com/apache/spark/pull/44636#discussion_r1447960970 ## sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSourceSuite.scala: ## @@ -1777,6 +1817,19 @@ class FileStreamSourceSuite extends FileStreamSourceTest

[PR] [SPARK-46662][K8S] Upgrade `kubernetes-client` to 6.10.0 [spark]

2024-01-10 Thread via GitHub
bjornjorgensen opened a new pull request, #44672: URL: https://github.com/apache/spark/pull/44672 ### What changes were proposed in this pull request? Upgrade `kubernetes-client` from 6.9.1 to 6.10.0 [Release notes 6.10.0](https://github.com/fabric8io/kubernetes-client/releases/tag/v6.

Re: [PR] [SPARK-46640][SQL] Fix RemoveRedundantAlias by excluding subquery attributes [spark]

2024-01-10 Thread via GitHub
nikhilsheoran-db commented on PR #44645: URL: https://github.com/apache/spark/pull/44645#issuecomment-1885740741 @jchen5 @agubichev Can you please take a look? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL ab

Re: [PR] [SPARK-46640][SQL] Fix RemoveRedundantAlias by excluding subquery attributes [spark]

2024-01-10 Thread via GitHub
nikhilsheoran-db commented on PR #44645: URL: https://github.com/apache/spark/pull/44645#issuecomment-1885801958 @cloud-fan Can you take a look at this? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] [SPIP-IN-PROGRESS][DO-NOT-MERGE][SS] Add base support for new arbitrary state management operator, single valueState type, multiple state variables and underlying support for column families

2024-01-10 Thread via GitHub
ericm-db commented on code in PR #43961: URL: https://github.com/apache/spark/pull/43961#discussion_r1448001913 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StatefulProcessorHandleImpl.scala: ## @@ -0,0 +1,127 @@ +/* + * Licensed to the Apache Software Fou

Re: [PR] [SPARK-46664][CORE] Improve `Master` to recover quickly in case of zero workers and apps [spark]

2024-01-10 Thread via GitHub
mridulm commented on code in PR #44673: URL: https://github.com/apache/spark/pull/44673#discussion_r1448062743 ## core/src/main/scala/org/apache/spark/deploy/master/Master.scala: ## @@ -619,6 +619,9 @@ private[deploy] class Master( case e: Exception => logInfo("Worker "

Re: [PR] [SPARK-46664][CORE] Improve `Master` to recover quickly in case of zero workers and apps [spark]

2024-01-10 Thread via GitHub
mridulm commented on code in PR #44673: URL: https://github.com/apache/spark/pull/44673#discussion_r1448067149 ## core/src/main/scala/org/apache/spark/deploy/master/Master.scala: ## @@ -619,6 +619,9 @@ private[deploy] class Master( case e: Exception => logInfo("Worker "

Re: [PR] [SPARK-46664][CORE] Improve `Master` to recover quickly in case of zero workers and apps [spark]

2024-01-10 Thread via GitHub
dongjoon-hyun commented on code in PR #44673: URL: https://github.com/apache/spark/pull/44673#discussion_r1448072611 ## core/src/main/scala/org/apache/spark/deploy/master/Master.scala: ## @@ -619,6 +619,9 @@ private[deploy] class Master( case e: Exception => logInfo("Wo

Re: [PR] [SPARK-46664][CORE] Improve `Master` to recover quickly in case of zero workers and apps [spark]

2024-01-10 Thread via GitHub
dongjoon-hyun commented on PR #44673: URL: https://github.com/apache/spark/pull/44673#issuecomment-1885870489 Thank you for review, @mridulm . The PR is updated accordingly. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] [SPARK-46641][SS] Add maxBytesPerTrigger threshold [spark]

2024-01-10 Thread via GitHub
dongjoon-hyun commented on code in PR #44636: URL: https://github.com/apache/spark/pull/44636#discussion_r1448082584 ## docs/structured-streaming-programming-guide.md: ## @@ -561,6 +561,8 @@ Here are the details of all the sources in Spark. maxFilesPerTrigger:

Re: [PR] [SPARK-46383] Reduce Driver Heap Usage by Reducing the Lifespan of `TaskInfo.accumulables()` [spark]

2024-01-10 Thread via GitHub
mridulm commented on code in PR #44321: URL: https://github.com/apache/spark/pull/44321#discussion_r1448069167 ## core/src/test/scala/org/apache/spark/scheduler/SparkListenerSuite.scala: ## @@ -643,6 +657,29 @@ class SparkListenerSuite extends SparkFunSuite with LocalSparkConte

Re: [PR] [SPARK-46641][SS] Add maxBytesPerTrigger threshold [spark]

2024-01-10 Thread via GitHub
dongjoon-hyun commented on code in PR #44636: URL: https://github.com/apache/spark/pull/44636#discussion_r1448084120 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/streaming/ReadLimit.java: ## @@ -39,6 +39,8 @@ static ReadLimit minRows(long rows, long maxTrigg

Re: [PR] [SPARK-46641][SS] Add maxBytesPerTrigger threshold [spark]

2024-01-10 Thread via GitHub
dongjoon-hyun commented on code in PR #44636: URL: https://github.com/apache/spark/pull/44636#discussion_r1448084408 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/streaming/ReadMaxBytes.java: ## @@ -0,0 +1,56 @@ +/* + * Licensed to the Apache Software Foundat

Re: [PR] [SPARK-46641][SS] Add maxBytesPerTrigger threshold [spark]

2024-01-10 Thread via GitHub
dongjoon-hyun commented on code in PR #44636: URL: https://github.com/apache/spark/pull/44636#discussion_r1448086330 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FileStreamSource.scala: ## @@ -113,16 +117,32 @@ class FileStreamSource( // Visible for tes

Re: [PR] [SPARK-46641][SS] Add maxBytesPerTrigger threshold [spark]

2024-01-10 Thread via GitHub
dongjoon-hyun commented on code in PR #44636: URL: https://github.com/apache/spark/pull/44636#discussion_r1448087582 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FileStreamSource.scala: ## @@ -113,16 +117,32 @@ class FileStreamSource( // Visible for tes

Re: [PR] [SPARK-46641][SS] Add maxBytesPerTrigger threshold [spark]

2024-01-10 Thread via GitHub
dongjoon-hyun commented on code in PR #44636: URL: https://github.com/apache/spark/pull/44636#discussion_r1448095648 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/streaming/ReadMaxBytes.java: ## @@ -0,0 +1,56 @@ +/* + * Licensed to the Apache Software Foundat

Re: [PR] [SPARK-46641][SS] Add maxBytesPerTrigger threshold [spark]

2024-01-10 Thread via GitHub
dongjoon-hyun commented on code in PR #44636: URL: https://github.com/apache/spark/pull/44636#discussion_r1448099879 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FileStreamSource.scala: ## @@ -379,6 +418,9 @@ object FileStreamSource { def sparkPath: S

Re: [PR] [SPARK-46641][SS] Add maxBytesPerTrigger threshold [spark]

2024-01-10 Thread via GitHub
dongjoon-hyun commented on code in PR #44636: URL: https://github.com/apache/spark/pull/44636#discussion_r1448101899 ## sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSourceSuite.scala: ## @@ -1153,174 +1153,213 @@ class FileStreamSourceSuite extends FileStrea

Re: [PR] [SPARK-46641][SS] Add maxBytesPerTrigger threshold [spark]

2024-01-10 Thread via GitHub
dongjoon-hyun commented on code in PR #44636: URL: https://github.com/apache/spark/pull/44636#discussion_r1448101682 ## sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSourceSuite.scala: ## @@ -1153,174 +1153,213 @@ class FileStreamSourceSuite extends FileStrea

Re: [PR] [SPARK-46641][SS] Add maxBytesPerTrigger threshold [spark]

2024-01-10 Thread via GitHub
dongjoon-hyun commented on code in PR #44636: URL: https://github.com/apache/spark/pull/44636#discussion_r1448102251 ## sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSourceSuite.scala: ## @@ -1153,174 +1153,213 @@ class FileStreamSourceSuite extends FileStrea

Re: [PR] [SPARK-46641][SS] Add maxBytesPerTrigger threshold [spark]

2024-01-10 Thread via GitHub
dongjoon-hyun commented on code in PR #44636: URL: https://github.com/apache/spark/pull/44636#discussion_r1448102701 ## sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSourceSuite.scala: ## @@ -1153,174 +1153,213 @@ class FileStreamSourceSuite extends FileStrea

Re: [PR] [SPARK-46664][CORE] Improve `Master` to recover quickly in case of zero workers and apps [spark]

2024-01-10 Thread via GitHub
dongjoon-hyun commented on PR #44673: URL: https://github.com/apache/spark/pull/44673#issuecomment-1885919098 Thank you, @mridulm . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific co

Re: [PR] [SPARK-46383] Reduce Driver Heap Usage by Reducing the Lifespan of `TaskInfo.accumulables()` [spark]

2024-01-10 Thread via GitHub
utkarsh39 commented on code in PR #44321: URL: https://github.com/apache/spark/pull/44321#discussion_r1448110832 ## core/src/test/scala/org/apache/spark/scheduler/SparkListenerSuite.scala: ## @@ -289,6 +290,17 @@ class SparkListenerSuite extends SparkFunSuite with LocalSparkCon

Re: [PR] [SPARK-46474][INFRA] Upgrade upload-artifact action to v4 [spark]

2024-01-10 Thread via GitHub
HyukjinKwon commented on PR #44662: URL: https://github.com/apache/spark/pull/44662#issuecomment-1885939496 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [SPARK-46474][INFRA] Upgrade upload-artifact action to v4 [spark]

2024-01-10 Thread via GitHub
HyukjinKwon closed pull request #44662: [SPARK-46474][INFRA] Upgrade upload-artifact action to v4 URL: https://github.com/apache/spark/pull/44662 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [WIP][DO-NOT-MERGE] Upgrade rocksjni to 8.9.1 [spark]

2024-01-10 Thread via GitHub
neilramaswamy closed pull request #44674: [WIP][DO-NOT-MERGE] Upgrade rocksjni to 8.9.1 URL: https://github.com/apache/spark/pull/44674 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

Re: [PR] [WIP][DO-NOT-MERGE] Upgrade rocksjni to 8.9.1 [spark]

2024-01-10 Thread via GitHub
neilramaswamy commented on PR #44674: URL: https://github.com/apache/spark/pull/44674#issuecomment-1885941820 Oops, closing as there is existing work in SPARK-45110 to upgrade to 8.8.1, which is already doing plenty of validation. -- This is an automated message from the Apache Git Servic

[PR] [SPARK-46665][PYTHON] Remove Pandas dependency for `pyspark.testing` [spark]

2024-01-10 Thread via GitHub
itholic opened a new pull request, #44675: URL: https://github.com/apache/spark/pull/44675 ### What changes were proposed in this pull request? This PR proposes to remove Pandas dependency for `pyspark.testing`. ### Why are the changes needed? `pyspark.testing.assertDataF

Re: [PR] [SPARK-46665][PYTHON] Remove Pandas dependency for `pyspark.testing` [spark]

2024-01-10 Thread via GitHub
HyukjinKwon commented on code in PR #44675: URL: https://github.com/apache/spark/pull/44675#discussion_r1448120281 ## python/pyspark/testing/__init__.py: ## @@ -16,6 +16,11 @@ # from pyspark.testing.utils import assertDataFrameEqual, assertSchemaEqual -from pyspark.testing.p

[PR] [SPARK-46666][PYTHON][TESTS] Make lxml as an optional testing dependency in test_session [spark]

2024-01-10 Thread via GitHub
HyukjinKwon opened a new pull request, #44676: URL: https://github.com/apache/spark/pull/44676 ### What changes were proposed in this pull request? This PR proposes to make `lxml` as an optional testing dependency in the `test_session` test ### Why are the changes needed?

Re: [PR] [SPARK-46666][PYTHON][TESTS] Make lxml as an optional testing dependency in test_session [spark]

2024-01-10 Thread via GitHub
HyukjinKwon commented on PR #44676: URL: https://github.com/apache/spark/pull/44676#issuecomment-1885958435 build: https://github.com/HyukjinKwon/spark/actions/runs/7482270214 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub an

Re: [PR] [SPARK-46658][DOCS] Loosen Ruby dependency specification [spark]

2024-01-10 Thread via GitHub
HyukjinKwon commented on PR #44667: URL: https://github.com/apache/spark/pull/44667#issuecomment-1885961407 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [SPARK-46658][DOCS] Loosen Ruby dependency specification [spark]

2024-01-10 Thread via GitHub
HyukjinKwon closed pull request #44667: [SPARK-46658][DOCS] Loosen Ruby dependency specification URL: https://github.com/apache/spark/pull/44667 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

Re: [PR] [SPARK-46382][SQL] XML: Update doc for `ignoreSurroundingSpaces` [spark]

2024-01-10 Thread via GitHub
HyukjinKwon commented on PR #44671: URL: https://github.com/apache/spark/pull/44671#issuecomment-1885962408 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [SPARK-46382][SQL] XML: Update doc for `ignoreSurroundingSpaces` [spark]

2024-01-10 Thread via GitHub
HyukjinKwon closed pull request #44671: [SPARK-46382][SQL] XML: Update doc for `ignoreSurroundingSpaces` URL: https://github.com/apache/spark/pull/44671 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] [SPARK-45134][SHUFFLE] Avoid repeated fallback when failed to fetch remote push-merged block meta [spark]

2024-01-10 Thread via GitHub
github-actions[bot] closed pull request #43004: [SPARK-45134][SHUFFLE] Avoid repeated fallback when failed to fetch remote push-merged block meta URL: https://github.com/apache/spark/pull/43004 -- This is an automated message from the Apache Git Service. To respond to the message, please log

Re: [PR] [SPARK-46641][SS] Add maxBytesPerTrigger threshold [spark]

2024-01-10 Thread via GitHub
viirya commented on code in PR #44636: URL: https://github.com/apache/spark/pull/44636#discussion_r1448132991 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/streaming/ReadLimit.java: ## @@ -39,6 +39,8 @@ static ReadLimit minRows(long rows, long maxTriggerDelay

[PR] [WIP] Multiple input stream test [spark]

2024-01-10 Thread via GitHub
ericm-db opened a new pull request, #44677: URL: https://github.com/apache/spark/pull/44677 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How wa

Re: [PR] [SPARK-46662][K8S][BUILD] Upgrade `kubernetes-client` to 6.10.0 [spark]

2024-01-10 Thread via GitHub
dongjoon-hyun closed pull request #44672: [SPARK-46662][K8S][BUILD] Upgrade `kubernetes-client` to 6.10.0 URL: https://github.com/apache/spark/pull/44672 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] [WIP] Multiple input stream test [spark]

2024-01-10 Thread via GitHub
ericm-db closed pull request #44677: [WIP] Multiple input stream test URL: https://github.com/apache/spark/pull/44677 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubsc

Re: [PR] [SPARK-46664][CORE] Improve `Master` to recover quickly in case of zero workers and apps [spark]

2024-01-10 Thread via GitHub
dongjoon-hyun closed pull request #44673: [SPARK-46664][CORE] Improve `Master` to recover quickly in case of zero workers and apps URL: https://github.com/apache/spark/pull/44673 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub an

[PR] [SPARK-46638][Python] Create Python UDTF API to acquire execution memory for 'eval' and 'terminate' methods [spark]

2024-01-10 Thread via GitHub
dtenedor opened a new pull request, #44678: URL: https://github.com/apache/spark/pull/44678 ### What changes were proposed in this pull request? This PR creates a Python UDTF API to acquire execution memory for 'eval' and 'terminate' methods. For example, this UDTF accepts an a

Re: [PR] [SPARK-46638][Python] Create Python UDTF API to acquire execution memory for 'eval' and 'terminate' methods [spark]

2024-01-10 Thread via GitHub
dtenedor commented on PR #44678: URL: https://github.com/apache/spark/pull/44678#issuecomment-1886024403 cc @ueshin @allisonwang-db -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

Re: [PR] [SPARK-46665][PYTHON] Remove Pandas dependency for `pyspark.testing` [spark]

2024-01-10 Thread via GitHub
itholic commented on code in PR #44675: URL: https://github.com/apache/spark/pull/44675#discussion_r1448175750 ## python/pyspark/testing/__init__.py: ## @@ -16,6 +16,11 @@ # from pyspark.testing.utils import assertDataFrameEqual, assertSchemaEqual -from pyspark.testing.panda

Re: [PR] [SPARK-46052][CORE] Remove function TaskScheduler.killAllTaskAttempts [spark]

2024-01-10 Thread via GitHub
Ngone51 commented on code in PR #43954: URL: https://github.com/apache/spark/pull/43954#discussion_r1448181728 ## core/src/test/scala/org/apache/spark/TempLocalSparkContext.scala: ## @@ -51,7 +51,7 @@ trait TempLocalSparkContext extends BeforeAndAfterEach */ def sc: Spark

Re: [PR] [SPARK-46052][CORE] Remove function TaskScheduler.killAllTaskAttempts [spark]

2024-01-10 Thread via GitHub
Ngone51 commented on code in PR #43954: URL: https://github.com/apache/spark/pull/43954#discussion_r1448192078 ## core/src/main/scala/org/apache/spark/scheduler/TaskScheduler.scala: ## @@ -54,7 +54,7 @@ private[spark] trait TaskScheduler { // Submit a sequence of tasks to run

Re: [PR] [SPARK-46654][SQL] Make to_csv can correctly display complex types data [spark]

2024-01-10 Thread via GitHub
panbingkun commented on PR #44665: URL: https://github.com/apache/spark/pull/44665#issuecomment-1886085064 cc @cloud-fan @LuciferYang -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-46650][CORE][SQL][YARN] Replace AtomicBoolean with volatile boolean [spark]

2024-01-10 Thread via GitHub
beliefer closed pull request #44638: [SPARK-46650][CORE][SQL][YARN] Replace AtomicBoolean with volatile boolean URL: https://github.com/apache/spark/pull/44638 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] [SPARK-46575][SQL][HIVE] Make HiveThriftServer2.startWithContext DevelopApi retriable and fix flakiness of ThriftServerWithSparkContextInHttpSuite [spark]

2024-01-10 Thread via GitHub
yaooqinn commented on code in PR #44575: URL: https://github.com/apache/spark/pull/44575#discussion_r1448201533 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala: ## @@ -201,6 +201,16 @@ private[spark] object HiveUtils extends Logging { .booleanConf

Re: [PR] try to fix upload [spark]

2024-01-10 Thread via GitHub
panbingkun closed pull request #44653: try to fix upload URL: https://github.com/apache/spark/pull/44653 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

Re: [PR] [SPARK-46052][CORE] Remove function TaskScheduler.killAllTaskAttempts [spark]

2024-01-10 Thread via GitHub
cloud-fan commented on code in PR #43954: URL: https://github.com/apache/spark/pull/43954#discussion_r1448202496 ## core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala: ## @@ -296,18 +296,31 @@ private[spark] class TaskSchedulerImpl( new TaskSetManager(th

Re: [PR] [WIP][SQL][CONNECT] Resolve inappropriate use of AtomicInteger [spark]

2024-01-10 Thread via GitHub
beliefer closed pull request #44659: [WIP][SQL][CONNECT] Resolve inappropriate use of AtomicInteger URL: https://github.com/apache/spark/pull/44659 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

Re: [PR] [SPARK-46052][CORE] Remove function TaskScheduler.killAllTaskAttempts [spark]

2024-01-10 Thread via GitHub
cloud-fan commented on code in PR #43954: URL: https://github.com/apache/spark/pull/43954#discussion_r1448202635 ## core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala: ## @@ -1002,6 +1002,17 @@ private[spark] class TaskSetManager( maybeFinishTaskSet() }

Re: [PR] [SPARK-46052][CORE] Remove function TaskScheduler.killAllTaskAttempts [spark]

2024-01-10 Thread via GitHub
cloud-fan commented on code in PR #43954: URL: https://github.com/apache/spark/pull/43954#discussion_r1448203237 ## core/src/main/scala/org/apache/spark/scheduler/TaskScheduler.scala: ## @@ -54,9 +54,9 @@ private[spark] trait TaskScheduler { // Submit a sequence of tasks to r

Re: [PR] [SPARK-46575][SQL][HIVE] Make HiveThriftServer2.startWithContext DevelopApi retriable and fix flakiness of ThriftServerWithSparkContextInHttpSuite [spark]

2024-01-10 Thread via GitHub
cloud-fan commented on code in PR #44575: URL: https://github.com/apache/spark/pull/44575#discussion_r1448204963 ## sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/SharedThriftServer.scala: ## @@ -138,10 +138,15 @@ trait SharedThriftServer extends Sha

  1   2   >