date:20240110

Re: [PR] [SPARK-46547][SS] Swallow non-fatal exception in maintenance task to avoid deadlock between maintenance thread and streaming aggregation operator [spark]

2024-01-10 Thread via GitHub

HeartSaVioR commented on PR #44542: URL: https://github.com/apache/spark/pull/44542#issuecomment-1884371290 @rangadi Please read through my comments in above. Here are links for you: * https://github.com/apache/spark/pull/44542#issuecomment-1882621196 * https://github.com/ap

Re: [PR] [SPARK-46474][INFRA] Upgrade upload-artifact action to v4 [spark]

2024-01-10 Thread via GitHub

panbingkun commented on code in PR #44662: URL: https://github.com/apache/spark/pull/44662#discussion_r1446954812 ## .github/workflows/build_and_test.yml: ## @@ -468,15 +468,15 @@ jobs: name: PySpark - name: Upload test results to report if: always() -

Re: [PR] [SPARK-46635][PYTHON][DOCS] Refine docstring of `from_csv/schema_of_csv/to_csv` [spark]

2024-01-10 Thread via GitHub

LuciferYang closed pull request #44639: [SPARK-46635][PYTHON][DOCS] Refine docstring of `from_csv/schema_of_csv/to_csv` URL: https://github.com/apache/spark/pull/44639 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the U

Re: [PR] [SPARK-46635][PYTHON][DOCS] Refine docstring of `from_csv/schema_of_csv/to_csv` [spark]

2024-01-10 Thread via GitHub

LuciferYang commented on PR #44639: URL: https://github.com/apache/spark/pull/44639#issuecomment-1884419945 Merged into master. Thanks @HyukjinKwon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

Re: [PR] [SPARK-46650][CORE][SQL][YARN] Replace AtomicBoolean with volatile boolean [spark]

2024-01-10 Thread via GitHub

srowen commented on PR #44638: URL: https://github.com/apache/spark/pull/44638#issuecomment-1884463326 It's probably fine but does this improve anything? It's about the same -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] [SPARK-19335][SPARK-38200][SQL] Add upserts for writing to JDBC [spark]

2024-01-10 Thread via GitHub

jatin5251 commented on PR #41518: URL: https://github.com/apache/spark/pull/41518#issuecomment-1884526594 @beliefer can you please approve the PR? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] [SPARK-46650][CORE][SQL][YARN] Replace AtomicBoolean with volatile boolean [spark]

2024-01-10 Thread via GitHub

beliefer commented on PR #44638: URL: https://github.com/apache/spark/pull/44638#issuecomment-1884556786 > It's probably fine but does this improve anything? It's about the same Reduce a little overhead due to `AtomicBoolean's` `get` and `set` actually use volatile . -- This is an

Re: [PR] [SPARK-46652][SQL][TESTS] Remove `Snappy` from `TPCDSQueryBenchmark` benchmark case name [spark]

2024-01-10 Thread via GitHub

dongjoon-hyun closed pull request #44657: [SPARK-46652][SQL][TESTS] Remove `Snappy` from `TPCDSQueryBenchmark` benchmark case name URL: https://github.com/apache/spark/pull/44657 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub an

Re: [PR] [SPARK-46652][SQL][TESTS] Remove `Snappy` from `TPCDSQueryBenchmark` benchmark case name [spark]

2024-01-10 Thread via GitHub

dongjoon-hyun commented on PR #44657: URL: https://github.com/apache/spark/pull/44657#issuecomment-1884579868 Thank you all! Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [SPARK-45593][BUILD] Building a runnable distribution from master code running spark-sql raise error [spark]

2024-01-10 Thread via GitHub

Yikf commented on code in PR #43436: URL: https://github.com/apache/spark/pull/43436#discussion_r1447182488 ## connector/connect/client/jvm/pom.xml: ## @@ -137,6 +137,10 @@ io.grpc.** + Review Comment: you are right,

Re: [PR] [SPARK-45593][BUILD] Building a runnable distribution from master code running spark-sql raise error [spark]

2024-01-10 Thread via GitHub

Yikf commented on code in PR #43436: URL: https://github.com/apache/spark/pull/43436#discussion_r1447185917 ## connector/connect/client/jvm/pom.xml: ## @@ -137,6 +137,10 @@ io.grpc.** + Review Comment: BTW, I also te

Re: [PR] [SPARK-46525][DOCKER][TESTS][FOLLOWUP] Fix docker-integration-tests on Apple Silicon for db2 and oracle with third-party docker environments [spark]

2024-01-10 Thread via GitHub

dongjoon-hyun commented on PR #44612: URL: https://github.com/apache/spark/pull/44612#issuecomment-1884589451 > I've noticed that this alternative was less stable than the official one. It failed from time to time. Please give me more time to test. Oh, thank you for spotting that. Sur

[PR] [SPARK-46656][PS][TESTS] Split `GroupbyParitySplitApplyTests` [spark]

2024-01-10 Thread via GitHub

zhengruifeng opened a new pull request, #44664: URL: https://github.com/apache/spark/pull/44664 ### What changes were proposed in this pull request? Split `GroupbyParitySplitApplyTests` ### Why are the changes needed? to testing parallelism this test normally takes

Re: [PR] [SPARK-46655][SQL] Skip query context catching in `DataFrame` methods [spark]

2024-01-10 Thread via GitHub

cloud-fan commented on code in PR #44501: URL: https://github.com/apache/spark/pull/44501#discussion_r1447280384 ## sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamSuite.scala: ## @@ -713,9 +713,7 @@ class StreamSuite extends StreamTest { "columnName" -> "`

Re: [PR] [SPARK-46655][SQL] Skip query context catching in `DataFrame` methods [spark]

2024-01-10 Thread via GitHub

cloud-fan commented on code in PR #44501: URL: https://github.com/apache/spark/pull/44501#discussion_r1447279164 ## sql/core/src/test/scala/org/apache/spark/sql/DataFrameSetOperationsSuite.scala: ## @@ -374,11 +374,7 @@ class DataFrameSetOperationsSuite extends QueryTest

[PR] [SPARK-46654][SQL] df.show() of pyspark displayed different results between Regular Spark and Spark Connect [spark]

2024-01-10 Thread via GitHub

panbingkun opened a new pull request, #44665: URL: https://github.com/apache/spark/pull/44665 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this

Re: [PR] [SPARK-46617][SQL] Create-table-if-not-exists should not silently overwrite existing tables [spark]

2024-01-10 Thread via GitHub

adrians commented on PR #44622: URL: https://github.com/apache/spark/pull/44622#issuecomment-1884833930 Hi @nchammas, thanks for the feedback! I've just added unit-tests to validate this kind of behavior. -- This is an automated message from the Apache Git Service. To respond to the mes

Re: [PR] [SPARK-46655][SQL] Skip query context catching in `DataFrame` methods [spark]

2024-01-10 Thread via GitHub

MaxGekk commented on code in PR #44501: URL: https://github.com/apache/spark/pull/44501#discussion_r1447423055 ## sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamSuite.scala: ## @@ -713,9 +713,7 @@ class StreamSuite extends StreamTest { "columnName" -> "`rn

Re: [PR] [SPARK-46655][SQL] Skip query context catching in `DataFrame` methods [spark]

2024-01-10 Thread via GitHub

MaxGekk commented on code in PR #44501: URL: https://github.com/apache/spark/pull/44501#discussion_r144730 ## sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamSuite.scala: ## @@ -713,9 +713,7 @@ class StreamSuite extends StreamTest { "columnName" -> "`rn

Re: [PR] [SPARK-46655][SQL] Skip query context catching in `DataFrame` methods [spark]

2024-01-10 Thread via GitHub

MaxGekk commented on code in PR #44501: URL: https://github.com/apache/spark/pull/44501#discussion_r1447446426 ## sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamSuite.scala: ## @@ -713,9 +713,7 @@ class StreamSuite extends StreamTest { "columnName" -> "`rn

Re: [PR] [SPARK-46547][SS] Swallow non-fatal exception in maintenance task to avoid deadlock between maintenance thread and streaming aggregation operator [spark]

2024-01-10 Thread via GitHub

HeartSaVioR commented on PR #44542: URL: https://github.com/apache/spark/pull/44542#issuecomment-1884933933 https://github.com/anishshri-db/spark/actions/runs/7471692378/job/20332452975 Failing module is unrelated. -- This is an automated message from the Apache Git Service. To respond

Re: [PR] [SPARK-46547][SS] Swallow non-fatal exception in maintenance task to avoid deadlock between maintenance thread and streaming aggregation operator [spark]

2024-01-10 Thread via GitHub

HeartSaVioR commented on PR #44542: URL: https://github.com/apache/spark/pull/44542#issuecomment-1884934131 Thanks! Merging to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-46547][SS] Swallow non-fatal exception in maintenance task to avoid deadlock between maintenance thread and streaming aggregation operator [spark]

2024-01-10 Thread via GitHub

HeartSaVioR closed pull request #44542: [SPARK-46547][SS] Swallow non-fatal exception in maintenance task to avoid deadlock between maintenance thread and streaming aggregation operator URL: https://github.com/apache/spark/pull/44542 -- This is an automated message from the Apache Git Servic

Re: [PR] [SPARK-46655][SQL] Skip query context catching in `DataFrame` methods [spark]

2024-01-10 Thread via GitHub

MaxGekk commented on code in PR #44501: URL: https://github.com/apache/spark/pull/44501#discussion_r1447477286 ## sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamSuite.scala: ## @@ -713,9 +713,7 @@ class StreamSuite extends StreamTest { "columnName" -> "`rn

Re: [PR] [SPARK-46655][SQL] Skip query context catching in `DataFrame` methods [spark]

2024-01-10 Thread via GitHub

cloud-fan commented on code in PR #44501: URL: https://github.com/apache/spark/pull/44501#discussion_r1447481635 ## sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamSuite.scala: ## @@ -713,9 +713,7 @@ class StreamSuite extends StreamTest { "columnName" -> "`

Re: [PR] [SPARK-46655][SQL] Skip query context catching in `DataFrame` methods [spark]

2024-01-10 Thread via GitHub

cloud-fan commented on PR #44501: URL: https://github.com/apache/spark/pull/44501#issuecomment-1885038313 @MaxGekk after a second thought, I feel that we don't need dataframe error context for analysis errors. The analysis is eager in dataframe, so people will know the dataframe call site a

Re: [PR] [SPARK-45022][SQL] Provide context for dataset API errors [spark]

2024-01-10 Thread via GitHub

ryan-johnson-databricks commented on code in PR #43334: URL: https://github.com/apache/spark/pull/43334#discussion_r1447541033 ## sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala: ## @@ -1572,7 +1589,9 @@ class Dataset[T] private[sql]( * @since 2.0.0 */ @sca

Re: [PR] [SPARK-45022][SQL] Provide context for dataset API errors [spark]

2024-01-10 Thread via GitHub

cloud-fan commented on code in PR #43334: URL: https://github.com/apache/spark/pull/43334#discussion_r1447601095 ## sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala: ## @@ -1572,7 +1589,9 @@ class Dataset[T] private[sql]( * @since 2.0.0 */ @scala.annotation.

Re: [PR] [SPARK-45022][SQL] Provide context for dataset API errors [spark]

2024-01-10 Thread via GitHub

ryan-johnson-databricks commented on code in PR #43334: URL: https://github.com/apache/spark/pull/43334#discussion_r1447667682 ## sql/core/src/main/scala/org/apache/spark/sql/package.scala: ## @@ -73,4 +76,41 @@ package object sql { * with rebasing. */ private[sql] va

Re: [PR] [SPARK-45022][SQL] Provide context for dataset API errors [spark]

2024-01-10 Thread via GitHub

MaxGekk commented on code in PR #43334: URL: https://github.com/apache/spark/pull/43334#discussion_r1447671645 ## sql/core/src/main/scala/org/apache/spark/sql/package.scala: ## @@ -73,4 +76,41 @@ package object sql { * with rebasing. */ private[sql] val SPARK_LEGACY_I

Re: [PR] [SPARK-46641][SS] Add maxBytesPerTrigger threshold [spark]

2024-01-10 Thread via GitHub

MaxNevermind commented on PR #44636: URL: https://github.com/apache/spark/pull/44636#issuecomment-1885273239 @viirya Fixed the issues, tests are passing now. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[PR] [SPARK-46657][INFRA] Install `lxml` in Python 3.12 [spark]

2024-01-10 Thread via GitHub

dongjoon-hyun opened a new pull request, #44666: URL: https://github.com/apache/spark/pull/44666 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### H

Re: [PR] [SPARK-42199][SQL] Fix issues around Dataset.groupByKey [spark]

2024-01-10 Thread via GitHub

EnricoMi commented on PR #39754: URL: https://github.com/apache/spark/pull/39754#issuecomment-1885446429 @cloud-fan @dongjoon-hyun are you interested in fixing these inconsistencies? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Gi

[PR] [SPARK-46658][DOCS] Loosen Ruby dependency specification [spark]

2024-01-10 Thread via GitHub

nchammas opened a new pull request, #44667: URL: https://github.com/apache/spark/pull/44667 ### What changes were proposed in this pull request? As [promised here][1], this change loosens our Ruby dependency specification so that Bundler can update transitive dependencies more easily.

[PR] Disable memory profiler for iterator UDFs [spark]

2024-01-10 Thread via GitHub

xinrong-meng opened a new pull request, #44668: URL: https://github.com/apache/spark/pull/44668 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### Ho

Re: [PR] [SPARK-46657][INFRA] Install `lxml` in Python 3.12 [spark]

2024-01-10 Thread via GitHub

dongjoon-hyun commented on PR #44666: URL: https://github.com/apache/spark/pull/44666#issuecomment-1885573439 All tests passed. Could you review this INFRA PR for Python 3.12 when you have some time, @viirya ? -- This is an automated message from the Apache Git Service. To respond to the

Re: [PR] [SPARK-46658][DOCS] Loosen Ruby dependency specification [spark]

2024-01-10 Thread via GitHub

nchammas commented on PR #44667: URL: https://github.com/apache/spark/pull/44667#issuecomment-1885577128 cc @HyukjinKwon @srowen - We should update the build images to include Ruby 3. https://github.com/apache/spark/blob/11ac856919815f7ef2e534e205d1ed83398de136/dev/create-release/spa

[PR] while talking with mohan [spark]

2024-01-10 Thread via GitHub

ramraghu474 opened a new pull request, #44669: URL: https://github.com/apache/spark/pull/44669 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How

Re: [PR] while talking with mohan [spark]

2024-01-10 Thread via GitHub

ramraghu474 closed pull request #44669: while talking with mohan URL: https://github.com/apache/spark/pull/44669 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe,

[PR] [SPARK-46660][CONNECT] ReattachExecute requests updates aliveness of SessionHolder [spark]

2024-01-10 Thread via GitHub

vicennial opened a new pull request, #44670: URL: https://github.com/apache/spark/pull/44670 ### What changes were proposed in this pull request? This PR makes `SparkConnectReattachExecuteHandler` fetch the `ExecuteHolder` via the`SessionHolder` which in turn refreshes it's al

Re: [PR] [SPARK-46657][INFRA] Install `lxml` in Python 3.12 [spark]

2024-01-10 Thread via GitHub

dongjoon-hyun commented on PR #44666: URL: https://github.com/apache/spark/pull/44666#issuecomment-1885665640 Thank you, @viirya . Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] [SPARK-46657][INFRA] Install `lxml` in Python 3.12 [spark]

2024-01-10 Thread via GitHub

dongjoon-hyun closed pull request #44666: [SPARK-46657][INFRA] Install `lxml` in Python 3.12 URL: https://github.com/apache/spark/pull/44666 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

Re: [PR] [SPARK-46641][SS] Add maxBytesPerTrigger threshold [spark]

2024-01-10 Thread via GitHub

viirya commented on code in PR #44636: URL: https://github.com/apache/spark/pull/44636#discussion_r1447955436 ## sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSourceSuite.scala: ## @@ -1153,174 +1153,213 @@ class FileStreamSourceSuite extends FileStreamSource

[PR] [SPARK-46382][SQL] XML: Update doc for `ignoreSurroundingSpaces` [spark]

2024-01-10 Thread via GitHub

shujingyang-db opened a new pull request, #44671: URL: https://github.com/apache/spark/pull/44671 ### What changes were proposed in this pull request? Update doc for `ignoreSurroundingSpaces` ### Why are the changes needed? Be aligned with the implementation

Re: [PR] [SPARK-46641][SS] Add maxBytesPerTrigger threshold [spark]

2024-01-10 Thread via GitHub

viirya commented on code in PR #44636: URL: https://github.com/apache/spark/pull/44636#discussion_r1447960178 ## sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSourceSuite.scala: ## @@ -1777,6 +1817,19 @@ class FileStreamSourceSuite extends FileStreamSourceTest

Re: [PR] [SPARK-46641][SS] Add maxBytesPerTrigger threshold [spark]

2024-01-10 Thread via GitHub

viirya commented on code in PR #44636: URL: https://github.com/apache/spark/pull/44636#discussion_r1447960970 ## sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSourceSuite.scala: ## @@ -1777,6 +1817,19 @@ class FileStreamSourceSuite extends FileStreamSourceTest

[PR] [SPARK-46662][K8S] Upgrade `kubernetes-client` to 6.10.0 [spark]

2024-01-10 Thread via GitHub

bjornjorgensen opened a new pull request, #44672: URL: https://github.com/apache/spark/pull/44672 ### What changes were proposed in this pull request? Upgrade `kubernetes-client` from 6.9.1 to 6.10.0 [Release notes 6.10.0](https://github.com/fabric8io/kubernetes-client/releases/tag/v6.

Re: [PR] [SPARK-46640][SQL] Fix RemoveRedundantAlias by excluding subquery attributes [spark]

2024-01-10 Thread via GitHub

nikhilsheoran-db commented on PR #44645: URL: https://github.com/apache/spark/pull/44645#issuecomment-1885740741 @jchen5 @agubichev Can you please take a look? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL ab

Re: [PR] [SPARK-46640][SQL] Fix RemoveRedundantAlias by excluding subquery attributes [spark]

2024-01-10 Thread via GitHub

nikhilsheoran-db commented on PR #44645: URL: https://github.com/apache/spark/pull/44645#issuecomment-1885801958 @cloud-fan Can you take a look at this? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] [SPIP-IN-PROGRESS][DO-NOT-MERGE][SS] Add base support for new arbitrary state management operator, single valueState type, multiple state variables and underlying support for column families

2024-01-10 Thread via GitHub

ericm-db commented on code in PR #43961: URL: https://github.com/apache/spark/pull/43961#discussion_r1448001913 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StatefulProcessorHandleImpl.scala: ## @@ -0,0 +1,127 @@ +/* + * Licensed to the Apache Software Fou

Re: [PR] [SPARK-46664][CORE] Improve `Master` to recover quickly in case of zero workers and apps [spark]

2024-01-10 Thread via GitHub

mridulm commented on code in PR #44673: URL: https://github.com/apache/spark/pull/44673#discussion_r1448062743 ## core/src/main/scala/org/apache/spark/deploy/master/Master.scala: ## @@ -619,6 +619,9 @@ private[deploy] class Master( case e: Exception => logInfo("Worker "

Re: [PR] [SPARK-46664][CORE] Improve `Master` to recover quickly in case of zero workers and apps [spark]

2024-01-10 Thread via GitHub

mridulm commented on code in PR #44673: URL: https://github.com/apache/spark/pull/44673#discussion_r1448067149 ## core/src/main/scala/org/apache/spark/deploy/master/Master.scala: ## @@ -619,6 +619,9 @@ private[deploy] class Master( case e: Exception => logInfo("Worker "

Re: [PR] [SPARK-46664][CORE] Improve `Master` to recover quickly in case of zero workers and apps [spark]

2024-01-10 Thread via GitHub

dongjoon-hyun commented on code in PR #44673: URL: https://github.com/apache/spark/pull/44673#discussion_r1448072611 ## core/src/main/scala/org/apache/spark/deploy/master/Master.scala: ## @@ -619,6 +619,9 @@ private[deploy] class Master( case e: Exception => logInfo("Wo

Re: [PR] [SPARK-46664][CORE] Improve `Master` to recover quickly in case of zero workers and apps [spark]

2024-01-10 Thread via GitHub

dongjoon-hyun commented on PR #44673: URL: https://github.com/apache/spark/pull/44673#issuecomment-1885870489 Thank you for review, @mridulm . The PR is updated accordingly. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] [SPARK-46641][SS] Add maxBytesPerTrigger threshold [spark]

2024-01-10 Thread via GitHub

dongjoon-hyun commented on code in PR #44636: URL: https://github.com/apache/spark/pull/44636#discussion_r1448082584 ## docs/structured-streaming-programming-guide.md: ## @@ -561,6 +561,8 @@ Here are the details of all the sources in Spark. maxFilesPerTrigger:

Re: [PR] [SPARK-46383] Reduce Driver Heap Usage by Reducing the Lifespan of `TaskInfo.accumulables()` [spark]

2024-01-10 Thread via GitHub

mridulm commented on code in PR #44321: URL: https://github.com/apache/spark/pull/44321#discussion_r1448069167 ## core/src/test/scala/org/apache/spark/scheduler/SparkListenerSuite.scala: ## @@ -643,6 +657,29 @@ class SparkListenerSuite extends SparkFunSuite with LocalSparkConte

Re: [PR] [SPARK-46641][SS] Add maxBytesPerTrigger threshold [spark]

2024-01-10 Thread via GitHub

dongjoon-hyun commented on code in PR #44636: URL: https://github.com/apache/spark/pull/44636#discussion_r1448084120 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/streaming/ReadLimit.java: ## @@ -39,6 +39,8 @@ static ReadLimit minRows(long rows, long maxTrigg

Re: [PR] [SPARK-46641][SS] Add maxBytesPerTrigger threshold [spark]

2024-01-10 Thread via GitHub

dongjoon-hyun commented on code in PR #44636: URL: https://github.com/apache/spark/pull/44636#discussion_r1448084408 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/streaming/ReadMaxBytes.java: ## @@ -0,0 +1,56 @@ +/* + * Licensed to the Apache Software Foundat

Re: [PR] [SPARK-46641][SS] Add maxBytesPerTrigger threshold [spark]

2024-01-10 Thread via GitHub

dongjoon-hyun commented on code in PR #44636: URL: https://github.com/apache/spark/pull/44636#discussion_r1448086330 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FileStreamSource.scala: ## @@ -113,16 +117,32 @@ class FileStreamSource( // Visible for tes

Re: [PR] [SPARK-46641][SS] Add maxBytesPerTrigger threshold [spark]

2024-01-10 Thread via GitHub

dongjoon-hyun commented on code in PR #44636: URL: https://github.com/apache/spark/pull/44636#discussion_r1448087582 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FileStreamSource.scala: ## @@ -113,16 +117,32 @@ class FileStreamSource( // Visible for tes

Re: [PR] [SPARK-46641][SS] Add maxBytesPerTrigger threshold [spark]

2024-01-10 Thread via GitHub

dongjoon-hyun commented on code in PR #44636: URL: https://github.com/apache/spark/pull/44636#discussion_r1448095648 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/streaming/ReadMaxBytes.java: ## @@ -0,0 +1,56 @@ +/* + * Licensed to the Apache Software Foundat

Re: [PR] [SPARK-46641][SS] Add maxBytesPerTrigger threshold [spark]

2024-01-10 Thread via GitHub

dongjoon-hyun commented on code in PR #44636: URL: https://github.com/apache/spark/pull/44636#discussion_r1448099879 ## sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FileStreamSource.scala: ## @@ -379,6 +418,9 @@ object FileStreamSource { def sparkPath: S

Re: [PR] [SPARK-46641][SS] Add maxBytesPerTrigger threshold [spark]

2024-01-10 Thread via GitHub

dongjoon-hyun commented on code in PR #44636: URL: https://github.com/apache/spark/pull/44636#discussion_r1448101899 ## sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSourceSuite.scala: ## @@ -1153,174 +1153,213 @@ class FileStreamSourceSuite extends FileStrea

Re: [PR] [SPARK-46641][SS] Add maxBytesPerTrigger threshold [spark]

2024-01-10 Thread via GitHub

dongjoon-hyun commented on code in PR #44636: URL: https://github.com/apache/spark/pull/44636#discussion_r1448101682 ## sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSourceSuite.scala: ## @@ -1153,174 +1153,213 @@ class FileStreamSourceSuite extends FileStrea

Re: [PR] [SPARK-46641][SS] Add maxBytesPerTrigger threshold [spark]

2024-01-10 Thread via GitHub

dongjoon-hyun commented on code in PR #44636: URL: https://github.com/apache/spark/pull/44636#discussion_r1448102251 ## sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSourceSuite.scala: ## @@ -1153,174 +1153,213 @@ class FileStreamSourceSuite extends FileStrea

Re: [PR] [SPARK-46641][SS] Add maxBytesPerTrigger threshold [spark]

2024-01-10 Thread via GitHub

dongjoon-hyun commented on code in PR #44636: URL: https://github.com/apache/spark/pull/44636#discussion_r1448102701 ## sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSourceSuite.scala: ## @@ -1153,174 +1153,213 @@ class FileStreamSourceSuite extends FileStrea

Re: [PR] [SPARK-46664][CORE] Improve `Master` to recover quickly in case of zero workers and apps [spark]

2024-01-10 Thread via GitHub

dongjoon-hyun commented on PR #44673: URL: https://github.com/apache/spark/pull/44673#issuecomment-1885919098 Thank you, @mridulm . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific co

Re: [PR] [SPARK-46383] Reduce Driver Heap Usage by Reducing the Lifespan of `TaskInfo.accumulables()` [spark]

2024-01-10 Thread via GitHub

utkarsh39 commented on code in PR #44321: URL: https://github.com/apache/spark/pull/44321#discussion_r1448110832 ## core/src/test/scala/org/apache/spark/scheduler/SparkListenerSuite.scala: ## @@ -289,6 +290,17 @@ class SparkListenerSuite extends SparkFunSuite with LocalSparkCon

Re: [PR] [SPARK-46474][INFRA] Upgrade upload-artifact action to v4 [spark]

2024-01-10 Thread via GitHub

HyukjinKwon commented on PR #44662: URL: https://github.com/apache/spark/pull/44662#issuecomment-1885939496 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [SPARK-46474][INFRA] Upgrade upload-artifact action to v4 [spark]

2024-01-10 Thread via GitHub

HyukjinKwon closed pull request #44662: [SPARK-46474][INFRA] Upgrade upload-artifact action to v4 URL: https://github.com/apache/spark/pull/44662 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] [WIP][DO-NOT-MERGE] Upgrade rocksjni to 8.9.1 [spark]

2024-01-10 Thread via GitHub

neilramaswamy closed pull request #44674: [WIP][DO-NOT-MERGE] Upgrade rocksjni to 8.9.1 URL: https://github.com/apache/spark/pull/44674 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

Re: [PR] [WIP][DO-NOT-MERGE] Upgrade rocksjni to 8.9.1 [spark]

2024-01-10 Thread via GitHub

neilramaswamy commented on PR #44674: URL: https://github.com/apache/spark/pull/44674#issuecomment-1885941820 Oops, closing as there is existing work in SPARK-45110 to upgrade to 8.8.1, which is already doing plenty of validation. -- This is an automated message from the Apache Git Servic

[PR] [SPARK-46665][PYTHON] Remove Pandas dependency for `pyspark.testing` [spark]

2024-01-10 Thread via GitHub

itholic opened a new pull request, #44675: URL: https://github.com/apache/spark/pull/44675 ### What changes were proposed in this pull request? This PR proposes to remove Pandas dependency for `pyspark.testing`. ### Why are the changes needed? `pyspark.testing.assertDataF

Re: [PR] [SPARK-46665][PYTHON] Remove Pandas dependency for `pyspark.testing` [spark]

2024-01-10 Thread via GitHub

HyukjinKwon commented on code in PR #44675: URL: https://github.com/apache/spark/pull/44675#discussion_r1448120281 ## python/pyspark/testing/__init__.py: ## @@ -16,6 +16,11 @@ # from pyspark.testing.utils import assertDataFrameEqual, assertSchemaEqual -from pyspark.testing.p

[PR] [SPARK-46666][PYTHON][TESTS] Make lxml as an optional testing dependency in test_session [spark]

2024-01-10 Thread via GitHub

HyukjinKwon opened a new pull request, #44676: URL: https://github.com/apache/spark/pull/44676 ### What changes were proposed in this pull request? This PR proposes to make `lxml` as an optional testing dependency in the `test_session` test ### Why are the changes needed?

Re: [PR] [SPARK-46666][PYTHON][TESTS] Make lxml as an optional testing dependency in test_session [spark]

2024-01-10 Thread via GitHub

HyukjinKwon commented on PR #44676: URL: https://github.com/apache/spark/pull/44676#issuecomment-1885958435 build: https://github.com/HyukjinKwon/spark/actions/runs/7482270214 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub an

Re: [PR] [SPARK-46658][DOCS] Loosen Ruby dependency specification [spark]

2024-01-10 Thread via GitHub

HyukjinKwon commented on PR #44667: URL: https://github.com/apache/spark/pull/44667#issuecomment-1885961407 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [SPARK-46658][DOCS] Loosen Ruby dependency specification [spark]

2024-01-10 Thread via GitHub

HyukjinKwon closed pull request #44667: [SPARK-46658][DOCS] Loosen Ruby dependency specification URL: https://github.com/apache/spark/pull/44667 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

Re: [PR] [SPARK-46382][SQL] XML: Update doc for `ignoreSurroundingSpaces` [spark]

2024-01-10 Thread via GitHub

HyukjinKwon commented on PR #44671: URL: https://github.com/apache/spark/pull/44671#issuecomment-1885962408 Merged to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] [SPARK-46382][SQL] XML: Update doc for `ignoreSurroundingSpaces` [spark]

2024-01-10 Thread via GitHub

HyukjinKwon closed pull request #44671: [SPARK-46382][SQL] XML: Update doc for `ignoreSurroundingSpaces` URL: https://github.com/apache/spark/pull/44671 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] [SPARK-45134][SHUFFLE] Avoid repeated fallback when failed to fetch remote push-merged block meta [spark]

2024-01-10 Thread via GitHub

github-actions[bot] closed pull request #43004: [SPARK-45134][SHUFFLE] Avoid repeated fallback when failed to fetch remote push-merged block meta URL: https://github.com/apache/spark/pull/43004 -- This is an automated message from the Apache Git Service. To respond to the message, please log

Re: [PR] [SPARK-46641][SS] Add maxBytesPerTrigger threshold [spark]

2024-01-10 Thread via GitHub

viirya commented on code in PR #44636: URL: https://github.com/apache/spark/pull/44636#discussion_r1448132991 ## sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/streaming/ReadLimit.java: ## @@ -39,6 +39,8 @@ static ReadLimit minRows(long rows, long maxTriggerDelay

[PR] [WIP] Multiple input stream test [spark]

2024-01-10 Thread via GitHub

ericm-db opened a new pull request, #44677: URL: https://github.com/apache/spark/pull/44677 ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How wa

Re: [PR] [SPARK-46662][K8S][BUILD] Upgrade `kubernetes-client` to 6.10.0 [spark]

2024-01-10 Thread via GitHub

dongjoon-hyun closed pull request #44672: [SPARK-46662][K8S][BUILD] Upgrade `kubernetes-client` to 6.10.0 URL: https://github.com/apache/spark/pull/44672 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] [WIP] Multiple input stream test [spark]

2024-01-10 Thread via GitHub

ericm-db closed pull request #44677: [WIP] Multiple input stream test URL: https://github.com/apache/spark/pull/44677 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubsc

Re: [PR] [SPARK-46664][CORE] Improve `Master` to recover quickly in case of zero workers and apps [spark]

2024-01-10 Thread via GitHub

dongjoon-hyun closed pull request #44673: [SPARK-46664][CORE] Improve `Master` to recover quickly in case of zero workers and apps URL: https://github.com/apache/spark/pull/44673 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub an

[PR] [SPARK-46638][Python] Create Python UDTF API to acquire execution memory for 'eval' and 'terminate' methods [spark]

2024-01-10 Thread via GitHub

dtenedor opened a new pull request, #44678: URL: https://github.com/apache/spark/pull/44678 ### What changes were proposed in this pull request? This PR creates a Python UDTF API to acquire execution memory for 'eval' and 'terminate' methods. For example, this UDTF accepts an a

Re: [PR] [SPARK-46638][Python] Create Python UDTF API to acquire execution memory for 'eval' and 'terminate' methods [spark]

2024-01-10 Thread via GitHub

dtenedor commented on PR #44678: URL: https://github.com/apache/spark/pull/44678#issuecomment-1886024403 cc @ueshin @allisonwang-db -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

Re: [PR] [SPARK-46665][PYTHON] Remove Pandas dependency for `pyspark.testing` [spark]

2024-01-10 Thread via GitHub

itholic commented on code in PR #44675: URL: https://github.com/apache/spark/pull/44675#discussion_r1448175750 ## python/pyspark/testing/__init__.py: ## @@ -16,6 +16,11 @@ # from pyspark.testing.utils import assertDataFrameEqual, assertSchemaEqual -from pyspark.testing.panda

Re: [PR] [SPARK-46052][CORE] Remove function TaskScheduler.killAllTaskAttempts [spark]

2024-01-10 Thread via GitHub

Ngone51 commented on code in PR #43954: URL: https://github.com/apache/spark/pull/43954#discussion_r1448181728 ## core/src/test/scala/org/apache/spark/TempLocalSparkContext.scala: ## @@ -51,7 +51,7 @@ trait TempLocalSparkContext extends BeforeAndAfterEach */ def sc: Spark

Re: [PR] [SPARK-46052][CORE] Remove function TaskScheduler.killAllTaskAttempts [spark]

2024-01-10 Thread via GitHub

Ngone51 commented on code in PR #43954: URL: https://github.com/apache/spark/pull/43954#discussion_r1448192078 ## core/src/main/scala/org/apache/spark/scheduler/TaskScheduler.scala: ## @@ -54,7 +54,7 @@ private[spark] trait TaskScheduler { // Submit a sequence of tasks to run

Re: [PR] [SPARK-46654][SQL] Make to_csv can correctly display complex types data [spark]

2024-01-10 Thread via GitHub

panbingkun commented on PR #44665: URL: https://github.com/apache/spark/pull/44665#issuecomment-1886085064 cc @cloud-fan @LuciferYang -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] [SPARK-46650][CORE][SQL][YARN] Replace AtomicBoolean with volatile boolean [spark]

2024-01-10 Thread via GitHub

beliefer closed pull request #44638: [SPARK-46650][CORE][SQL][YARN] Replace AtomicBoolean with volatile boolean URL: https://github.com/apache/spark/pull/44638 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] [SPARK-46575][SQL][HIVE] Make HiveThriftServer2.startWithContext DevelopApi retriable and fix flakiness of ThriftServerWithSparkContextInHttpSuite [spark]

2024-01-10 Thread via GitHub

yaooqinn commented on code in PR #44575: URL: https://github.com/apache/spark/pull/44575#discussion_r1448201533 ## sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveUtils.scala: ## @@ -201,6 +201,16 @@ private[spark] object HiveUtils extends Logging { .booleanConf

Re: [PR] try to fix upload [spark]

2024-01-10 Thread via GitHub

panbingkun closed pull request #44653: try to fix upload URL: https://github.com/apache/spark/pull/44653 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

Re: [PR] [SPARK-46052][CORE] Remove function TaskScheduler.killAllTaskAttempts [spark]

2024-01-10 Thread via GitHub

cloud-fan commented on code in PR #43954: URL: https://github.com/apache/spark/pull/43954#discussion_r1448202496 ## core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala: ## @@ -296,18 +296,31 @@ private[spark] class TaskSchedulerImpl( new TaskSetManager(th

Re: [PR] [WIP][SQL][CONNECT] Resolve inappropriate use of AtomicInteger [spark]

2024-01-10 Thread via GitHub

beliefer closed pull request #44659: [WIP][SQL][CONNECT] Resolve inappropriate use of AtomicInteger URL: https://github.com/apache/spark/pull/44659 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

Re: [PR] [SPARK-46052][CORE] Remove function TaskScheduler.killAllTaskAttempts [spark]

2024-01-10 Thread via GitHub

cloud-fan commented on code in PR #43954: URL: https://github.com/apache/spark/pull/43954#discussion_r1448202635 ## core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala: ## @@ -1002,6 +1002,17 @@ private[spark] class TaskSetManager( maybeFinishTaskSet() }

Re: [PR] [SPARK-46052][CORE] Remove function TaskScheduler.killAllTaskAttempts [spark]

2024-01-10 Thread via GitHub

cloud-fan commented on code in PR #43954: URL: https://github.com/apache/spark/pull/43954#discussion_r1448203237 ## core/src/main/scala/org/apache/spark/scheduler/TaskScheduler.scala: ## @@ -54,9 +54,9 @@ private[spark] trait TaskScheduler { // Submit a sequence of tasks to r

Re: [PR] [SPARK-46575][SQL][HIVE] Make HiveThriftServer2.startWithContext DevelopApi retriable and fix flakiness of ThriftServerWithSparkContextInHttpSuite [spark]

2024-01-10 Thread via GitHub

cloud-fan commented on code in PR #44575: URL: https://github.com/apache/spark/pull/44575#discussion_r1448204963 ## sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/SharedThriftServer.scala: ## @@ -138,10 +138,15 @@ trait SharedThriftServer extends Sha

1 2 >

1 - 100 of 142 matches

Mail list logo