Github user jinxing64 commented on the issue:
https://github.com/apache/spark/pull/18239
@cloud-fan
Thanks a lot for reply.
Yes, I'm also hesitate to backport branch-1.6;
But I think this bug is too obvious -- with
`spark.sql.adaptive.enabled=true`, any rerunni
Github user jinxing64 commented on a diff in the pull request:
https://github.com/apache/spark/pull/18231#discussion_r122367121
--- Diff:
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalShuffleBlockHandler.java
---
@@ -209,4 +190,51 @@ private
Github user jinxing64 commented on the issue:
https://github.com/apache/spark/pull/18231
Jenkins, retest this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and
GitHub user jinxing64 opened a pull request:
https://github.com/apache/spark/pull/18327
[SPARK-21047] Add test suites for complicated cases in ColumnarBatchSuite
## What changes were proposed in this pull request?
Current ColumnarBatchSuite has very simple test cases for `Array
Github user jinxing64 commented on the issue:
https://github.com/apache/spark/pull/18327
@kiszk
Would you mind if I make a try for this JIRA?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not
Github user jinxing64 commented on the issue:
https://github.com/apache/spark/pull/18231
@cloudâfan
Thanks for merging !
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user jinxing64 commented on the issue:
https://github.com/apache/spark/pull/18343
Thanks for ping.
If I understand correctly, `HighlyCompressedStatus` is initialized when 2
situations:
1. Creating `MapStatus` when shuffle-write and the reduce partitions is
over 2000
Github user jinxing64 commented on the issue:
https://github.com/apache/spark/pull/14085
@zenglinxi0615
This pr is about adding all files in a directory recursively, thus no need
to enumerate all the filenames? I think this can be pretty useful especially in
production env
Github user jinxing64 commented on the issue:
https://github.com/apache/spark/pull/18327
@kiszk
Thank you so much !
I will read your comments carefully and refine this pr : )
---
If your project is set up for it, you can reply to this email and have your
reply appear on
Github user jinxing64 commented on the issue:
https://github.com/apache/spark/pull/18249
@vanzin
Would you mind give more comments when have time ? And I can continue
working on this :)
---
If your project is set up for it, you can reply to this email and have your
reply appear
Github user jinxing64 closed the pull request at:
https://github.com/apache/spark/pull/18239
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is
Github user jinxing64 commented on a diff in the pull request:
https://github.com/apache/spark/pull/18327#discussion_r123258919
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/execution/vectorized/ColumnarBatchSuite.scala
---
@@ -739,6 +739,123 @@ class ColumnarBatchSuite
GitHub user jinxing64 opened a pull request:
https://github.com/apache/spark/pull/18388
[SPARK-21175] Reject OpenBlocks when memory shortage on shuffle service.
## What changes were proposed in this pull request?
A shuffle service can serves blocks from multiple apps/tasks
Github user jinxing64 commented on the issue:
https://github.com/apache/spark/pull/18327
@kiszk
I tried to add a test `Nest Array(containing null) in Array.`. Please take
a look when you have time and I will continue working on this :)
---
If your project is set up for it, you
Github user jinxing64 commented on a diff in the pull request:
https://github.com/apache/spark/pull/18327#discussion_r123473623
--- Diff:
sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ColumnarBatch.java
---
@@ -241,7 +241,40 @@ public MapData getMap(int ordinal
Github user jinxing64 commented on a diff in the pull request:
https://github.com/apache/spark/pull/18327#discussion_r123499668
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/execution/vectorized/ColumnarBatchSuite.scala
---
@@ -739,6 +739,157 @@ class ColumnarBatchSuite
Github user jinxing64 commented on the issue:
https://github.com/apache/spark/pull/18327
@cloud-fan
Thanks a lot for taking time review this :)
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does
Github user jinxing64 commented on a diff in the pull request:
https://github.com/apache/spark/pull/18327#discussion_r123690164
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/execution/vectorized/ColumnarBatchSuite.scala
---
@@ -739,6 +739,157 @@ class ColumnarBatchSuite
Github user jinxing64 commented on the issue:
https://github.com/apache/spark/pull/18327
@cloud-fan
Thanks for merging :)
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
GitHub user jinxing64 opened a pull request:
https://github.com/apache/spark/pull/18405
[SPARK-21194][SQL][WIP] Fail the putNullmethod when containsNull=false.
## What changes were proposed in this pull request?
Currently there's no check for putting null into a `Arra
Github user jinxing64 commented on the issue:
https://github.com/apache/spark/pull/18405
cc @kiszk
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if
Github user jinxing64 commented on the issue:
https://github.com/apache/spark/pull/18405
Jenkins, retest this please.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and
Github user jinxing64 commented on the issue:
https://github.com/apache/spark/pull/18405
Jenkins, retest this please.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and
Github user jinxing64 commented on the issue:
https://github.com/apache/spark/pull/18405
Jenkins, retest this please.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and
Github user jinxing64 commented on the issue:
https://github.com/apache/spark/pull/18405
@kiszk
Thanks a lot for taking time review this.
I've no idea why test failed :(
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitH
Github user jinxing64 commented on the issue:
https://github.com/apache/spark/pull/16989
@cloud-fan
More comments on this ? :)
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this
Github user jinxing64 commented on a diff in the pull request:
https://github.com/apache/spark/pull/16989#discussion_r115219258
--- Diff: core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala ---
@@ -128,41 +130,52 @@ private[spark] class CompressedMapStatus
Github user jinxing64 commented on a diff in the pull request:
https://github.com/apache/spark/pull/16989#discussion_r115220025
--- Diff:
core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala
---
@@ -163,6 +173,8 @@ final class ShuffleBlockFetcherIterator
Github user jinxing64 commented on a diff in the pull request:
https://github.com/apache/spark/pull/16989#discussion_r115229968
--- Diff:
core/src/test/scala/org/apache/spark/scheduler/MapStatusSuite.scala ---
@@ -128,4 +130,22 @@ class MapStatusSuite extends SparkFunSuite
Github user jinxing64 commented on a diff in the pull request:
https://github.com/apache/spark/pull/16989#discussion_r115229995
--- Diff:
core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala
---
@@ -175,33 +187,45 @@ final class
Github user jinxing64 commented on a diff in the pull request:
https://github.com/apache/spark/pull/16989#discussion_r115230064
--- Diff:
core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala
---
@@ -163,6 +173,8 @@ final class ShuffleBlockFetcherIterator
Github user jinxing64 commented on the issue:
https://github.com/apache/spark/pull/16989
@cloud-fan
Thank you very much for reviewing this thus far :)
>How about we always fetch to disk if the block size is over
maxBytesInFlight?
I super agree with this. It'
Github user jinxing64 commented on the issue:
https://github.com/apache/spark/pull/16989
Jenkins, test this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and
Github user jinxing64 commented on a diff in the pull request:
https://github.com/apache/spark/pull/16989#discussion_r115518541
--- Diff:
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/OneForOneBlockFetcher.java
---
@@ -126,4 +151,39 @@ private void
Github user jinxing64 commented on a diff in the pull request:
https://github.com/apache/spark/pull/16989#discussion_r115519964
--- Diff: core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala ---
@@ -128,41 +130,52 @@ private[spark] class CompressedMapStatus
Github user jinxing64 commented on a diff in the pull request:
https://github.com/apache/spark/pull/16989#discussion_r115519863
--- Diff: core/src/main/scala/org/apache/spark/memory/MemoryManager.scala
---
@@ -54,7 +54,8 @@ private[spark] abstract class MemoryManager
Github user jinxing64 commented on a diff in the pull request:
https://github.com/apache/spark/pull/16989#discussion_r115522781
--- Diff: core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala ---
@@ -193,8 +206,18 @@ private[spark] object HighlyCompressedMapStatus
Github user jinxing64 commented on a diff in the pull request:
https://github.com/apache/spark/pull/16989#discussion_r115522902
--- Diff:
core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala
---
@@ -137,6 +146,7 @@ final class ShuffleBlockFetcherIterator
Github user jinxing64 commented on a diff in the pull request:
https://github.com/apache/spark/pull/16989#discussion_r115522973
--- Diff:
core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala
---
@@ -154,15 +164,24 @@ final class
Github user jinxing64 commented on a diff in the pull request:
https://github.com/apache/spark/pull/16989#discussion_r115523103
--- Diff:
core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala
---
@@ -154,15 +164,24 @@ final class
Github user jinxing64 commented on a diff in the pull request:
https://github.com/apache/spark/pull/16989#discussion_r115523324
--- Diff:
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/OneForOneBlockFetcher.java
---
@@ -126,4 +149,38 @@ private void
Github user jinxing64 commented on a diff in the pull request:
https://github.com/apache/spark/pull/16989#discussion_r115523971
--- Diff:
common/network-common/src/main/java/org/apache/spark/network/server/OneForOneStreamManager.java
---
@@ -95,6 +95,14 @@ public ManagedBuffer
Github user jinxing64 commented on a diff in the pull request:
https://github.com/apache/spark/pull/16989#discussion_r115525399
--- Diff: core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala ---
@@ -128,41 +131,60 @@ private[spark] class CompressedMapStatus
Github user jinxing64 commented on a diff in the pull request:
https://github.com/apache/spark/pull/16989#discussion_r115525769
--- Diff:
common/network-common/src/main/java/org/apache/spark/network/buffer/FileSegmentManagedBuffer.java
---
@@ -36,7 +36,7 @@
/**
* A
Github user jinxing64 commented on the issue:
https://github.com/apache/spark/pull/16989
@mridulm
Really thankful for taking time looking into this pr. Really helpful. I
refined according to your comments. Please take another look when you have time
and give more comments
Github user jinxing64 commented on a diff in the pull request:
https://github.com/apache/spark/pull/16989#discussion_r115638943
--- Diff: core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala ---
@@ -128,41 +131,60 @@ private[spark] class CompressedMapStatus
Github user jinxing64 commented on the issue:
https://github.com/apache/spark/pull/16989
Yes, I will refine :)
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes
Github user jinxing64 commented on a diff in the pull request:
https://github.com/apache/spark/pull/16989#discussion_r115884613
--- Diff:
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/OneForOneBlockFetcher.java
---
@@ -126,4 +150,50 @@ private void
Github user jinxing64 commented on the issue:
https://github.com/apache/spark/pull/16989
As @mridulm mentioned, in `HighlyCompressedMapStatus` it can be configured
in two respects:
>1. minimum size before we consider something a large block.
>2. The fraction '2
Github user jinxing64 commented on the issue:
https://github.com/apache/spark/pull/16989
@cloud-fan
Yes, I think it's a good idea to make `2000` configurable. I will refine.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitH
Github user jinxing64 commented on a diff in the pull request:
https://github.com/apache/spark/pull/16989#discussion_r115933206
--- Diff:
core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala
---
@@ -175,33 +193,54 @@ final class
Github user jinxing64 commented on the issue:
https://github.com/apache/spark/pull/16989
@cloud-fan @mridulm
I think it's good idea to make 2000 configurable. But checking the code,
I'm a little bit hesitant to do that in this pr. I think it's bigger change and
so
Github user jinxing64 commented on the issue:
https://github.com/apache/spark/pull/16989
Very gentle ping to @cloud-fan and @mridulm
How do you think about the current change :) ?
---
If your project is set up for it, you can reply to this email and have your
reply appear on
Github user jinxing64 commented on a diff in the pull request:
https://github.com/apache/spark/pull/16989#discussion_r116520507
--- Diff: core/src/main/scala/org/apache/spark/memory/MemoryManager.scala
---
@@ -54,7 +54,8 @@ private[spark] abstract class MemoryManager
Github user jinxing64 commented on a diff in the pull request:
https://github.com/apache/spark/pull/16989#discussion_r116521474
--- Diff:
core/src/main/scala/org/apache/spark/shuffle/BlockStoreShuffleReader.scala ---
@@ -51,7 +59,10 @@ private[spark] class BlockStoreShuffleReader
Github user jinxing64 commented on the issue:
https://github.com/apache/spark/pull/16989
In current code, `spark.memory.offHeap.enabled` is used when decide
`tungstenMemoryMode`.
`spark.memory.offHeap.enabled` doesn't decide remote blocks are shuffled to
whether onHeap or of
Github user jinxing64 commented on a diff in the pull request:
https://github.com/apache/spark/pull/16989#discussion_r116674324
--- Diff: core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala ---
@@ -193,8 +216,21 @@ private[spark] object HighlyCompressedMapStatus
Github user jinxing64 commented on the issue:
https://github.com/apache/spark/pull/16989
@cloud-fan Thanks a lot. I will refine :)
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this
Github user jinxing64 commented on a diff in the pull request:
https://github.com/apache/spark/pull/16989#discussion_r116907199
--- Diff:
core/src/test/scala/org/apache/spark/scheduler/MapStatusSuite.scala ---
@@ -128,4 +138,27 @@ class MapStatusSuite extends SparkFunSuite
Github user jinxing64 commented on a diff in the pull request:
https://github.com/apache/spark/pull/16989#discussion_r116907401
--- Diff: docs/configuration.md ---
@@ -954,16 +970,16 @@ Apart from these, the following properties are also
available, and may be useful
Github user jinxing64 commented on a diff in the pull request:
https://github.com/apache/spark/pull/16989#discussion_r116907437
--- Diff:
core/src/main/scala/org/apache/spark/internal/config/package.scala ---
@@ -278,4 +278,10 @@ package object config
Github user jinxing64 commented on a diff in the pull request:
https://github.com/apache/spark/pull/16989#discussion_r116907421
--- Diff:
core/src/test/scala/org/apache/spark/storage/ShuffleBlockFetcherIteratorSuite.scala
---
@@ -401,4 +429,139 @@ class
Github user jinxing64 commented on a diff in the pull request:
https://github.com/apache/spark/pull/16989#discussion_r116911537
--- Diff:
core/src/main/scala/org/apache/spark/shuffle/BlockStoreShuffleReader.scala ---
@@ -51,7 +59,10 @@ private[spark] class BlockStoreShuffleReader
Github user jinxing64 commented on a diff in the pull request:
https://github.com/apache/spark/pull/16989#discussion_r116911964
--- Diff:
core/src/test/scala/org/apache/spark/storage/ShuffleBlockFetcherIteratorSuite.scala
---
@@ -401,4 +429,146 @@ class
Github user jinxing64 commented on a diff in the pull request:
https://github.com/apache/spark/pull/16989#discussion_r116919856
--- Diff:
core/src/test/scala/org/apache/spark/storage/ShuffleBlockFetcherIteratorSuite.scala
---
@@ -401,4 +429,146 @@ class
Github user jinxing64 commented on a diff in the pull request:
https://github.com/apache/spark/pull/16989#discussion_r116929076
--- Diff:
core/src/test/scala/org/apache/spark/storage/ShuffleBlockFetcherIteratorSuite.scala
---
@@ -401,4 +429,146 @@ class
Github user jinxing64 commented on a diff in the pull request:
https://github.com/apache/spark/pull/16989#discussion_r116929126
--- Diff:
core/src/main/scala/org/apache/spark/internal/config/package.scala ---
@@ -278,4 +278,39 @@ package object config
Github user jinxing64 commented on the issue:
https://github.com/apache/spark/pull/16989
@cloud-fan Thanks, I will refine the documents.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this
Github user jinxing64 commented on the issue:
https://github.com/apache/spark/pull/16989
Jenkins, retest this please.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and
Github user jinxing64 commented on a diff in the pull request:
https://github.com/apache/spark/pull/16989#discussion_r117152091
--- Diff:
core/src/main/scala/org/apache/spark/internal/config/package.scala ---
@@ -278,4 +278,39 @@ package object config
Github user jinxing64 commented on the issue:
https://github.com/apache/spark/pull/16989
It seems like `SparkConfigProvider` is not checking alternatives in
`SparkConf`. That's why spark.memory.offHeap.enabled is not set(still the
default value), though we've a
Github user jinxing64 commented on the issue:
https://github.com/apache/spark/pull/16989
Checking the code:
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/internal/config/ConfigProvider.scala#L59
`SparkConfigProvider` just check if the key is
Github user jinxing64 commented on the issue:
https://github.com/apache/spark/pull/16989
@JoshRosen
Thanks a lot for taking time looking into this pr. I'm reading your
comments carefully.
Yes, I think it's good to integrate with memory manager later.
I will
GitHub user jinxing64 opened a pull request:
https://github.com/apache/spark/pull/18031
Record accurate size of blocks in MapStatus when it's above threshold.
## What changes were proposed in this pull request?
Currently, when number of reduces is above
Github user jinxing64 commented on the issue:
https://github.com/apache/spark/pull/18031
To resolve the comments in https://github.com/apache/spark/pull/16989 :
>minimum size before we consider something a large block : if average is
10kb, and some blocks are > 20kb, spillin
Github user jinxing64 commented on the issue:
https://github.com/apache/spark/pull/18031
I try to give user a way to control the memory strictly and no blocks are
underestimated(setting spark.shuffle.accurateBlockThreshold=0 and
spark.shuffle.accurateBlockThresholdByTimesAverage=1
Github user jinxing64 commented on a diff in the pull request:
https://github.com/apache/spark/pull/18031#discussion_r117293425
--- Diff: core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala ---
@@ -193,8 +219,27 @@ private[spark] object HighlyCompressedMapStatus
Github user jinxing64 commented on the issue:
https://github.com/apache/spark/pull/18031
Jenkins, retest this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and
Github user jinxing64 commented on the issue:
https://github.com/apache/spark/pull/18031
Gentle ping to @JoshRosen @cloud-fan @mridulm
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this
Github user jinxing64 commented on the issue:
https://github.com/apache/spark/pull/18031
@HyukjinKwon
Thank you so much ! Really helpful ð
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not
Github user jinxing64 commented on a diff in the pull request:
https://github.com/apache/spark/pull/18031#discussion_r117460170
--- Diff: core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala ---
@@ -193,8 +219,27 @@ private[spark] object HighlyCompressedMapStatus
Github user jinxing64 commented on a diff in the pull request:
https://github.com/apache/spark/pull/18031#discussion_r117461089
--- Diff: core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala ---
@@ -193,8 +219,27 @@ private[spark] object HighlyCompressedMapStatus
Github user jinxing64 commented on a diff in the pull request:
https://github.com/apache/spark/pull/18031#discussion_r117510051
--- Diff: core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala ---
@@ -193,8 +219,27 @@ private[spark] object HighlyCompressedMapStatus
Github user jinxing64 commented on a diff in the pull request:
https://github.com/apache/spark/pull/18031#discussion_r117610285
--- Diff: core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala ---
@@ -193,8 +219,27 @@ private[spark] object HighlyCompressedMapStatus
Github user jinxing64 commented on a diff in the pull request:
https://github.com/apache/spark/pull/18031#discussion_r117610423
--- Diff: core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala ---
@@ -121,48 +126,69 @@ private[spark] class CompressedMapStatus
Github user jinxing64 commented on a diff in the pull request:
https://github.com/apache/spark/pull/18031#discussion_r117620188
--- Diff: core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala ---
@@ -193,8 +219,27 @@ private[spark] object HighlyCompressedMapStatus
Github user jinxing64 commented on the issue:
https://github.com/apache/spark/pull/18031
In current change:
1. there's only one config: spark.shuffle.accurateBlockThreshold
2. I remove the huge blocks from the numerator in that calculation for
average size
---
If
Github user jinxing64 commented on the issue:
https://github.com/apache/spark/pull/16989
@cloud-fan
Yes, thanks a lot for merging #18031
I will update soon !
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If
Github user jinxing64 commented on the issue:
https://github.com/apache/spark/pull/18031
@cloud-fan
Thanks for merging !
@mridulm @JoshRosen @viirya @HyukjinKwon @wzhfy Thanks a lot for taking
time reviewing this pr !
---
If your project is set up for it, you can reply to
Github user jinxing64 commented on the issue:
https://github.com/apache/spark/pull/16989
In current change:
1. `ShuffleBlockFetcherIterator` is not a `MemoryConsumer`
2. Name of shuffle file becomes: ${context.taskAttemptId()}-remote-$bId
3. Try to delete all shuffle files
Github user jinxing64 commented on the issue:
https://github.com/apache/spark/pull/17634
@jiangxb1987
Thank you so much taking time looking into this.
Yes, it is not failing in existing code. But I think it's quite
straightforward that partitioner should be compatible
Github user jinxing64 commented on the issue:
https://github.com/apache/spark/pull/16989
@cloud-fan
In current change, the shuffle files are deleted twice:
1). After the `ManagedBuffer.release`
2). In the `cleanup()`, the `cleanup()` is already registered as a task
Github user jinxing64 commented on the issue:
https://github.com/apache/spark/pull/16989
In current change:
1) remove the partial written file when failing
2) remove all shuffle files when `cleanup()`(this is registered as a task
completion callback)
---
If your project is
Github user jinxing64 commented on a diff in the pull request:
https://github.com/apache/spark/pull/16989#discussion_r118272532
--- Diff:
core/src/main/scala/org/apache/spark/internal/config/package.scala ---
@@ -287,4 +287,10 @@ package object config {
.bytesConf
Github user jinxing64 commented on a diff in the pull request:
https://github.com/apache/spark/pull/16989#discussion_r118272653
--- Diff:
core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala
---
@@ -175,33 +187,49 @@ final class
Github user jinxing64 commented on a diff in the pull request:
https://github.com/apache/spark/pull/16989#discussion_r118272605
--- Diff:
core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala
---
@@ -163,6 +170,11 @@ final class
Github user jinxing64 commented on a diff in the pull request:
https://github.com/apache/spark/pull/16989#discussion_r118272761
--- Diff:
core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala
---
@@ -175,33 +187,49 @@ final class
Github user jinxing64 commented on the issue:
https://github.com/apache/spark/pull/17634
Yes. it doesn't fail for sure.
I just think it's fairly straightforward that partitioner should be
compatible with num of child RDD's partitions. I find no reason the num of
Github user jinxing64 commented on a diff in the pull request:
https://github.com/apache/spark/pull/16989#discussion_r118424377
--- Diff: docs/configuration.md ---
@@ -520,6 +520,14 @@ Apart from these, the following properties are also
available, and may be useful
Github user jinxing64 commented on a diff in the pull request:
https://github.com/apache/spark/pull/16989#discussion_r118424375
--- Diff:
core/src/test/scala/org/apache/spark/storage/ShuffleBlockFetcherIteratorSuite.scala
---
@@ -401,4 +411,61 @@ class
601 - 700 of 719 matches
Mail list logo