[GitHub] [spark] attilapiros commented on pull request #30763: [SPARK-31801][API][SHUFFLE] Register map output metadata

2021-03-16 Thread GitBox
attilapiros commented on pull request #30763: URL: https://github.com/apache/spark/pull/30763#issuecomment-800791087 We are all agree more abstraction here is really a good idea and reading https://github.com/apache/spark/pull/30763#issuecomment-792865534 gives me the impression we both wo

[GitHub] [spark] attilapiros commented on pull request #30763: [SPARK-31801][API][SHUFFLE] Register map output metadata

2021-03-08 Thread GitBox
attilapiros commented on pull request #30763: URL: https://github.com/apache/spark/pull/30763#issuecomment-792833865 I still think the location abstraction is good idea. I just have my doubts about the amount of the efforts we need to do: > Also, this way wouldn't change the existing

[GitHub] [spark] attilapiros commented on pull request #30763: [SPARK-31801][API][SHUFFLE] Register map output metadata

2021-03-08 Thread GitBox
attilapiros commented on pull request #30763: URL: https://github.com/apache/spark/pull/30763#issuecomment-792821433 >. Then, `BlockManagerId` would be a native implementation for Spark and users could implement `Location` to support custom storage. To test the idea I try to come up

[GitHub] [spark] attilapiros commented on pull request #30763: [SPARK-31801][API][SHUFFLE] Register map output metadata

2021-03-08 Thread GitBox
attilapiros commented on pull request #30763: URL: https://github.com/apache/spark/pull/30763#issuecomment-792588218 @Ngone51 Regarding cutting this to smaller pieces I can identify two potential sub-PRs: - introduction of `MapTaskResult` - introduction of `ShuffleOutputTrack

[GitHub] [spark] attilapiros commented on pull request #30763: [SPARK-31801][API][SHUFFLE] Register map output metadata

2021-03-08 Thread GitBox
attilapiros commented on pull request #30763: URL: https://github.com/apache/spark/pull/30763#issuecomment-792575649 Let me move the mima excludes from 3.1.x to 3.2.x. This is an automated message from the Apache Git Service.

[GitHub] [spark] attilapiros commented on pull request #30763: [SPARK-31801][API][SHUFFLE] Register map output metadata

2021-03-08 Thread GitBox
attilapiros commented on pull request #30763: URL: https://github.com/apache/spark/pull/30763#issuecomment-792563765 Failure is totally unrealted: - org.apache.spark.sql.kafka010.KafkaMicroBatchV2SourceWithAdminSuite.subscribing topic by pattern from latest offsets (failOnDataLoss: fals

[GitHub] [spark] attilapiros commented on pull request #30763: [SPARK-31801][API][SHUFFLE] Register map output metadata

2020-12-15 Thread GitBox
attilapiros commented on pull request #30763: URL: https://github.com/apache/spark/pull/30763#issuecomment-745146977 cc @holdenk, @viirya, @mridulm, @shadowinlife, @tgravescs, @tianczha This is an automated message f