azagrebin opened a new pull request #8789: [FLINK-12890] Add partition lifecycle related Shuffle API URL: https://github.com/apache/flink/pull/8789 ## What is the purpose of the change At the moment we have `ShuffleEnvironment.releasePartitions` which is used to release locally occupied resources of partition. JM can also use it by calling `TaskExecutorGateway.releasePartitions`. To support lifecycle management of partitions ([FLINK-12069](https://issues.apache.org/jira/browse/FLINK-12069), relevant mostly for batch and blocking partitions), we need to extend Shuffle API: - `ShuffleDescriptor.hasLocalResources` indicates that this partition occupies local resources on TM and requires TM running to consume the produced data (e.g. true for default `NettyShuffleEnviroment` and false for externally stored partitions). If a partition needs external lifecycle management and is not released after the first consumption is done (`ResultPartitionDeploymentDescriptor.isReleasedOnConsumption`), then RM/JM should keep TMs, which produce these partitions, running until partition still needs to be consumed. The connection to these TMs should also to be kept to issue the RPC call `TaskExecutorGateway.releasePartitions` once partition is not needed any more. - `ShuffleMaster.removePartitionExternally`: JM should call this whenever the partition does not need to be consumed any more. This call releases partition resources possibly occupied externally outside of TM and should not depend on `ShuffleDescriptor.hasLocalResources`. ## Brief change log - Introduce `ShuffleDescriptor.hasLocalResources` and default netty shuffle implementation - Introduce `ShuffleMaster.removePartitionExternally` and default netty shuffle implementation ## Verifying this change Trivial shuffle interface extension. ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): (no) - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: ( no) - The serializers: (no) - The runtime per-record code paths (performance sensitive): (no) - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (no) - The S3 file system connector: (no) ## Documentation - Does this pull request introduce a new feature? (no) - If yes, how is the feature documented? (not applicable)
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services