[GitHub] flink issue #2571: [FLINK-4348] Simplify logic of SlotManager

2016-10-07 Thread mxm
Github user mxm commented on the issue: https://github.com/apache/flink/pull/2571 Merged --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] flink issue #2571: [FLINK-4348] Simplify logic of SlotManager

2016-10-06 Thread mxm
Github user mxm commented on the issue: https://github.com/apache/flink/pull/2571 Rebased to latest `flip-6`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes

[GitHub] flink issue #2571: [FLINK-4348] Simplify logic of SlotManager

2016-10-06 Thread mxm
Github user mxm commented on the issue: https://github.com/apache/flink/pull/2571 Thanks for the comments @KurtYoung. I've updated the PR. No worries, four eyes always see more than two and it is natural that ideas progress over time :) --- If your project is set up for it, you can

[GitHub] flink issue #2571: [FLINK-4348] Simplify logic of SlotManager

2016-10-05 Thread mxm
Github user mxm commented on the issue: https://github.com/apache/flink/pull/2571 As discussed in this PR, I've pushed the following changes: 1. Move the slot registration and allocation report to the registration of the TaskExecutor 2. Let the TaskExecutor

[GitHub] flink issue #2571: [FLINK-4348] Simplify logic of SlotManager

2016-10-04 Thread mxm
Github user mxm commented on the issue: https://github.com/apache/flink/pull/2571 @KurtYoung I see, that makes sense. When the TM looses connection to the RM, it will eventually be detected by the heartbeating and the list of pending slot allocation removal requests will be cleared.

[GitHub] flink issue #2571: [FLINK-4348] Simplify logic of SlotManager

2016-10-01 Thread KurtYoung
Github user KurtYoung commented on the issue: https://github.com/apache/flink/pull/2571 @mxm Regarding the last heartbeat thing, what i really trying to say is, the failing free request will not stuck in the unconfirmed list forever, since we have the heartbeat manager to monitoring

[GitHub] flink issue #2571: [FLINK-4348] Simplify logic of SlotManager

2016-09-30 Thread mxm
Github user mxm commented on the issue: https://github.com/apache/flink/pull/2571 Great to hear that we're on the same page :) >I think it's no need to stick to the failed slot when the allocation fails by rpc. Just put it back to the free pool, and give us another shot.

[GitHub] flink issue #2571: [FLINK-4348] Simplify logic of SlotManager

2016-09-30 Thread KurtYoung
Github user KurtYoung commented on the issue: https://github.com/apache/flink/pull/2571 @mxm Thanks for your thoughts, i really like the discussion.:smirk: You just pointed out another point which i missed in the previous reply. Actually, i noticed that right after i posted

[GitHub] flink issue #2571: [FLINK-4348] Simplify logic of SlotManager

2016-09-30 Thread mxm
Github user mxm commented on the issue: https://github.com/apache/flink/pull/2571 @KurtYoung It is guaranteed that the ResourceManager will receive an RPC response of some sort. Either a reply from the TaskExecutor, or a timeout/error which is returned by the future. If the request

[GitHub] flink issue #2571: [FLINK-4348] Simplify logic of SlotManager

2016-09-30 Thread KurtYoung
Github user KurtYoung commented on the issue: https://github.com/apache/flink/pull/2571 @mxm I think that situation will happen when RM request slot from TM, TM accept the request, but the response was somehow missed. Typically, when an rpc error occurred, the send could not know

[GitHub] flink issue #2571: [FLINK-4348] Simplify logic of SlotManager

2016-09-30 Thread mxm
Github user mxm commented on the issue: https://github.com/apache/flink/pull/2571 Thanks for the feedback, @beyond1920 and @KurtYoung. You're right, the changes don't allow slots to be released by a TaskExecutor. We can change that by explicitly reporting to the RM if a slot

[GitHub] flink issue #2571: [FLINK-4348] Simplify logic of SlotManager

2016-09-30 Thread KurtYoung
Github user KurtYoung commented on the issue: https://github.com/apache/flink/pull/2571 @mxm Thanks for the simplification, i like the idea. When i wrote the first version of the SlotManager, i have noticed maybe i made things too complicated, but i didn't figure out how to make

[GitHub] flink issue #2571: [FLINK-4348] Simplify logic of SlotManager

2016-09-29 Thread beyond1920
Github user beyond1920 commented on the issue: https://github.com/apache/flink/pull/2571 @mxm, What would happen under following cases: taskExecutor releases a registered slot, then taskExecutor reports its latest slotReport to ResourceManager, ResourceManager should remove the old

[GitHub] flink issue #2571: [FLINK-4348] Simplify logic of SlotManager

2016-09-29 Thread mxm
Github user mxm commented on the issue: https://github.com/apache/flink/pull/2571 CC @beyond1920 @KurtYoung --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes