Wilfred Spiegelenburg created YUNIKORN-229: ----------------------------------------------
Summary: shim sends the same remove request twice for a remove re Key: YUNIKORN-229 URL: https://issues.apache.org/jira/browse/YUNIKORN-229 Project: Apache YuniKorn Issue Type: Improvement Components: shim - kubernetes Reporter: Wilfred Spiegelenburg In the logs it looks like the shim asks to remove the same allocation using the same UUID: First release request from shim: {code} 2020-06-10T05:54:24.564Z DEBUG cache/cluster_info.go:136 enqueued event {"eventType": "*cacheevent.RMUpdateRequestEvent", "event": {"Request":{"releases":{"allocationsToRelease":[{"partitionName":"[mycluster]default","applicationID":"spark-3a34f5a12bc54c24b7d5f02957cff30c","UUID":"3bf0a159-89ee-4bdc-ada1-c577ac2097d1","message":"task completed"}]},"rmID":"mycluster"}}, "currentQueueSize": 0} 2020-06-10T05:54:24.565Z DEBUG scheduler/scheduler.go:191 enqueued event {"eventType": "*schedulerevent.SchedulerAllocationUpdatesEvent", "event": {"RejectedAllocations":null,"AcceptedAllocations":null,"NewAsks":null,"ToReleases":{"allocationsToRelease":[{"partitionName":"[mycluster]default","applicationID":"spark-3a34f5a12bc54c24b7d5f02957cff30c","UUID":"3bf0a159-89ee-4bdc-ada1-c577ac2097d1","message":"task completed"}]},"ExistingAllocations":null,"RMId":""}, "currentQueueSize": 0} 2020-06-10T05:54:24.565Z DEBUG cache/cluster_info.go:136 enqueued event {"eventType": "*cacheevent.ReleaseAllocationsEvent", "event": {"AllocationsToRelease":[{"UUID":"3bf0a159-89ee-4bdc-ada1-c577ac2097d1","ApplicationID":"spark-3a34f5a12bc54c24b7d5f02957cff30c","PartitionName":"[mycluster]default","Message":"task completed","ReleaseType":0}]}, "currentQueueSize": 0} 2020-06-10T05:54:24.565Z DEBUG cache/partition_info.go:429 removing allocations {"appID": "spark-3a34f5a12bc54c24b7d5f02957cff30c", "allocationId": "3bf0a159-89ee-4bdc-ada1-c577ac2097d1"} 2020-06-10T05:54:24.566Z INFO cache/partition_info.go:477 allocation removed {"numOfAllocationReleased": 1, "partitionName": "[mycluster]default"} 2020-06-10T05:54:24.566Z DEBUG rmproxy/rmproxy.go:65 enqueue event {"event": {"RmID":"mycluster","ReleasedAllocations":[{"UUID":"3bf0a159-89ee-4bdc-ada1-c577ac2097d1","message":"task completed"}]}, "currentQueueSize": 0} 2020-06-10T05:54:24.566Z DEBUG callback/scheduler_callback.go:44 callback received {"updateResponse": "releasedAllocations:<UUID:\"3bf0a159-89ee-4bdc-ada1-c577ac2097d1\" message:\"task completed\" > "} 2020-06-10T05:54:24.566Z DEBUG callback/scheduler_callback.go:119 callback: response to released allocations {"UUID": "3bf0a159-89ee-4bdc-ada1-c577ac2097d1"} {code} Second release request from shim 0.16 seconds after the first request: {code} 2020-06-10T05:54:40.423Z DEBUG cache/cluster_info.go:136 enqueued event {"eventType": "*cacheevent.RMUpdateRequestEvent", "event": {"Request":{"releases":{"allocationsToRelease":[{"partitionName":"[mycluster]default","applicationID":"spark-3a34f5a12bc54c24b7d5f02957cff30c","UUID":"3bf0a159-89ee-4bdc-ada1-c577ac2097d1","message":"task completed"}]},"rmID":"mycluster"}}, "currentQueueSize": 0} 2020-06-10T05:54:40.423Z DEBUG scheduler/scheduler.go:191 enqueued event {"eventType": "*schedulerevent.SchedulerAllocationUpdatesEvent", "event": {"RejectedAllocations":null,"AcceptedAllocations":null,"NewAsks":null,"ToReleases":{"allocationsToRelease":[{"partitionName":"[mycluster]default","applicationID":"spark-3a34f5a12bc54c24b7d5f02957cff30c","UUID":"3bf0a159-89ee-4bdc-ada1-c577ac2097d1","message":"task completed"}]},"ExistingAllocations":null,"RMId":""}, "currentQueueSize": 0} 2020-06-10T05:54:40.423Z DEBUG cache/cluster_info.go:136 enqueued event {"eventType": "*cacheevent.ReleaseAllocationsEvent", "event": {"AllocationsToRelease":[{"UUID":"3bf0a159-89ee-4bdc-ada1-c577ac2097d1","ApplicationID":"spark-3a34f5a12bc54c24b7d5f02957cff30c","PartitionName":"[mycluster]default","Message":"task completed","ReleaseType":0}]}, "currentQueueSize": 0} 2020-06-10T05:54:40.423Z DEBUG cache/partition_info.go:429 removing allocations {"appID": "spark-3a34f5a12bc54c24b7d5f02957cff30c", "allocationId": "3bf0a159-89ee-4bdc-ada1-c577ac2097d1"} 2020-06-10T05:54:40.423Z DEBUG cache/partition_info.go:442 no active allocations found to release {"appID": "spark-3a34f5a12bc54c24b7d5f02957cff30c"} {code} The core scheduler handles it correctly and just ignores the request but when the number of tasks in the shim grows this could have a big performance impact and we need to find out why it removes it twice. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org