Wilfred Spiegelenburg created YUNIKORN-229:
----------------------------------------------

             Summary: shim sends the same remove request twice for a remove re
                 Key: YUNIKORN-229
                 URL: https://issues.apache.org/jira/browse/YUNIKORN-229
             Project: Apache YuniKorn
          Issue Type: Improvement
          Components: shim - kubernetes
            Reporter: Wilfred Spiegelenburg


In the logs it looks like the shim asks to remove the same allocation using the 
same UUID:

First release request from shim:
{code}
2020-06-10T05:54:24.564Z        DEBUG   cache/cluster_info.go:136       
enqueued event  {"eventType": "*cacheevent.RMUpdateRequestEvent", "event": 
{"Request":{"releases":{"allocationsToRelease":[{"partitionName":"[mycluster]default","applicationID":"spark-3a34f5a12bc54c24b7d5f02957cff30c","UUID":"3bf0a159-89ee-4bdc-ada1-c577ac2097d1","message":"task
 completed"}]},"rmID":"mycluster"}}, "currentQueueSize": 0}
2020-06-10T05:54:24.565Z        DEBUG   scheduler/scheduler.go:191      
enqueued event  {"eventType": 
"*schedulerevent.SchedulerAllocationUpdatesEvent", "event": 
{"RejectedAllocations":null,"AcceptedAllocations":null,"NewAsks":null,"ToReleases":{"allocationsToRelease":[{"partitionName":"[mycluster]default","applicationID":"spark-3a34f5a12bc54c24b7d5f02957cff30c","UUID":"3bf0a159-89ee-4bdc-ada1-c577ac2097d1","message":"task
 completed"}]},"ExistingAllocations":null,"RMId":""}, "currentQueueSize": 0}
2020-06-10T05:54:24.565Z        DEBUG   cache/cluster_info.go:136       
enqueued event  {"eventType": "*cacheevent.ReleaseAllocationsEvent", "event": 
{"AllocationsToRelease":[{"UUID":"3bf0a159-89ee-4bdc-ada1-c577ac2097d1","ApplicationID":"spark-3a34f5a12bc54c24b7d5f02957cff30c","PartitionName":"[mycluster]default","Message":"task
 completed","ReleaseType":0}]}, "currentQueueSize": 0}
2020-06-10T05:54:24.565Z        DEBUG   cache/partition_info.go:429     
removing allocations    {"appID": "spark-3a34f5a12bc54c24b7d5f02957cff30c", 
"allocationId": "3bf0a159-89ee-4bdc-ada1-c577ac2097d1"}
2020-06-10T05:54:24.566Z        INFO    cache/partition_info.go:477     
allocation removed      {"numOfAllocationReleased": 1, "partitionName": 
"[mycluster]default"}
2020-06-10T05:54:24.566Z        DEBUG   rmproxy/rmproxy.go:65   enqueue event   
{"event": 
{"RmID":"mycluster","ReleasedAllocations":[{"UUID":"3bf0a159-89ee-4bdc-ada1-c577ac2097d1","message":"task
 completed"}]}, "currentQueueSize": 0}
2020-06-10T05:54:24.566Z        DEBUG   callback/scheduler_callback.go:44       
callback received       {"updateResponse": 
"releasedAllocations:<UUID:\"3bf0a159-89ee-4bdc-ada1-c577ac2097d1\" 
message:\"task completed\" > "}
2020-06-10T05:54:24.566Z        DEBUG   callback/scheduler_callback.go:119      
callback: response to released allocations      {"UUID": 
"3bf0a159-89ee-4bdc-ada1-c577ac2097d1"}
{code}

Second release request from shim 0.16 seconds after the first request:
{code}
2020-06-10T05:54:40.423Z        DEBUG   cache/cluster_info.go:136       
enqueued event  {"eventType": "*cacheevent.RMUpdateRequestEvent", "event": 
{"Request":{"releases":{"allocationsToRelease":[{"partitionName":"[mycluster]default","applicationID":"spark-3a34f5a12bc54c24b7d5f02957cff30c","UUID":"3bf0a159-89ee-4bdc-ada1-c577ac2097d1","message":"task
 completed"}]},"rmID":"mycluster"}}, "currentQueueSize": 0}
2020-06-10T05:54:40.423Z        DEBUG   scheduler/scheduler.go:191      
enqueued event  {"eventType": 
"*schedulerevent.SchedulerAllocationUpdatesEvent", "event": 
{"RejectedAllocations":null,"AcceptedAllocations":null,"NewAsks":null,"ToReleases":{"allocationsToRelease":[{"partitionName":"[mycluster]default","applicationID":"spark-3a34f5a12bc54c24b7d5f02957cff30c","UUID":"3bf0a159-89ee-4bdc-ada1-c577ac2097d1","message":"task
 completed"}]},"ExistingAllocations":null,"RMId":""}, "currentQueueSize": 0}
2020-06-10T05:54:40.423Z        DEBUG   cache/cluster_info.go:136       
enqueued event  {"eventType": "*cacheevent.ReleaseAllocationsEvent", "event": 
{"AllocationsToRelease":[{"UUID":"3bf0a159-89ee-4bdc-ada1-c577ac2097d1","ApplicationID":"spark-3a34f5a12bc54c24b7d5f02957cff30c","PartitionName":"[mycluster]default","Message":"task
 completed","ReleaseType":0}]}, "currentQueueSize": 0}
2020-06-10T05:54:40.423Z        DEBUG   cache/partition_info.go:429     
removing allocations    {"appID": "spark-3a34f5a12bc54c24b7d5f02957cff30c", 
"allocationId": "3bf0a159-89ee-4bdc-ada1-c577ac2097d1"}
2020-06-10T05:54:40.423Z        DEBUG   cache/partition_info.go:442     no 
active allocations found to release  {"appID": 
"spark-3a34f5a12bc54c24b7d5f02957cff30c"}
{code}

The core scheduler handles it correctly and just ignores the request but when 
the number of tasks in the shim grows this could have a big performance impact 
and we need to find out why it removes it twice.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: issues-h...@yunikorn.apache.org

Reply via email to