Wilfred Spiegelenburg created YUNIKORN-936:
----------------------------------------------

             Summary: app and node recovery event ordering
                 Key: YUNIKORN-936
                 URL: https://issues.apache.org/jira/browse/YUNIKORN-936
             Project: Apache YuniKorn
          Issue Type: Improvement
          Components: core - common
            Reporter: Wilfred Spiegelenburg


While working on YUNIKORN-905 a number of unit tests failed due to event 
ordering. Looking at the change we might have had an issue in the RMProxy for a 
long time.

An update request could contain apps, asks and nodes. Processing was ordered 
like that too. During recovery the order was/is important. There was never an 
order requirement on the events send by a shim or a use of complex updates 
events to support this ordering by the shim.

An event to recover a node could be a separate UpdateRequest from the 
applications that should be recovered. That means we relied on the go routine 
and event ordering to hopefully do things correctly: i.e. events send by the 
shim to create new apps would be processed before node recovery started. Even 
in the previous implementation there was no guarantee that all the application 
were added before a node was recovered. The unit tests in the core used the 
order processing dependency to make sure it worked.

That is not the real world scenario. and thus a dangerous assumption.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@yunikorn.apache.org
For additional commands, e-mail: dev-h...@yunikorn.apache.org

Reply via email to