Weiwei Yang created YUNIKORN-1218: ------------------------------------- Summary: Scheduler crashed with concurrent map access error in health checker Key: YUNIKORN-1218 URL: https://issues.apache.org/jira/browse/YUNIKORN-1218 Project: Apache YuniKorn Issue Type: Bug Components: core - scheduler Reporter: Weiwei Yang Assignee: Weiwei Yang
After YUNIKORN-1107, the health checker runs as a background thread in 30s interval. We observed a few scheduler restarts in the past week that seems to be caused by this thread, because it has an unsafe access to the partition context without proper read lock. I have uploaded a patch to reproduce this locally, and a file of the stack trace when crash happens. -- This message was sent by Atlassian Jira (v8.20.7#820007) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org