Peter Bacsko created YUNIKORN-2574: -------------------------------------- Summary: totalPartitionResource should not be mutated with AddTo/SubFrom Key: YUNIKORN-2574 URL: https://issues.apache.org/jira/browse/YUNIKORN-2574 Project: Apache YuniKorn Issue Type: Bug Components: core - scheduler Affects Versions: 1.5.0, 1.4.0 Reporter: Peter Bacsko Assignee: Peter Bacsko
There is a potential data race in PartitionContext: the field "totalPartitionResource" is mutated in place. The problem is that the method {{GetTotalPartitionResource()}} does not clone it. {noformat} func (pc *PartitionContext) GetTotalPartitionResource() *resources.Resource { pc.RLock() defer pc.RUnlock() return pc.totalPartitionResource } {noformat} In general, we should prefer the immutable approach for variables like this, just like in {{objects.Queue}}: {noformat} func (sq *Queue) IncAllocatedResource(alloc *resources.Resource, nodeReported bool) error { // check this queue: failure stops checks if the allocation is not part of a node addition newAllocated := resources.Add(sq.allocatedResource, alloc) [ ... removed ... ] sq.Lock() defer sq.Unlock() // all OK update this queue sq.allocatedResource = newAllocated sq.updateAllocatedResourceMetrics() return nil } // incPendingResource increments pending resource of this queue and its parents. func (sq *Queue) incPendingResource(delta *resources.Resource) { // update the parent if sq.parent != nil { sq.parent.incPendingResource(delta) } // update this queue sq.Lock() defer sq.Unlock() sq.pending = resources.Add(sq.pending, delta) sq.updatePendingResourceMetrics() } {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org