[ https://issues.apache.org/jira/browse/MESOS-9172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16598998#comment-16598998 ]
James Peach edited comment on MESOS-9172 at 8/31/18 4:46 PM: ------------------------------------------------------------- | [r/68587|https://reviews.apache.org/r/68587] | Fixed fetcher deadlock with duplicate URIs. | | [r/68586|https://reviews.apache.org/r/68586] | Add the output file to the hash on CommandInfo::URI. | was (Author: jamespeach): | [r/68587|https://reviews.apache.org/*r/68587] | Fixed fetcher deadlock with duplicate URIs. | | [r/68586|https://reviews.apache.org/*r/68586] | Add the output file to the hash on CommandInfo::URI. | > Fetcher deadlock with duplicated URIs. > -------------------------------------- > > Key: MESOS-9172 > URL: https://issues.apache.org/jira/browse/MESOS-9172 > Project: Mesos > Issue Type: Bug > Components: fetcher > Reporter: James Peach > Assignee: James Peach > Priority: Major > > If the fetcher cache is empty and you launch a task that contains duplicate > URIs, the fetcher deadlocks waiting for the futures in > {{FetcherProcess::_fetch}}. > What happens is that when the fetcher is setting up the initial match of > cache lookup futures in {{FetcherProcess::fetch}}, the duplicate URIs cause > cache hits on the placeholder cache entries. This code is assuming that there > is already an operation in flight that will populate the cache entry. > However, the cache is currently empty - the placeholder entry is caused by a > the duplicate in the task's URIs. > When we await the futures in {{FetcherProcess::_fetch}}, we end up waiting > for the future that indicated the cache entry becomes populated, but that > won't ever happen because we need to make progress on the current fetching > batch in order to populate the cache entry. At this point we are live-locked. -- This message was sent by Atlassian JIRA (v7.6.3#76005)