[ https://issues.apache.org/jira/browse/SOLR-17049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Houston Putman resolved SOLR-17049. ----------------------------------- Fix Version/s: 9.7 9.6.1 Assignee: Houston Putman Resolution: Fixed > Marking replicas down at startup and waiting does not wait > ---------------------------------------------------------- > > Key: SOLR-17049 > URL: https://issues.apache.org/jira/browse/SOLR-17049 > Project: Solr > Issue Type: Bug > Affects Versions: 8.6 > Reporter: Vincent Primault > Assignee: Houston Putman > Priority: Major > Fix For: 9.7, 9.6.1 > > Time Spent: 2h 40m > Remaining Estimate: 0h > > We observed an unexpected behaviour where a node was taking traffic for a > replica that was not ready to take it. It seems to happen when the node is > marked as live and the replica is marked as active, while the corresponding > core is not loaded yet on the node. > > I looked at the code and in theory it should not happen, since the following > happens in {{{}ZkController#init{}}}: mark node as down, wait for replicas to > be marked as down, and then register the node as live. However, after looking > at the code of {{{}publishAndWaitForDownStates{}}}, I observed that we wait > for down states for replicas associated with cores as returned by > {{{}CoreContainer#getCoreDescriptors{}}}... which is empty at this point > since {{ZkController#init}} is called before cores are discovered (which > happens later in {{{}CoreContainer#load{}}}). > > It hence seems to me that we basically never wait for any replicas to be > marked as down, and continue the startup sequence by marking the node as > live, and hence _might_ take traffic for a short period of time for a replica > that is not ready (e.g., if the node previously crashed and the replica > stayed active). -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org