[ 
https://issues.apache.org/jira/browse/SOLR-17049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Houston Putman resolved SOLR-17049.
-----------------------------------
    Fix Version/s: 9.7
                   9.6.1
         Assignee: Houston Putman
       Resolution: Fixed

> Marking replicas down at startup and waiting does not wait
> ----------------------------------------------------------
>
>                 Key: SOLR-17049
>                 URL: https://issues.apache.org/jira/browse/SOLR-17049
>             Project: Solr
>          Issue Type: Bug
>    Affects Versions: 8.6
>            Reporter: Vincent Primault
>            Assignee: Houston Putman
>            Priority: Major
>             Fix For: 9.7, 9.6.1
>
>          Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> We observed an unexpected behaviour where a node was taking traffic for a 
> replica that was not ready to take it. It seems to happen when the node is 
> marked as live and the replica is marked as active, while the corresponding 
> core is not loaded yet on the node.
>  
> I looked at the code and in theory it should not happen, since the following 
> happens in {{{}ZkController#init{}}}: mark node as down, wait for replicas to 
> be marked as down, and then register the node as live. However, after looking 
> at the code of {{{}publishAndWaitForDownStates{}}}, I observed that we wait 
> for down states for replicas associated with cores as returned by 
> {{{}CoreContainer#getCoreDescriptors{}}}... which is empty at this point 
> since {{ZkController#init}} is called before cores are discovered (which 
> happens later in {{{}CoreContainer#load{}}}).
>  
> It hence seems to me that we basically never wait for any replicas to be 
> marked as down, and continue the startup sequence by marking the node as 
> live, and hence _might_ take traffic for a short period of time for a replica 
> that is not ready (e.g., if the node previously crashed and the replica 
> stayed active).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

Reply via email to