Istvan Fajth created HDDS-2695:
----------------------------------

             Summary: SCM is not able to start under certain conditions
                 Key: HDDS-2695
                 URL: https://issues.apache.org/jira/browse/HDDS-2695
             Project: Hadoop Distributed Data Store
          Issue Type: Bug
          Components: SCM
            Reporter: Istvan Fajth


Given
- a cluster where RATIS-677 happened, and DataNodes are already failing to 
start properly due to the issue
When
- I restart the cluster and start to see the exceptions as described in 
RATIS-677
- I stop the 3 DN that has the failing pipeline
- remove the ratis metadata for the pipeline
- close the pipeline with scmcli
- restart the 3 DN
Then
- SCM is unable to come out of safe mode, the log shows the following possible 
reason:
{code}
2019-12-09 01:13:38,437 INFO 
org.apache.hadoop.hdds.scm.safemode.SCMSafeModeManager: SCM in safe mode. 
Pipelines with at least one datanode reported count is 0, required at least one 
datanode reported per pipeline count is 4
{code}

If after this I restart the SCM, it fails without logging any exception, and 
the standard error contains the following message es the last one:
{code}
PipelineID=<id_of_pipeline_that_has_been_closed> not found
{code}

Also scmcli did not list the closed pipeline after I closed it and checked the 
active pipelines.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org

Reply via email to