[ https://issues.apache.org/jira/browse/HDDS-3314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ASF GitHub Bot updated HDDS-3314: --------------------------------- Labels: pull-request-available (was: ) > scmcli container info command failing intermittently > ----------------------------------------------------- > > Key: HDDS-3314 > URL: https://issues.apache.org/jira/browse/HDDS-3314 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: SCM, SCM Client > Reporter: Nilotpal Nandi > Assignee: Sadanand Shenoy > Priority: Major > Labels: pull-request-available > > config set before running the command : > "ozone.scm.stale.node.interval": "2m", > "ozone.scm.dead.node.interval": "4m", > "hdds.scm.replication.thread.interval": "12s", > "ozone.scm.container.size": "1GB" > > steps taken : > 1) write a key (less than a block size) > 2) shutdown two container replica datanodes. > 3) Tried to query container info > Container info command failed . > > > {noformat} > ozone scmcli container info 33 | egrep 'Container|Datanodes' > Failed to execute command cmdType: ReadContainer > {noformat} > > scm log during that time range : > {noformat} > 2020-04-01 10:09:29,665 INFO SecurityLogger.org.apache.hadoop.ipc.Server: > Auth successful for hrt...@root.hwx.site (auth:KERBEROS) > 2020-04-01 10:09:29,706 INFO > SecurityLogger.org.apache.hadoop.security.authorize.ServiceAuthorizationManager: > Authorization successful for hrt...@root.hwx.site (auth:KERBEROS) for > protocol=interface > org.apache.hadoop.hdds.scm.protocol.StorageContainerLocationProtocol > 2020-04-01 10:09:55,283 INFO SecurityLogger.org.apache.hadoop.ipc.Server: > Auth successful for > dn/quasar-fjgcwr-2.quasar-fjgcwr.root.hwx.s...@root.hwx.site (auth:KERBEROS) > 2020-04-01 10:09:55,287 INFO > SecurityLogger.org.apache.hadoop.security.authorize.ServiceAuthorizationManager: > Authorization successful for > dn/quasar-fjgcwr-2.quasar-fjgcwr.root.hwx.s...@root.hwx.site (auth:KERBEROS) > for protocol=interface > org.apache.hadoop.ozone.protocol.StorageContainerDatanodeProtocol > 2020-04-01 10:09:55,474 INFO > org.apache.hadoop.hdds.scm.container.ReplicationManager: Starting Replication > Monitor Thread. > 2020-04-01 10:09:55,486 INFO > org.apache.hadoop.hdds.scm.container.ReplicationManager: Replication Monitor > Thread took 10 milliseconds for processing 33 containers. > 2020-04-01 10:10:07,488 INFO > org.apache.hadoop.hdds.scm.container.ReplicationManager: Replication Monitor > Thread took 2 milliseconds for processing 33 containers. > 2020-04-01 10:10:17,996 INFO SecurityLogger.org.apache.hadoop.ipc.Server: > Auth successful for > dn/quasar-fjgcwr-7.quasar-fjgcwr.root.hwx.s...@root.hwx.site (auth:KERBEROS) > 2020-04-01 10:10:18,001 INFO > SecurityLogger.org.apache.hadoop.security.authorize.ServiceAuthorizationManager: > Authorization successful for > dn/quasar-fjgcwr-7.quasar-fjgcwr.root.hwx.s...@root.hwx.site (auth:KERBEROS) > for protocol=interface > org.apache.hadoop.ozone.protocol.StorageContainerDatanodeProtocol > 2020-04-01 10:10:19,491 INFO > org.apache.hadoop.hdds.scm.container.ReplicationManager: Replication Monitor > Thread took 3 milliseconds for processing 33 containers. > 2020-04-01 10:10:31,494 INFO > org.apache.hadoop.hdds.scm.container.ReplicationManager: Replication Monitor > Thread took 2 milliseconds for processing 33 containers. > 2020-04-01 10:10:43,495 INFO > org.apache.hadoop.hdds.scm.container.ReplicationManager: Replication Monitor > Thread took 1 milliseconds for processing 33 containers. > 2020-04-01 10:10:47,987 ERROR > org.apache.hadoop.hdds.scm.pipeline.PipelineActionHandler: Received pipeline > action CLOSE for Pipeline[ Id: 763bd379-a703-4dc0-85c5-bf385cdc0b18, Nodes: > 92f73ec3-9ed8-41c8-9103-c4c1b2b365e1{ip: 172.27.120.0, host: > quasar-fjgcwr-1.quasar-fjgcwr.root.hwx.site, networkLocation: /default-rack, > certSerialId: null}b60097cf-7dff-44dc-800f-3500dda636f6{ip: 172.27.123.128, > host: quasar-fjgcwr-4.quasar-fjgcwr.root.hwx.site, networkLocation: > /default-rack, certSerialId: null}ea2322d9-8ede-4f48-a72d-693e809d2b95{ip: > 172.27.12.195, host: quasar-fjgcwr-7.quasar-fjgcwr.root.hwx.site, > networkLocation: /default-rack, certSerialId: null}, Type:RATIS, > Factor:THREE, State:OPEN, leaderId:b60097cf-7dff-44dc-800f-3500dda636f6, > CreationTimestamp2020-04-01T10:04:47.723688Z] from datanode > ea2322d9-8ede-4f48-a72d-693e809d2b95{ip: 172.27.12.195, host: > quasar-fjgcwr-7.quasar-fjgcwr.root.hwx.site, networkLocation: /default-rack, > certSerialId: 12651664310640168}. Reason : > ea2322d9-8ede-4f48-a72d-693e809d2b95 is in candidate state for 61616ms > 2020-04-01 10:10:47,988 INFO > org.apache.hadoop.hdds.scm.pipeline.SCMPipelineManager: Destroying > pipeline:Pipeline[ Id: 763bd379-a703-4dc0-85c5-bf385cdc0b18, Nodes: > 92f73ec3-9ed8-41c8-9103-c4c1b2b365e1{ip: 172.27.120.0, host: > quasar-fjgcwr-1.quasar-fjgcwr.root.hwx.site, networkLocation: /default-rack, > certSerialId: null}b60097cf-7dff-44dc-800f-3500dda636f6{ip: 172.27.123.128, > host: quasar-fjgcwr-4.quasar-fjgcwr.root.hwx.site, networkLocation: > /default-rack, certSerialId: null}ea2322d9-8ede-4f48-a72d-693e809d2b95{ip: > 172.27.12.195, host: quasar-fjgcwr-7.quasar-fjgcwr.root.hwx.site, > networkLocation: /default-rack, certSerialId: null}, Type:RATIS, > Factor:THREE, State:OPEN, leaderId:b60097cf-7dff-44dc-800f-3500dda636f6, > CreationTimestamp2020-04-01T10:04:47.723688Z]{noformat} > -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: ozone-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: ozone-issues-h...@hadoop.apache.org