Re: [PR] HDDS-14108. Provide option in ‘scm safemode status’ to show status of all SCM nodes [ozone]
sumitagrawl merged PR #9611: URL: https://github.com/apache/ozone/pull/9611 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] HDDS-14108. Provide option in ‘scm safemode status’ to show status of all SCM nodes [ozone]
sumitagrawl commented on code in PR #9611:
URL: https://github.com/apache/ozone/pull/9611#discussion_r2821554658
##
hadoop-ozone/cli-admin/src/main/java/org/apache/hadoop/hdds/scm/cli/SafeModeCheckSubcommand.java:
##
@@ -32,25 +44,203 @@
description = "Check if SCM is in safe mode",
mixinStandardHelpOptions = true,
versionProvider = HddsVersionProvider.class)
-public class SafeModeCheckSubcommand extends ScmSubcommand {
+public class SafeModeCheckSubcommand extends AbstractSubcommand implements
Callable {
+ @CommandLine.Mixin
+ private ScmOption scmOption;
+
+ @CommandLine.Option(names = {"--all", "-a"},
+ description = "Show safe mode status for all SCM nodes in the service. "
+
+ "When multiple SCM service IDs are configured, --service-id must be
specified.")
+ private boolean allNodes;
+
+ private String serviceId;
+ private List nodes;
@Override
- public void execute(ScmClient scmClient) throws IOException {
-boolean execReturn = scmClient.inSafeMode();
+ public Void call() throws Exception {
+OzoneConfiguration conf = getOzoneConf();
+serviceId = HddsUtils.getScmServiceId(conf);
+String scmAddress = scmOption.getScm();
+if (serviceId != null) {
+ nodes = SCMNodeInfo.buildNodeInfo(conf);
+}
+
+ScmNodeTarget targetScmNode = new ScmNodeTarget();
+try (ScmClient scmClient = scmOption.createScmClient(conf, targetScmNode))
{
Review Comment:
simplify the code, all node: iterate all, query, scm: get matching node,
query, default: get leader or only one if empty, query
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] HDDS-14108. Provide option in ‘scm safemode status’ to show status of all SCM nodes [ozone]
sreejasahithi commented on PR #9611: URL: https://github.com/apache/ozone/pull/9611#issuecomment-3882129655 @sumitagrawl could you please review this patch. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] HDDS-14108. Provide option in ‘scm safemode status’ to show status of all SCM nodes [ozone]
sreejasahithi commented on code in PR #9611:
URL: https://github.com/apache/ozone/pull/9611#discussion_r2720755916
##
hadoop-ozone/cli-admin/src/main/java/org/apache/hadoop/hdds/scm/cli/SafeModeCheckSubcommand.java:
##
@@ -45,12 +71,106 @@ public void execute(ScmClient scmClient) throws
IOException {
System.out.println("SCM is out of safe mode.");
}
if (isVerbose()) {
- for (Map.Entry> entry :
- scmClient.getSafeModeRuleStatuses().entrySet()) {
-Pair value = entry.getValue();
-System.out.printf("validated:%s, %s, %s%n",
-value.getLeft(), entry.getKey(), value.getRight());
+ printSafeModeRules(scmClient.getSafeModeRuleStatuses());
+}
+ }
+
+ private void executeForSpecificNodeInHA(ScmClient scmClient, String
serviceId) throws IOException {
+String scmAddress = getScmOption().getScm();
+
+System.out.println("Service ID: " + serviceId);
+
+final OzoneConfiguration conf = getOzoneConf();
+
+List nodes = SCMNodeInfo.buildNodeInfo(conf);
+
+// Find the node matching the --scm address
+List matchedNodes = nodes.stream()
+.filter(node -> matchesAddress(node, scmAddress))
+.collect(Collectors.toList());
+
+if (matchedNodes.isEmpty()) {
+ throw new IOException("Specified --scm address " + scmAddress +
+ " does not match any node in service " + serviceId +
+ ". Available nodes: " + nodes.stream()
+ .map(n -> n.getScmClientAddress() + " [" + n.getNodeId() + "]")
+ .collect(Collectors.joining(", ")));
+}
+
+queryNode(scmClient, matchedNodes.get(0));
+ }
+
+ private void executeForAllNodes(ScmClient scmClient) throws IOException {
+final OzoneConfiguration conf = getOzoneConf();
+String serviceId = HddsUtils.getScmServiceId(conf);
+
+if (serviceId == null) {
+ executeForSingleNode(scmClient);
+ return;
+}
+
+System.out.println("Service ID: " + serviceId);
+List nodes = SCMNodeInfo.buildNodeInfo(conf);
+
+for (SCMNodeInfo node : nodes) {
+ queryNode(scmClient, node);
+}
+ }
+
+ private void queryNode(ScmClient scmClient, SCMNodeInfo node) {
+String nodeId = node.getNodeId();
+
+try {
+ boolean inSafeMode = scmClient.inSafeModeForNode(nodeId);
+
+ System.out.printf("%s [%s]: %s%n",
+ node.getScmClientAddress(),
+ nodeId,
+ inSafeMode ? "IN SAFE MODE" : "OUT OF SAFE MODE");
+
+ if (isVerbose()) {
+Map> rules =
scmClient.getSafeModeRuleStatusesForNode(nodeId);
+if (rules != null && !rules.isEmpty()) {
+ printSafeModeRules(rules);
+}
}
+} catch (Exception e) {
+ System.out.printf("%s [%s]: ERROR: Failed to get safe mode status -
%s%n",
+ node.getScmClientAddress(), nodeId, e.getMessage());
+}
+ }
+
+ /**
+ * Check if the given SCMNodeInfo matches the target address.
+ * Tries to match by direct string comparison and by resolved address.
+ */
+ private boolean matchesAddress(SCMNodeInfo node, String targetAddress) {
+String nodeAddress = node.getScmClientAddress();
+
+// Direct match
+if (nodeAddress.equals(targetAddress)) {
+ return true;
+}
+
+// Try normalizing both addresses and comparing
+try {
+ InetSocketAddress target = NetUtils.createSocketAddr(targetAddress);
+ InetSocketAddress nodeAddr = NetUtils.createSocketAddr(nodeAddress);
+
+ // Match by resolved IP and port
+ return target.getPort() == nodeAddr.getPort() &&
+ target.getAddress().equals(nodeAddr.getAddress());
+} catch (Exception e) {
+ // If address resolution fails, no match
+ return false;
Review Comment:
I have removed the logging here because it creates unwanted noise in the CLI
output. Instead, I have ensured that actual errors are properly surfaced when
no leader can be determined or when the node specified in --scm option doesn't
match, clear error messages are thrown to the user.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] HDDS-14108. Provide option in ‘scm safemode status’ to show status of all SCM nodes [ozone]
aryangupta1998 commented on code in PR #9611:
URL: https://github.com/apache/ozone/pull/9611#discussion_r2711408445
##
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/server/SCMClientProtocolServer.java:
##
@@ -1033,6 +1033,42 @@ public Map>
getSafeModeRuleStatuses()
}
}
+ @Override
+ public boolean inSafeModeForNode(String nodeId) throws IOException {
+Map auditMap = Maps.newHashMap();
+auditMap.put("nodeId", nodeId);
+try {
+ boolean result = inSafeMode();
Review Comment:
```suggestion
boolean result = scm.isInSafeMode();
```
It’s better to use `scm.isInSafeMode()` instead of `inSafeMode()`, since
inSafeMode() is a generic API and may evolve in the future (for example, to
include leader-level logic). Using scm.isInSafeMode() makes it explicit that we
are returning the local SCM node’s safe mode state.
##
hadoop-ozone/cli-admin/src/main/java/org/apache/hadoop/hdds/scm/cli/SafeModeCheckSubcommand.java:
##
@@ -33,24 +42,181 @@
mixinStandardHelpOptions = true,
versionProvider = HddsVersionProvider.class)
public class SafeModeCheckSubcommand extends ScmSubcommand {
+ @CommandLine.Option(names = {"--all", "-a"},
+ description = "Show safe mode status for all SCM nodes in the service. "
+
+ "When multiple SCM service IDs are configured, --service-id must be
specified.")
+ private boolean allNodes;
+
+ private String serviceId;
+ private List nodes;
@Override
public void execute(ScmClient scmClient) throws IOException {
-boolean execReturn = scmClient.inSafeMode();
+OzoneConfiguration conf = getOzoneConf();
+serviceId = HddsUtils.getScmServiceId(conf);
+String scmAddress = getScmOption().getScm();
+if (serviceId != null) {
+ nodes = SCMNodeInfo.buildNodeInfo(conf);
+}
+
+if (allNodes) {
+ executeForAllNodes(scmClient);
+} else if (StringUtils.isNotEmpty(scmAddress)) {
+ executeForSpecificNode(scmClient, scmAddress);
+} else {
+ executeForSingleNode(scmClient);
+}
+ }
+
+ private void executeForSingleNode(ScmClient scmClient) throws IOException {
+boolean inSafeMode;
+Map> rules = null;
+String leaderNodeId;
-// Output data list
-if (execReturn) {
+// If SCM HA mode, query the leader node.
+if (serviceId != null) {
+ leaderNodeId = findLeaderNodeId(scmClient);
Review Comment:
```suggestion
leaderNodeId = findLeaderNodeId(scmClient);
if (leaderNodeId == null) {
throw new IOException("Unable to determine SCM leader for the service");
}
```
##
hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/scm/protocol/StorageContainerLocationProtocol.java:
##
@@ -73,6 +73,14 @@ public interface StorageContainerLocationProtocol extends
Closeable {
Type.StopReplicationManager,
Type.ForceExitSafeMode));
+ /**
+ * Read-only commands that can execute on followers without leader check.
+ * These commands respect the --scm parameter and query the specified SCM.
+ */
+ Set FOLLOWER_READABLE_COMMAND_TYPE =
Collections.unmodifiableSet(EnumSet.of(
Review Comment:
nit: FOLLOWER_READABLE_COMMAND_TYPES
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] HDDS-14108. Provide option in ‘scm safemode status’ to show status of all SCM nodes [ozone]
sreejasahithi commented on code in PR #9611:
URL: https://github.com/apache/ozone/pull/9611#discussion_r2697944257
##
hadoop-ozone/cli-admin/src/main/java/org/apache/hadoop/hdds/scm/cli/SafeModeCheckSubcommand.java:
##
@@ -33,9 +42,26 @@
mixinStandardHelpOptions = true,
versionProvider = HddsVersionProvider.class)
public class SafeModeCheckSubcommand extends ScmSubcommand {
+ @CommandLine.Option(names = {"--all", "-a"},
+ description = "Show safe mode status for all SCM nodes in the service. "
+
+ "When multiple SCM service IDs are configured, --service-id must be
specified.")
+ private boolean allNodes;
@Override
public void execute(ScmClient scmClient) throws IOException {
+final OzoneConfiguration conf = getOzoneConf();
+String serviceId = HddsUtils.getScmServiceId(conf);
+
+if (allNodes) {
+ executeForAllNodes(scmClient);
+} else if (StringUtils.isNotEmpty(getScmOption().getScm()) && serviceId !=
null) {
+ executeForSpecificNodeInHA(scmClient, serviceId);
+} else {
+ executeForSingleNode(scmClient);
Review Comment:
Thanks @ashishkumar50 for finding this bug, you are right now that we are
allowing follower to also accept status command there can be a possibility
where when we run safemode status command with no additional option it can
return the status of the follower.
I have fixed this issue.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] HDDS-14108. Provide option in ‘scm safemode status’ to show status of all SCM nodes [ozone]
sreejasahithi commented on code in PR #9611:
URL: https://github.com/apache/ozone/pull/9611#discussion_r2697944257
##
hadoop-ozone/cli-admin/src/main/java/org/apache/hadoop/hdds/scm/cli/SafeModeCheckSubcommand.java:
##
@@ -33,9 +42,26 @@
mixinStandardHelpOptions = true,
versionProvider = HddsVersionProvider.class)
public class SafeModeCheckSubcommand extends ScmSubcommand {
+ @CommandLine.Option(names = {"--all", "-a"},
+ description = "Show safe mode status for all SCM nodes in the service. "
+
+ "When multiple SCM service IDs are configured, --service-id must be
specified.")
+ private boolean allNodes;
@Override
public void execute(ScmClient scmClient) throws IOException {
+final OzoneConfiguration conf = getOzoneConf();
+String serviceId = HddsUtils.getScmServiceId(conf);
+
+if (allNodes) {
+ executeForAllNodes(scmClient);
+} else if (StringUtils.isNotEmpty(getScmOption().getScm()) && serviceId !=
null) {
+ executeForSpecificNodeInHA(scmClient, serviceId);
+} else {
+ executeForSingleNode(scmClient);
Review Comment:
Thanks @ashishkumar50 for finding this bug, now that we are allowing
follower to also accept status command there can be a possibility where when we
run safemode status command with no additional option it can return the status
of the follower.
I have fixed this issue.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] HDDS-14108. Provide option in ‘scm safemode status’ to show status of all SCM nodes [ozone]
sadanand48 commented on code in PR #9611:
URL: https://github.com/apache/ozone/pull/9611#discussion_r2689399814
##
hadoop-ozone/cli-admin/src/main/java/org/apache/hadoop/hdds/scm/cli/SafeModeCheckSubcommand.java:
##
@@ -45,12 +71,106 @@ public void execute(ScmClient scmClient) throws
IOException {
System.out.println("SCM is out of safe mode.");
}
if (isVerbose()) {
- for (Map.Entry> entry :
- scmClient.getSafeModeRuleStatuses().entrySet()) {
-Pair value = entry.getValue();
-System.out.printf("validated:%s, %s, %s%n",
-value.getLeft(), entry.getKey(), value.getRight());
+ printSafeModeRules(scmClient.getSafeModeRuleStatuses());
+}
+ }
+
+ private void executeForSpecificNodeInHA(ScmClient scmClient, String
serviceId) throws IOException {
+String scmAddress = getScmOption().getScm();
+
+System.out.println("Service ID: " + serviceId);
+
+final OzoneConfiguration conf = getOzoneConf();
+
+List nodes = SCMNodeInfo.buildNodeInfo(conf);
+
+// Find the node matching the --scm address
+List matchedNodes = nodes.stream()
+.filter(node -> matchesAddress(node, scmAddress))
+.collect(Collectors.toList());
+
+if (matchedNodes.isEmpty()) {
+ throw new IOException("Specified --scm address " + scmAddress +
+ " does not match any node in service " + serviceId +
+ ". Available nodes: " + nodes.stream()
+ .map(n -> n.getScmClientAddress() + " [" + n.getNodeId() + "]")
+ .collect(Collectors.joining(", ")));
+}
+
+queryNode(scmClient, matchedNodes.get(0));
+ }
+
+ private void executeForAllNodes(ScmClient scmClient) throws IOException {
+final OzoneConfiguration conf = getOzoneConf();
+String serviceId = HddsUtils.getScmServiceId(conf);
+
+if (serviceId == null) {
+ executeForSingleNode(scmClient);
+ return;
+}
+
+System.out.println("Service ID: " + serviceId);
+List nodes = SCMNodeInfo.buildNodeInfo(conf);
+
+for (SCMNodeInfo node : nodes) {
+ queryNode(scmClient, node);
+}
+ }
+
+ private void queryNode(ScmClient scmClient, SCMNodeInfo node) {
+String nodeId = node.getNodeId();
+
+try {
+ boolean inSafeMode = scmClient.inSafeModeForNode(nodeId);
+
+ System.out.printf("%s [%s]: %s%n",
+ node.getScmClientAddress(),
+ nodeId,
+ inSafeMode ? "IN SAFE MODE" : "OUT OF SAFE MODE");
+
+ if (isVerbose()) {
+Map> rules =
scmClient.getSafeModeRuleStatusesForNode(nodeId);
+if (rules != null && !rules.isEmpty()) {
+ printSafeModeRules(rules);
+}
}
+} catch (Exception e) {
+ System.out.printf("%s [%s]: ERROR: Failed to get safe mode status -
%s%n",
+ node.getScmClientAddress(), nodeId, e.getMessage());
+}
+ }
+
+ /**
+ * Check if the given SCMNodeInfo matches the target address.
+ * Tries to match by direct string comparison and by resolved address.
+ */
+ private boolean matchesAddress(SCMNodeInfo node, String targetAddress) {
+String nodeAddress = node.getScmClientAddress();
+
+// Direct match
+if (nodeAddress.equals(targetAddress)) {
+ return true;
+}
+
+// Try normalizing both addresses and comparing
+try {
+ InetSocketAddress target = NetUtils.createSocketAddr(targetAddress);
+ InetSocketAddress nodeAddr = NetUtils.createSocketAddr(nodeAddress);
+
+ // Match by resolved IP and port
+ return target.getPort() == nodeAddr.getPort() &&
+ target.getAddress().equals(nodeAddr.getAddress());
+} catch (Exception e) {
+ // If address resolution fails, no match
+ return false;
Review Comment:
nit : Log the exception here
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] HDDS-14108. Provide option in ‘scm safemode status’ to show status of all SCM nodes [ozone]
sadanand48 commented on code in PR #9611:
URL: https://github.com/apache/ozone/pull/9611#discussion_r2689399814
##
hadoop-ozone/cli-admin/src/main/java/org/apache/hadoop/hdds/scm/cli/SafeModeCheckSubcommand.java:
##
@@ -45,12 +71,106 @@ public void execute(ScmClient scmClient) throws
IOException {
System.out.println("SCM is out of safe mode.");
}
if (isVerbose()) {
- for (Map.Entry> entry :
- scmClient.getSafeModeRuleStatuses().entrySet()) {
-Pair value = entry.getValue();
-System.out.printf("validated:%s, %s, %s%n",
-value.getLeft(), entry.getKey(), value.getRight());
+ printSafeModeRules(scmClient.getSafeModeRuleStatuses());
+}
+ }
+
+ private void executeForSpecificNodeInHA(ScmClient scmClient, String
serviceId) throws IOException {
+String scmAddress = getScmOption().getScm();
+
+System.out.println("Service ID: " + serviceId);
+
+final OzoneConfiguration conf = getOzoneConf();
+
+List nodes = SCMNodeInfo.buildNodeInfo(conf);
+
+// Find the node matching the --scm address
+List matchedNodes = nodes.stream()
+.filter(node -> matchesAddress(node, scmAddress))
+.collect(Collectors.toList());
+
+if (matchedNodes.isEmpty()) {
+ throw new IOException("Specified --scm address " + scmAddress +
+ " does not match any node in service " + serviceId +
+ ". Available nodes: " + nodes.stream()
+ .map(n -> n.getScmClientAddress() + " [" + n.getNodeId() + "]")
+ .collect(Collectors.joining(", ")));
+}
+
+queryNode(scmClient, matchedNodes.get(0));
+ }
+
+ private void executeForAllNodes(ScmClient scmClient) throws IOException {
+final OzoneConfiguration conf = getOzoneConf();
+String serviceId = HddsUtils.getScmServiceId(conf);
+
+if (serviceId == null) {
+ executeForSingleNode(scmClient);
+ return;
+}
+
+System.out.println("Service ID: " + serviceId);
+List nodes = SCMNodeInfo.buildNodeInfo(conf);
+
+for (SCMNodeInfo node : nodes) {
+ queryNode(scmClient, node);
+}
+ }
+
+ private void queryNode(ScmClient scmClient, SCMNodeInfo node) {
+String nodeId = node.getNodeId();
+
+try {
+ boolean inSafeMode = scmClient.inSafeModeForNode(nodeId);
+
+ System.out.printf("%s [%s]: %s%n",
+ node.getScmClientAddress(),
+ nodeId,
+ inSafeMode ? "IN SAFE MODE" : "OUT OF SAFE MODE");
+
+ if (isVerbose()) {
+Map> rules =
scmClient.getSafeModeRuleStatusesForNode(nodeId);
+if (rules != null && !rules.isEmpty()) {
+ printSafeModeRules(rules);
+}
}
+} catch (Exception e) {
+ System.out.printf("%s [%s]: ERROR: Failed to get safe mode status -
%s%n",
+ node.getScmClientAddress(), nodeId, e.getMessage());
+}
+ }
+
+ /**
+ * Check if the given SCMNodeInfo matches the target address.
+ * Tries to match by direct string comparison and by resolved address.
+ */
+ private boolean matchesAddress(SCMNodeInfo node, String targetAddress) {
+String nodeAddress = node.getScmClientAddress();
+
+// Direct match
+if (nodeAddress.equals(targetAddress)) {
+ return true;
+}
+
+// Try normalizing both addresses and comparing
+try {
+ InetSocketAddress target = NetUtils.createSocketAddr(targetAddress);
+ InetSocketAddress nodeAddr = NetUtils.createSocketAddr(nodeAddress);
+
+ // Match by resolved IP and port
+ return target.getPort() == nodeAddr.getPort() &&
+ target.getAddress().equals(nodeAddr.getAddress());
+} catch (Exception e) {
+ // If address resolution fails, no match
+ return false;
Review Comment:
nit : Log the exception here before returning false
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] HDDS-14108. Provide option in ‘scm safemode status’ to show status of all SCM nodes [ozone]
priyeshkaratha commented on code in PR #9611:
URL: https://github.com/apache/ozone/pull/9611#discussion_r2688872727
##
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/server/SCMClientProtocolServer.java:
##
@@ -1032,6 +1033,24 @@ public Map>
getSafeModeRuleStatuses()
}
}
+ @Override
+ public boolean inSafeModeForNode(String nodeId) throws IOException {
+boolean result = inSafeMode();
+AUDIT.logReadSuccess(
+buildAuditMessageForSuccess(SCMAction.IN_SAFE_MODE, null)
+);
+return result;
+ }
+
+ @Override
+ public Map>
getSafeModeRuleStatusesForNode(String nodeId) throws IOException {
+Map> result = getSafeModeRuleStatuses();
+AUDIT.logReadSuccess(
Review Comment:
Please use logReadSuccess with auditMap with nodeId added and also add
auditlog on failure since its called for nodeId
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] HDDS-14108. Provide option in ‘scm safemode status’ to show status of all SCM nodes [ozone]
ashishkumar50 commented on code in PR #9611:
URL: https://github.com/apache/ozone/pull/9611#discussion_r2685945921
##
hadoop-ozone/cli-admin/src/main/java/org/apache/hadoop/hdds/scm/cli/SafeModeCheckSubcommand.java:
##
@@ -45,12 +71,106 @@ public void execute(ScmClient scmClient) throws
IOException {
System.out.println("SCM is out of safe mode.");
}
if (isVerbose()) {
- for (Map.Entry> entry :
- scmClient.getSafeModeRuleStatuses().entrySet()) {
-Pair value = entry.getValue();
-System.out.printf("validated:%s, %s, %s%n",
-value.getLeft(), entry.getKey(), value.getRight());
+ printSafeModeRules(scmClient.getSafeModeRuleStatuses());
+}
+ }
+
+ private void executeForSpecificNodeInHA(ScmClient scmClient, String
serviceId) throws IOException {
+String scmAddress = getScmOption().getScm();
Review Comment:
scmAddress is not mandatory option.
##
hadoop-ozone/cli-admin/src/main/java/org/apache/hadoop/hdds/scm/cli/SafeModeCheckSubcommand.java:
##
@@ -33,9 +42,26 @@
mixinStandardHelpOptions = true,
versionProvider = HddsVersionProvider.class)
public class SafeModeCheckSubcommand extends ScmSubcommand {
+ @CommandLine.Option(names = {"--all", "-a"},
+ description = "Show safe mode status for all SCM nodes in the service. "
+
+ "When multiple SCM service IDs are configured, --service-id must be
specified.")
+ private boolean allNodes;
@Override
public void execute(ScmClient scmClient) throws IOException {
+final OzoneConfiguration conf = getOzoneConf();
+String serviceId = HddsUtils.getScmServiceId(conf);
+
+if (allNodes) {
+ executeForAllNodes(scmClient);
+} else if (StringUtils.isNotEmpty(getScmOption().getScm()) && serviceId !=
null) {
+ executeForSpecificNodeInHA(scmClient, serviceId);
+} else {
+ executeForSingleNode(scmClient);
Review Comment:
In normal or existing behaviour we need safemode status from leader node
most of the time. When no scm address is passed, whether we are getting safe
mode status from leader node or not? Because now follower also can accept
safemode and can return the status.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] HDDS-14108. Provide option in ‘scm safemode status’ to show status of all SCM nodes [ozone]
octachoron commented on code in PR #9611:
URL: https://github.com/apache/ozone/pull/9611#discussion_r2684404784
##
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/server/SCMClientProtocolServer.java:
##
@@ -1033,6 +1033,24 @@ public Map>
getSafeModeRuleStatuses()
}
}
+ @Override
+ public boolean inSafeModeForNode(String nodeId) throws IOException {
+boolean result = inSafeMode();
+AUDIT.logReadSuccess(
+buildAuditMessageForSuccess(SCMAction.IN_SAFE_MODE, null)
+);
+return result;
+ }
+
+ @Override
+ public Map>
getSafeModeRuleStatusesForNode(String nodeId) throws IOException {
+Map> result = getSafeModeRuleStatuses();
+AUDIT.logReadSuccess(
+buildAuditMessageForSuccess(SCMAction.GET_SAFE_MODE_RULE_STATUSES,
null)
+);
+return result;
+ }
Review Comment:
Great, this makes it clear. Thank you for the confirmation and walkthrough.
This sounds like the right thing to do to me as well.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] HDDS-14108. Provide option in ‘scm safemode status’ to show status of all SCM nodes [ozone]
octachoron commented on code in PR #9611:
URL: https://github.com/apache/ozone/pull/9611#discussion_r2684390281
##
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/protocol/StorageContainerLocationProtocolServerSideTranslatorPB.java:
##
@@ -212,7 +213,8 @@ public ScmContainerLocationResponse
submitRequest(RpcController controller,
ScmContainerLocationRequest request) throws ServiceException {
// not leader or not belong to admin command.
Review Comment:
Thank you for the update, looks good to me.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] HDDS-14108. Provide option in ‘scm safemode status’ to show status of all SCM nodes [ozone]
octachoron commented on code in PR #9611:
URL: https://github.com/apache/ozone/pull/9611#discussion_r2684386988
##
hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/scm/protocolPB/StorageContainerLocationProtocolClientSideTranslatorPB.java:
##
@@ -870,6 +874,50 @@ public boolean forceExitSafeMode() throws IOException {
}
+ @Override
+ public boolean inSafeModeForNode(String nodeId) throws IOException {
+InSafeModeRequestProto request =
InSafeModeRequestProto.getDefaultInstance();
+
+try {
+ StorageContainerLocationProtocolPB proxy = fpp.getProxyForNode(nodeId);
+ ScmContainerLocationRequest wrapper =
ScmContainerLocationRequest.newBuilder()
+ .setCmdType(Type.InSafeMode)
+ .setVersion(ClientVersion.CURRENT_VERSION)
+ .setTraceID(TracingUtil.exportCurrentSpan())
+ .setInSafeModeRequest(request)
+ .build();
+ ScmContainerLocationResponse response =
proxy.submitRequest(NULL_RPC_CONTROLLER, wrapper);
+ return response.getInSafeModeResponse().getInSafeMode();
+} catch (Exception e) {
+ throw new IOException("Failed to get safe mode status from SCM node " +
nodeId, e);
+}
+ }
+
+ @Override
+ public Map>
getSafeModeRuleStatusesForNode(String nodeId) throws IOException {
+GetSafeModeRuleStatusesRequestProto request =
GetSafeModeRuleStatusesRequestProto.getDefaultInstance();
+
+try {
+ StorageContainerLocationProtocolPB proxy = fpp.getProxyForNode(nodeId);
+ ScmContainerLocationRequest wrapper =
ScmContainerLocationRequest.newBuilder()
+ .setCmdType(Type.GetSafeModeRuleStatuses)
+ .setVersion(ClientVersion.CURRENT_VERSION)
+ .setTraceID(TracingUtil.exportCurrentSpan())
+ .setGetSafeModeRuleStatusesRequest(request)
+ .build();
+ ScmContainerLocationResponse response =
proxy.submitRequest(NULL_RPC_CONTROLLER, wrapper);
Review Comment:
I think we have achieved exactly what we wanted to here. Thanks for the
refactor!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] HDDS-14108. Provide option in ‘scm safemode status’ to show status of all SCM nodes [ozone]
octachoron commented on code in PR #9611:
URL: https://github.com/apache/ozone/pull/9611#discussion_r2684383575
##
hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/scm/protocolPB/StorageContainerLocationProtocolClientSideTranslatorPB.java:
##
@@ -870,6 +874,50 @@ public boolean forceExitSafeMode() throws IOException {
}
+ @Override
+ public boolean inSafeModeForNode(String nodeId) throws IOException {
+InSafeModeRequestProto request =
InSafeModeRequestProto.getDefaultInstance();
+
+try {
+ StorageContainerLocationProtocolPB proxy = fpp.getProxyForNode(nodeId);
+ ScmContainerLocationRequest wrapper =
ScmContainerLocationRequest.newBuilder()
+ .setCmdType(Type.InSafeMode)
+ .setVersion(ClientVersion.CURRENT_VERSION)
+ .setTraceID(TracingUtil.exportCurrentSpan())
+ .setInSafeModeRequest(request)
+ .build();
+ ScmContainerLocationResponse response =
proxy.submitRequest(NULL_RPC_CONTROLLER, wrapper);
Review Comment:
Looks like it did improve things. :slightly_smiling_face: Thank you for the
change.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] HDDS-14108. Provide option in ‘scm safemode status’ to show status of all SCM nodes [ozone]
sreejasahithi commented on code in PR #9611:
URL: https://github.com/apache/ozone/pull/9611#discussion_r2681640796
##
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/server/SCMClientProtocolServer.java:
##
@@ -1033,6 +1033,24 @@ public Map>
getSafeModeRuleStatuses()
}
}
+ @Override
+ public boolean inSafeModeForNode(String nodeId) throws IOException {
+boolean result = inSafeMode();
+AUDIT.logReadSuccess(
+buildAuditMessageForSuccess(SCMAction.IN_SAFE_MODE, null)
+);
+return result;
+ }
+
+ @Override
+ public Map>
getSafeModeRuleStatusesForNode(String nodeId) throws IOException {
+Map> result = getSafeModeRuleStatuses();
+AUDIT.logReadSuccess(
+buildAuditMessageForSuccess(SCMAction.GET_SAFE_MODE_RULE_STATUSES,
null)
+);
+return result;
+ }
Review Comment:
Your understanding is correct. These methods cannot be reached, but they
exist because it is required by the StorageContainerLocationProtocol interface
contract.
They are never called because client-side translator uses
submitRequestToNode() which gets a direct proxy to the target SCM and sends a
regular Type.InSafeMode request.
The server receives this as a standard InSafeMode request and routes it to
the existing inSafeMode() method
The nodeId parameter only exists in the client-side for proxy selection.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] HDDS-14108. Provide option in ‘scm safemode status’ to show status of all SCM nodes [ozone]
octachoron commented on code in PR #9611:
URL: https://github.com/apache/ozone/pull/9611#discussion_r2678784429
##
hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/scm/protocolPB/StorageContainerLocationProtocolClientSideTranslatorPB.java:
##
@@ -870,6 +874,50 @@ public boolean forceExitSafeMode() throws IOException {
}
+ @Override
+ public boolean inSafeModeForNode(String nodeId) throws IOException {
+InSafeModeRequestProto request =
InSafeModeRequestProto.getDefaultInstance();
+
+try {
+ StorageContainerLocationProtocolPB proxy = fpp.getProxyForNode(nodeId);
+ ScmContainerLocationRequest wrapper =
ScmContainerLocationRequest.newBuilder()
+ .setCmdType(Type.InSafeMode)
+ .setVersion(ClientVersion.CURRENT_VERSION)
+ .setTraceID(TracingUtil.exportCurrentSpan())
+ .setInSafeModeRequest(request)
+ .build();
+ ScmContainerLocationResponse response =
proxy.submitRequest(NULL_RPC_CONTROLLER, wrapper);
Review Comment:
What happens here is functionally the same thing as `submitRequest()`, other
than the way the proxy is selected. Would it make sense to make that official,
and refactor this part into a general `submitRequest()` variant that also takes
a node ID? I believe that would have many benefits: separation of concerns,
improving tests through said separation of concerns, DRY, readability
(especially when comparing to `inSafeMode()`), reusability (e.g., in this PR a
few lines down), etc.
##
hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/scm/protocolPB/StorageContainerLocationProtocolClientSideTranslatorPB.java:
##
@@ -870,6 +874,50 @@ public boolean forceExitSafeMode() throws IOException {
}
+ @Override
+ public boolean inSafeModeForNode(String nodeId) throws IOException {
+InSafeModeRequestProto request =
InSafeModeRequestProto.getDefaultInstance();
+
+try {
+ StorageContainerLocationProtocolPB proxy = fpp.getProxyForNode(nodeId);
+ ScmContainerLocationRequest wrapper =
ScmContainerLocationRequest.newBuilder()
+ .setCmdType(Type.InSafeMode)
+ .setVersion(ClientVersion.CURRENT_VERSION)
+ .setTraceID(TracingUtil.exportCurrentSpan())
+ .setInSafeModeRequest(request)
+ .build();
+ ScmContainerLocationResponse response =
proxy.submitRequest(NULL_RPC_CONTROLLER, wrapper);
+ return response.getInSafeModeResponse().getInSafeMode();
+} catch (Exception e) {
+ throw new IOException("Failed to get safe mode status from SCM node " +
nodeId, e);
+}
+ }
+
+ @Override
+ public Map>
getSafeModeRuleStatusesForNode(String nodeId) throws IOException {
+GetSafeModeRuleStatusesRequestProto request =
GetSafeModeRuleStatusesRequestProto.getDefaultInstance();
+
+try {
+ StorageContainerLocationProtocolPB proxy = fpp.getProxyForNode(nodeId);
+ ScmContainerLocationRequest wrapper =
ScmContainerLocationRequest.newBuilder()
+ .setCmdType(Type.GetSafeModeRuleStatuses)
+ .setVersion(ClientVersion.CURRENT_VERSION)
+ .setTraceID(TracingUtil.exportCurrentSpan())
+ .setGetSafeModeRuleStatusesRequest(request)
+ .build();
+ ScmContainerLocationResponse response =
proxy.submitRequest(NULL_RPC_CONTROLLER, wrapper);
Review Comment:
In this case, a `submitRequest()` helper with a node ID parameter (details
in the other comment) would even allow reusing the part of the existing
`getSafeModeRuleStatuses()` method that builds the map (as it is functionally
identical).
If we go with that approach in the end though, I vote to keep this new
iteration of the map-building part, as it makes some readability improvements
over the existing one.
##
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/server/SCMClientProtocolServer.java:
##
@@ -1033,6 +1033,24 @@ public Map>
getSafeModeRuleStatuses()
}
}
+ @Override
+ public boolean inSafeModeForNode(String nodeId) throws IOException {
+boolean result = inSafeMode();
+AUDIT.logReadSuccess(
+buildAuditMessageForSuccess(SCMAction.IN_SAFE_MODE, null)
+);
+return result;
+ }
+
+ @Override
+ public Map>
getSafeModeRuleStatusesForNode(String nodeId) throws IOException {
+Map> result = getSafeModeRuleStatuses();
+AUDIT.logReadSuccess(
+buildAuditMessageForSuccess(SCMAction.GET_SAFE_MODE_RULE_STATUSES,
null)
+);
+return result;
+ }
Review Comment:
Just to make sure my understanding is correct (or find out that it is not
:slightly_smiling_face:), these implementations cannot be reached through an
actual client, right?
As in, we need the methods in the StorageContainerLocationProtocol interface
so that the calls can get all the way to the client-side translator, and so we
definitely need to implement the new methods here too. On the other hand, after
a target SCM is chosen i
Re: [PR] HDDS-14108. Provide option in ‘scm safemode status’ to show status of all SCM nodes [ozone]
octachoron commented on PR #9611: URL: https://github.com/apache/ozone/pull/9611#issuecomment-3733929192 @dombizita, absolutely, thank you! I don't think my vote is enough to merge though. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] HDDS-14108. Provide option in ‘scm safemode status’ to show status of all SCM nodes [ozone]
dombizita commented on PR #9611: URL: https://github.com/apache/ozone/pull/9611#issuecomment-3732366825 @octachoron would you like to take a look at it if you have time? It's related to what we discussed recently :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
