Re: [PR] HDDS-14108. Provide option in ‘scm safemode status’ to show status of all SCM nodes [ozone]

2026-02-25 Thread via GitHub


sumitagrawl merged PR #9611:
URL: https://github.com/apache/ozone/pull/9611


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] HDDS-14108. Provide option in ‘scm safemode status’ to show status of all SCM nodes [ozone]

2026-02-18 Thread via GitHub


sumitagrawl commented on code in PR #9611:
URL: https://github.com/apache/ozone/pull/9611#discussion_r2821554658


##
hadoop-ozone/cli-admin/src/main/java/org/apache/hadoop/hdds/scm/cli/SafeModeCheckSubcommand.java:
##
@@ -32,25 +44,203 @@
 description = "Check if SCM is in safe mode",
 mixinStandardHelpOptions = true,
 versionProvider = HddsVersionProvider.class)
-public class SafeModeCheckSubcommand extends ScmSubcommand {
+public class SafeModeCheckSubcommand extends AbstractSubcommand implements 
Callable {
+  @CommandLine.Mixin
+  private ScmOption scmOption;
+
+  @CommandLine.Option(names = {"--all", "-a"},
+  description = "Show safe mode status for all SCM nodes in the service. " 
+
+  "When multiple SCM service IDs are configured, --service-id must be 
specified.")
+  private boolean allNodes;
+
+  private String serviceId;
+  private List nodes;
 
   @Override
-  public void execute(ScmClient scmClient) throws IOException {
-boolean execReturn = scmClient.inSafeMode();
+  public Void call() throws Exception {
+OzoneConfiguration conf = getOzoneConf();
+serviceId = HddsUtils.getScmServiceId(conf);
+String scmAddress = scmOption.getScm();
+if (serviceId != null) {
+  nodes = SCMNodeInfo.buildNodeInfo(conf);
+}
+
+ScmNodeTarget targetScmNode = new ScmNodeTarget();
+try (ScmClient scmClient = scmOption.createScmClient(conf, targetScmNode)) 
{

Review Comment:
   simplify the code, all node: iterate all, query, scm: get matching node, 
query, default: get leader or only one if empty, query



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] HDDS-14108. Provide option in ‘scm safemode status’ to show status of all SCM nodes [ozone]

2026-02-10 Thread via GitHub


sreejasahithi commented on PR #9611:
URL: https://github.com/apache/ozone/pull/9611#issuecomment-3882129655

   @sumitagrawl could you please review this patch.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] HDDS-14108. Provide option in ‘scm safemode status’ to show status of all SCM nodes [ozone]

2026-01-23 Thread via GitHub


sreejasahithi commented on code in PR #9611:
URL: https://github.com/apache/ozone/pull/9611#discussion_r2720755916


##
hadoop-ozone/cli-admin/src/main/java/org/apache/hadoop/hdds/scm/cli/SafeModeCheckSubcommand.java:
##
@@ -45,12 +71,106 @@ public void execute(ScmClient scmClient) throws 
IOException {
   System.out.println("SCM is out of safe mode.");
 }
 if (isVerbose()) {
-  for (Map.Entry> entry :
-  scmClient.getSafeModeRuleStatuses().entrySet()) {
-Pair value = entry.getValue();
-System.out.printf("validated:%s, %s, %s%n",
-value.getLeft(), entry.getKey(), value.getRight());
+  printSafeModeRules(scmClient.getSafeModeRuleStatuses());
+}
+  }
+
+  private void executeForSpecificNodeInHA(ScmClient scmClient, String 
serviceId) throws IOException {
+String scmAddress = getScmOption().getScm();
+
+System.out.println("Service ID: " + serviceId);
+
+final OzoneConfiguration conf = getOzoneConf();
+
+List nodes = SCMNodeInfo.buildNodeInfo(conf);
+
+// Find the node matching the --scm address
+List matchedNodes = nodes.stream()
+.filter(node -> matchesAddress(node, scmAddress))
+.collect(Collectors.toList());
+
+if (matchedNodes.isEmpty()) {
+  throw new IOException("Specified --scm address " + scmAddress +
+  " does not match any node in service " + serviceId +
+  ". Available nodes: " + nodes.stream()
+  .map(n -> n.getScmClientAddress() + " [" + n.getNodeId() + "]")
+  .collect(Collectors.joining(", ")));
+}
+
+queryNode(scmClient, matchedNodes.get(0));
+  }
+
+  private void executeForAllNodes(ScmClient scmClient) throws IOException {
+final OzoneConfiguration conf = getOzoneConf();
+String serviceId = HddsUtils.getScmServiceId(conf);
+
+if (serviceId == null) {
+  executeForSingleNode(scmClient);
+  return;
+}
+
+System.out.println("Service ID: " + serviceId);
+List nodes = SCMNodeInfo.buildNodeInfo(conf);
+
+for (SCMNodeInfo node : nodes) {
+  queryNode(scmClient, node);
+}
+  }
+
+  private void queryNode(ScmClient scmClient, SCMNodeInfo node) {
+String nodeId = node.getNodeId();
+
+try {
+  boolean inSafeMode = scmClient.inSafeModeForNode(nodeId);
+
+  System.out.printf("%s [%s]: %s%n",
+  node.getScmClientAddress(),
+  nodeId,
+  inSafeMode ? "IN SAFE MODE" : "OUT OF SAFE MODE");
+
+  if (isVerbose()) {
+Map> rules = 
scmClient.getSafeModeRuleStatusesForNode(nodeId);
+if (rules != null && !rules.isEmpty()) {
+  printSafeModeRules(rules);
+}
   }
+} catch (Exception e) {
+  System.out.printf("%s [%s]: ERROR: Failed to get safe mode status - 
%s%n",
+  node.getScmClientAddress(), nodeId, e.getMessage());
+}
+  }
+
+  /**
+   * Check if the given SCMNodeInfo matches the target address.
+   * Tries to match by direct string comparison and by resolved address.
+   */
+  private boolean matchesAddress(SCMNodeInfo node, String targetAddress) {
+String nodeAddress = node.getScmClientAddress();
+
+// Direct match
+if (nodeAddress.equals(targetAddress)) {
+  return true;
+}
+
+// Try normalizing both addresses and comparing
+try {
+  InetSocketAddress target = NetUtils.createSocketAddr(targetAddress);
+  InetSocketAddress nodeAddr = NetUtils.createSocketAddr(nodeAddress);
+
+  // Match by resolved IP and port
+  return target.getPort() == nodeAddr.getPort() &&
+  target.getAddress().equals(nodeAddr.getAddress());
+} catch (Exception e) {
+  // If address resolution fails, no match
+  return false;

Review Comment:
   I have removed the logging here because it creates unwanted noise in the CLI 
output. Instead, I have ensured that actual errors are properly surfaced when 
no leader can be determined or when the node specified in --scm option doesn't 
match, clear error messages are thrown to the user.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] HDDS-14108. Provide option in ‘scm safemode status’ to show status of all SCM nodes [ozone]

2026-01-21 Thread via GitHub


aryangupta1998 commented on code in PR #9611:
URL: https://github.com/apache/ozone/pull/9611#discussion_r2711408445


##
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/server/SCMClientProtocolServer.java:
##
@@ -1033,6 +1033,42 @@ public Map> 
getSafeModeRuleStatuses()
 }
   }
 
+  @Override
+  public boolean inSafeModeForNode(String nodeId) throws IOException {
+Map auditMap = Maps.newHashMap();
+auditMap.put("nodeId", nodeId);
+try {
+  boolean result = inSafeMode();

Review Comment:
   ```suggestion
 boolean result = scm.isInSafeMode();
   ```
   It’s better to use `scm.isInSafeMode()` instead of `inSafeMode()`, since 
inSafeMode() is a generic API and may evolve in the future (for example, to 
include leader-level logic). Using scm.isInSafeMode() makes it explicit that we 
are returning the local SCM node’s safe mode state.



##
hadoop-ozone/cli-admin/src/main/java/org/apache/hadoop/hdds/scm/cli/SafeModeCheckSubcommand.java:
##
@@ -33,24 +42,181 @@
 mixinStandardHelpOptions = true,
 versionProvider = HddsVersionProvider.class)
 public class SafeModeCheckSubcommand extends ScmSubcommand {
+  @CommandLine.Option(names = {"--all", "-a"},
+  description = "Show safe mode status for all SCM nodes in the service. " 
+
+  "When multiple SCM service IDs are configured, --service-id must be 
specified.")
+  private boolean allNodes;
+
+  private String serviceId;
+  private List nodes;
 
   @Override
   public void execute(ScmClient scmClient) throws IOException {
-boolean execReturn = scmClient.inSafeMode();
+OzoneConfiguration conf = getOzoneConf();
+serviceId = HddsUtils.getScmServiceId(conf);
+String scmAddress = getScmOption().getScm();
+if (serviceId != null) {
+  nodes = SCMNodeInfo.buildNodeInfo(conf);
+}
+
+if (allNodes) {
+  executeForAllNodes(scmClient);
+} else if (StringUtils.isNotEmpty(scmAddress)) {
+  executeForSpecificNode(scmClient, scmAddress);
+} else {
+  executeForSingleNode(scmClient);
+}
+  }
+
+  private void executeForSingleNode(ScmClient scmClient) throws IOException {
+boolean inSafeMode;
+Map> rules = null;
+String leaderNodeId;
 
-// Output data list
-if (execReturn) {
+// If SCM HA mode, query the leader node.
+if (serviceId != null) {
+  leaderNodeId = findLeaderNodeId(scmClient);

Review Comment:
   ```suggestion
 leaderNodeId = findLeaderNodeId(scmClient);
   if (leaderNodeId == null) {
 throw new IOException("Unable to determine SCM leader for the service");
   }
   ```



##
hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/scm/protocol/StorageContainerLocationProtocol.java:
##
@@ -73,6 +73,14 @@ public interface StorageContainerLocationProtocol extends 
Closeable {
   Type.StopReplicationManager,
   Type.ForceExitSafeMode));
 
+  /**
+   * Read-only commands that can execute on followers without leader check.
+   * These commands respect the --scm parameter and query the specified SCM.
+   */
+  Set FOLLOWER_READABLE_COMMAND_TYPE = 
Collections.unmodifiableSet(EnumSet.of(

Review Comment:
   nit: FOLLOWER_READABLE_COMMAND_TYPES



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] HDDS-14108. Provide option in ‘scm safemode status’ to show status of all SCM nodes [ozone]

2026-01-16 Thread via GitHub


sreejasahithi commented on code in PR #9611:
URL: https://github.com/apache/ozone/pull/9611#discussion_r2697944257


##
hadoop-ozone/cli-admin/src/main/java/org/apache/hadoop/hdds/scm/cli/SafeModeCheckSubcommand.java:
##
@@ -33,9 +42,26 @@
 mixinStandardHelpOptions = true,
 versionProvider = HddsVersionProvider.class)
 public class SafeModeCheckSubcommand extends ScmSubcommand {
+  @CommandLine.Option(names = {"--all", "-a"},
+  description = "Show safe mode status for all SCM nodes in the service. " 
+
+  "When multiple SCM service IDs are configured, --service-id must be 
specified.")
+  private boolean allNodes;
 
   @Override
   public void execute(ScmClient scmClient) throws IOException {
+final OzoneConfiguration conf = getOzoneConf();
+String serviceId = HddsUtils.getScmServiceId(conf);
+
+if (allNodes) {
+  executeForAllNodes(scmClient);
+} else if (StringUtils.isNotEmpty(getScmOption().getScm()) && serviceId != 
null) {
+  executeForSpecificNodeInHA(scmClient, serviceId);
+} else {
+  executeForSingleNode(scmClient);

Review Comment:
   Thanks @ashishkumar50 for finding this bug, you are right now that we are 
allowing follower to also accept status command there can be a possibility 
where when we run safemode status command with no additional option it can 
return the status of the follower.
   
   I have fixed this issue.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] HDDS-14108. Provide option in ‘scm safemode status’ to show status of all SCM nodes [ozone]

2026-01-16 Thread via GitHub


sreejasahithi commented on code in PR #9611:
URL: https://github.com/apache/ozone/pull/9611#discussion_r2697944257


##
hadoop-ozone/cli-admin/src/main/java/org/apache/hadoop/hdds/scm/cli/SafeModeCheckSubcommand.java:
##
@@ -33,9 +42,26 @@
 mixinStandardHelpOptions = true,
 versionProvider = HddsVersionProvider.class)
 public class SafeModeCheckSubcommand extends ScmSubcommand {
+  @CommandLine.Option(names = {"--all", "-a"},
+  description = "Show safe mode status for all SCM nodes in the service. " 
+
+  "When multiple SCM service IDs are configured, --service-id must be 
specified.")
+  private boolean allNodes;
 
   @Override
   public void execute(ScmClient scmClient) throws IOException {
+final OzoneConfiguration conf = getOzoneConf();
+String serviceId = HddsUtils.getScmServiceId(conf);
+
+if (allNodes) {
+  executeForAllNodes(scmClient);
+} else if (StringUtils.isNotEmpty(getScmOption().getScm()) && serviceId != 
null) {
+  executeForSpecificNodeInHA(scmClient, serviceId);
+} else {
+  executeForSingleNode(scmClient);

Review Comment:
   Thanks @ashishkumar50 for finding this bug, now that we are allowing 
follower to also accept status command there can be a possibility where when we 
run safemode status command with no additional option it can return the status 
of the follower.
   
   I have fixed this issue.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] HDDS-14108. Provide option in ‘scm safemode status’ to show status of all SCM nodes [ozone]

2026-01-14 Thread via GitHub


sadanand48 commented on code in PR #9611:
URL: https://github.com/apache/ozone/pull/9611#discussion_r2689399814


##
hadoop-ozone/cli-admin/src/main/java/org/apache/hadoop/hdds/scm/cli/SafeModeCheckSubcommand.java:
##
@@ -45,12 +71,106 @@ public void execute(ScmClient scmClient) throws 
IOException {
   System.out.println("SCM is out of safe mode.");
 }
 if (isVerbose()) {
-  for (Map.Entry> entry :
-  scmClient.getSafeModeRuleStatuses().entrySet()) {
-Pair value = entry.getValue();
-System.out.printf("validated:%s, %s, %s%n",
-value.getLeft(), entry.getKey(), value.getRight());
+  printSafeModeRules(scmClient.getSafeModeRuleStatuses());
+}
+  }
+
+  private void executeForSpecificNodeInHA(ScmClient scmClient, String 
serviceId) throws IOException {
+String scmAddress = getScmOption().getScm();
+
+System.out.println("Service ID: " + serviceId);
+
+final OzoneConfiguration conf = getOzoneConf();
+
+List nodes = SCMNodeInfo.buildNodeInfo(conf);
+
+// Find the node matching the --scm address
+List matchedNodes = nodes.stream()
+.filter(node -> matchesAddress(node, scmAddress))
+.collect(Collectors.toList());
+
+if (matchedNodes.isEmpty()) {
+  throw new IOException("Specified --scm address " + scmAddress +
+  " does not match any node in service " + serviceId +
+  ". Available nodes: " + nodes.stream()
+  .map(n -> n.getScmClientAddress() + " [" + n.getNodeId() + "]")
+  .collect(Collectors.joining(", ")));
+}
+
+queryNode(scmClient, matchedNodes.get(0));
+  }
+
+  private void executeForAllNodes(ScmClient scmClient) throws IOException {
+final OzoneConfiguration conf = getOzoneConf();
+String serviceId = HddsUtils.getScmServiceId(conf);
+
+if (serviceId == null) {
+  executeForSingleNode(scmClient);
+  return;
+}
+
+System.out.println("Service ID: " + serviceId);
+List nodes = SCMNodeInfo.buildNodeInfo(conf);
+
+for (SCMNodeInfo node : nodes) {
+  queryNode(scmClient, node);
+}
+  }
+
+  private void queryNode(ScmClient scmClient, SCMNodeInfo node) {
+String nodeId = node.getNodeId();
+
+try {
+  boolean inSafeMode = scmClient.inSafeModeForNode(nodeId);
+
+  System.out.printf("%s [%s]: %s%n",
+  node.getScmClientAddress(),
+  nodeId,
+  inSafeMode ? "IN SAFE MODE" : "OUT OF SAFE MODE");
+
+  if (isVerbose()) {
+Map> rules = 
scmClient.getSafeModeRuleStatusesForNode(nodeId);
+if (rules != null && !rules.isEmpty()) {
+  printSafeModeRules(rules);
+}
   }
+} catch (Exception e) {
+  System.out.printf("%s [%s]: ERROR: Failed to get safe mode status - 
%s%n",
+  node.getScmClientAddress(), nodeId, e.getMessage());
+}
+  }
+
+  /**
+   * Check if the given SCMNodeInfo matches the target address.
+   * Tries to match by direct string comparison and by resolved address.
+   */
+  private boolean matchesAddress(SCMNodeInfo node, String targetAddress) {
+String nodeAddress = node.getScmClientAddress();
+
+// Direct match
+if (nodeAddress.equals(targetAddress)) {
+  return true;
+}
+
+// Try normalizing both addresses and comparing
+try {
+  InetSocketAddress target = NetUtils.createSocketAddr(targetAddress);
+  InetSocketAddress nodeAddr = NetUtils.createSocketAddr(nodeAddress);
+
+  // Match by resolved IP and port
+  return target.getPort() == nodeAddr.getPort() &&
+  target.getAddress().equals(nodeAddr.getAddress());
+} catch (Exception e) {
+  // If address resolution fails, no match
+  return false;

Review Comment:
   nit : Log the exception here



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] HDDS-14108. Provide option in ‘scm safemode status’ to show status of all SCM nodes [ozone]

2026-01-14 Thread via GitHub


sadanand48 commented on code in PR #9611:
URL: https://github.com/apache/ozone/pull/9611#discussion_r2689399814


##
hadoop-ozone/cli-admin/src/main/java/org/apache/hadoop/hdds/scm/cli/SafeModeCheckSubcommand.java:
##
@@ -45,12 +71,106 @@ public void execute(ScmClient scmClient) throws 
IOException {
   System.out.println("SCM is out of safe mode.");
 }
 if (isVerbose()) {
-  for (Map.Entry> entry :
-  scmClient.getSafeModeRuleStatuses().entrySet()) {
-Pair value = entry.getValue();
-System.out.printf("validated:%s, %s, %s%n",
-value.getLeft(), entry.getKey(), value.getRight());
+  printSafeModeRules(scmClient.getSafeModeRuleStatuses());
+}
+  }
+
+  private void executeForSpecificNodeInHA(ScmClient scmClient, String 
serviceId) throws IOException {
+String scmAddress = getScmOption().getScm();
+
+System.out.println("Service ID: " + serviceId);
+
+final OzoneConfiguration conf = getOzoneConf();
+
+List nodes = SCMNodeInfo.buildNodeInfo(conf);
+
+// Find the node matching the --scm address
+List matchedNodes = nodes.stream()
+.filter(node -> matchesAddress(node, scmAddress))
+.collect(Collectors.toList());
+
+if (matchedNodes.isEmpty()) {
+  throw new IOException("Specified --scm address " + scmAddress +
+  " does not match any node in service " + serviceId +
+  ". Available nodes: " + nodes.stream()
+  .map(n -> n.getScmClientAddress() + " [" + n.getNodeId() + "]")
+  .collect(Collectors.joining(", ")));
+}
+
+queryNode(scmClient, matchedNodes.get(0));
+  }
+
+  private void executeForAllNodes(ScmClient scmClient) throws IOException {
+final OzoneConfiguration conf = getOzoneConf();
+String serviceId = HddsUtils.getScmServiceId(conf);
+
+if (serviceId == null) {
+  executeForSingleNode(scmClient);
+  return;
+}
+
+System.out.println("Service ID: " + serviceId);
+List nodes = SCMNodeInfo.buildNodeInfo(conf);
+
+for (SCMNodeInfo node : nodes) {
+  queryNode(scmClient, node);
+}
+  }
+
+  private void queryNode(ScmClient scmClient, SCMNodeInfo node) {
+String nodeId = node.getNodeId();
+
+try {
+  boolean inSafeMode = scmClient.inSafeModeForNode(nodeId);
+
+  System.out.printf("%s [%s]: %s%n",
+  node.getScmClientAddress(),
+  nodeId,
+  inSafeMode ? "IN SAFE MODE" : "OUT OF SAFE MODE");
+
+  if (isVerbose()) {
+Map> rules = 
scmClient.getSafeModeRuleStatusesForNode(nodeId);
+if (rules != null && !rules.isEmpty()) {
+  printSafeModeRules(rules);
+}
   }
+} catch (Exception e) {
+  System.out.printf("%s [%s]: ERROR: Failed to get safe mode status - 
%s%n",
+  node.getScmClientAddress(), nodeId, e.getMessage());
+}
+  }
+
+  /**
+   * Check if the given SCMNodeInfo matches the target address.
+   * Tries to match by direct string comparison and by resolved address.
+   */
+  private boolean matchesAddress(SCMNodeInfo node, String targetAddress) {
+String nodeAddress = node.getScmClientAddress();
+
+// Direct match
+if (nodeAddress.equals(targetAddress)) {
+  return true;
+}
+
+// Try normalizing both addresses and comparing
+try {
+  InetSocketAddress target = NetUtils.createSocketAddr(targetAddress);
+  InetSocketAddress nodeAddr = NetUtils.createSocketAddr(nodeAddress);
+
+  // Match by resolved IP and port
+  return target.getPort() == nodeAddr.getPort() &&
+  target.getAddress().equals(nodeAddr.getAddress());
+} catch (Exception e) {
+  // If address resolution fails, no match
+  return false;

Review Comment:
   nit : Log the exception here before returning false



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] HDDS-14108. Provide option in ‘scm safemode status’ to show status of all SCM nodes [ozone]

2026-01-13 Thread via GitHub


priyeshkaratha commented on code in PR #9611:
URL: https://github.com/apache/ozone/pull/9611#discussion_r2688872727


##
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/server/SCMClientProtocolServer.java:
##
@@ -1032,6 +1033,24 @@ public Map> 
getSafeModeRuleStatuses()
 }
   }
 
+  @Override
+  public boolean inSafeModeForNode(String nodeId) throws IOException {
+boolean result = inSafeMode();
+AUDIT.logReadSuccess(
+buildAuditMessageForSuccess(SCMAction.IN_SAFE_MODE, null)
+);
+return result;
+  }
+
+  @Override
+  public Map> 
getSafeModeRuleStatusesForNode(String nodeId) throws IOException {
+Map> result = getSafeModeRuleStatuses();
+AUDIT.logReadSuccess(

Review Comment:
   Please use logReadSuccess with auditMap with nodeId added and also add 
auditlog on failure since its called for nodeId



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] HDDS-14108. Provide option in ‘scm safemode status’ to show status of all SCM nodes [ozone]

2026-01-13 Thread via GitHub


ashishkumar50 commented on code in PR #9611:
URL: https://github.com/apache/ozone/pull/9611#discussion_r2685945921


##
hadoop-ozone/cli-admin/src/main/java/org/apache/hadoop/hdds/scm/cli/SafeModeCheckSubcommand.java:
##
@@ -45,12 +71,106 @@ public void execute(ScmClient scmClient) throws 
IOException {
   System.out.println("SCM is out of safe mode.");
 }
 if (isVerbose()) {
-  for (Map.Entry> entry :
-  scmClient.getSafeModeRuleStatuses().entrySet()) {
-Pair value = entry.getValue();
-System.out.printf("validated:%s, %s, %s%n",
-value.getLeft(), entry.getKey(), value.getRight());
+  printSafeModeRules(scmClient.getSafeModeRuleStatuses());
+}
+  }
+
+  private void executeForSpecificNodeInHA(ScmClient scmClient, String 
serviceId) throws IOException {
+String scmAddress = getScmOption().getScm();

Review Comment:
   scmAddress is not mandatory option.



##
hadoop-ozone/cli-admin/src/main/java/org/apache/hadoop/hdds/scm/cli/SafeModeCheckSubcommand.java:
##
@@ -33,9 +42,26 @@
 mixinStandardHelpOptions = true,
 versionProvider = HddsVersionProvider.class)
 public class SafeModeCheckSubcommand extends ScmSubcommand {
+  @CommandLine.Option(names = {"--all", "-a"},
+  description = "Show safe mode status for all SCM nodes in the service. " 
+
+  "When multiple SCM service IDs are configured, --service-id must be 
specified.")
+  private boolean allNodes;
 
   @Override
   public void execute(ScmClient scmClient) throws IOException {
+final OzoneConfiguration conf = getOzoneConf();
+String serviceId = HddsUtils.getScmServiceId(conf);
+
+if (allNodes) {
+  executeForAllNodes(scmClient);
+} else if (StringUtils.isNotEmpty(getScmOption().getScm()) && serviceId != 
null) {
+  executeForSpecificNodeInHA(scmClient, serviceId);
+} else {
+  executeForSingleNode(scmClient);

Review Comment:
   In normal or existing behaviour we need safemode status from leader node 
most of the time. When no scm address is passed, whether we are getting safe 
mode status from leader node or not? Because now follower also can accept 
safemode and can return the status.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] HDDS-14108. Provide option in ‘scm safemode status’ to show status of all SCM nodes [ozone]

2026-01-12 Thread via GitHub


octachoron commented on code in PR #9611:
URL: https://github.com/apache/ozone/pull/9611#discussion_r2684404784


##
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/server/SCMClientProtocolServer.java:
##
@@ -1033,6 +1033,24 @@ public Map> 
getSafeModeRuleStatuses()
 }
   }
 
+  @Override
+  public boolean inSafeModeForNode(String nodeId) throws IOException {
+boolean result = inSafeMode();
+AUDIT.logReadSuccess(
+buildAuditMessageForSuccess(SCMAction.IN_SAFE_MODE, null)
+);
+return result;
+  }
+
+  @Override
+  public Map> 
getSafeModeRuleStatusesForNode(String nodeId) throws IOException {
+Map> result = getSafeModeRuleStatuses();
+AUDIT.logReadSuccess(
+buildAuditMessageForSuccess(SCMAction.GET_SAFE_MODE_RULE_STATUSES, 
null)
+);
+return result;
+  }

Review Comment:
   Great, this makes it clear. Thank you for the confirmation and walkthrough. 
This sounds like the right thing to do to me as well.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] HDDS-14108. Provide option in ‘scm safemode status’ to show status of all SCM nodes [ozone]

2026-01-12 Thread via GitHub


octachoron commented on code in PR #9611:
URL: https://github.com/apache/ozone/pull/9611#discussion_r2684390281


##
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/protocol/StorageContainerLocationProtocolServerSideTranslatorPB.java:
##
@@ -212,7 +213,8 @@ public ScmContainerLocationResponse 
submitRequest(RpcController controller,
   ScmContainerLocationRequest request) throws ServiceException {
 // not leader or not belong to admin command.

Review Comment:
   Thank you for the update, looks good to me.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] HDDS-14108. Provide option in ‘scm safemode status’ to show status of all SCM nodes [ozone]

2026-01-12 Thread via GitHub


octachoron commented on code in PR #9611:
URL: https://github.com/apache/ozone/pull/9611#discussion_r2684386988


##
hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/scm/protocolPB/StorageContainerLocationProtocolClientSideTranslatorPB.java:
##
@@ -870,6 +874,50 @@ public boolean forceExitSafeMode() throws IOException {
 
   }
 
+  @Override
+  public boolean inSafeModeForNode(String nodeId) throws IOException {
+InSafeModeRequestProto request = 
InSafeModeRequestProto.getDefaultInstance();
+
+try {
+  StorageContainerLocationProtocolPB proxy = fpp.getProxyForNode(nodeId);
+  ScmContainerLocationRequest wrapper = 
ScmContainerLocationRequest.newBuilder()
+  .setCmdType(Type.InSafeMode)
+  .setVersion(ClientVersion.CURRENT_VERSION)
+  .setTraceID(TracingUtil.exportCurrentSpan())
+  .setInSafeModeRequest(request)
+  .build();
+  ScmContainerLocationResponse response = 
proxy.submitRequest(NULL_RPC_CONTROLLER, wrapper);
+  return response.getInSafeModeResponse().getInSafeMode();
+} catch (Exception e) {
+  throw new IOException("Failed to get safe mode status from SCM node " + 
nodeId, e);
+}
+  }
+
+  @Override
+  public Map> 
getSafeModeRuleStatusesForNode(String nodeId) throws IOException {
+GetSafeModeRuleStatusesRequestProto request = 
GetSafeModeRuleStatusesRequestProto.getDefaultInstance();
+
+try {
+  StorageContainerLocationProtocolPB proxy = fpp.getProxyForNode(nodeId);
+  ScmContainerLocationRequest wrapper = 
ScmContainerLocationRequest.newBuilder()
+  .setCmdType(Type.GetSafeModeRuleStatuses)
+  .setVersion(ClientVersion.CURRENT_VERSION)
+  .setTraceID(TracingUtil.exportCurrentSpan())
+  .setGetSafeModeRuleStatusesRequest(request)
+  .build();
+  ScmContainerLocationResponse response = 
proxy.submitRequest(NULL_RPC_CONTROLLER, wrapper);

Review Comment:
   I think we have achieved exactly what we wanted to here. Thanks for the 
refactor!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] HDDS-14108. Provide option in ‘scm safemode status’ to show status of all SCM nodes [ozone]

2026-01-12 Thread via GitHub


octachoron commented on code in PR #9611:
URL: https://github.com/apache/ozone/pull/9611#discussion_r2684383575


##
hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/scm/protocolPB/StorageContainerLocationProtocolClientSideTranslatorPB.java:
##
@@ -870,6 +874,50 @@ public boolean forceExitSafeMode() throws IOException {
 
   }
 
+  @Override
+  public boolean inSafeModeForNode(String nodeId) throws IOException {
+InSafeModeRequestProto request = 
InSafeModeRequestProto.getDefaultInstance();
+
+try {
+  StorageContainerLocationProtocolPB proxy = fpp.getProxyForNode(nodeId);
+  ScmContainerLocationRequest wrapper = 
ScmContainerLocationRequest.newBuilder()
+  .setCmdType(Type.InSafeMode)
+  .setVersion(ClientVersion.CURRENT_VERSION)
+  .setTraceID(TracingUtil.exportCurrentSpan())
+  .setInSafeModeRequest(request)
+  .build();
+  ScmContainerLocationResponse response = 
proxy.submitRequest(NULL_RPC_CONTROLLER, wrapper);

Review Comment:
   Looks like it did improve things. :slightly_smiling_face: Thank you for the 
change.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] HDDS-14108. Provide option in ‘scm safemode status’ to show status of all SCM nodes [ozone]

2026-01-12 Thread via GitHub


sreejasahithi commented on code in PR #9611:
URL: https://github.com/apache/ozone/pull/9611#discussion_r2681640796


##
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/server/SCMClientProtocolServer.java:
##
@@ -1033,6 +1033,24 @@ public Map> 
getSafeModeRuleStatuses()
 }
   }
 
+  @Override
+  public boolean inSafeModeForNode(String nodeId) throws IOException {
+boolean result = inSafeMode();
+AUDIT.logReadSuccess(
+buildAuditMessageForSuccess(SCMAction.IN_SAFE_MODE, null)
+);
+return result;
+  }
+
+  @Override
+  public Map> 
getSafeModeRuleStatusesForNode(String nodeId) throws IOException {
+Map> result = getSafeModeRuleStatuses();
+AUDIT.logReadSuccess(
+buildAuditMessageForSuccess(SCMAction.GET_SAFE_MODE_RULE_STATUSES, 
null)
+);
+return result;
+  }

Review Comment:
   Your understanding is correct. These methods cannot be reached, but they 
exist because it is required by the StorageContainerLocationProtocol interface 
contract.
   They are never called because client-side translator uses 
submitRequestToNode() which gets a direct proxy to the target SCM and sends a 
regular Type.InSafeMode request.
   The server receives this as a standard InSafeMode request and routes it to 
the existing inSafeMode() method
   The nodeId parameter only exists in the client-side for proxy selection.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] HDDS-14108. Provide option in ‘scm safemode status’ to show status of all SCM nodes [ozone]

2026-01-10 Thread via GitHub


octachoron commented on code in PR #9611:
URL: https://github.com/apache/ozone/pull/9611#discussion_r2678784429


##
hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/scm/protocolPB/StorageContainerLocationProtocolClientSideTranslatorPB.java:
##
@@ -870,6 +874,50 @@ public boolean forceExitSafeMode() throws IOException {
 
   }
 
+  @Override
+  public boolean inSafeModeForNode(String nodeId) throws IOException {
+InSafeModeRequestProto request = 
InSafeModeRequestProto.getDefaultInstance();
+
+try {
+  StorageContainerLocationProtocolPB proxy = fpp.getProxyForNode(nodeId);
+  ScmContainerLocationRequest wrapper = 
ScmContainerLocationRequest.newBuilder()
+  .setCmdType(Type.InSafeMode)
+  .setVersion(ClientVersion.CURRENT_VERSION)
+  .setTraceID(TracingUtil.exportCurrentSpan())
+  .setInSafeModeRequest(request)
+  .build();
+  ScmContainerLocationResponse response = 
proxy.submitRequest(NULL_RPC_CONTROLLER, wrapper);

Review Comment:
   What happens here is functionally the same thing as `submitRequest()`, other 
than the way the proxy is selected. Would it make sense to make that official, 
and refactor this part into a general `submitRequest()` variant that also takes 
a node ID? I believe that would have many benefits: separation of concerns, 
improving tests through said separation of concerns, DRY, readability 
(especially when comparing to `inSafeMode()`), reusability (e.g., in this PR a 
few lines down), etc.



##
hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/scm/protocolPB/StorageContainerLocationProtocolClientSideTranslatorPB.java:
##
@@ -870,6 +874,50 @@ public boolean forceExitSafeMode() throws IOException {
 
   }
 
+  @Override
+  public boolean inSafeModeForNode(String nodeId) throws IOException {
+InSafeModeRequestProto request = 
InSafeModeRequestProto.getDefaultInstance();
+
+try {
+  StorageContainerLocationProtocolPB proxy = fpp.getProxyForNode(nodeId);
+  ScmContainerLocationRequest wrapper = 
ScmContainerLocationRequest.newBuilder()
+  .setCmdType(Type.InSafeMode)
+  .setVersion(ClientVersion.CURRENT_VERSION)
+  .setTraceID(TracingUtil.exportCurrentSpan())
+  .setInSafeModeRequest(request)
+  .build();
+  ScmContainerLocationResponse response = 
proxy.submitRequest(NULL_RPC_CONTROLLER, wrapper);
+  return response.getInSafeModeResponse().getInSafeMode();
+} catch (Exception e) {
+  throw new IOException("Failed to get safe mode status from SCM node " + 
nodeId, e);
+}
+  }
+
+  @Override
+  public Map> 
getSafeModeRuleStatusesForNode(String nodeId) throws IOException {
+GetSafeModeRuleStatusesRequestProto request = 
GetSafeModeRuleStatusesRequestProto.getDefaultInstance();
+
+try {
+  StorageContainerLocationProtocolPB proxy = fpp.getProxyForNode(nodeId);
+  ScmContainerLocationRequest wrapper = 
ScmContainerLocationRequest.newBuilder()
+  .setCmdType(Type.GetSafeModeRuleStatuses)
+  .setVersion(ClientVersion.CURRENT_VERSION)
+  .setTraceID(TracingUtil.exportCurrentSpan())
+  .setGetSafeModeRuleStatusesRequest(request)
+  .build();
+  ScmContainerLocationResponse response = 
proxy.submitRequest(NULL_RPC_CONTROLLER, wrapper);

Review Comment:
   In this case, a `submitRequest()` helper with a node ID parameter (details 
in the other comment) would even allow reusing the part of the existing 
`getSafeModeRuleStatuses()` method that builds the map (as it is functionally 
identical).
   
   If we go with that approach in the end though, I vote to keep this new 
iteration of the map-building part, as it makes some readability improvements 
over the existing one.



##
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/server/SCMClientProtocolServer.java:
##
@@ -1033,6 +1033,24 @@ public Map> 
getSafeModeRuleStatuses()
 }
   }
 
+  @Override
+  public boolean inSafeModeForNode(String nodeId) throws IOException {
+boolean result = inSafeMode();
+AUDIT.logReadSuccess(
+buildAuditMessageForSuccess(SCMAction.IN_SAFE_MODE, null)
+);
+return result;
+  }
+
+  @Override
+  public Map> 
getSafeModeRuleStatusesForNode(String nodeId) throws IOException {
+Map> result = getSafeModeRuleStatuses();
+AUDIT.logReadSuccess(
+buildAuditMessageForSuccess(SCMAction.GET_SAFE_MODE_RULE_STATUSES, 
null)
+);
+return result;
+  }

Review Comment:
   Just to make sure my understanding is correct (or find out that it is not 
:slightly_smiling_face:), these implementations cannot be reached through an 
actual client, right?
   
   As in, we need the methods in the StorageContainerLocationProtocol interface 
so that the calls can get all the way to the client-side translator, and so we 
definitely need to implement the new methods here too. On the other hand, after 
a target SCM is chosen i

Re: [PR] HDDS-14108. Provide option in ‘scm safemode status’ to show status of all SCM nodes [ozone]

2026-01-10 Thread via GitHub


octachoron commented on PR #9611:
URL: https://github.com/apache/ozone/pull/9611#issuecomment-3733929192

   @dombizita, absolutely, thank you! I don't think my vote is enough to merge 
though.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] HDDS-14108. Provide option in ‘scm safemode status’ to show status of all SCM nodes [ozone]

2026-01-10 Thread via GitHub


dombizita commented on PR #9611:
URL: https://github.com/apache/ozone/pull/9611#issuecomment-3732366825

   @octachoron would you like to take a look at it if you have time? It's 
related to what we discussed recently :) 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]