[ https://issues.apache.org/jira/browse/AMBARI-24380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16567837#comment-16567837 ]
Akhil S Naik commented on AMBARI-24380: --------------------------------------- Hi, I had a similar issue where In Ambari HBase configuration option I accidentally changed HBASE_REGIONSERVER_OPTS to have -Xmn option of -Xmn4096mm - note the extra m. I then asked ambari to do rolling HBase restarts with 2 servers at a time and tolerate upto 2 failures. But ambari did take down all of my servers stating each operation of restarting region servers are successfully restarted but actually it dont, I see alerts on ambari saying region servers not up. additionaly i tried mannually i found its not keeping up and below is error messsage {code:java} [root@aanaik4 ambari-metrics-monitor]# /usr/hdp/current/hbase-regionserver/bin/hbase-daemon.sh --config /usr/hdp/current/hbase-regionserver/conf start regionserver starting regionserver, logging to /var/log/hbase/hbase-root-regionserver-aanaik4.openstacklocal.out Error: VM option 'G1NewSizePercent' is experimental and must be enabled via -XX:+UnlockExperimentalVMOptions. Error: Could not create the Java Virtual Machine. Error: A fatal exception has occurred. The program will exit. {code} Root cause : Ambari is not checking whether PID is created or not . its just sending execute command ( start region server) and making the task success. code reference: https://github.com/apache/ambari/blob/79cca1c7184f1661236971dac70d85a83fab6c11/ambari-server/src/main/resources/common-services/HBASE/2.0.0.3.0/package/scripts/hbase_service.py#L42 {code:java} try: Execute ( daemon_cmd, not_if = no_op_test, user = params.hbase_user ) except: show_logs(params.log_dir, params.hbase_user) raise {code} I hope my issue and yours are same. Both will be fixed if there is some tweak in this part of code. > Ambari HBase Rolling Restart failed to check RegionServers restarted > successfully, continued to take down rest of RegionServers! > -------------------------------------------------------------------------------------------------------------------------------- > > Key: AMBARI-24380 > URL: https://issues.apache.org/jira/browse/AMBARI-24380 > Project: Ambari > Issue Type: Bug > Components: ambari-server > Affects Versions: 2.5.2 > Reporter: Hari Sekhon > Priority: Critical > > Ambari rolling-restart of HBase RegionServers failed to detect that > RegionServers were not coming back online, continued to take down the rest of > the RegionServers in the cluster. > Adding the following JVM tuning setting to HBASE_OPTS in hbase-env.sh > template near the start of the options: > {code:java} > -XX:G1NewSizePercent=3{code} > before the following option (which was set a couple options further along, it > needs to go after this option): > {code:java} > -XX:+UnlockExperimentalVMOptions{code} > This resulted in both HMaster and RegionServer startup failures, but Ambari > did not detect that the RegionServers were not coming back online, and > proceeded to take down the rest of the RegionServers. > Ambari should have checked that the first RegionServer restarted successfully > and stayed up for the default 120 second rolling window before moving on to > the second RegionServer. -- This message was sent by Atlassian JIRA (v7.6.3#76005)