[ 
https://issues.apache.org/jira/browse/AMBARI-24380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16567837#comment-16567837
 ] 

Akhil S Naik commented on AMBARI-24380:
---------------------------------------

Hi,

I had a similar issue where In Ambari HBase configuration option I accidentally 
changed HBASE_REGIONSERVER_OPTS to have -Xmn option of -Xmn4096mm - note the 
extra m. 

I then asked ambari to do rolling HBase restarts with 2 servers at a time and 
tolerate upto 2 failures. But ambari did take down all of my servers stating 
each operation of restarting region servers are successfully restarted but 
actually it dont, I see alerts on ambari saying region servers not up.

additionaly i tried mannually i found its not keeping up  and below is error 
messsage 
{code:java}
[root@aanaik4 ambari-metrics-monitor]# 
/usr/hdp/current/hbase-regionserver/bin/hbase-daemon.sh --config 
/usr/hdp/current/hbase-regionserver/conf start regionserver
starting regionserver, logging to 
/var/log/hbase/hbase-root-regionserver-aanaik4.openstacklocal.out
Error: VM option 'G1NewSizePercent' is experimental and must be enabled via 
-XX:+UnlockExperimentalVMOptions.
Error: Could not create the Java Virtual Machine.
Error: A fatal exception has occurred. The program will exit.
{code}

Root cause :

Ambari is not checking whether PID is created or not . its just sending execute 
command ( start region server) and making the task success.
code reference: 
https://github.com/apache/ambari/blob/79cca1c7184f1661236971dac70d85a83fab6c11/ambari-server/src/main/resources/common-services/HBASE/2.0.0.3.0/package/scripts/hbase_service.py#L42


{code:java}
try:
        Execute ( daemon_cmd,
          not_if = no_op_test,
          user = params.hbase_user
        )
      except:
        show_logs(params.log_dir, params.hbase_user)
        raise
{code}


I hope my issue and yours are same.
Both will be fixed if there is some tweak in this part of code.


> Ambari HBase Rolling Restart failed to check RegionServers restarted 
> successfully, continued to take down rest of RegionServers!
> --------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: AMBARI-24380
>                 URL: https://issues.apache.org/jira/browse/AMBARI-24380
>             Project: Ambari
>          Issue Type: Bug
>          Components: ambari-server
>    Affects Versions: 2.5.2
>            Reporter: Hari Sekhon
>            Priority: Critical
>
> Ambari rolling-restart of HBase RegionServers failed to detect that 
> RegionServers were not coming back online, continued to take down the rest of 
> the RegionServers in the cluster.
> Adding the following JVM tuning setting to HBASE_OPTS in hbase-env.sh 
> template near the start of the options:
> {code:java}
> -XX:G1NewSizePercent=3{code}
> before the following option (which was set a couple options further along, it 
> needs to go after this option):
> {code:java}
> -XX:+UnlockExperimentalVMOptions{code}
> This resulted in both HMaster and RegionServer startup failures, but Ambari 
> did not detect that the RegionServers were not coming back online, and 
> proceeded to take down the rest of the RegionServers.
> Ambari should have checked that the first RegionServer restarted successfully 
> and stayed up for the default 120 second rolling window before moving on to 
> the second RegionServer.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to