Hi James, Would you mind providing your python script so we can take a look?
Thanks, Nate From: James Tanner <[email protected]<mailto:[email protected]>> Reply-To: "[email protected]<mailto:[email protected]>" <[email protected]<mailto:[email protected]>> Date: Tuesday, November 24, 2015 at 3:51 PM To: "[email protected]<mailto:[email protected]>" <[email protected]<mailto:[email protected]>> Subject: Re: proper return for status() in a service script? I killed the test service, restarted ambari-server, then tailed the logs to see if there were any clues ... 24 Nov 2015 15:47:35,132 INFO [qtp-ambari-agent-52] HeartBeatHandler:657 - State of service component TEST_SLAVE of service TEST of cluster TEST01 has changed from INSTALLED to STARTED at host node2.lab.net<http://node2.lab.net> 24 Nov 2015 15:47:35,134 INFO [qtp-ambari-agent-52] HeartBeatHandler:657 - State of service component TEST_CLIENT of service TEST of cluster TEST01 has changed from INSTALLED to STARTED at host node2.lab.net<http://node2.lab.net> 24 Nov 2015 15:47:37,775 INFO [Thread-23] AbstractPoolBackedDataSource:462 - Initializing c3p0 pool... com.mchange.v2.c3p0.ComboPooledDataSource [ acquireIncrement -> 3, acquireRetryAttempts -> 30, acquireRetryDelay -> 1000, autoCommitOnClose -> false, automaticTestTable -> null, breakAfterAcquireFailure -> false, checkoutTimeout -> 0, connectionCustomizerClassName -> null, connectionTesterClassName -> com.mchange.v2.c3p0.impl.DefaultConnectionTester, dataSourceName -> 2rvxuc9dgfrzar1v9eztj|7320bccc, debugUnreturnedConnectionStackTraces -> false, description -> null, driverClass -> org.postgresql.Driver, factoryClassLocation -> null, forceIgnoreUnresolvedTransactions -> false, identityToken -> 2rvxuc9dgfrzar1v9eztj|7320bccc, idleConnectionTestPeriod -> 50, initialPoolSize -> 3, jdbcUrl -> jdbc:postgresql://localhost/ambari, lastAcquisitionFailureDefaultUser -> null, maxAdministrativeTaskTime -> 0, maxConnectionAge -> 0, maxIdleTime -> 0, maxIdleTimeExcessConnections -> 0, maxPoolSize -> 5, maxStatements -> 0, maxStatementsPerConnection -> 120, minPoolSize -> 1, numHelperThreads -> 3, numThreadsAwaitingCheckoutDefaultUser -> 0, preferredTestQuery -> select 0, properties -> {user=******, password=******}, propertyCycle -> 0, testConnectionOnCheckin -> true, testConnectionOnCheckout -> false, unreturnedConnectionTimeout -> 0, usesTraditionalReflectiveProxies -> false ] 24 Nov 2015 15:47:37,988 INFO [Thread-23] JobStoreTX:861 - Freed 0 triggers from 'acquired' / 'blocked' state. 24 Nov 2015 15:47:38,014 INFO [Thread-23] JobStoreTX:871 - Recovering 0 jobs that were in-progress at the time of the last shut-down. 24 Nov 2015 15:47:38,014 INFO [Thread-23] JobStoreTX:884 - Recovery complete. 24 Nov 2015 15:47:38,014 INFO [Thread-23] JobStoreTX:891 - Removed 0 'complete' triggers. 24 Nov 2015 15:47:38,015 INFO [Thread-23] JobStoreTX:896 - Removed 0 stale fired job entries. 24 Nov 2015 15:47:38,031 INFO [Thread-23] QuartzScheduler:575 - Scheduler ExecutionScheduler_$_NON_CLUSTERED started. 24 Nov 2015 15:47:38,723 INFO [qtp-ambari-agent-39] HeartBeatHandler:657 - State of service component TEST_CLIENT of service TEST of cluster TEST01 has changed from INSTALLED to STARTED at host node1.lab.net<http://node1.lab.net> 24 Nov 2015 15:47:38,729 INFO [qtp-ambari-agent-39] HeartBeatHandler:657 - State of service component TEST_MASTER of service TEST of cluster CAS01 has changed from INSTALLED to STARTED at host node1.lab.net<http://node1.lab.net> Ambari flipped the state from "INSTALLED" to "STARTED", but I can tell from my service script's log output that no calls were ever made to it, especially not a call to status(). What is ambari actually doing when it decides to switch state from installed to started? It seems to be unrelated to the service script(s). On Tue, Nov 24, 2015 at 3:08 PM, James Tanner <[email protected]<mailto:[email protected]>> wrote: What is the proper return values for "running" and "not running" in an Ambari service script? If I reference the wiki, the status function should return nothing: https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=38571133#Overview%28Ambari1.5.0orlater%29-CreateandAddtheService.1 If I reference the GlusterFS's yarn service script included in an HDP stack, there is no return but a ComponentIsNotRunning should be rasied if it's down. Regardless of what I return, it seems that the internal ambari database status gets set to "running". ambari=# select component_name,current_state from ambari.hostcomponentstate; component_name | current_state -------------------+--------------- ZOOKEEPER_SERVER | STARTED ZOOKEEPER_CLIENT | INSTALLED ZOOKEEPER_CLIENT | INSTALLED TEST_CLIENT | STARTED TEST_CLIENT | STARTED METRICS_MONITOR | STARTED METRICS_COLLECTOR | STARTED ZOOKEEPER_SERVER | STARTED TEST_SLAVE | STARTED # the service script raised the ComponentIsNotRunning exception for this when status() was called METRICS_MONITOR | STARTED METRICS_COLLECTOR | STARTED TEST_MASTER | STARTED # the service script raised the ComponentIsNotRunning exception for this when status() was called (12 rows) I've also noticed via log statements that the status() function for the service is called upon startup of ambari-server or during manual service state change, but it never polls status at any regular interval. Is that supposed to be the case? If not, how is the displayed service state ever accurate?
