Re: Cannot get check status to work for a customer service

Alejandro Fernandez Sun, 24 May 2015 12:10:38 -0700

For server-side debugging, modify the ambari_server_main.py file to enable 
DEBUG and can then connect a remote debugger from your favorite IDE to port 
5005.


For increased debugging, modify the ambari.properties file debug level.

For python logging on the agents, I believe the default logger level is already 
DEBUG.


Thanks,

Alejandro


________________________________
From: Tim To <[email protected]>
Sent: Friday, May 22, 2015 7:36 PM
To: Alejandro Fernandez
Cc: [email protected]
Subject: Re: Cannot get check status to work for a customer service

Thanks Alejandro,
We use python 2.7 and yes I did test the kill -0 with the contents of the pid 
file and got the result that you show. It does work now I have a line of 
Execute() code in the status function where the OS command was malformed so the 
method blew up but the exception was eaten so I didn't know that was a problem. 
Once I stripped out all the other code, it is now working.
However, I still have the following questions:
- is there a better way to debug? I tried "ambari-server start --debug" because 
the ambari code actually have extra debug statement that would've helped me but 
I can't turn the debug flag on - is there a config file that I can change to 
trigger those debug logger statement to be printed?

- This setup initially wasn't working because the metainfo file and the command 
script were missing from the cache directory. It was a last resort for me to 
manually copy over the scripts after restarting the ambari server a few times. 
I am hoping that I am missing a config step somewhere to cause this - please 
confirm since ambari wasn't updating the cache directories properly.

Tim

On Fri, May 22, 2015 at 7:01 PM, Alejandro Fernandez 
<[email protected]<mailto:[email protected]>> wrote:
Hi Tim,

Only make the changes to ambari-server's 
/var/lib/ambari-server/resources/stacks/… folder, then restart ambari-server; 
this will cause the agents to update their cache.
Restarting ambari-server takes time, so I prefer a shell script, or use pdsh.

What python version are you using?
Have you tried to run this manually?

sleep 60&
[root@c6401 ~]# ps aux | grep sleep
root     29010  0.0  0.0 100904   592 pts/1    S    01:59   0:00 sleep 60
root     29039  0.0  0.0 103236   864 pts/1    S+   01:59   0:00 grep sleep
[root@c6401 ~]# kill -0 29010
[root@c6401 ~]# echo $?
0
[root@c6401 ~]# kill -0 290100000000
-bash: kill: 290100000000: arguments must be process or job Ids

Thanks,
Alejandro

From: Tim To <[email protected]<mailto:[email protected]>>
Reply-To: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Date: Friday, May 22, 2015 at 11:26 AM
To: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Subject: Re: Cannot get check status to work for a customer service

A quick update:
I was able to find the check_process_status method and all it does is read the 
pid from the pid file then do  a kill -0 on the pid so this should work but it 
doesn't for some reason. Perhaps I mis a config step where Ambari is not 
calling the our python script somehow. But Ambari is calling the script to 
start our app so any suggestion is appreciated.

One more question I setup the customer script in
/var/lib/ambari-server/resources/stacks/HDP/2.2/... but when ambari starts it 
complains that it can't find my script at
/var/lib/ambari-agent/cache/stacks/HDP/2.2/services/... so I manually created 
the directory that ambari is looking for and copied the script there and ambari 
was able to start our app. Anyone knows how to set a custom service up so I 
don't have to manually copy my script to the /var/lib/ambari-agent/cache/... 
directory ??

Many thanks!

the portion of the check_process_status code I was talking about :

  try:
      pid = int(sudo.read_file(pid_file))
  except:
      Logger.debug("Pid file {0} does not exist".format(pid_file))
      raise ComponentIsNotRunning()

  try:
      # Kill will not actually kill the process
      # From the doc:
      # If sig is 0, then no signal is sent, but error checking is still
      # performed; this can be used to check for the existence of a
      # process ID or process group ID.
      os.kill(pid, 0)
  except OSError:
      Logger.debug("Process with pid {0} is not running. Stale pid file"
                   " at {1}".format(pid, pid_file))
      raise ComponentIsNotRunning()


On Fri, May 22, 2015 at 10:35 AM, Tim To 
<[email protected]<mailto:[email protected]>> wrote:
Hi all,
I am using ambari 2.0 and hadoop 2.4. We have a customer process and I've been 
trying to set up monitoring and alert for it base on what I leanred from two 
populate sites:
The offiical ambari one:
https://cwiki.apache.org/confluence/display/AMBARI/Ambari
and another one that has more detailed explanations:
http://mozartanalytics.com/how-to-create-a-software-stack-for-ambari/?preview=true

So far I was able to set it up and I can deploy our custom service and have 
ambari starts it for me (so I think the stop will work too). However check 
status doesn't work. Based on comments from the second site I'm trying to pass 
a pid file location to check_process_status() and magic should happen and 
Ambari would be able to tell whether this process is working or not.
here's my python function for status:

  def status(self, env):
    print 'Status of Phemi Central';
    check_process_status('/home/testuser/appName/logs/pid-8888')

I manaully checked the file after ambari started our app and it does contain 
the correct pid for the process but ambari still think the app is "stopped".

- Any pointer as to how check status works and how I am suppose to setup up is 
apprecaited.
- Any more detailed documentation to setup monitoring and alert in addition to 
the avoe mentioend website would be greatly helpful (even a confirmation that 
there is none would save me searching for more :) )
- I also checked out the latest ambari code from github but have a hard time 
locatng the where check status is done so any help with looking for the code 
would also be helpful.

Thanks in advance everyone!!


--
Tim To
Software Engineer
PHEMI Health Systems
180-887 Great Northern Way
Vancouver, BC V5T 4T5
website<http://www.phemi.com/> twitter<https://twitter.com/PHEMISystems> 
Linkedin<http://www.linkedin.com/company/3561810?trk=tyah&trkInfo=tarId%3A1403279580554%2Ctas%3Aphemi%20hea%2Cidx%3A1-1-1>



--
Tim To
Software Engineer
PHEMI Health Systems
180-887 Great Northern Way
Vancouver, BC V5T 4T5
website<http://www.phemi.com/> twitter<https://twitter.com/PHEMISystems> 
Linkedin<http://www.linkedin.com/company/3561810?trk=tyah&trkInfo=tarId%3A1403279580554%2Ctas%3Aphemi%20hea%2Cidx%3A1-1-1>



--
Tim To
Software Engineer
PHEMI Health Systems
180-887 Great Northern Way
Vancouver, BC V5T 4T5
website<http://www.phemi.com/> twitter<https://twitter.com/PHEMISystems> 
Linkedin<http://www.linkedin.com/company/3561810?trk=tyah&trkInfo=tarId%3A1403279580554%2Ctas%3Aphemi%20hea%2Cidx%3A1-1-1>

Re: Cannot get check status to work for a customer service

Reply via email to