Hi Chris, For the following supervisor config (memmon limit set low intentionally):
[inet_http_server] port=127.0.0.1:9036 [supervisord] logfile=%(here)s/../log/supervisord.log logfile_maxbytes=50MB logfile_backups=10 loglevel=info pidfile=%(here)s/../var/supervisord.pid nodaemon=false [rpcinterface:supervisor] supervisor.rpcinterface_factory=supervisor.rpcinterface:make_main_rpcinterface [supervisorctl] serverurl=http://127.0.0.1:9036 [program:osi] command = %(here)s/../bin/paster serve %(here)s/../etc/karl.ini redirect_stderr = true [program:zeo] command = %(here)s/../bin/runzeo -C %(here)s/zeo.conf redirect_stderr = true [eventlistener:memmon] command=%(here)s/../bin/memmon -a 20MB events=TICK_60 When I run it under "supervisord -n -edebug" (using supervisor 3.0a7) on MacOS 10.6, I see this (see lines that being "restarting"): [chr...@snowpro bfgenv]$ bin/supervisord -n -edebug 2009-10-03 13:34:41,101 INFO Increased RLIMIT_NOFILE limit to 1024 2009-10-03 13:34:41,221 INFO RPC interface 'supervisor' initialized 2009-10-03 13:34:41,221 WARN cElementTree not installed, using slower XML parser for XML-RPC 2009-10-03 13:34:41,221 CRIT Server 'inet_http_server' running without any HTTP authentication checking 2009-10-03 13:34:41,222 INFO supervisord started with pid 49246 2009-10-03 13:34:42,226 INFO spawned: 'memmon' with pid 49248 2009-10-03 13:34:42,231 INFO spawned: 'osi' with pid 49249 2009-10-03 13:34:42,239 INFO spawned: 'zeo' with pid 49250 2009-10-03 13:34:43,241 INFO success: memmon entered RUNNING state, process has stayed up for > than 1 seconds (startsecs) 2009-10-03 13:34:43,241 INFO success: osi entered RUNNING state, process has stayed up for > than 1 seconds (startsecs) 2009-10-03 13:34:43,250 INFO success: zeo entered RUNNING state, process has stayed up for > than 1 seconds (startsecs) 2009-10-03 13:34:43,370 DEBG 'memmon' stdout output: READY 2009-10-03 13:34:43,371 DEBG memmon: ACKNOWLEDGED -> READY 2009-10-03 13:34:43,705 DEBG 'zeo' stdout output: ------ 2009-10-03T13:34:43 INFO ZEO.runzeo (49250) created PID file './var/ZEO.pid' 2009-10-03 13:34:43,706 DEBG 'zeo' stdout output: ------ 2009-10-03T13:34:43 INFO ZEO.runzeo (49250) opening storage '1' using BlobStorage 2009-10-03 13:34:43,749 DEBG 'zeo' stdout output: ------ 2009-10-03T13:34:43 INFO ZODB.blob (49250) Blob directory `./var/blobs` has layout marker set. Selected `bushy` layout. 2009-10-03 13:34:43,793 DEBG 'zeo' stdout output: ------ 2009-10-03T13:34:43 INFO ZEO.StorageServer (49250) StorageServer created RW with storages: 1:RW:./var/osi.db 2009-10-03 13:34:44,156 DEBG 'zeo' stdout output: ------ 2009-10-03T13:34:44 INFO ZEO.zrpc (49250) listening on ('', 8884) 2009-10-03 13:34:45,279 DEBG 'zeo' stdout output: ------ 2009-10-03T13:34:45 INFO ZEO.StorageServer (49250) new connection ('127.0.0.1', 59853): <ManagedServerConnection ('127.0.0.1', 59853)> 2009-10-03 13:34:45,280 DEBG 'zeo' stdout output: ------ 2009-10-03T13:34:45 INFO ZEO.zrpc.Connection(S) (127.0.0.1:59853) received handshake 'Z308' 2009-10-03 13:34:45,414 DEBG 'zeo' stdout output: ------ 2009-10-03T13:34:45 INFO ZEO.StorageServer (49250) new connection ('127.0.0.1', 59854): <ManagedServerConnection ('127.0.0.1', 59854)> 2009-10-03 13:34:45,415 DEBG 'zeo' stdout output: ------ 2009-10-03T13:34:45 INFO ZEO.zrpc.Connection(S) (127.0.0.1:59854) received handshake 'Z308' 2009-10-03 13:35:01,422 DEBG event 0 sent to listener memmon 2009-10-03 13:35:01,423 DEBG 'memmon' stderr output: Checking any=20971520 2009-10-03 13:35:01,441 DEBG 'memmon' stderr output: RSS of memmon:memmon is 5775360 2009-10-03 13:35:01,447 DEBG 'memmon' stderr output: RSS of osi:osi is 28737536 Restarting osi:osi 2009-10-03 13:35:01,448 DEBG fd 10 closed, stopped monitoring <POutputDispatcher at 7294408 for <Subprocess at 7054920 with name osi in state RUNNING> (stdout)> 2009-10-03 13:35:01,449 DEBG killing osi (pid 49249) with signal SIGTERM 2009-10-03 13:35:01,456 INFO stopped: osi (terminated by SIGTERM) 2009-10-03 13:35:01,457 DEBG received SIGCHLD indicating a child quit 2009-10-03 13:35:01,457 DEBG 'zeo' stdout output: ------ 2009-10-03T13:35:01 INFO ZEO.StorageServer (49250/127.0.0.1:59854) disconnected 2009-10-03 13:35:01,457 DEBG 'zeo' stdout output: ------ 2009-10-03T13:35:01 INFO ZEO.StorageServer (49250/127.0.0.1:59853) disconnected 2009-10-03 13:35:02,465 INFO spawned: 'osi' with pid 49256 2009-10-03 13:35:03,468 INFO success: osi entered RUNNING state, process has stayed up for > than 1 seconds (startsecs) 2009-10-03 13:35:03,477 DEBG 'memmon' stdout output: RESULT 2 OKREADY 2009-10-03 13:35:03,477 DEBG memmon: BUSY -> ACKNOWLEDGED (processed) 2009-10-03 13:35:03,477 DEBG memmon: ACKNOWLEDGED -> READY 2009-10-03 13:35:03,478 DEBG 'memmon' stderr output: RSS of zeo:zeo is 10936320 2009-10-03 13:35:05,143 DEBG 'zeo' stdout output: ------ 2009-10-03T13:35:05 INFO ZEO.StorageServer (49250) new connection ('127.0.0.1', 59856): <ManagedServerConnection ('127.0.0.1', 59856)> 2009-10-03 13:35:05,144 DEBG 'zeo' stdout output: ------ 2009-10-03T13:35:05 INFO ZEO.zrpc.Connection(S) (127.0.0.1:59856) received handshake 'Z308' 2009-10-03 13:35:05,276 DEBG 'zeo' stdout output: ------ 2009-10-03T13:35:05 INFO ZEO.StorageServer (49250) new connection ('127.0.0.1', 59857): <ManagedServerConnection ('127.0.0.1', 59857)> 2009-10-03 13:35:05,277 DEBG 'zeo' stdout output: ------ 2009-10-03T13:35:05 INFO ZEO.zrpc.Connection(S) (127.0.0.1:59857) received handshake 'Z308' Chris Shenton wrote: > I've got supervisor with a memmon eventlistener from a buildout: > >> [supervisor] >> recipe = collective.recipe.supervisor >> user=nfw >> password=nfw >> eggs = supervisor >> programs = >> 10 zeo ${zeo:location}/bin/runzeo ${zeo:location} >> 20 instance ${instance:location}/bin/runzope ${instance:location} >> true >> plugins = >> superlance >> eventlisteners = >> MemmonAll TICK_60 ${buildout:bin-directory}/memmon [-a 100MB] > > It generates a plausible parts/supervisor/supervisord.conf: > >> ... >> [program:instance] >> command = /usr/local/cshenton/Projects/cjp/svn-buildout/parts/ >> instance/bin/runzope >> process_name = instance >> directory = /usr/local/cshenton/Projects/cjp/svn-buildout/parts/ >> instance >> priority = 20 >> redirect_stderr = true >> >> [eventlistener:MemmonAll] >> command = /usr/local/cshenton/Projects/cjp/svn-buildout/bin/memmon - >> a 100MB >> events = TICK_60 >> process_name=MemmonAll >> environment >> = >> SUPERVISOR_USERNAME >> =nfw,SUPERVISOR_PASSWORD=nfw,SUPERVISOR_SERVER_URL=http://localhost:9001 > > I start up supervisor, and the plone instance stabilizes at 97MB, > then after connecting to plone with a browser, jumps to 113MB: > >> 2009-10-03 10:30:34,798 INFO spawned: 'MemmonAll' with pid 11608 >> 2009-10-03 10:30:34,816 INFO spawned: 'zeo' with pid 11609 >> 2009-10-03 10:30:34,826 INFO spawned: 'instance' with pid 11610 >> ch...@mackeral:~$ ps -orss 11610 >> RSS >> 97444 >> ch...@mackeral:~$ ps -orss 11610 >> RSS >> 113512 > > But supervisor never restarts the instance. I don't see any info in > the logs that its checking all processes or anything else. > > I'm not sure how to debug it with pdb since supervisor spawns memmon. > Sticking print and logging statements into superlance/memmon.py > indicates to me that it never comes back from this wait call: > >> def runforever(self, test=False): >> while 1: >> # we explicitly use self.stdin, self.stdout, and >> self.stderr >> # instead of sys.* so we can unit test this code >> headers, payload = childutils.listener.wait(self.stdin, >> self.stdout) > > Any suggestions for debugging this? Thanks. > > > > _______________________________________________ > Supervisor-users mailing list > [email protected] > http://lists.supervisord.org/mailman/listinfo/supervisor-users > _______________________________________________ Supervisor-users mailing list [email protected] http://lists.supervisord.org/mailman/listinfo/supervisor-users
