RE: [Zope-dev] Possible Windows Service improvements.
Ressurecting a bit of an old thread: From: Chris McDonough [mailto:[EMAIL PROTECTED] Sent: Wednesday, 4 August 2004 11:19 PM Subject: Re: [Zope-dev] Possible Windows Service improvements. ... I'm a Windows signal idiot. I was too. I think I understand them a little better now after having played with both the signal module, and the win32 specific functions. Is there a way that we can make the Zope process capture Windows signals and when the Windows equivalent of SIGTERM is sent to the process to shut it down cleanly? This is how it works on UNIX, but we circumvent trying to listen for signals on Windows entirely at startup. There are all sorts of hooks for clean shutdown now that we can coopt if we can make the process capture a signal. I've uploaded a patch to http://collector.zope.org/Zope/1527. I'd appreciate any comments - specifically about if I have hooked the appropriate place. My first reaction was that the correct hooks were in SignalHandler.py and Signals.py - however, Windows signals really aren't suitable for hooking there - only SIGINT is supported. Trying to twist code into pretending signals on Windows worked like Linux ended up with a bit of a mess. Hacking Lifetime.py was the cleanest solution. Note that the UNIX environment has a lot of additional niceties due to responses to signals (like logfile rotation) that Windows doesn't now, which tends to have the effect of relegating Windows to a second-class platform on which to run a production Zope instance. I guess the correct way to do that gets back to the other issue this thread raised - cross-platform startup/error reporting and command handling. I fear that will take a little longer to implement. I hope to break this up into 2 tasks: * Give windows reliable shutdown behaviour now. * Try and develop a basis for reliable cross-platform parent/child notification and control. I think the first would allow us to gracefully shutdown services - at the moment the child process is immediately terminated! The second would give us better startup and error recovery, but that seems less important to me. I hope to submit a patch for the service code shortly. Mark ___ Zope-Dev maillist - [EMAIL PROTECTED] http://mail.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope )
RE: [Zope-dev] Possible Windows Service improvements.
Would a lockfile work ok to signify running state? IOW, Zope would lock a file as one of the last steps during startup (which it actually does now, but might do it a bit too early). The Windows service manager would attempt to lock the same file after timeout seconds (although I'm not sure where to get timeout from) on initial startup. If the file doesn't exist or the lock succeeds, the process is nonrestartable. If it exists and cannot be locked (Zope has locked it, signifying successful startup), the process is restartable. This lets the parent know of successful startup, but doesn't help it during the 'starting' phase (if the lock never gets set, how long do we wait before giving up?) Ditto for shutdown - after requesting the child shutdown, how long do we wait for it to terminate? A single timeout leaves us where we started - it is too fragile to assume that if the app has not started after x seconds, it never will. By having the child actively report progress, the parent process can be more confident it is alive and normal, even if it is taking a very long time to start. Wouldn't a socket work? As the parent process is starting the child, it could pass the port on the cmdline. I think by *describing* the API, I made it sound more complicated than it is. It consists of 2 functions: def reportState(state, timeout, error_info = None): Called by the child process, and implemented by the parent. Reports the status of the child process to the parent. state may be 'starting', 'running', 'stopping', 'stopped', 'pausing' or 'paused'. timeout is how long before the parent will consider us dead if we don't report progress again. error_info may be specified as the program reports 'stopped'. If None, the shutdown was normal. # This second one is optional, but abstracts away using signals :) def changeState(new_state): Called by the parent process, and implemented by the child. Requests that the client process enter the specified state. Only 'running', 'stopped' and 'paused' are valid (ie, the parent must never request any of the transition states) If we ignore the second and stick to signals, it seems fairly robust and achievable. The other option is that we get rid of the 'parent' service all together. Just have the service code use run.py directly, and run it in-process. The Windows builtin 'auto-restart' facilities for services can handle the bizarre errors, and offers the benefit of allowing the user to define the restart policy (even allowing them to reboot should it go down wink). There may be good reasons for that though. Mark. ___ Zope-Dev maillist - [EMAIL PROTECTED] http://mail.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope )
RE: [Zope-dev] Possible Windows Service improvements.
Would a lockfile work ok to signify running state? IOW, Zope would lock a file as one of the last steps during startup (which it actually does now, but might do it a bit too early). The Windows service manager would attempt to lock the same file after timeout seconds (although I'm not sure where to get timeout from) on initial startup. If the file doesn't exist or the lock succeeds, the process is nonrestartable. If it exists and cannot be locked (Zope has locked it, signifying successful startup), the process is restartable. On Sun, 2004-08-08 at 21:10, Mark Hammond wrote: [Me] By adding a layer around run.py, I believe we could arrange for these fatal errors to be handled with a special return code. [Jim] I assume by fatal, you mean errors that we should not try to restart from. Correct. Let me see if I understand the use cases here: - Normal shutdown. (Should it be possible to shut down Zope through the web on Windows?) I see no reason it makes less sense for Windows than it does for anywhere else wink. - Start-up error. We want to log relevent information somewhere. We don't want to restart. - Run-time (after startup) error. We also want to log a problem, but we do want to restart Zope. Yep, I think that covers it. Note that we also need to consider uncontrolled exits, like segfaults. Yes - if the segfault is at startup, it should be considered non-restartable. Once normal operations have started, a segfault should cause a restart. Perhaps there should be a framework that with calls that a program can make to indicate normal exit, fatal (non-restartable) exit, and non-fatal (restartable) exit. That could done with process exit codes if all the child needs to is report *exit* status - but that really doesn't cover enough bases. Given the number of ways programs can fail, it may be hard to guarantee, and doesn't handle uncontrolled exits or children going zombie. What we need is something a more authoritative - where the child process actively signals its state to the parent - ie starting, running or stopping. pausing and paused may also make sense. If the child never reported 'running', it is non-restartable. If the child terminates without reporting graceful shutdown, it is restartable. This still does not provide any way of handling the case when the child process is running, but failing to transition between states. We still need a timeout, but can make it more robust by having the child process report the status *and* the timeout the parent should use. Which, coincidently, sounds exactly like the Windows Services API wink. Is that sounding reasonable, or moving into too-complicated/YAGNI territory? Thanks, Mark. ___ Zope-Dev maillist - [EMAIL PROTECTED] http://mail.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope ) ___ Zope-Dev maillist - [EMAIL PROTECTED] http://mail.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope )
RE: [Zope-dev] Possible Windows Service improvements.
[Me] By adding a layer around run.py, I believe we could arrange for these fatal errors to be handled with a special return code. [Jim] I assume by fatal, you mean errors that we should not try to restart from. Correct. Let me see if I understand the use cases here: - Normal shutdown. (Should it be possible to shut down Zope through the web on Windows?) I see no reason it makes less sense for Windows than it does for anywhere else wink. - Start-up error. We want to log relevent information somewhere. We don't want to restart. - Run-time (after startup) error. We also want to log a problem, but we do want to restart Zope. Yep, I think that covers it. Note that we also need to consider uncontrolled exits, like segfaults. Yes - if the segfault is at startup, it should be considered non-restartable. Once normal operations have started, a segfault should cause a restart. Perhaps there should be a framework that with calls that a program can make to indicate normal exit, fatal (non-restartable) exit, and non-fatal (restartable) exit. That could done with process exit codes if all the child needs to is report *exit* status - but that really doesn't cover enough bases. Given the number of ways programs can fail, it may be hard to guarantee, and doesn't handle uncontrolled exits or children going zombie. What we need is something a more authoritative - where the child process actively signals its state to the parent - ie starting, running or stopping. pausing and paused may also make sense. If the child never reported 'running', it is non-restartable. If the child terminates without reporting graceful shutdown, it is restartable. This still does not provide any way of handling the case when the child process is running, but failing to transition between states. We still need a timeout, but can make it more robust by having the child process report the status *and* the timeout the parent should use. Which, coincidently, sounds exactly like the Windows Services API wink. Is that sounding reasonable, or moving into too-complicated/YAGNI territory? Thanks, Mark. ___ Zope-Dev maillist - [EMAIL PROTECTED] http://mail.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope )
Re: [Zope-dev] Possible Windows Service improvements.
Some quick hit-and-runs. [Mark Hammond] FWIW, the way to do that better was to use WaitForSingleObject(hStopEvent, timeout_period*1000). [Cris McDonough] Is this a more-or-less direct replacement for the usage of time.sleep() in the current incarnation of SvcDoRun? The Windows WaitFor{Single,Multiple}Object calls are mondo cool. Think select() under Unix, but generalized to all sorts of things. The call above waits for hStopEvent to be signaled, but for no longer than the timeout period. The return value tells you whether you timed out or got the signal you were waiting for. Ironically, while the WaitFor... calls work with events, mutexes, processes, etc, etc, they *don't* work with sockets. Heh. ... It actually impossible to *stop* Zope from using pieces of another system-installed Python, given that it apparently looks unconditionally to the registry to find some things. (See the thread revolving around http://mail.zope.org/pipermail/zope-dev/2004-March/021979.html for the details). This is actually not a desirable feature. Mark knows about that, and it's fixed in current win32all. But current win32all builds are shipped as disutils-produced Windows installers, not as Wise installers, so none of our Windows-buildout scripts know what to do with a current release. Given what little (albeit crucial) use Zope makes of win32all, it might be wise for Zope to repackage a much smaller part of it. ... FWIW, Tim (most recently), I (second-most-recently), and Brian Lloyd (least recently) are really the only people who have put a concerted effort into keeping Zope running and installable acceptably under Windows. Actually not! I've never built the Zope Windows installer, and the process in fact never ran to completion on Win98SE. I do build the Zope Z4I and ZRS Windows installers, and they've gotten a lot more recent attention. Z4I requires that Zope was already installed, and ZRS is independent of Zope. They're quite different from the Zope Windows installers in some respects (for example, they ship files with Windows line endings wink), but I haven't had time to fold those improvements back into the Zope installer. ... Tim has an excuse (he still works for ZC, which apparently does have customers that use Windows as a server platform and he uses Windows himself), but I think everyone would agree that he really should be working on more important things. Rob doesn't agree, when a paying customer is waiting for a Windows ZRS installer. I don't blame him. Despite that I really, really want to wink. Brian has long ago disavowed any knowledge of making Zope run under Windows. So in other words, it's probably a very good thing you're here! ;-) I've known Mark for a decade, and it's generally a very good thing when he's anywhere near. The trick for us (and it's a very small trick) is to convince him that the Windows Zope user experience is poor. He won't be able to refrain from fixing it then, and even if it's entirely Windows's fault! You should see what he did with the SpamBayes Outlook client ... ___ Zope-Dev maillist - [EMAIL PROTECTED] http://mail.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope )
RE: [Zope-dev] Possible Windows Service improvements.
You left out the lack of 'zopectl debug' ;] I have managed to create it though -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Behalf Of Mark Hammond Sent: Wednesday, August 04, 2004 7:35 AM To: [EMAIL PROTECTED] Subject: [Zope-dev] Possible Windows Service improvements. Hi all, I am starting to venture into the wonderful world of Zope! With the benefit of a complete lack of Zope experience, I have been able to look at the Windows service support from a fairly clean slate. However, I also realize this lack of experience means my ideas may be naive - hence I have attempted to split them into discrete issues for discrete rejection wink. 1) startup error redirection. I've noticed that the main Zope service driver for Windows seems to work fine when everything is setup correctly, but when things go wrong it offers no clues as to what. This is reflected in collector item 1020 (poor error reporting on product initialisation failure under windows). Issue 1408 (Configuration file imports don't see INSTANCE_HOME when running Zope as a windows service), via the referenced thread, has evidence of someone burning a day due to this. It cost me alot of time too :) I propose: Each time the child process terminates with a non-zero return code, the tail x-bytes of the child output be written to the Windows event log, where x~2k. 2) reporting of successful start and backoff strategy. A trivial startup error (eg, PYTHONPATH not set) will cause the Zope service to hopelessly retry for a number of minutes, and not respond to shutdown requests during a retry. At the moment, as soon as the service starts it reports successful startup to Windows. It then begins an attempt to start the child. If the child immediately fails, the code immediately begins the backoff strategy. This strategy appears to have 2 main purposes: * Startup may fail due to other 'services' not having yet started, so retry in the hope they become available. * The process may die due to some obscure error - restart it. On windows, assuming we install the service to depend on the tcpip service, I see no reason that the first reason is valid. If the process fails quickly the first time we attempt to launch it, it is almost certainly going to fail every time we try and launch it. The current strategy also means that 3rd party services could not themselves depend on the Zope service - the Zope service will report successful startup before it really has (and therefore the dependent service may itself fail). This isn't a known requirement today, but who knows! net start and other front ends also fail to detect fatal errors - they all say Zope started OK. I propose: We insist the child process can be created and continues to run for x seconds (where x~5). If that fails, we report an error (never reporting to Windows that we started successfully). If the child process stays alive for this period, we report success to Windows, and then use the existing backoff strategy should it die. If the machine is heavily loaded, this 5 seconds may expire before the fatal error is hit in the child - in that case, we are simply doing what we do now - using the backoff strategy to hopelessly attempt a restart - ie, a win in most cases, and no loss in the others. 3) environment setting The service process should set a number of environment variables before spawning the child - PYTHONPATH at a minimum, and according to issue #1408, INSTANCE_HOME. It already knows these values thanks to mkzopeinstance. I'm yet to determine where these values comes from for in binary build, but I see no reason not to fix this (and possibly remove whatever magic the binary does) I propose: A few trivial os.environ insertions based on the substitutions done by mkzopeinstance, before we create the child process(es). Alternatively, we create an explicit new environment we pass to CreateProcess, but I see no good reason for that.) 4) Currently, when the process is stopped, we immediately terminate the child process. This seems dangerous. We should find a way to gracefully terminate the child, and try that before we simply kill it. I propose: That someone help me work out how to do this wink. I've already worked out how if the service knows the username/password of a Zope administrator, but it doesn't! Sending a Ctrl_C 'signal' doesn't work without hacks to run.py (and I'm yet to confirm it will even with such hacks). I welcome any feedback on these issues. Obviously I am willing to back each of these proposals up (except 4!) with code that seems to work :) I would also welcome feedback on the best way to proceed (ie, create a new collector for each issue? thrash it out here? give up?wink, etc) Note that none of these issues would require a win32all/pywin32 update. If anyone was really upset by issue 1423 (Zope 2.7.1 won't run as service under NT), and also able to test, I'd be willing to fix it - but that *would* require a pywin32
Re: [Zope-dev] Possible Windows Service improvements.
Mark Hammond wrote: ... Thanks Jim. I agree with those concerns. Note that already we do not recognise startup failure and keep retyring even though we shouldn't, and is exactly what I am trying to solve. Yup Is there some sort of IPC that can be used here? On Unix, we use a reserved exit code to indicate a startup failure and arrange for Zope to exit with that. The Unix situation is a little different - if runzope itself has immediate startup failure, then everything immediately fails - ie, as far as Unix is concerned, the service itself failed. On Windows however, runzope is executed as a *child* of the service. If the child fatally fails, the service itself is still reporting success, and hopelessly attempting a restart. The service needs to know if the child fatally failed. This doesn't apply on Unix as runzope *is* the 'service', not the child of the service. By adding a layer around run.py, I believe we could arrange for these fatal errors to be handled with a special return code. I assume by fatal, you mean errors that we should not try to restart from. Let me see if I understand the use cases here: - Normal shutdown. (Should it be possible to shut down Zope through the web on Windows?) - Start-up error. We want to log relevent information somewhere. We don't want to restart. - Run-time (after startup) error. We also want to log a problem, but we do want to restart Zope. Note that we also need to consider uncontrolled exits, like segfaults. Alternatively, if Zope itself never returns an error code of 1 (one), then we could use that - Python itself returns this for unhandled exceptions. That seems dangerous though. Can you offer some advice here? Probably not at the level you are asking. I'm not familiar with any of the details here and can't make the time to be. I just know we need an explicit mechanism, rather than trying to guess based on timing. * Is an exit-code of 1 suitable for a fatal error? If so, this requires no changes to the child process. However, I assume it is not suitable. Why not use an exit code of 0? I think that Zope could arange to exit with an exit code of 0 if there is a startup error. * Is a special exit code, generated by a wrapper to run.py (eg, run_svc.py) suitable? If so, what value do you recommend? Sorry, I don't understand these scripts will enough to comment. * Should some Win32 specific, robust IPC mechanism be investigated? This would cut deeper into a run.py wrapper, and obviously is not a general solution. I think there is a general solution lurking here. But I'm too ignorant of the details to be more specific. :( On another note, I'd really prefer to work out a general facility that can be used with any Python program, including both Zope 2 and Zope 3. The problem at the moment is that our facilities are *too* general - ie, without some coordination between the parent and child, the parent must guess. Right. We need something at a lower level. It would be OK with me if this something had an API that the program being run had to use. The simplest coordination does seem to be process exit code, but that seems fragile. Yup. But whatever coodination is chosen, any Python program that was willing to play the coordinate game could use it. The simpler this game to play, the more fragile the system is (ie, just using exit codes is simple, but fragile; using other IPC mechanisms could be made robust, but is not simple - especially not in a platform independent way.) Right. They may not need to be platform independent though, if there is some sort of API in the middle. Perhaps there should be a framework that with calls that a program can make to indicate normal exit, fatal (non-restartable) exit, and non-fatal (restartable) exit. A long-runing-process manager could provide some way of handling these events as well as handling exits without an aplication having signalled onde of these events. Jim -- Jim Fulton mailto:[EMAIL PROTECTED] Python Powered! CTO (540) 361-1714http://www.python.org Zope Corporation http://www.zope.com http://www.zope.org ___ Zope-Dev maillist - [EMAIL PROTECTED] http://mail.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope )
Re: [Zope-dev] Possible Windows Service improvements.
On Wed, 2004-08-04 at 07:35, Mark Hammond wrote: Hi all, I am starting to venture into the wonderful world of Zope! With the benefit of a complete lack of Zope experience, I have been able to look at the Windows service support from a fairly clean slate. However, I also realize this lack of experience means my ideas may be naive - hence I have attempted to split them into discrete issues for discrete rejection wink. 1) startup error redirection. I've noticed that the main Zope service driver for Windows seems to work fine when everything is setup correctly, but when things go wrong it offers no clues as to what. This is reflected in collector item 1020 (poor error reporting on product initialisation failure under windows). Issue 1408 (Configuration file imports don't see INSTANCE_HOME when running Zope as a windows service), via the referenced thread, has evidence of someone burning a day due to this. It cost me alot of time too :) Yes, sorry about that! (I was the fool who checked in the NameError referenced in 1408). I propose: Each time the child process terminates with a non-zero return code, the tail x-bytes of the child output be written to the Windows event log, where x~2k. This is a good idea. FWIW, I believe the Zope HEAD already has some work done towards this (in lib/python/nt_svcutils/service.py), although the child output goes to a logfile instead of the event log. It would be nice to make the output go to the event log and then backport this to 2.7. 2) reporting of successful start and backoff strategy. A trivial startup error (eg, PYTHONPATH not set) will cause the Zope service to hopelessly retry for a number of minutes, and not respond to shutdown requests during a retry. Yup. The reason it retry-restarts is because it's simple and stupid and the reason it doesn't respond to shutdown requests during a retry is because the service code sleeps for the backoff interval after an unsuccessful startup. Any async requests that happen in the meantime are blocked waiting for this sleep to end. I'm not quite sure how to do that better. At the moment, as soon as the service starts it reports successful startup to Windows. It then begins an attempt to start the child. If the child immediately fails, the code immediately begins the backoff strategy. This strategy appears to have 2 main purposes: * Startup may fail due to other 'services' not having yet started, so retry in the hope they become available. * The process may die due to some obscure error - restart it. One concrete example of the obscure error is that the Zope process handles a restart request from its Control Panel web interface. On windows, assuming we install the service to depend on the tcpip service, I see no reason that the first reason is valid. If the process fails quickly the first time we attempt to launch it, it is almost certainly going to fail every time we try and launch it. The current strategy also means that 3rd party services could not themselves depend on the Zope service - the Zope service will report successful startup before it really has (and therefore the dependent service may itself fail). This isn't a known requirement today, but who knows! net start and other front ends also fail to detect fatal errors - they all say Zope started OK. I propose: We insist the child process can be created and continues to run for x seconds (where x~5). If that fails, we report an error (never reporting to Windows that we started successfully). If the child process stays alive for this period, we report success to Windows, and then use the existing backoff strategy should it die. If the machine is heavily loaded, this 5 seconds may expire before the fatal error is hit in the child - in that case, we are simply doing what we do now - using the backoff strategy to hopelessly attempt a restart - ie, a win in most cases, and no loss in the others. That sounds good. 3) environment setting The service process should set a number of environment variables before spawning the child - PYTHONPATH at a minimum, and according to issue #1408, INSTANCE_HOME. It already knows these values thanks to mkzopeinstance. I'm yet to determine where these values comes from for in binary build, but I see no reason not to fix this (and possibly remove whatever magic the binary does) I propose: A few trivial os.environ insertions based on the substitutions done by mkzopeinstance, before we create the child process(es). Alternatively, we create an explicit new environment we pass to CreateProcess, but I see no good reason for that.) Note that the Zope Python install also has a sitecustomize.py that munges sys.path in order to get things set up properly. Others have claimed this is unnecessary and that the work that gets done in there could be done in the service code. It's a bit of a mess. At one point I flailed trying to make the child process inherit its
Re: [Zope-dev] Possible Windows Service improvements.
Mark Hammond wrote: ... I propose: We insist the child process can be created and continues to run for x seconds (where x~5). I recommend against this strategy. It's too implicit and has bitten us on Unix in the past. The problem with the strategy if the program starts very fast and then dies, we don't restart it even though we should. Similarly, if the program starts very slowly, we don't recognize a startup failure and keep retrying even though we shouldn't. Is there some sort of IPC that can be used here? On Unix, we use a reserved exit code to indicate a startup failure and arrange for Zope to exit with that. On another note, I'd really prefer to work out a general facility that can be used with any Python program, including both Zope 2 and Zope 3. Jim -- Jim Fulton mailto:[EMAIL PROTECTED] Python Powered! CTO (540) 361-1714http://www.python.org Zope Corporation http://www.zope.com http://www.zope.org ___ Zope-Dev maillist - [EMAIL PROTECTED] http://mail.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope )
RE: [Zope-dev] Possible Windows Service improvements.
[Me] I propose: We insist the child process can be created and continues to run for x seconds (where x~5). [Jim] I recommend against this strategy. It's too implicit and has bitten us on Unix in the past. The problem with the strategy if the program starts very fast and then dies, we don't restart it even though we should. Similarly, if the program starts very slowly, we don't recognize a startup failure and keep retrying even though we shouldn't. Thanks Jim. I agree with those concerns. Note that already we do not recognise startup failure and keep retyring even though we shouldn't, and is exactly what I am trying to solve. Is there some sort of IPC that can be used here? On Unix, we use a reserved exit code to indicate a startup failure and arrange for Zope to exit with that. The Unix situation is a little different - if runzope itself has immediate startup failure, then everything immediately fails - ie, as far as Unix is concerned, the service itself failed. On Windows however, runzope is executed as a *child* of the service. If the child fatally fails, the service itself is still reporting success, and hopelessly attempting a restart. The service needs to know if the child fatally failed. This doesn't apply on Unix as runzope *is* the 'service', not the child of the service. By adding a layer around run.py, I believe we could arrange for these fatal errors to be handled with a special return code. Alternatively, if Zope itself never returns an error code of 1 (one), then we could use that - Python itself returns this for unhandled exceptions. That seems dangerous though. Can you offer some advice here? * Is an exit-code of 1 suitable for a fatal error? If so, this requires no changes to the child process. However, I assume it is not suitable. * Is a special exit code, generated by a wrapper to run.py (eg, run_svc.py) suitable? If so, what value do you recommend? * Should some Win32 specific, robust IPC mechanism be investigated? This would cut deeper into a run.py wrapper, and obviously is not a general solution. On another note, I'd really prefer to work out a general facility that can be used with any Python program, including both Zope 2 and Zope 3. The problem at the moment is that our facilities are *too* general - ie, without some coordination between the parent and child, the parent must guess. The simplest coordination does seem to be process exit code, but that seems fragile. But whatever coodination is chosen, any Python program that was willing to play the coordinate game could use it. The simpler this game to play, the more fragile the system is (ie, just using exit codes is simple, but fragile; using other IPC mechanisms could be made robust, but is not simple - especially not in a platform independent way.) Mark. ___ Zope-Dev maillist - [EMAIL PROTECTED] http://mail.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope )
RE: [Zope-dev] Possible Windows Service improvements.
I propose: Each time the child process terminates with a non-zero return code, the tail x-bytes of the child output be written to the Windows event log, where x~2k. This is a good idea. FWIW, I believe the Zope HEAD already has some work done towards this (in lib/python/nt_svcutils/service.py), although the child output goes to a logfile instead of the event log. It would be nice to make the output go to the event log and then backport this to 2.7. I started from HEAD, and indeed did find a good attempt at making this work. However, it was disabled by default, it writes to a file, and had issues with blocking reads. I fixed this to capure the output in memory (but not all of it - just the tail), and to use a single pipe for both stdout and stderr. 2) reporting of successful start and backoff strategy. A trivial startup error (eg, PYTHONPATH not set) will cause the Zope service to hopelessly retry for a number of minutes, and not respond to shutdown requests during a retry. Yup. The reason it retry-restarts is because it's simple and stupid and the reason it doesn't respond to shutdown requests during a retry is because the service code sleeps for the backoff interval after an unsuccessful startup. Any async requests that happen in the meantime are blocked waiting for this sleep to end. I'm not quite sure how to do that better. FWIW, the way to do that better was to use WaitForSingleObject(hStopEvent, timeout_period*1000). Note that the Zope Python install also has a sitecustomize.py that munges sys.path in order to get things set up properly. Right! That is the magic I was referring to and had not yet found. Others have claimed this is unnecessary and that the work that gets done in there could be done in the service code. It's a bit of a mess. I agree with the others :) mkzopeinstance goes to lengths to provide all relevent information, and the service code does not take full advantage of that. It should be possible for Zope to work with a standard, external Python/pywin32 installation. I'm not suggesting this become a distribution option, but still a worthy goal; for developers, and to keep us honest wink. Note that with a few tweaks, Zope *does* build and work with an external Python/pywin32. At one point I flailed trying to make the child process inherit its environment from the parent, and plastered over the problem with various sys.path and PYTHONPATH and other environment variable settings. The current situation is a result. Some guidance here would be helpful. The child process *does* inherit the parent, service environment. Hence adding os.environ[] entries in the service does set them for subsequently created children. ie, setting os.environ[PYTHONPATH]=SOFTWARE_HOME in the service main code appears to avoid the sitecustomize.py requirement - the child process *does* see the new PYTHONPATH. I'm a Windows signal idiot. No - Windows is a signal idiot :) Is there a way that we can make the Zope process capture Windows signals and when the Windows equivalent of SIGTERM is sent to the process to shut it down cleanly? This is how it works on UNIX That makes sense, but: but we circumvent trying to listen for signals on Windows entirely at startup. Can you explain the above? Do you mean that on Windows, you take no special signal actions, as demonstrated in WindowsZopeStarter.registerSignals? I'll see what I can come up with though. Note that the UNIX environment has a lot of additional niceties due to responses to signals (like logfile rotation) that Windows doesn't now, which tends to have the effect of relegating Windows to a second-class platform on which to run a production Zope instance. Windows certainly has these features available - they are just not always spelt the same as they are on Unix. Sometimes they are even better wink. So there seems to be a chicken-and-egg problem - users will always consider it second class until Zope itself starts considering it first class. This is an observation, not a critisism :) Mark. ___ Zope-Dev maillist - [EMAIL PROTECTED] http://mail.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope )
RE: [Zope-dev] Possible Windows Service improvements.
On Wed, 2004-08-04 at 19:56, Mark Hammond wrote: On Windows however, runzope is executed as a *child* of the service. If the child fatally fails, the service itself is still reporting success, and hopelessly attempting a restart. The service needs to know if the child fatally failed. This doesn't apply on Unix as runzope *is* the 'service', not the child of the service. FWIW, the equivalent of the service manager code under UNIX is zopectl rather than runzope. zopectl has its own failure detection and backoff algorithm that's a bit more complex than the Windows service code of the same ilk. Actually that's not entirely true: zopectl is a client that attempts to communicate with a separate daemonizing process via a UNIX domain socket. The daemonizing process is really the parent of the Zope process (it just invokes runzope). The majority of code for this is in lib/python/zdaemon/zdrun.py. The mainloop that impleents the backoff algorithm is in the runforever method of the Daemonizer class in that file, and the thing that decides not to restart it if it exits with a known error code is in the reportstatus method of the same class. You probably care about none of this, but it's there if you do. ;-) By adding a layer around run.py, I believe we could arrange for these fatal errors to be handled with a special return code. Alternatively, if Zope itself never returns an error code of 1 (one), then we could use that - Python itself returns this for unhandled exceptions. That seems dangerous though. When you say these errors above, do you mean any unhandled exception? If so (and any nonzero exit code indicated a startup failure), would we really need cooperation from run.py for this? It seems like it could be done entirely inside of SvcDoRun. OTOH, if the real problem is that you can't stop the service from fruitlessly restarting itself in the face of an insoluble error because of the blocking sleep, it seems like you already solved that. Would it be a reasonable strategy to leave the backoff stupidty as-is if you were able to stop the service from flailing via the service manager CP applet and if it didn't report successful startup until the child actually starts successfully? Can you offer some advice here? * Is an exit-code of 1 suitable for a fatal error? If so, this requires no changes to the child process. However, I assume it is not suitable. * Is a special exit code, generated by a wrapper to run.py (eg, run_svc.py) suitable? If so, what value do you recommend? * Should some Win32 specific, robust IPC mechanism be investigated? This would cut deeper into a run.py wrapper, and obviously is not a general solution. I guess I'll need to wait for the confusion evidenced by my last question to clear up before I'd be able to venture an answer to that. On another note, I'd really prefer to work out a general facility that can be used with any Python program, including both Zope 2 and Zope 3. The problem at the moment is that our facilities are *too* general - ie, without some coordination between the parent and child, the parent must guess. The simplest coordination does seem to be process exit code, but that seems fragile. But whatever coodination is chosen, any Python program that was willing to play the coordinate game could use it. The simpler this game to play, the more fragile the system is (ie, just using exit codes is simple, but fragile; using other IPC mechanisms could be made robust, but is not simple - especially not in a platform independent way.) Mark. ___ Zope-Dev maillist - [EMAIL PROTECTED] http://mail.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope ) ___ Zope-Dev maillist - [EMAIL PROTECTED] http://mail.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope )
RE: [Zope-dev] Possible Windows Service improvements.
On Wed, 2004-08-04 at 20:24, Mark Hammond wrote: This is a good idea. FWIW, I believe the Zope HEAD already has some work done towards this (in lib/python/nt_svcutils/service.py), although the child output goes to a logfile instead of the event log. It would be nice to make the output go to the event log and then backport this to 2.7. I started from HEAD, and indeed did find a good attempt at making this work. However, it was disabled by default, it writes to a file, and had issues with blocking reads. I fixed this to capure the output in memory (but not all of it - just the tail), and to use a single pipe for both stdout and stderr. Great! Yup. The reason it retry-restarts is because it's simple and stupid and the reason it doesn't respond to shutdown requests during a retry is because the service code sleeps for the backoff interval after an unsuccessful startup. Any async requests that happen in the meantime are blocked waiting for this sleep to end. I'm not quite sure how to do that better. FWIW, the way to do that better was to use WaitForSingleObject(hStopEvent, timeout_period*1000). Is this a more-or-less direct replacement for the usage of time.sleep() in the current incarnation of SvcDoRun? Do you think you might be able to attach your patched service.py (and related files) to a future mail? Others have claimed this is unnecessary and that the work that gets done in there could be done in the service code. It's a bit of a mess. I agree with the others :) mkzopeinstance goes to lengths to provide all relevent information, and the service code does not take full advantage of that. It should be possible for Zope to work with a standard, external Python/pywin32 installation. I'm not suggesting this become a distribution option, but still a worthy goal; for developers, and to keep us honest wink. Note that with a few tweaks, Zope *does* build and work with an external Python/pywin32. Yup, it should be possible; there's really nothing special about the Zope-supplied Python, we just have a history of attempting to make Zope install under Windows without requiring that the user know anything about Python, and the easiest way to do this is to ship our own. It actually impossible to *stop* Zope from using pieces of another system-installed Python, given that it apparently looks unconditionally to the registry to find some things. (See the thread revolving around http://mail.zope.org/pipermail/zope-dev/2004-March/021979.html for the details). This is actually not a desirable feature. At one point I flailed trying to make the child process inherit its environment from the parent, and plastered over the problem with various sys.path and PYTHONPATH and other environment variable settings. The current situation is a result. Some guidance here would be helpful. The child process *does* inherit the parent, service environment. Hence adding os.environ[] entries in the service does set them for subsequently created children. ie, setting os.environ[PYTHONPATH]=SOFTWARE_HOME in the service main code appears to avoid the sitecustomize.py requirement - the child process *does* see the new PYTHONPATH. Yup, that confirms what the thread above says too. I'm a Windows signal idiot. No - Windows is a signal idiot :) Actually, signals are idiotic, but... ;-) Is there a way that we can make the Zope process capture Windows signals and when the Windows equivalent of SIGTERM is sent to the process to shut it down cleanly? This is how it works on UNIX That makes sense, but: but we circumvent trying to listen for signals on Windows entirely at startup. Can you explain the above? Do you mean that on Windows, you take no special signal actions, as demonstrated in WindowsZopeStarter.registerSignals? That's just what I mean. The signaling mechanism doesn't necessarily need to literally be signals, I just don't know of another way to asynchronously influence the state of a running process (UNIX or Windows). Maybe Zope needs to cooperate a bit, which would be fine. I'll see what I can come up with though. Note that the UNIX environment has a lot of additional niceties due to responses to signals (like logfile rotation) that Windows doesn't now, which tends to have the effect of relegating Windows to a second-class platform on which to run a production Zope instance. Windows certainly has these features available - they are just not always spelt the same as they are on Unix. Sometimes they are even better wink. So there seems to be a chicken-and-egg problem - users will always consider it second class until Zope itself starts considering it first class. This is an observation, not a critisism :) Sure. It probably won't take much work to match the features that are available under UNIX. FWIW, Tim (most recently), I (second-most-recently), and Brian Lloyd (least recently) are
RE: [Zope-dev] Possible Windows Service improvements.
FWIW, the equivalent of the service manager code under UNIX is zopectl rather than runzope. Ahh, ok. That makes more sense now, thanks. When you say these errors above, do you mean any unhandled exception? If so (and any nonzero exit code indicated a startup failure), would we really need cooperation from run.py for this? It seems like it could be done entirely inside of SvcDoRun. Yes, that is what I tried to say here: * Is an exit-code of 1 suitable for a fatal error? If so, this requires no changes to the child process. However, I assume it is not suitable. As the exit code of 1 is the only way I see to detect an unhandled exception in the child. OTOH, if the real problem is that you can't stop the service from fruitlessly restarting itself in the face of an insoluble error because of the blocking sleep, it seems like you already solved that. Yeah, it really isn't a huge issue. It leaves us with the fact that Zope always reports startup success, even in hopeless cases. This isn't a real problem - just a nicety - along the lines of making Zope a first-class service on Windows wink. I'm happy to let that idea rest. To your other mail: Do you think you might be able to attach your patched service.py (and related files) to a future mail? I'll attach the patch in private mail. I just tried attaching it to collector 1020, but I don't have permission. It has the redirection and the sleep fix, but no changes to the startup strategy. It actually impossible to *stop* Zope from using pieces of another system-installed Python, given that it apparently looks unconditionally to the registry to find some things. Yes, and it is my fault that code even exists. As far as I know, win32all was the only product that registered modules in this way, and later versions no longer do. I'd be +1 on getting this removed from Python. Thanks, Mark. ___ Zope-Dev maillist - [EMAIL PROTECTED] http://mail.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope )