RE: [Zope-dev] Possible Windows Service improvements.

2004-10-12 Thread Mark Hammond
Ressurecting a bit of an old thread:

 From: Chris McDonough [mailto:[EMAIL PROTECTED]
 Sent: Wednesday, 4 August 2004 11:19 PM
 Subject: Re: [Zope-dev] Possible Windows Service improvements.
...

 I'm a Windows signal idiot.

I was too.  I think I understand them a little better now after having
played with both the signal module, and the win32 specific functions.

 Is there a way that we can make the Zope
 process capture Windows signals and when the Windows equivalent of
 SIGTERM is sent to the process to shut it down cleanly?  This is how
 it works on UNIX, but we circumvent trying to listen for signals on
 Windows entirely at startup.  There are all sorts of hooks for clean
 shutdown now that we can coopt if we can make the process capture a
 signal.

I've uploaded a patch to http://collector.zope.org/Zope/1527.  I'd
appreciate any comments - specifically about if I have hooked the
appropriate place.

My first reaction was that the correct hooks were in SignalHandler.py and
Signals.py - however, Windows signals really aren't suitable for hooking
there - only SIGINT is supported.  Trying to twist code into pretending
signals on Windows worked like Linux ended up with a bit of a mess.  Hacking
Lifetime.py was the cleanest solution.

 Note that the UNIX environment has a lot of additional niceties due to
 responses to signals (like logfile rotation) that Windows doesn't now,
 which tends to have the effect of relegating Windows to a second-class
 platform on which to run a production Zope instance.

I guess the correct way to do that gets back to the other issue this thread
raised - cross-platform startup/error reporting and command handling.  I
fear that will take a little longer to implement.  I hope to break this up
into 2 tasks:

* Give windows reliable shutdown behaviour now.
* Try and develop a basis for reliable cross-platform parent/child
notification and control.

I think the first would allow us to gracefully shutdown services - at the
moment the child process is immediately terminated!  The second would give
us better startup and error recovery, but that seems less important to me.
I hope to submit a patch for the service code shortly.

Mark

___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://mail.zope.org/mailman/listinfo/zope-announce
 http://mail.zope.org/mailman/listinfo/zope )


RE: [Zope-dev] Possible Windows Service improvements.

2004-08-12 Thread Mark Hammond
 Would a lockfile work ok to signify running state?  IOW, Zope would
 lock a file as one of the last steps during startup (which it actually
 does now, but might do it a bit too early).  The Windows
 service manager
 would attempt to lock the same file after timeout seconds (although
 I'm not sure where to get timeout from) on initial startup.

 If the file doesn't exist or the lock succeeds, the process is
 nonrestartable.
 If it exists and cannot be locked (Zope has locked it, signifying
 successful startup), the process is restartable.

This lets the parent know of successful startup, but doesn't help it during
the 'starting' phase (if the lock never gets set, how long do we wait before
giving up?)  Ditto for shutdown - after requesting the child shutdown, how
long do we wait for it to terminate?

A single timeout leaves us where we started - it is too fragile to assume
that if the app has not started after x seconds, it never will.  By having
the child actively report progress, the parent process can be more confident
it is alive and normal, even if it is taking a very long time to start.

Wouldn't a socket work?  As the parent process is starting the child, it
could pass the port on the cmdline.

I think by *describing* the API, I made it sound more complicated than it
is.  It consists of 2 functions:

def reportState(state, timeout, error_info = None):
  Called by the child process, and implemented by the parent.
 Reports the status of the child process to the parent.

 state may be 'starting', 'running', 'stopping', 'stopped',
 'pausing' or 'paused'.

 timeout is how long before the parent will consider us dead
 if we don't report progress again.

 error_info may be specified as the program reports 'stopped'.
 If None, the shutdown was normal.
  

# This second one is optional, but abstracts away using signals :)
def changeState(new_state):
  Called by the parent process, and implemented by the child.
 Requests that the client process enter the specified state.
 Only 'running', 'stopped' and 'paused' are valid (ie, the
 parent must never request any of the transition states)
  

If we ignore the second and stick to signals, it seems fairly robust and
achievable.

The other option is that we get rid of the 'parent' service all together.
Just have the service code use run.py directly, and run it in-process.  The
Windows builtin 'auto-restart' facilities for services can handle the
bizarre errors, and offers the benefit of allowing the user to define the
restart policy (even allowing them to reboot should it go down wink).
There may be good reasons for that though.

Mark.

___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://mail.zope.org/mailman/listinfo/zope-announce
 http://mail.zope.org/mailman/listinfo/zope )


RE: [Zope-dev] Possible Windows Service improvements.

2004-08-09 Thread Chris McDonough
Would a lockfile work ok to signify running state?  IOW, Zope would
lock a file as one of the last steps during startup (which it actually
does now, but might do it a bit too early).  The Windows service manager
would attempt to lock the same file after timeout seconds (although
I'm not sure where to get timeout from) on initial startup.  If the
file doesn't exist or the lock succeeds, the process is nonrestartable. 
If it exists and cannot be locked (Zope has locked it, signifying
successful startup), the process is restartable.

On Sun, 2004-08-08 at 21:10, Mark Hammond wrote:
 [Me]
   By adding a layer around run.py, I believe we could arrange
   for these fatal errors to be handled with a special return code.
 
 [Jim]
  I assume by fatal, you mean errors that we should not try to restart
  from.
 
 Correct.
 
  Let me see if I understand the use cases here:
 
  - Normal shutdown.  (Should it be possible to shut down Zope
 through the web on Windows?)
 
 I see no reason it makes less sense for Windows than it does for anywhere
 else wink.
 
 
  - Start-up error. We want to log relevent information somewhere.
 We don't want to restart.
 
  - Run-time (after startup) error.  We also want to log a problem,
 but we do want to restart Zope.
 
 Yep, I think that covers it.
 
  Note that we also need to consider uncontrolled exits, like segfaults.
 
 Yes - if the segfault is at startup, it should be considered
 non-restartable.  Once normal operations have started, a segfault should
 cause a restart.
 
  Perhaps there should be a framework that with calls that a program can
  make to indicate normal exit, fatal (non-restartable) exit,
  and non-fatal (restartable) exit.
 
 That could done with process exit codes if all the child needs to is report
 *exit* status - but that really doesn't cover enough bases.  Given the
 number of ways programs can fail, it may be hard to guarantee, and doesn't
 handle uncontrolled exits or children going zombie.
 
 What we need is something a more authoritative - where the child process
 actively signals its state to the parent - ie starting, running or
 stopping.  pausing and paused may also make sense.  If the child never
 reported 'running', it is non-restartable.  If the child terminates without
 reporting graceful shutdown, it is restartable.
 
 This still does not provide any way of handling the case when the child
 process is running, but failing to transition between states.  We still need
 a timeout, but can make it more robust by having the child process report
 the status *and* the timeout the parent should use.
 
 Which, coincidently, sounds exactly like the Windows Services API wink.
 Is that sounding reasonable, or moving into too-complicated/YAGNI territory?
 
 Thanks,
 
 Mark.
 
 ___
 Zope-Dev maillist  -  [EMAIL PROTECTED]
 http://mail.zope.org/mailman/listinfo/zope-dev
 **  No cross posts or HTML encoding!  **
 (Related lists - 
  http://mail.zope.org/mailman/listinfo/zope-announce
  http://mail.zope.org/mailman/listinfo/zope )
 

___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://mail.zope.org/mailman/listinfo/zope-announce
 http://mail.zope.org/mailman/listinfo/zope )


RE: [Zope-dev] Possible Windows Service improvements.

2004-08-08 Thread Mark Hammond
[Me]
  By adding a layer around run.py, I believe we could arrange
  for these fatal errors to be handled with a special return code.

[Jim]
 I assume by fatal, you mean errors that we should not try to restart
 from.

Correct.

 Let me see if I understand the use cases here:

 - Normal shutdown.  (Should it be possible to shut down Zope
through the web on Windows?)

I see no reason it makes less sense for Windows than it does for anywhere
else wink.


 - Start-up error. We want to log relevent information somewhere.
We don't want to restart.

 - Run-time (after startup) error.  We also want to log a problem,
but we do want to restart Zope.

Yep, I think that covers it.

 Note that we also need to consider uncontrolled exits, like segfaults.

Yes - if the segfault is at startup, it should be considered
non-restartable.  Once normal operations have started, a segfault should
cause a restart.

 Perhaps there should be a framework that with calls that a program can
 make to indicate normal exit, fatal (non-restartable) exit,
 and non-fatal (restartable) exit.

That could done with process exit codes if all the child needs to is report
*exit* status - but that really doesn't cover enough bases.  Given the
number of ways programs can fail, it may be hard to guarantee, and doesn't
handle uncontrolled exits or children going zombie.

What we need is something a more authoritative - where the child process
actively signals its state to the parent - ie starting, running or
stopping.  pausing and paused may also make sense.  If the child never
reported 'running', it is non-restartable.  If the child terminates without
reporting graceful shutdown, it is restartable.

This still does not provide any way of handling the case when the child
process is running, but failing to transition between states.  We still need
a timeout, but can make it more robust by having the child process report
the status *and* the timeout the parent should use.

Which, coincidently, sounds exactly like the Windows Services API wink.
Is that sounding reasonable, or moving into too-complicated/YAGNI territory?

Thanks,

Mark.

___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://mail.zope.org/mailman/listinfo/zope-announce
 http://mail.zope.org/mailman/listinfo/zope )


Re: [Zope-dev] Possible Windows Service improvements.

2004-08-05 Thread Tim Peters
Some quick hit-and-runs.

[Mark Hammond]
 FWIW, the way to do that better was to use WaitForSingleObject(hStopEvent,
 timeout_period*1000).

[Cris McDonough]
 Is this a more-or-less direct replacement for the usage of time.sleep()
 in the current incarnation of SvcDoRun?

The Windows WaitFor{Single,Multiple}Object calls are mondo cool. 
Think select() under Unix, but generalized to all sorts of things. 
The call above waits for hStopEvent to be signaled, but for no
longer than the timeout period.  The return value tells you whether
you timed out or got the signal you were waiting for.  Ironically,
while the WaitFor... calls work with events, mutexes, processes, etc,
etc, they *don't* work with sockets.  Heh.

...
 It actually impossible to *stop* Zope from using pieces of another
 system-installed Python, given that it apparently looks unconditionally
 to the registry to find some things.
 (See the thread revolving around
 http://mail.zope.org/pipermail/zope-dev/2004-March/021979.html for the
 details).  This is actually not a desirable feature.

Mark knows about that, and it's fixed in current win32all.  But
current win32all builds are shipped as disutils-produced Windows
installers, not as Wise installers, so none of our Windows-buildout
scripts know what to do with a current release.

Given what little (albeit crucial) use Zope makes of win32all, it
might be wise for Zope to repackage a much smaller part of it.

 ...
 FWIW, Tim (most recently), I (second-most-recently), and Brian Lloyd
 (least recently) are really the only people who have put a concerted
 effort into keeping Zope running and installable acceptably under
 Windows.

Actually not!  I've never built the Zope Windows installer, and the
process in fact never ran to completion on Win98SE.  I do build the
Zope Z4I and ZRS Windows installers, and they've gotten a lot more
recent attention.  Z4I requires that Zope was already installed, and
ZRS is independent of Zope.  They're quite different from the Zope
Windows installers in some respects (for example, they ship files with
Windows line endings wink), but I haven't had time to fold those
improvements back into the Zope installer.

 ...
 Tim has an excuse (he still works for ZC, which apparently does have
 customers that use Windows as a server platform and he uses Windows
 himself), but I think everyone would agree that he really should be working on
 more important things.

Rob doesn't agree, when a paying customer is waiting for a Windows ZRS
installer.  I don't blame him.  Despite that I really, really want to
wink.

 Brian has long ago disavowed any knowledge of making Zope run under
 Windows.  So in other words, it's probably a very good thing you're here! ;-)

I've known Mark for a decade, and it's generally a very good thing
when he's anywhere near.  The trick for us (and it's a very small
trick) is to convince him that the Windows Zope user experience is
poor.  He won't be able to refrain from fixing it then, and even if
it's entirely Windows's fault!  You should see what he did with the
SpamBayes Outlook client ...
___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://mail.zope.org/mailman/listinfo/zope-announce
 http://mail.zope.org/mailman/listinfo/zope )


RE: [Zope-dev] Possible Windows Service improvements.

2004-08-05 Thread Matt Shaw
You left out the lack of 'zopectl debug' ;]  I have managed to create it
though



-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
Behalf Of Mark Hammond
Sent: Wednesday, August 04, 2004 7:35 AM
To: [EMAIL PROTECTED]
Subject: [Zope-dev] Possible Windows Service improvements.


Hi all,
  I am starting to venture into the wonderful world of Zope!  With the
benefit of a complete lack of Zope experience, I have been able to look at
the Windows service support from a fairly clean slate.  However, I also
realize this lack of experience means my ideas may be naive - hence I have
attempted to split them into discrete issues for discrete rejection wink.

1) startup error redirection.
I've noticed that the main Zope service driver for Windows seems to work
fine when everything is setup correctly, but when things go wrong it offers
no clues as to what.  This is reflected in collector item 1020 (poor error
reporting on product initialisation failure under windows).  Issue 1408
(Configuration file imports don't see INSTANCE_HOME when running Zope as a
windows service), via the referenced thread, has evidence of someone
burning a day due to this.  It cost me alot of time too :)

I propose:
Each time the child process terminates with a non-zero return code, the tail
x-bytes of the child output be written to the Windows event log, where x~2k.

2) reporting of successful start and backoff strategy.
A trivial startup error (eg, PYTHONPATH not set) will cause the Zope service
to hopelessly retry for a number of minutes, and not respond to shutdown
requests during a retry.

At the moment, as soon as the service starts it reports successful startup
to Windows.  It then begins an attempt to start the child.  If the child
immediately fails, the code immediately begins the backoff strategy.  This
strategy appears to have 2 main purposes:
* Startup may fail due to other 'services' not having yet started, so retry
in the hope they become available.
* The process may die due to some obscure error - restart it.

On windows, assuming we install the service to depend on the tcpip
service, I see no reason that the first reason is valid.  If the process
fails quickly the first time we attempt to launch it, it is almost certainly
going to fail every time we try and launch it.

The current strategy also means that 3rd party services could not themselves
depend on the Zope service - the Zope service will report successful startup
before it really has (and therefore the dependent service may itself fail).
This isn't a known requirement today, but who knows!  net start and other
front ends also fail to detect fatal errors - they all say Zope started OK.

I propose:
We insist the child process can be created and continues to run for x
seconds (where x~5).  If that fails, we report an error (never reporting to
Windows that we started successfully).  If the child process stays alive for
this period, we report success to Windows, and then use the existing backoff
strategy should it die.  If the machine is heavily loaded, this 5 seconds
may expire before the fatal error is hit in the child - in that case, we are
simply doing what we do now - using the backoff strategy to hopelessly
attempt a restart - ie, a win in most cases, and no loss in the others.

3) environment setting
The service process should set a number of environment variables before
spawning the child - PYTHONPATH at a minimum, and according to issue #1408,
INSTANCE_HOME.  It already knows these values thanks to mkzopeinstance.  I'm
yet to determine where these values comes from for in binary build, but I
see no reason not to fix this (and possibly remove whatever magic the binary
does)

I propose:
A few trivial os.environ insertions based on the substitutions done by
mkzopeinstance, before we create the child process(es).  Alternatively, we
create an explicit new environment we pass to CreateProcess, but I see no
good reason for that.)

4) Currently, when the process is stopped, we immediately terminate the
child process.  This seems dangerous.  We should find a way to gracefully
terminate the child, and try that before we simply kill it.

I propose:
That someone help me work out how to do this wink.  I've already worked
out how if the service knows the username/password of a Zope administrator,
but it doesn't!  Sending a Ctrl_C 'signal' doesn't work without hacks to
run.py (and I'm yet to confirm it will even with such hacks).

I welcome any feedback on these issues.  Obviously I am willing to back each
of these proposals up (except 4!) with code that seems to work :)  I would
also welcome feedback on the best way to proceed (ie, create a new collector
for each issue?  thrash it out here?  give up?wink, etc)

Note that none of these issues would require a win32all/pywin32 update.  If
anyone was really upset by issue 1423 (Zope 2.7.1 won't run as service
under NT), and also able to test, I'd be willing to fix it - but that
*would* require a pywin32 

Re: [Zope-dev] Possible Windows Service improvements.

2004-08-05 Thread Jim Fulton
Mark Hammond wrote:
...
Thanks Jim.
I agree with those concerns.  Note that already we do not recognise startup
failure and keep retyring even though we shouldn't, and is exactly what I
am trying to solve.
Yup

Is there some sort of IPC that can be used here?  On Unix, we
use a reserved
exit code to indicate a startup failure and arrange for Zope
to exit with that.

The Unix situation is a little different - if runzope itself has immediate
startup failure, then everything immediately fails - ie, as far as Unix is
concerned, the service itself failed.
On Windows however, runzope is executed as a *child* of the service.  If
the child fatally fails, the service itself is still reporting success, and
hopelessly attempting a restart.  The service needs to know if the child
fatally failed.  This doesn't apply on Unix as runzope *is* the 'service',
not the child of the service.
By adding a layer around run.py, I believe we could arrange for these fatal
errors to be handled with a special return code.
I assume by fatal, you mean errors that we should not try to restart
from.
Let me see if I understand the use cases here:
- Normal shutdown.  (Should it be possible to shut down Zope
  through the web on Windows?)
- Start-up error. We want to log relevent information somewhere.
  We don't want to restart.
- Run-time (after startup) error.  We also want to log a problem,
  but we do want to restart Zope.
Note that we also need to consider uncontrolled exits, like segfaults.
 Alternatively, if Zope
itself never returns an error code of 1 (one), then we could use that -
Python itself returns this for unhandled exceptions.  That seems dangerous
though.
Can you offer some advice here?
Probably not at the level you are asking.  I'm not familiar with any of the
details here and can't make the time to be.  I just know we need an explicit
mechanism, rather than trying to guess based on timing.

* Is an exit-code of 1 suitable for a fatal error? If so, this requires no
changes to the child process.  However, I assume it is not suitable.
Why not use an exit code of 0?
I think that Zope could arange to exit with an exit code of 0 if there is a
startup error.
* Is a special exit code, generated by a wrapper to run.py (eg, run_svc.py)
suitable?  If so, what value do you recommend?
Sorry, I don't understand these scripts will enough to comment.
* Should some Win32 specific, robust IPC mechanism be investigated?  This
would cut deeper into a run.py wrapper, and obviously is not a general
solution.
I think there is a general solution lurking here.  But I'm too ignorant
of the details to be more specific. :(


On another note, I'd really prefer to work out a general facility that
can be used with any Python program, including both Zope 2 and Zope 3.

The problem at the moment is that our facilities are *too* general - ie,
without some coordination between the parent and child, the parent must
guess. 
Right.  We need something at a lower level.  It would be OK with me
if this something had an API that the program being run had to use.
 The simplest coordination does seem to be process exit code, but
that seems fragile. 
Yup.
 But whatever coodination is chosen, any Python program
that was willing to play the coordinate game could use it.  The simpler
this game to play, the more fragile the system is (ie, just using exit codes
is simple, but fragile; using other IPC mechanisms could be made robust, but
is not simple - especially not in a platform independent way.)
Right.  They may not need to be platform independent though, if there is
some sort of API in the middle.
Perhaps there should be a framework that with calls that a program can
make to indicate normal exit, fatal (non-restartable) exit, and non-fatal
(restartable) exit. A long-runing-process manager could provide some
way of handling these events as well as handling exits without an aplication
having signalled onde of these events.
Jim
--
Jim Fulton   mailto:[EMAIL PROTECTED]   Python Powered!
CTO  (540) 361-1714http://www.python.org
Zope Corporation http://www.zope.com   http://www.zope.org
___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
http://mail.zope.org/mailman/listinfo/zope-announce
http://mail.zope.org/mailman/listinfo/zope )


Re: [Zope-dev] Possible Windows Service improvements.

2004-08-04 Thread Chris McDonough
On Wed, 2004-08-04 at 07:35, Mark Hammond wrote:
 Hi all,
   I am starting to venture into the wonderful world of Zope!  With the
 benefit of a complete lack of Zope experience, I have been able to look at
 the Windows service support from a fairly clean slate.  However, I also
 realize this lack of experience means my ideas may be naive - hence I have
 attempted to split them into discrete issues for discrete rejection wink.
 
 1) startup error redirection.
 I've noticed that the main Zope service driver for Windows seems to work
 fine when everything is setup correctly, but when things go wrong it offers
 no clues as to what.  This is reflected in collector item 1020 (poor error
 reporting on product initialisation failure under windows).  Issue 1408
 (Configuration file imports don't see INSTANCE_HOME when running Zope as a
 windows service), via the referenced thread, has evidence of someone
 burning a day due to this.  It cost me alot of time too :)

Yes, sorry about that!  (I was the fool who checked in the NameError
referenced in 1408).

 I propose:
 Each time the child process terminates with a non-zero return code, the tail
 x-bytes of the child output be written to the Windows event log, where x~2k.

This is a good idea.  FWIW, I believe the Zope HEAD already has some
work done towards this (in lib/python/nt_svcutils/service.py), although
the child output goes to a logfile instead of the event log.  It would
be nice to make the output go to the event log and then backport this to
2.7.

 2) reporting of successful start and backoff strategy.
 A trivial startup error (eg, PYTHONPATH not set) will cause the Zope service
 to hopelessly retry for a number of minutes, and not respond to shutdown
 requests during a retry.

Yup.  The reason it retry-restarts is because it's simple and stupid and
the reason it doesn't respond to shutdown requests during a retry is
because the service code sleeps for the backoff interval after an
unsuccessful startup.  Any async requests that happen in the meantime
are blocked waiting for this sleep to end.  I'm not quite sure how to do
that better.

 At the moment, as soon as the service starts it reports successful startup
 to Windows.  It then begins an attempt to start the child.  If the child
 immediately fails, the code immediately begins the backoff strategy.  This
 strategy appears to have 2 main purposes:
 * Startup may fail due to other 'services' not having yet started, so retry
 in the hope they become available.
 * The process may die due to some obscure error - restart it.

One concrete example of the obscure error is that the Zope process
handles a restart request from its Control Panel web interface.

 On windows, assuming we install the service to depend on the tcpip
 service, I see no reason that the first reason is valid.  If the process
 fails quickly the first time we attempt to launch it, it is almost certainly
 going to fail every time we try and launch it.
 
 The current strategy also means that 3rd party services could not themselves
 depend on the Zope service - the Zope service will report successful startup
 before it really has (and therefore the dependent service may itself fail).
 This isn't a known requirement today, but who knows!  net start and other
 front ends also fail to detect fatal errors - they all say Zope started OK.
 
 I propose:
 We insist the child process can be created and continues to run for x
 seconds (where x~5).  If that fails, we report an error (never reporting to
 Windows that we started successfully).  If the child process stays alive for
 this period, we report success to Windows, and then use the existing backoff
 strategy should it die.  If the machine is heavily loaded, this 5 seconds
 may expire before the fatal error is hit in the child - in that case, we are
 simply doing what we do now - using the backoff strategy to hopelessly
 attempt a restart - ie, a win in most cases, and no loss in the others.

That sounds good.

 
 3) environment setting
 The service process should set a number of environment variables before
 spawning the child - PYTHONPATH at a minimum, and according to issue #1408,
 INSTANCE_HOME.  It already knows these values thanks to mkzopeinstance.  I'm
 yet to determine where these values comes from for in binary build, but I
 see no reason not to fix this (and possibly remove whatever magic the binary
 does)
 
 I propose:
 A few trivial os.environ insertions based on the substitutions done by
 mkzopeinstance, before we create the child process(es).  Alternatively, we
 create an explicit new environment we pass to CreateProcess, but I see no
 good reason for that.)

Note that the Zope Python install also has a sitecustomize.py that
munges sys.path in order to get things set up properly.  Others have
claimed this is unnecessary and that the work that gets done in there
could be done in the service code.  It's a bit of a mess.  At one point
I flailed trying to make the child process inherit its 

Re: [Zope-dev] Possible Windows Service improvements.

2004-08-04 Thread Jim Fulton
Mark Hammond wrote:
...
I propose:
We insist the child process can be created and continues to run for x
seconds (where x~5). 
I recommend against this strategy.  It's too implicit and has bitten us on
Unix in the past. The problem with the strategy if the program starts very fast
and then dies, we don't restart it even though we should. Similarly, if the
program starts very slowly, we don't recognize a startup failure and keep
retrying even though we shouldn't.
Is there some sort of IPC that can be used here?  On Unix, we use a reserved
exit code to indicate a startup failure and arrange for Zope to exit with that.
On another note, I'd really prefer to work out a general facility that
can be used with any Python program, including both Zope 2 and Zope 3.
Jim
--
Jim Fulton   mailto:[EMAIL PROTECTED]   Python Powered!
CTO  (540) 361-1714http://www.python.org
Zope Corporation http://www.zope.com   http://www.zope.org
___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
http://mail.zope.org/mailman/listinfo/zope-announce
http://mail.zope.org/mailman/listinfo/zope )


RE: [Zope-dev] Possible Windows Service improvements.

2004-08-04 Thread Mark Hammond
[Me]
  I propose:
  We insist the child process can be created and continues to
  run for x seconds (where x~5).

[Jim]
 I recommend against this strategy.  It's too implicit and has
 bitten us on
 Unix in the past. The problem with the strategy if the
 program starts very fast
 and then dies, we don't restart it even though we should.
 Similarly, if the
 program starts very slowly, we don't recognize a startup
 failure and keep
 retrying even though we shouldn't.

Thanks Jim.

I agree with those concerns.  Note that already we do not recognise startup
failure and keep retyring even though we shouldn't, and is exactly what I
am trying to solve.

 Is there some sort of IPC that can be used here?  On Unix, we
 use a reserved
 exit code to indicate a startup failure and arrange for Zope
 to exit with that.

The Unix situation is a little different - if runzope itself has immediate
startup failure, then everything immediately fails - ie, as far as Unix is
concerned, the service itself failed.

On Windows however, runzope is executed as a *child* of the service.  If
the child fatally fails, the service itself is still reporting success, and
hopelessly attempting a restart.  The service needs to know if the child
fatally failed.  This doesn't apply on Unix as runzope *is* the 'service',
not the child of the service.

By adding a layer around run.py, I believe we could arrange for these fatal
errors to be handled with a special return code.  Alternatively, if Zope
itself never returns an error code of 1 (one), then we could use that -
Python itself returns this for unhandled exceptions.  That seems dangerous
though.

Can you offer some advice here?
* Is an exit-code of 1 suitable for a fatal error? If so, this requires no
changes to the child process.  However, I assume it is not suitable.
* Is a special exit code, generated by a wrapper to run.py (eg, run_svc.py)
suitable?  If so, what value do you recommend?
* Should some Win32 specific, robust IPC mechanism be investigated?  This
would cut deeper into a run.py wrapper, and obviously is not a general
solution.

 On another note, I'd really prefer to work out a general facility that
 can be used with any Python program, including both Zope 2 and Zope 3.

The problem at the moment is that our facilities are *too* general - ie,
without some coordination between the parent and child, the parent must
guess.  The simplest coordination does seem to be process exit code, but
that seems fragile.  But whatever coodination is chosen, any Python program
that was willing to play the coordinate game could use it.  The simpler
this game to play, the more fragile the system is (ie, just using exit codes
is simple, but fragile; using other IPC mechanisms could be made robust, but
is not simple - especially not in a platform independent way.)

Mark.

___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://mail.zope.org/mailman/listinfo/zope-announce
 http://mail.zope.org/mailman/listinfo/zope )


RE: [Zope-dev] Possible Windows Service improvements.

2004-08-04 Thread Mark Hammond
  I propose:
  Each time the child process terminates with a non-zero
  return code, the tail
  x-bytes of the child output be written to the Windows event
  log, where x~2k.

 This is a good idea.  FWIW, I believe the Zope HEAD already has some
 work done towards this (in
 lib/python/nt_svcutils/service.py), although
 the child output goes to a logfile instead of the event log.  It would
 be nice to make the output go to the event log and then
 backport this to
 2.7.

I started from HEAD, and indeed did find a good attempt at making this work.
However, it was disabled by default, it writes to a file, and had issues
with blocking reads.  I fixed this to capure the output in memory (but not
all of it - just the tail), and to use a single pipe for both stdout and
stderr.

  2) reporting of successful start and backoff strategy.
  A trivial startup error (eg, PYTHONPATH not set) will cause
 the Zope service
  to hopelessly retry for a number of minutes, and not
 respond to shutdown
  requests during a retry.

 Yup.  The reason it retry-restarts is because it's simple and
 stupid and
 the reason it doesn't respond to shutdown requests during a retry is
 because the service code sleeps for the backoff interval after an
 unsuccessful startup.  Any async requests that happen in the meantime
 are blocked waiting for this sleep to end.  I'm not quite
 sure how to do
 that better.

FWIW, the way to do that better was to use WaitForSingleObject(hStopEvent,
timeout_period*1000).

 Note that the Zope Python install also has a sitecustomize.py that
 munges sys.path in order to get things set up properly.

Right!  That is the magic I was referring to and had not yet found.

 Others have
 claimed this is unnecessary and that the work that gets done in there
 could be done in the service code.  It's a bit of a mess.

I agree with the others :)  mkzopeinstance goes to lengths to provide all
relevent information, and the service code does not take full advantage of
that.

It should be possible for Zope to work with a standard, external
Python/pywin32 installation.  I'm not suggesting this become a distribution
option, but still a worthy goal; for developers, and to keep us honest
wink.  Note that with a few tweaks, Zope *does* build and work with an
external Python/pywin32.

 At one point
 I flailed trying to make the child process inherit its
 environment from
 the parent, and plastered over the problem with various sys.path and
 PYTHONPATH and other environment variable settings.  The current
 situation is a result.  Some guidance here would be helpful.

The child process *does* inherit the parent, service environment.  Hence
adding os.environ[] entries in the service does set them for subsequently
created children.

ie, setting os.environ[PYTHONPATH]=SOFTWARE_HOME in the service main code
appears to avoid the sitecustomize.py requirement - the child process *does*
see the new PYTHONPATH.

 I'm a Windows signal idiot.

No - Windows is a signal idiot :)

 Is there a way that we can make the Zope
 process capture Windows signals and when the Windows equivalent of
 SIGTERM is sent to the process to shut it down cleanly?  This is how
 it works on UNIX

That makes sense, but:

 but we circumvent trying to listen for signals on
 Windows entirely at startup.

Can you explain the above?  Do you mean that on Windows, you take no special
signal actions, as demonstrated in WindowsZopeStarter.registerSignals?

I'll see what I can come up with though.

 Note that the UNIX environment has a lot of additional niceties due to
 responses to signals (like logfile rotation) that Windows doesn't now,
 which tends to have the effect of relegating Windows to a second-class
 platform on which to run a production Zope instance.

Windows certainly has these features available - they are just not always
spelt the same as they are on Unix.  Sometimes they are even better wink.
So there seems to be a chicken-and-egg problem - users will always consider
it second class until Zope itself starts considering it first class.  This
is an observation, not a critisism :)

Mark.

___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://mail.zope.org/mailman/listinfo/zope-announce
 http://mail.zope.org/mailman/listinfo/zope )


RE: [Zope-dev] Possible Windows Service improvements.

2004-08-04 Thread Chris McDonough
On Wed, 2004-08-04 at 19:56, Mark Hammond wrote:

 On Windows however, runzope is executed as a *child* of the service.  If
 the child fatally fails, the service itself is still reporting success, and
 hopelessly attempting a restart.  The service needs to know if the child
 fatally failed.  This doesn't apply on Unix as runzope *is* the 'service',
 not the child of the service.

FWIW, the equivalent of the service manager code under UNIX is zopectl
rather than runzope.  zopectl has its own failure detection and backoff
algorithm that's a bit more complex than the Windows service code of the
same ilk.

Actually that's not entirely true: zopectl is a client that attempts
to communicate with a separate daemonizing process via a UNIX domain
socket.   The daemonizing process is really the parent of the Zope
process (it just invokes runzope).

The majority of code for this is in lib/python/zdaemon/zdrun.py.  The
mainloop that impleents the backoff algorithm is in the runforever
method of the Daemonizer class in that file, and the thing that decides
not to restart it if it exits with a known error code is in the
reportstatus method of the same class.  You probably care about none
of this, but it's there if you do. ;-)

 By adding a layer around run.py, I believe we could arrange for these fatal
 errors to be handled with a special return code.  Alternatively, if Zope
 itself never returns an error code of 1 (one), then we could use that -
 Python itself returns this for unhandled exceptions.  That seems dangerous
 though.

When you say these errors above, do you mean any unhandled exception? 
If so (and any nonzero exit code indicated a startup failure), would we
really need cooperation from run.py for this? It seems like it could be
done entirely inside of SvcDoRun.

OTOH, if the real problem is that you can't stop the service from
fruitlessly restarting itself in the face of an insoluble error because
of the blocking sleep, it seems like you already solved that.  Would it
be a reasonable strategy to leave the backoff stupidty as-is if you were
able to stop the service from flailing via the service manager CP applet
and if it didn't report successful startup until the child actually
starts successfully?

 Can you offer some advice here?
 * Is an exit-code of 1 suitable for a fatal error? If so, this requires no
 changes to the child process.  However, I assume it is not suitable.
 * Is a special exit code, generated by a wrapper to run.py (eg, run_svc.py)
 suitable?  If so, what value do you recommend?
 * Should some Win32 specific, robust IPC mechanism be investigated?  This
 would cut deeper into a run.py wrapper, and obviously is not a general
 solution.

I guess I'll need to wait for the confusion evidenced by my last
question to clear up before I'd be able to venture an answer to that.
 
  On another note, I'd really prefer to work out a general facility that
  can be used with any Python program, including both Zope 2 and Zope 3.
 
 The problem at the moment is that our facilities are *too* general - ie,
 without some coordination between the parent and child, the parent must
 guess.  The simplest coordination does seem to be process exit code, but
 that seems fragile.  But whatever coodination is chosen, any Python program
 that was willing to play the coordinate game could use it.  The simpler
 this game to play, the more fragile the system is (ie, just using exit codes
 is simple, but fragile; using other IPC mechanisms could be made robust, but
 is not simple - especially not in a platform independent way.)
 
 Mark.
 
 ___
 Zope-Dev maillist  -  [EMAIL PROTECTED]
 http://mail.zope.org/mailman/listinfo/zope-dev
 **  No cross posts or HTML encoding!  **
 (Related lists - 
  http://mail.zope.org/mailman/listinfo/zope-announce
  http://mail.zope.org/mailman/listinfo/zope )
 

___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://mail.zope.org/mailman/listinfo/zope-announce
 http://mail.zope.org/mailman/listinfo/zope )


RE: [Zope-dev] Possible Windows Service improvements.

2004-08-04 Thread Chris McDonough
On Wed, 2004-08-04 at 20:24, Mark Hammond wrote:
  This is a good idea.  FWIW, I believe the Zope HEAD already has some
  work done towards this (in
  lib/python/nt_svcutils/service.py), although
  the child output goes to a logfile instead of the event log.  It would
  be nice to make the output go to the event log and then
  backport this to
  2.7.
 
 I started from HEAD, and indeed did find a good attempt at making this work.
 However, it was disabled by default, it writes to a file, and had issues
 with blocking reads.  I fixed this to capure the output in memory (but not
 all of it - just the tail), and to use a single pipe for both stdout and
 stderr.

Great!

  Yup.  The reason it retry-restarts is because it's simple and
  stupid and
  the reason it doesn't respond to shutdown requests during a retry is
  because the service code sleeps for the backoff interval after an
  unsuccessful startup.  Any async requests that happen in the meantime
  are blocked waiting for this sleep to end.  I'm not quite
  sure how to do
  that better.
 
 FWIW, the way to do that better was to use WaitForSingleObject(hStopEvent,
 timeout_period*1000).

Is this a more-or-less direct replacement for the usage of time.sleep()
in the current incarnation of SvcDoRun?

Do you think you might be able to attach your patched service.py (and
related files) to a future mail?

  Others have
  claimed this is unnecessary and that the work that gets done in there
  could be done in the service code.  It's a bit of a mess.
 
 I agree with the others :)  mkzopeinstance goes to lengths to provide all
 relevent information, and the service code does not take full advantage of
 that.
 
 It should be possible for Zope to work with a standard, external
 Python/pywin32 installation.  I'm not suggesting this become a distribution
 option, but still a worthy goal; for developers, and to keep us honest
 wink.  Note that with a few tweaks, Zope *does* build and work with an
 external Python/pywin32.

Yup, it should be possible; there's really nothing special about the
Zope-supplied Python, we just have a history of attempting to make Zope
install under Windows without requiring that the user know anything
about Python, and the easiest way to do this is to ship our own.

It actually impossible to *stop* Zope from using pieces of another
system-installed Python, given that it apparently looks unconditionally
to the registry to find some things.
(See the thread revolving around
http://mail.zope.org/pipermail/zope-dev/2004-March/021979.html for the
details).  This is actually not a desirable feature.

  At one point
  I flailed trying to make the child process inherit its
  environment from
  the parent, and plastered over the problem with various sys.path and
  PYTHONPATH and other environment variable settings.  The current
  situation is a result.  Some guidance here would be helpful.
 
 The child process *does* inherit the parent, service environment.  Hence
 adding os.environ[] entries in the service does set them for subsequently
 created children.
 
 ie, setting os.environ[PYTHONPATH]=SOFTWARE_HOME in the service main code
 appears to avoid the sitecustomize.py requirement - the child process *does*
 see the new PYTHONPATH.

Yup, that confirms what the thread above says too.

  I'm a Windows signal idiot.
 
 No - Windows is a signal idiot :)

Actually, signals are idiotic, but... ;-)

  Is there a way that we can make the Zope
  process capture Windows signals and when the Windows equivalent of
  SIGTERM is sent to the process to shut it down cleanly?  This is how
  it works on UNIX
 
 That makes sense, but:
 
  but we circumvent trying to listen for signals on
  Windows entirely at startup.
 
 Can you explain the above?  Do you mean that on Windows, you take no special
 signal actions, as demonstrated in WindowsZopeStarter.registerSignals?

That's just what I mean.

The signaling mechanism doesn't necessarily need to literally be
signals, I just don't know of another way to asynchronously influence
the state of a running process (UNIX or Windows).  Maybe Zope needs to
cooperate a bit, which would be fine.

 I'll see what I can come up with though.
 
  Note that the UNIX environment has a lot of additional niceties due to
  responses to signals (like logfile rotation) that Windows doesn't now,
  which tends to have the effect of relegating Windows to a second-class
  platform on which to run a production Zope instance.
 
 Windows certainly has these features available - they are just not always
 spelt the same as they are on Unix.  Sometimes they are even better wink.
 So there seems to be a chicken-and-egg problem - users will always consider
 it second class until Zope itself starts considering it first class.  This
 is an observation, not a critisism :)

Sure.  It probably won't take much work to match the features that are
available under UNIX.

FWIW, Tim (most recently), I (second-most-recently), and Brian Lloyd
(least recently) are 

RE: [Zope-dev] Possible Windows Service improvements.

2004-08-04 Thread Mark Hammond
 FWIW, the equivalent of the service manager code under UNIX
 is zopectl
 rather than runzope.

Ahh, ok.  That makes more sense now, thanks.

 When you say these errors above, do you mean any unhandled
 exception?
 If so (and any nonzero exit code indicated a startup
 failure), would we
 really need cooperation from run.py for this? It seems like
 it could be
 done entirely inside of SvcDoRun.

Yes, that is what I tried to say here:

  * Is an exit-code of 1 suitable for a fatal error? If so,
  this requires no changes to the child process.  However, I
  assume it is not suitable.

As the exit code of 1 is the only way I see to detect an unhandled exception
in the child.

 OTOH, if the real problem is that you can't stop the service from
 fruitlessly restarting itself in the face of an insoluble error because
 of the blocking sleep, it seems like you already solved that.

Yeah, it really isn't a huge issue.  It leaves us with the fact that Zope
always reports startup success, even in hopeless cases.  This isn't a real
problem - just a nicety - along the lines of making Zope a first-class
service on Windows wink.

I'm happy to let that idea rest.

To your other mail:

 Do you think you might be able to attach your patched service.py (and
 related files) to a future mail?

I'll attach the patch in private mail.  I just tried attaching it to
collector 1020, but I don't have permission.  It has the redirection and the
sleep fix, but no changes to the startup strategy.

 It actually impossible to *stop* Zope from using pieces of another
 system-installed Python, given that it apparently looks
 unconditionally to the registry to find some things.

Yes, and it is my fault that code even exists.  As far as I know, win32all
was the only product that registered modules in this way, and later versions
no longer do.  I'd be +1 on getting this removed from Python.

Thanks,

Mark.

___
Zope-Dev maillist  -  [EMAIL PROTECTED]
http://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists - 
 http://mail.zope.org/mailman/listinfo/zope-announce
 http://mail.zope.org/mailman/listinfo/zope )