Lennart Poettering <lennart <at> poettering.net> writes: > > I was wondering if there is some kind of guideline about whether > > packaged .service files in Fedora, etc. should specify Restart=, > > RestartSec=, etc. [...] > There's currently no policy on this, but I generally do believe it would > make a lot of sense to automatically respawn most services when they > crash.
I admin a couple of (web) servers too, and I have to say I agree. I think I represent the GNOME audience of sysadms - the damn thing should just work out of the box and if it's broken, fix itself. Let me just point out some things I've learned over the years with Monit. Conceptually, it's less complicated than one thinks. First, one has to accept that services sometimes die for no good reason (cosmic ray or weird thread- related bug in some Apache module that happens once a year). If you do not set up baby-sitting, you'll be fine for half a year, and then you're toast. I think lazy sysadms like me all learn it the hard way. I honestly believe it's a big bug in a distros that you can install Apache and don't get baby-sitting out of the box. Now when a problem happens, either it is fixable by a restart (perfect, almost no down time), disappears by itself (some network-related problems), or human intervention is required (I or some other developer made a mistake, or there's a hardware problem). Trying out a restart is usually safe once the service has failed. If, and only if, human intervention is required, I want a notification (i.e. email). Ideally, only one. Logs are good for diagnostics, but I don't want to check logs regularly, I want the system to tell me what's wrong when it needs help. Of course, if something fails all the time, it's probably going to require human intervention even if a restart fixes it intermittently. Say Apache died 10 times today - that's abnormal in my book, and I would like to get a notification. Of course, there are some extra opportunities for checking whether the service is working, e.g. you can check a web app with a HTTP request and see if you get a 200 back. In my experience, these are nice as a kind of unit test so you don't accidentally break something after an upgrade or a change, but they find fewer errors than the simple "is the process running" check. Anyway, I'm not saying systemd should do all this by itself; from a web server admin perspective I just think it would be neat if we could move closer to this experience with a default distro server install. Of course, people will want to tweak things, and that's fine, but no argument against a good default setup. -- Ole Laursen http://people.iola.dk/olau/ _______________________________________________ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel