Re: patch: sv check should wait when svrun is not ready

Avery Payne Tue, 17 Feb 2015 16:20:21 -0800


On 2/17/2015 11:02 AM, Buck Evan wrote:

I think there's only three cases here:
1. Users that would have gotten immediate failure, and no amount ofspinning would help. These users will see their error delayed by$SVWAIT seconds, but no other difference.2. Users that would have gotten immediate failure, but could havegotten a success within $SVWAIT seconds. All of these users will ofcourse be glad of the change.3. Users that would not have gotten immediate failure. None of theseusers will see the slightest change in behavior.
Do you have a particular scenario in mind when you mention "breakinglots of existing installations elsewhere due to a default behaviorchange"? I don't see that there is any case this change would break.

I am not so much thinking of a specific case as I am looking at it froman integration perspective. I ask that you indulge me for a moment, andlet me diverge from the discussion so I can clarify things.

My background is in maintaining business software. My employer has thesource code to their ERP system and I make large and small modificationsto adapt to changing business needs. During the process of working on"legacy code" in a "legacy language", I have to be mindful that thereare side-effects to each change; I have to look at it from a viewpointof "what is everything else in the system expecting when this code iscalled". This means thinking in terms of code-as-API, so that callselsewhere don't break. Yes, I am aware of unit tests, etc., but trustme when I say it's not an option for the environment. So that meanslots and lots of careful testing by hand, and being very mindful of howthings fit together.

With that viewpoint in mind, let's turn back to my words, which wereadmittedly overstated. When I said "breaking lots of existinginstallations" I was trying to describe a point of view, for I waslooking at it from a pragmatic standpoint of "if there is code out therethat expects behavior X, but is given behavior Y, then the probabilityof something breaking increases". From my point of view, when you run"sv check (something)", that's no different that making an API callbecause "sv check (something)" typically happens inside of a script,which in turn implies a language and environment. The behavior of the"sv check" call, and specifically the side-effects of it, are taken intoconsideration "elsewhere"; I can't say where else because I can't seespecific installations, and it's entirely possible that there is*nothing* out there that would be broken, and I'm writing this all fornaught. But the point remains - the API is set, the behavior of the"call" is set, and deviating from that requires that everyone downstreammake changes to ensure that their scripts don't break.

So, I think it becomes a question of "can I guarantee that the sideeffect created by the change will not adversely impact something else,since I can't directly observe what will be impacted?" Which is why Isuggested the option switch. Introducing a new switch means that theexisting behavior will be kept, but we can now use the new behavior byexplicitly asking for it. In effect, we're extending our API withoutbreaking existing "calls from legacy code".

The only example I could give would be my own project at the moment,although what follows is admittedly a weak argument. Blocking-on-checkwould pretty much destroy the script work I've done for peer-baseddependency management, because a single dependency would cause the"parent" service to hang while it waited for the "child" to come up.This happens because the use of "sv check (child)" follows theconvention of "check, and either succeed fast or fail fast", and theparent's script is written with the goal of exiting out because of achild's failure. Each fail-to-start is logged in the parent, so it'sclear that parent service X failed because child service Y was notconfirmed as running. Without that fast-fail, the logged hint neveroccurs; the sysadmin now has to figure out which of three possibleservices in a dependency chain are causing the hang. While this isimplemented differently from other installations, there are known casessimilar to what I am doing, where people have ./run scripts like this:


#!/bin/sh
sv check child-service || exit 1
exec parent-service

A secondary example would be that the existing set of scripts in theproject are written with an eye towards supporting three environments,which is possible due to their similar behavior. This consistency makesthe project possible for daemontools and s6, as well as runit. A changein runit's behavior implies that I can no longer rely on that consistency.

Perhaps I am understanding the environment clearly but misunderstandingthe intent of the change. If I am not grokking your intentions, justsend a short note to the effect of "sorry, wrong idea" and I'll stop. :)

Re: patch: sv check should wait when svrun is not ready

Reply via email to