On 2/17/2015 11:02 AM, Buck Evan wrote:
I think there's only three cases here:

1. Users that would have gotten immediate failure, and no amount of spinning would help. These users will see their error delayed by $SVWAIT seconds, but no other difference. 2. Users that would have gotten immediate failure, but could have gotten a success within $SVWAIT seconds. All of these users will of course be glad of the change. 3. Users that would not have gotten immediate failure. None of these users will see the slightest change in behavior.

Do you have a particular scenario in mind when you mention "breaking lots of existing installations elsewhere due to a default behavior change"? I don't see that there is any case this change would break.

I am not so much thinking of a specific case as I am looking at it from an integration perspective. I ask that you indulge me for a moment, and let me diverge from the discussion so I can clarify things.

My background is in maintaining business software. My employer has the source code to their ERP system and I make large and small modifications to adapt to changing business needs. During the process of working on "legacy code" in a "legacy language", I have to be mindful that there are side-effects to each change; I have to look at it from a viewpoint of "what is everything else in the system expecting when this code is called". This means thinking in terms of code-as-API, so that calls elsewhere don't break. Yes, I am aware of unit tests, etc., but trust me when I say it's not an option for the environment. So that means lots and lots of careful testing by hand, and being very mindful of how things fit together.

With that viewpoint in mind, let's turn back to my words, which were admittedly overstated. When I said "breaking lots of existing installations" I was trying to describe a point of view, for I was looking at it from a pragmatic standpoint of "if there is code out there that expects behavior X, but is given behavior Y, then the probability of something breaking increases". From my point of view, when you run "sv check (something)", that's no different that making an API call because "sv check (something)" typically happens inside of a script, which in turn implies a language and environment. The behavior of the "sv check" call, and specifically the side-effects of it, are taken into consideration "elsewhere"; I can't say where else because I can't see specific installations, and it's entirely possible that there is *nothing* out there that would be broken, and I'm writing this all for naught. But the point remains - the API is set, the behavior of the "call" is set, and deviating from that requires that everyone downstream make changes to ensure that their scripts don't break.

So, I think it becomes a question of "can I guarantee that the side effect created by the change will not adversely impact something else, since I can't directly observe what will be impacted?" Which is why I suggested the option switch. Introducing a new switch means that the existing behavior will be kept, but we can now use the new behavior by explicitly asking for it. In effect, we're extending our API without breaking existing "calls from legacy code".

The only example I could give would be my own project at the moment, although what follows is admittedly a weak argument. Blocking-on-check would pretty much destroy the script work I've done for peer-based dependency management, because a single dependency would cause the "parent" service to hang while it waited for the "child" to come up. This happens because the use of "sv check (child)" follows the convention of "check, and either succeed fast or fail fast", and the parent's script is written with the goal of exiting out because of a child's failure. Each fail-to-start is logged in the parent, so it's clear that parent service X failed because child service Y was not confirmed as running. Without that fast-fail, the logged hint never occurs; the sysadmin now has to figure out which of three possible services in a dependency chain are causing the hang. While this is implemented differently from other installations, there are known cases similar to what I am doing, where people have ./run scripts like this:

#!/bin/sh
sv check child-service || exit 1
exec parent-service

A secondary example would be that the existing set of scripts in the project are written with an eye towards supporting three environments, which is possible due to their similar behavior. This consistency makes the project possible for daemontools and s6, as well as runit. A change in runit's behavior implies that I can no longer rely on that consistency.

Perhaps I am understanding the environment clearly but misunderstanding the intent of the change. If I am not grokking your intentions, just send a short note to the effect of "sorry, wrong idea" and I'll stop. :)

Reply via email to