On 5/14/2015 3:25 PM, Jonathan de Boyne Pollard wrote:
The most widespread general purpose practice for "breaking" (i.e. avoiding) this kind of ordering is of course opening server sockets early. Client and server then don't need to be so strongly ordered.
This is where I've resisted using sockets. Not because they are bad - they are not. I've resisted because they are difficult to make 100% portable between environments. Let me explain.

First, there is the question of "what environment am I running in"? This can break down in to several sub-questions of "what variable settings do I have", "what does my directory structure look like", and "what tools are available". That last one - what tools are installed - is what kills me. Because while I can be assured that the bulk of a framework will be present, there is no guarantee that I will have UCSPI sockets around.

Let's say I decide to only support frameworks that package UCSPI out of the box, so I am assured that the possibility of socket activate is 100% guaranteed, ignoring the fact that I just jettisoned several other frameworks in the process simply to support this one feature. So we press on with the design assumption "it is safe to assume that UCSPI is installed and therefore can be encoded into run scripts". Now we have another problem - integration. Using sockets means I need to have a well-defined namespace to locate the sockets themselves, and that means a well-known area in the filesystem because the filesystem is what organizes the namespace. So where do the sockets live? /var/run? /run? /var/sockets? /insert-my-own-flavor-here?

Let's take it a step further and I decide on some name - I'll pull one out of a hat and simply call it /var/run/ucspi-sockets - and ignore all of the toes I'm stepping on in the process, including the possibility that some distribution already has that name reserved. Now I have (a) the assurance that UCSPI is supported and (b) a place for UCSPI to get its groove on, then we have the next problem, getting all of the services to play nice within this context. Do I write everything to depend on UCSPI sockets so that I get automatic block? Do I make it entirely the choice of the administrator to activate this feature via a "switch" that can be thrown? Or is it used for edge cases only? Getting consistency out of it would be great, but then I back the admin into a corner with "this is design policy and you get it, like it or not". If I go with admin controlled, that means yet another code path in an already bloaty ./run.sh script that may or may not activate, and the admin has their day with it, but the number of potential problem vectors grows. Or I can hybridize it and do it for edge cases only, but now the admin is left scratching their head asking "why is it here, but not there? it's not consistent, what where they thinking??"

Personally, I would do the following:

* Create a socket directory in whatever passes for /var/run, and name it /var/run/ucspi-sockets.

* For each service definition that has active sockets, there would be /var/run/ucspi-sockets/{directory} where {directory} is the name of the service, and inside of that is a socket file named /var/run/ucspi-sockets/{directory}/socket. That is about as generic and "safe" as I can get, given that /var/run on Linux is a symlink that points to /run in some cases. It is consistent - the admin knows where to find the socket every single time, and is assured that the socket inside of the directory is the one that connects to a service. It is a reasonable name - the odds of /var/run/ucspi-sockets being taken for anything else but that is fairly low, and the odds of me stepping on top of some other construct in that directory are low as well, because any existing sub-directory in that location is probably there for the same reason.

* Make socket activate an admin-controlled feature that is disabled by default. You want socket activation, you ask for it first. The admin gets control, I get more headache, and mostly everyone can be happy.

We've answered the "where" and the "when", now we are left with the "how". I suspect that you and Laurent would argue that I shouldn't be using sockets inside of ./run as it is, that it should be in the layer above in service management proper, meaning that the entire construct shouldn't exist at that level. Which means I shouldn't even support it inside of ./run. Which means I can't package this feature in my scripts. And we're back to square one.

Let's say I ignore this advice (at my own peril) and provide support for those frameworks that don't have external management layers on top of them. This was the entire reason I wrote my silly peer-level dependency support to begin with, so that "other folks" would have one or two of these features available to them, even though they don't have external management like nosh or s6-rc or anopa. It's a poor man's solution, but I'm not presenting it any other way, you get what you see. So doing UCSPI sockets as an optional feature is probably OK, as long as it's clear that I'm not giving you full management out of the box.

If I were to write support for sockets in, I would guess that it would probably augment the existing ./needs approach by checking for a socket first (when the feature is enabled), and then failing to find one proceed to peer-level dependency management (when it is enabled). So you would have "no sockets and no peer dependencies" which is the default out-of-box experience, and the one that is 100% compatible with all frameworks. Nothing is checked and everything run-loops as expected, and you can drop the ./run scripts into things like nosh or s6-rc or anopa with confidence. You would have "sockets but no peer dependencies", where there is a check for a socket performed, and if present, that is used, otherwise it run-loops. You would have "no sockets but peer dependencies", where there is no socket found, but it then walks the dependency tree and starts things that way, run-looping as needed. Finally you have "sockets and peer dependencies", if a socket is found it is used, if it is not found peer dependencies are used, and if either fail, it run-loops. This is "gracefully degrading" as sockets would receive preferential treatment, moving to peer resolution when it's active but not available, and if you accidentally enable the feature where it isn't needed/wanted, things don't blow up horribly because the end result is a run-loop that can be caught and controlled.

Both features would be selectable by the admin, and both are independent of each other - enabling none, one, the other, or both options are possibilities. In situations where dependency management is externally handled, you would simply keep both features turned off. In the case of sockets, things would launch and dependencies would block. In the case of peer-dependencies, the home user - who doesn't give two cares about this and just "wants it to work" - gets what they want, ease of use. If you want the full "belt and suspenders experience" turn on both switches, sit back, and enjoy the light show. Everyone wins.

Of course, there are no immediate plans to support UCSPI, although I've already made the mistake of baking in some support with a bcron definition. I think I need to go back and revisit that entry...

As a side note, I'm beginning to suspect that the desire for "true parallel startup" is more of a "mirage caused by desire" rather than by design. What I'm saying is that it may be more of an ideal we aspire to rather than a design that was thought through. If you have sequenced dependencies, can you truly gain a lot of time by attempting parallel startup? Is the gain for the effort really that important? Can we even speed things up when fsck is deemed mandatory by the admin for a given situation? Questions like these make me wonder if this is really a feasible feature at all.

Reply via email to