Greg Padgett has posted comments on this change. Change subject: supervdsm: don't let _runAs return early due to EINTR ......................................................................
Patch Set 1: (6 comments) Thanks Nir, comments inline https://gerrit.ovirt.org/#/c/45752/1//COMMIT_MSG Commit Message: Line 13: ending child command execution ending in unintentional failure. Line 14: Line 15: This is particularly prevalant when subsequent _runAs executions are Line 16: performed. Termination of the first will result in a SIGCHLD which Line 17: interrupts polling for the second, as exposed by commit 777c36d6. > So this is a regression triggered by the new check in 777c36d6? Yes, you could say that. It revealed a race that seems to happen every time in hosted engine setup but (in my case at least) less frequently when setting up a file domain on a regular host. Line 18: Line 19: Because poll will return early without an exception if the low-level Line 20: call is interrupted, NoIntrCall() cannot be used and thus a new helper Line 21: function is provided that retries based on the time elapsed since the Line 23: Line 24: PEP-475 will handle this in Python 3.5, but in the meantime this Line 25: workaround is needed. Line 26: Line 27: Change-Id: I25af73a8fe67e7bc434f60a7d8492e33dc3b2ffa > We want to backport this to 3.6 - right? Done Line 28: Bug-Url: https://bugzilla.redhat.com/1259310 https://gerrit.ovirt.org/#/c/45752/1/lib/vdsm/utils.py File lib/vdsm/utils.py: Line 353: This is a workaround until we get the PEP-475 fix for EINTR. It Line 354: ensures that a multiprocessing.connection.poll() will not return Line 355: before the timeout due to an interruption. Line 356: """ Line 357: end = monotonic_time() + timeout > Lets call it deadline for consistently with other code using this pattern. Done Line 358: remaining = timeout Line 359: Line 360: while not mpConnection.poll(remaining): Line 361: remaining = end - monotonic_time() Line 359: Line 360: while not mpConnection.poll(remaining): Line 361: remaining = end - monotonic_time() Line 362: if remaining > 0: Line 363: continue > Why not: Done Line 364: return False Line 365: Line 366: return True Line 367: https://gerrit.ovirt.org/#/c/45752/1/vdsm/supervdsmServer File vdsm/supervdsmServer: Line 244: proc.start() Line 245: Line 246: needReaping = True Line 247: try: Line 248: if not utils.NoIntrMPConnPoll(pipe, RUN_AS_TIMEOUT): > Another nicer option - subclass Pipe, and fix its poll function: I like this, but multiprocessing.Pipe returns a multiprocessing.connection object that holds the poll method. We could wrap Pipe and return a new connection object with a wrapped poll method, but it seems overly complicated compared to simply calling a method with our fix. Thoughts? Line 249: try: Line 250: Line 251: os.kill(proc.pid, signal.SIGKILL) Line 252: except OSError as e: Line 244: proc.start() Line 245: Line 246: needReaping = True Line 247: try: Line 248: if not utils.NoIntrMPConnPoll(pipe, RUN_AS_TIMEOUT): > This name is little ugly and it is also not really generic helper for utils Done Line 249: try: Line 250: Line 251: os.kill(proc.pid, signal.SIGKILL) Line 252: except OSError as e: -- To view, visit https://gerrit.ovirt.org/45752 To unsubscribe, visit https://gerrit.ovirt.org/settings Gerrit-MessageType: comment Gerrit-Change-Id: I25af73a8fe67e7bc434f60a7d8492e33dc3b2ffa Gerrit-PatchSet: 1 Gerrit-Project: vdsm Gerrit-Branch: master Gerrit-Owner: Greg Padgett <[email protected]> Gerrit-Reviewer: Adam Litke <[email protected]> Gerrit-Reviewer: Dan Kenigsberg <[email protected]> Gerrit-Reviewer: Francesco Romani <[email protected]> Gerrit-Reviewer: Greg Padgett <[email protected]> Gerrit-Reviewer: Jenkins CI Gerrit-Reviewer: Nir Soffer <[email protected]> Gerrit-Reviewer: Piotr Kliczewski <[email protected]> Gerrit-Reviewer: [email protected] Gerrit-HasComments: Yes _______________________________________________ vdsm-patches mailing list [email protected] https://lists.fedorahosted.org/mailman/listinfo/vdsm-patches
