Nir Soffer has posted comments on this change. Change subject: threadpool: docs: add design notes ......................................................................
Patch Set 4: (5 comments) http://gerrit.ovirt.org/#/c/29190/4/lib/threadpool/README.rst File lib/threadpool/README.rst: Line 49: Line 50: The countermeasures implemented in WatchedThreadPool are the following. Line 51: Line 52: 1. each worker takes a task exclusively. Line 53: each task must be taken exactly by one worker and exactly once for each Why do you care about the task used only once? the thread pool should care about task reusing. The task should simply provide the interface that allow the worker to run it. If a task can run multiple times, thats the task creator concern. Line 54: request. Each task is consumed by the first worker thread which takes it. Line 55: Thus, at any given time at most one worker thread can get stuck Line 56: on an unresponsive task. Line 57: Periodic task are implemented by letting a worker thread to produce a clone Line 54: request. Each task is consumed by the first worker thread which takes it. Line 55: Thus, at any given time at most one worker thread can get stuck Line 56: on an unresponsive task. Line 57: Periodic task are implemented by letting a worker thread to produce a clone Line 58: of the task that was consumed and reinjecting it into the queue. We don't need periodic tasks. Lets simplify by removing implementing only what we need. Line 59: Line 60: 2. each worker thread is made capable to answer to a couple of simple yet Line 61: important queries: Line 62: - what are you doing? Line 59: Line 60: 2. each worker thread is made capable to answer to a couple of simple yet Line 61: important queries: Line 62: - what are you doing? Line 63: - how long is this task taking? Nice, but the time it takes to run a task say nothing about task state. Some task takes lot of time (moving storage etc.). So the real questions is "are you stuck?" and only the task can answer this. Line 64: Line 65: 3. a watcher thread is added to the pool. Line 66: the watcher thread does not do any real work, but instead periodically Line 67: checks the health of each active worker thread, and detects long running Line 66: the watcher thread does not do any real work, but instead periodically Line 67: checks the health of each active worker thread, and detects long running Line 68: tasks. Then the watcher can implement any stuck-detection policy. Line 69: The simplest one is to consider a thread 'stuck' if any task elaboration Line 70: time exceeds a given threshold. The watcher thread should not be part of the pool, it is unrelated to the pool. It should detect sick tasks, not long running tasks. Only the task can detect this so the watcher thread does not have to implement a detection policy. Line 71: Line 72: 4. the worker threads detected as 'stuck' are transparently detached by the Line 73: pool and replaced. Line 74: This is transparently done by the watcher thread, which ensures the pool Line 75: has a constant active workforce. The threads detected as stuck are taken out Line 76: the pool and put into a limbo until they eventually unblock. Line 77: The task detected as blocked are *not* automatically retried. This is Line 78: very important because, coupled with point #1, avoids the lemmings effect Line 79: of more and more worker threads frozen attempting to do a blocking task. I don't think we should remove stuck threads but instead cancel stuck tasks. Any task doing io can be killed by closing the file descriptor it uses. Task doing cpu bound stuff should check if it was canceled from time to time. We need to check if closing a file descriptor does wake up a thread reading from it when the thread enter D state. Removing stuck thread should be last resort if we cannot cancel the task in another way. Line 80: Line 81: With all in place, WatchedThreadPool can effectively minimize the waste Line 82: of resources, and can deal with unresponsive tasks gracefully. Line 83: -- To view, visit http://gerrit.ovirt.org/29190 To unsubscribe, visit http://gerrit.ovirt.org/settings Gerrit-MessageType: comment Gerrit-Change-Id: I400799e300f5d012dae5d158c4379fe57db1bd37 Gerrit-PatchSet: 4 Gerrit-Project: vdsm Gerrit-Branch: master Gerrit-Owner: Francesco Romani <[email protected]> Gerrit-Reviewer: Antoni Segura Puimedon <[email protected]> Gerrit-Reviewer: Dan Kenigsberg <[email protected]> Gerrit-Reviewer: Dima Kuznetsov <[email protected]> Gerrit-Reviewer: Federico Simoncelli <[email protected]> Gerrit-Reviewer: Francesco Romani <[email protected]> Gerrit-Reviewer: Michal Skrivanek <[email protected]> Gerrit-Reviewer: Nir Soffer <[email protected]> Gerrit-Reviewer: Saggi Mizrahi <[email protected]> Gerrit-Reviewer: Vinzenz Feenstra <[email protected]> Gerrit-Reviewer: Yaniv Bronhaim <[email protected]> Gerrit-Reviewer: [email protected] Gerrit-Reviewer: oVirt Jenkins CI Server Gerrit-HasComments: Yes _______________________________________________ vdsm-patches mailing list [email protected] https://lists.fedorahosted.org/mailman/listinfo/vdsm-patches
