Francesco Romani has uploaded a new change for review. Change subject: threadpool: docs: add design notes ......................................................................
threadpool: docs: add design notes This patch adds documentation for the new threadpool package, in order to provide the rational and the design notes. Change-Id: I400799e300f5d012dae5d158c4379fe57db1bd37 Signed-off-by: Francesco Romani <[email protected]> --- A lib/threadpool/README.rst 1 file changed, 118 insertions(+), 0 deletions(-) git pull ssh://gerrit.ovirt.org:29418/vdsm refs/changes/90/29190/1 diff --git a/lib/threadpool/README.rst b/lib/threadpool/README.rst new file mode 100644 index 0000000..e4f529e --- /dev/null +++ b/lib/threadpool/README.rst @@ -0,0 +1,118 @@ +WatchedThreadPool +================== + +A thread pool with a builtin watchdog. +written by Francesco Romani <fromani a redhat d com> +(C) 2014 Red Hat Inc. + +Description +=========== + +This is the implementation of a plain thread pool with a few additions to +handle blocking task, and compatible with `Futures`_. + +Rationale +========= + +Usually, the canonical way to deal with blocking I/O is to make the I/O channel +(usually file or socket handles) not-blocking. +However, it is possible that the I/O operations are encapsulated in a third party +library that may not be make use of not-blocking I/O. + +If that is the case, the solution pool to deal with this API is quickly +deplenished because most form of event loop/reactor patterns requires +not-blocking operations. The straightforward solution becomes to leverage +threading. Because we are ultimately dealing with I/O based concurrency, +the infamous GIL of the CPython runtime is not a big deal, and the threading +approach can be well suited in python. + +However, another problem related to I/O arises, in the form of blocking +operation. An I/O operation can block for long time, possibly forever. +With a thread pool, this means that one or possibly more threads from the +pool are taken out, thus the pool slows down, and eventually may become +incapable to do any work. + +A common solution is to replenish the worker pool adding new threads to +replace the blocked ones. In some circumstances, this may lead to +a horde of zombie threads leaked (consider the example of network +being unreachable for a broken cable). +Moreover, the client code may be want to be notified if a task is detected +as stuck (for some definition of 'stuck') and possibly take some +countermeasures. + +WatchedThreadPool provides a solution for the above scenario, by implementing +a few additions to a regular thread pool to deal with the problems outlined +above. + +Design +====== + +The countermeasures implemented in WatchedThreadPool are the following. + +1. each worker takes a task exclusively. + each task must be took exactly by one worker and exactly once for each + request. Thus, at any given time at most one worker thread can get stuck + on an unresponsive task. + In the case of a periodic task, the task is reinjected in the work queue + by the worker thread itself, once it is done. + +2. each worker thread is made capable to answer to a couple of simple yet + important queries: + what are you doing? + how long is this task taking? + +3. a watcher thread is added to the pool. + the watcher thread does not do any real work, but instead periodically + checks the health of each active worker thread, and detects long running tasks. + Then the watcher can implement any stuck-detection policy. + The simplest one is to consider a thread 'stuck' if any task elaboration + time exceeds a given threashold + +4. the worker threads detected as 'stuck' are transparently detached by the + pool and replaced. + This is transparently done by the watcher thread, which ensures the pool + has a constant active workforce. The threads detected as stuck are taken out + the pool and put into a limbo until they eventually unblock. + The task detected as blocked is *not* automatically retried. This is + very important because, coupled with point #1, avoids the lemmings effect + with more and more worker threads frozen attempting to do a blocking task. + +With all in place, WatchedThreadPool can effectively minimize the waste +of resource, and can deal with unresponsive tasks gracefully. + +Futures +======= + +concurrent.futures_ is a package merged in the python standard library since +version 3.2 which provides a nice and handy framework for async operations. +A backport_ for python 2.x is available as well. +WatchedThreadPool provides an integration module to work with this package +seamlessly. + +.. _concurrent.futures: https://docs.python.org/3/library/concurrent.futures.html +.. _backport: https://pypi.python.org/pypi/futures + + +Tests +===== + +run from the package's top level directory (the one which contains this README) + +$ PYTHONPATH=. nosetests + +or + +$ PYTHONPATH=. py.test + +if you want the coverage report: + +$ PYTHONPATH=. nosetests -v --with-coverage --cover-package=threadpool + +TODO +==== + +Not in priority order: + +* make this one a real proper package +* storage's threadpool compatibility +* logging integration -- To view, visit http://gerrit.ovirt.org/29190 To unsubscribe, visit http://gerrit.ovirt.org/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I400799e300f5d012dae5d158c4379fe57db1bd37 Gerrit-PatchSet: 1 Gerrit-Project: vdsm Gerrit-Branch: master Gerrit-Owner: Francesco Romani <[email protected]> _______________________________________________ vdsm-patches mailing list [email protected] https://lists.fedorahosted.org/mailman/listinfo/vdsm-patches
