Change in vdsm[master]: threadpool: docs: add design notes

fromani Wed, 25 Jun 2014 04:05:54 -0700

Francesco Romani has uploaded a new change for review.

Change subject: threadpool: docs: add design notes
......................................................................


threadpool: docs: add design notes

This patch adds documentation for the new threadpool package,
in order to provide the rational and the design notes.

Change-Id: I400799e300f5d012dae5d158c4379fe57db1bd37
Signed-off-by: Francesco Romani <[email protected]>
---
A lib/threadpool/README.rst
1 file changed, 118 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.ovirt.org:29418/vdsm refs/changes/90/29190/1

diff --git a/lib/threadpool/README.rst b/lib/threadpool/README.rst
new file mode 100644
index 0000000..e4f529e
--- /dev/null
+++ b/lib/threadpool/README.rst
@@ -0,0 +1,118 @@
+WatchedThreadPool
+==================
+
+A thread pool with a builtin watchdog.
+written by Francesco Romani <fromani a redhat d com>
+(C) 2014 Red Hat Inc.
+
+Description
+===========
+
+This is the implementation of a plain thread pool with a few additions to
+handle blocking task, and compatible with `Futures`_.
+
+Rationale
+=========
+
+Usually, the canonical way to deal with blocking I/O is to make the I/O channel
+(usually file or socket handles) not-blocking.
+However, it is possible that the I/O operations are encapsulated in a third 
party
+library that may not be make use of not-blocking I/O.
+
+If that is the case, the solution pool to deal with this API is quickly
+deplenished because most form of event loop/reactor patterns requires
+not-blocking operations. The straightforward solution becomes to leverage
+threading. Because we are ultimately dealing with I/O based concurrency,
+the infamous GIL of the CPython runtime is not a big deal, and the threading
+approach can be well suited in python.
+
+However, another problem related to I/O arises, in the form of blocking
+operation. An I/O operation can block for long time, possibly forever.
+With a thread pool, this means that one or possibly more threads from the
+pool are taken out, thus the pool slows down, and eventually may become
+incapable to do any work.
+
+A common solution is to replenish the worker pool adding new threads to
+replace the blocked ones. In some circumstances, this may lead to
+a horde of zombie threads leaked (consider the example of network
+being unreachable for a broken cable).
+Moreover, the client code may be want to be notified if a task is detected
+as stuck (for some definition of 'stuck') and possibly take some
+countermeasures.
+
+WatchedThreadPool provides a solution for the above scenario, by implementing
+a few additions to a regular thread pool to deal with the problems outlined
+above.
+
+Design
+======
+
+The countermeasures implemented in WatchedThreadPool are the following.
+
+1. each worker takes a task exclusively.
+   each task must be took exactly by one worker and exactly once for each
+   request. Thus, at any given time at most one worker thread can get stuck
+   on an unresponsive task.
+   In the case of a periodic task, the task is reinjected in the work queue
+   by the worker thread itself, once it is done.
+
+2. each worker thread is made capable to answer to a couple of simple yet
+   important queries:
+   what are you doing?
+   how long is this task taking?
+
+3. a watcher thread is added to the pool.
+   the watcher thread does not do any real work, but instead periodically
+   checks the health of each active worker thread, and detects long running 
tasks.
+   Then the watcher can implement any stuck-detection policy.
+   The simplest one is to consider a thread 'stuck' if any task elaboration
+   time exceeds a given threashold
+
+4. the worker threads detected as 'stuck' are transparently detached by the
+   pool and replaced.
+   This is transparently done by the watcher thread, which ensures the pool
+   has a constant active workforce. The threads detected as stuck are taken out
+   the pool and put into a limbo until they eventually unblock.
+   The task detected as blocked is *not* automatically retried. This is
+   very important because, coupled with point #1, avoids the lemmings effect
+   with more and more worker threads frozen attempting to do a blocking task.
+
+With all in place, WatchedThreadPool can effectively minimize the waste
+of resource, and can deal with unresponsive tasks gracefully.
+
+Futures
+=======
+
+concurrent.futures_ is a package merged in the python standard library since
+version 3.2 which provides a nice and handy framework for async operations.
+A backport_ for python 2.x is available as well.
+WatchedThreadPool provides an integration module to work with this package
+seamlessly.
+
+.. _concurrent.futures: 
https://docs.python.org/3/library/concurrent.futures.html
+.. _backport: https://pypi.python.org/pypi/futures
+
+
+Tests
+=====
+
+run from the package's top level directory (the one which contains this README)
+
+$ PYTHONPATH=. nosetests
+
+or
+
+$ PYTHONPATH=. py.test
+
+if you want the coverage report:
+
+$ PYTHONPATH=. nosetests -v --with-coverage --cover-package=threadpool
+
+TODO
+====
+
+Not in priority order:
+
+* make this one a real proper package
+* storage's threadpool compatibility
+* logging integration


-- 
To view, visit http://gerrit.ovirt.org/29190
To unsubscribe, visit http://gerrit.ovirt.org/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I400799e300f5d012dae5d158c4379fe57db1bd37
Gerrit-PatchSet: 1
Gerrit-Project: vdsm
Gerrit-Branch: master
Gerrit-Owner: Francesco Romani <[email protected]>
_______________________________________________
vdsm-patches mailing list
[email protected]
https://lists.fedorahosted.org/mailman/listinfo/vdsm-patches

Change in vdsm[master]: threadpool: docs: add design notes

Reply via email to