KUDU-1913: cap number of threads on server-wide pools The last piece of work is to establish an upper bound on the number of threads that may be started in the Raft and Prepare server-wide threadpools. Such caps will make it easier for admins to reason about appropriate values for the configuration of the Kudu processes' RLIMIT_NPROC resource.
KUDU-1913 proposed a cap of "number of cores + number of disks", but a lively Slack discussion yielded a better solution: set the cap at some percentage of the process' RLIMIT_NPROC value. Given that the rest of Kudu generally uses a constant number of threads, this should prevent spikes from ever exceeding the RLIMIT_NPROC and crashing the server due to an election storm. This patch implements a cap of 10% per pool and also provides a new gflag as an "escape hatch" (in case we were horribly wrong). Note: it's still possible for a massive number of "hot" replicas to exceed RLIMIT_NPROC by virtue of each replica's log append thread, but the server is more likely to run out of memory for MemRowSets before that happens. Change-Id: I194907a7f8a483c9cba71eba8caed6bc6090f618 Reviewed-on: http://gerrit.cloudera.org:8080/9522 Tested-by: Kudu Jenkins Reviewed-by: David Ribeiro Alves <davidral...@gmail.com> Reviewed-by: Todd Lipcon <t...@apache.org> Project: http://git-wip-us.apache.org/repos/asf/kudu/repo Commit: http://git-wip-us.apache.org/repos/asf/kudu/commit/debcb8ea Tree: http://git-wip-us.apache.org/repos/asf/kudu/tree/debcb8ea Diff: http://git-wip-us.apache.org/repos/asf/kudu/diff/debcb8ea Branch: refs/heads/master Commit: debcb8ea21e95b57f5a9924b07acf43ca7a4a389 Parents: ede0cf0 Author: Adar Dembo <a...@cloudera.com> Authored: Tue Mar 6 16:33:45 2018 -0800 Committer: Adar Dembo <a...@cloudera.com> Committed: Thu Mar 8 02:43:34 2018 +0000 ---------------------------------------------------------------------- src/kudu/kserver/kserver.cc | 68 +++++++++++++++++++++++++++++++++------- 1 file changed, 57 insertions(+), 11 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/kudu/blob/debcb8ea/src/kudu/kserver/kserver.cc ---------------------------------------------------------------------- diff --git a/src/kudu/kserver/kserver.cc b/src/kudu/kserver/kserver.cc index e9ce303..b695979 100644 --- a/src/kudu/kserver/kserver.cc +++ b/src/kudu/kserver/kserver.cc @@ -17,17 +17,44 @@ #include "kudu/kserver/kserver.h" -#include <limits> +#include <cstdint> #include <memory> +#include <mutex> +#include <ostream> #include <string> #include <utility> +#include <gflags/gflags.h> +#include <glog/logging.h> + +#include "kudu/fs/fs_manager.h" +#include "kudu/gutil/strings/substitute.h" #include "kudu/rpc/messenger.h" +#include "kudu/util/env.h" +#include "kudu/util/flag_tags.h" #include "kudu/util/metrics.h" #include "kudu/util/status.h" #include "kudu/util/threadpool.h" +DEFINE_int64(server_thread_pool_max_thread_count, -1, + "Maximum number of threads to allow in each server-wide thread " + "pool. If -1, Kudu will use 10% of its running thread per " + "effective uid resource limit as per getrlimit(). It is an error " + "to use a value of 0."); +TAG_FLAG(server_thread_pool_max_thread_count, advanced); +TAG_FLAG(server_thread_pool_max_thread_count, evolving); + +static bool ValidateThreadPoolThreadLimit(const char* /*flagname*/, int64_t value) { + if (value == 0) { + LOG(ERROR) << "Invalid thread pool thread limit: cannot be 0"; + return false; + } + return true; +} +DEFINE_validator(server_thread_pool_max_thread_count, &ValidateThreadPoolThreadLimit); + using std::string; +using strings::Substitute; namespace kudu { @@ -56,6 +83,30 @@ METRIC_DEFINE_histogram(server, op_apply_run_time, "Operation Apply Run Time", "that operations consist of very large batches.", 10000000, 2); +namespace { + +int64_t GetThreadPoolThreadLimit(Env* env) { + // Maximize this process' running thread limit first, if possible. + static std::once_flag once; + std::call_once(once, [&]() { + env->IncreaseResourceLimit(Env::ResourceLimitType::RUNNING_THREADS_PER_EUID); + }); + + int64_t rlimit = env->GetResourceLimit(Env::ResourceLimitType::RUNNING_THREADS_PER_EUID); + // See server_thread_pool_max_thread_count. + if (FLAGS_server_thread_pool_max_thread_count == -1) { + return rlimit / 10; + } + LOG_IF(FATAL, FLAGS_server_thread_pool_max_thread_count > rlimit) << + Substitute( + "Configured server-wide thread pool running thread limit " + "(server_thread_pool_max_thread_count) $0 exceeds euid running " + "thread limit (ulimit) $1", + FLAGS_server_thread_pool_max_thread_count, rlimit); + return FLAGS_server_thread_pool_max_thread_count; +} + +} // anonymous namespace KuduServer::KuduServer(string name, const ServerBaseOptions& options, @@ -75,20 +126,15 @@ Status KuduServer::Init() { .set_metrics(std::move(metrics)) .Build(&tablet_apply_pool_)); - // These pools are shared by all replicas hosted by this server. - // - // Submitted tasks use blocking IO (raft_pool_) or acquire long-held locks - // (tablet_prepare_pool_) so we configure no upper bound on the maximum - // number of threads in each pool (otherwise the default value of "number of - // CPUs" may cause blocking tasks to starve other "fast" tasks). However, the - // effective upper bound is the number of replicas as each will submit its - // own tasks via a dedicated token. + // These pools are shared by all replicas hosted by this server, and thus + // are capped at a portion of the overall per-euid thread resource limit. + int64_t server_wide_pool_limit = GetThreadPoolThreadLimit(fs_manager_->env()); RETURN_NOT_OK(ThreadPoolBuilder("prepare") - .set_max_threads(std::numeric_limits<int>::max()) + .set_max_threads(server_wide_pool_limit) .Build(&tablet_prepare_pool_)); RETURN_NOT_OK(ThreadPoolBuilder("raft") .set_trace_metric_prefix("raft") - .set_max_threads(std::numeric_limits<int>::max()) + .set_max_threads(server_wide_pool_limit) .Build(&raft_pool_)); return Status::OK();