On 10/07/14 06:50, Stephan Erb wrote:
Seems like there is a workaround: I can emulate my desired configuration
to prevent swap usage, by disabling swap on the host and starting the
slave without "--cgroups_limit_swap". Then everything works as expected,
i.e., a misbehaving task is killed immediately.
However, I still don't know why 'cgroups_limit_swap' is not working as
advertised.
Best Regards,
Stephan
Stephan,
I do not think that anyone has mastered systemd and is fully happy with
all of the low level capabilities it promises to master. It is a work in
progress. It is *huge* and much is not documented. Now we're running
clustering software (mesos) on these new systems? It's a needle in the
haystack when memory issues are deeply rooted. How do you know they are
deeply rooted? Easy, when you cannot find a simple solution. I use
Gentoo for this work, because my intention is to build up both openrc
and systemd mesos clusters, to ferret out deep memory issues. I sure
hope others (developers?) have methodologies planned for deep memory
issue data-collection, analysis, testing and resolution. I think many
of the dev-folks are holding those cards, close to their chest. I'm a
bit more open, older, and doubting that systemd is so wonderful, in it's
current offering. I salute those "brave souls" that have swallowed the
systemd theory and wish them all the best and great success.
Me, I'm old and crusted and depend on the "old traditional ways" whilst
I wait for systemd to mature. Either way, you are going to need tools
such as ftrace/trace-cmd/kernelshark and some very "tuned" kernels to
push the capabilities of mesos, imho. So until I get my clusters built
and accepting batch jobs, I cannot really help you out.
Systemtap, dtrace, vlagrind, etc etc are tools that may help. I'm still
trying to get kernelshark working on gentoo linux. I wish I could be of
more help to you. I think it would be an excellent idea if folks would
include their platform (OS, kernel, mesos-version, spark-version etc
etc) in their postings. For me, I'm working on too many things in
parallel in order to get thse mesos-spark clusters ready to bang on a
bit. I'm not much for just downloading and running a bunch of binaries
and tweaking a few config files. In my decades of experiences with
embedded systems, high_strung mathematics and distributed processing, a
bunch of binaries will simply not work when you run into deep problems
like (OOM). It's going to take building up from 100% sourcecodes and
diagnosing these problems all along the way. OOM for an "in-memory"
distributed system is just one of the deep, kernel related problems we
are going to face, imho. You may/will exhaust user space remedies when
the real issues are deeply related to systemd and the low level kernel
resource allocations decisions that have been abstracted away into
systemd. Anything as complex as systemd is going to take years to become
stable and decades to master and then document, imho.
Certainly, I hope I'm very, very wrong. When somebody builds a mesos
cluster, and runs a (10K)^3 cell array with PDE/FEM codes on a mesos
cluster, please let me know, so I can download your binaries? When your
mesos-cluster is running batch jobs of most-any commonly found linux
applications, please drop the list some fan-mail.
WE need deep_tools, and this community should share what tools they have
as these problems are worked through, imho.
hth,
James