On 10/07/14 06:50, Stephan Erb wrote:
Seems like there is a workaround: I can emulate my desired configuration
to prevent swap usage, by disabling swap on the host and starting the
slave without "--cgroups_limit_swap". Then everything works as expected,
i.e., a misbehaving task is killed immediately.

However, I still don't know why 'cgroups_limit_swap' is not working as
advertised.

Best Regards,
Stephan


Stephan,

I do not think that anyone has mastered systemd and is fully happy with all of the low level capabilities it promises to master. It is a work in progress. It is *huge* and much is not documented. Now we're running clustering software (mesos) on these new systems? It's a needle in the haystack when memory issues are deeply rooted. How do you know they are deeply rooted? Easy, when you cannot find a simple solution. I use Gentoo for this work, because my intention is to build up both openrc and systemd mesos clusters, to ferret out deep memory issues. I sure hope others (developers?) have methodologies planned for deep memory issue data-collection, analysis, testing and resolution. I think many of the dev-folks are holding those cards, close to their chest. I'm a bit more open, older, and doubting that systemd is so wonderful, in it's current offering. I salute those "brave souls" that have swallowed the systemd theory and wish them all the best and great success.

Me, I'm old and crusted and depend on the "old traditional ways" whilst
I wait for systemd to mature. Either way, you are going to  need tools
such as ftrace/trace-cmd/kernelshark and some very "tuned" kernels to
push the capabilities of mesos, imho. So until I get my clusters built
and accepting batch jobs, I cannot really help you out.

Systemtap, dtrace, vlagrind, etc etc are tools that may help. I'm still trying to get kernelshark working on gentoo linux. I wish I could be of more help to you. I think it would be an excellent idea if folks would include their platform (OS, kernel, mesos-version, spark-version etc etc) in their postings. For me, I'm working on too many things in parallel in order to get thse mesos-spark clusters ready to bang on a bit. I'm not much for just downloading and running a bunch of binaries and tweaking a few config files. In my decades of experiences with embedded systems, high_strung mathematics and distributed processing, a bunch of binaries will simply not work when you run into deep problems like (OOM). It's going to take building up from 100% sourcecodes and diagnosing these problems all along the way. OOM for an "in-memory" distributed system is just one of the deep, kernel related problems we are going to face, imho. You may/will exhaust user space remedies when the real issues are deeply related to systemd and the low level kernel resource allocations decisions that have been abstracted away into systemd. Anything as complex as systemd is going to take years to become
stable and decades to master and then document, imho.


Certainly, I hope I'm very, very wrong. When somebody builds a mesos cluster, and runs a (10K)^3 cell array with PDE/FEM codes on a mesos cluster, please let me know, so I can download your binaries? When your mesos-cluster is running batch jobs of most-any commonly found linux applications, please drop the list some fan-mail.


WE need deep_tools, and this community should share what tools they have as these problems are worked through, imho.


hth,
James

Reply via email to