Kiril,

We had this problem too and had a confusing time understanding how the
ulimit setting was being applied to mesos executors.

Here is what we learned which seems to work:

The startup script /usr/bin/mesos-init-wrapper reads configuration values
from /etc/default/mesos

In this file, the configuration value for the ULIMIT parameter is fed as
the argument to the 'ulimit' command.

Therefore, this is what we have in our /etc/default/mesos file:

ULIMIT="-n 20000"

This gets passed in to the line in /usr/bin/mesos-init-wrapper which looks
like this:

  [[ ! ${ULIMIT:-} ]]    || ulimit $ULIMIT

..and sets your limits correctly. It is not sufficient to set systemwide
ulimits.

Do beware when performing upgrades that replace the content in
/etc/default/mesos


Thanks,
June Taylor
*(she / her)*
System Administrator, Minnesota Population Center
University of Minnesota

On Thu, Dec 29, 2016 at 11:07 AM, Kiril Menshikov <kmenshi...@gmail.com>
wrote:

> ​Hi,
>
> I have executor which run java programs. One executor execute around 800
> tasks. Last ~100 failed with "Too many open files". I increased nofile
> and nproc limits. During my debug I could not say that problem in in the
> tasks. But some times linux reach limits. I see some boxes are good with
> out 'Too many open files errors'. But some has such errors.
>
> I run executor through mesos-containerizer and isolation is posix/cpu,
> posix/mem.
>
> Can some one explain why this happens? Is it better to create separate
> executor for each task? Tasks have common code but has different commands.
>
> Any help are welcomed.
>
> mesos-containerizer launch --command={"shell":true,"value":"java -cp
> executor-all-1.0.jar com.stone.mesos.MesosTestExecutor"} --environment={"
> LIBPROCESS_IP":"10.10.10.10","LIBPROCESS_PORT":"0","MESOS_AGENT_
> ENDPOINT":"10.10.10.10:5051","MESOS_CHECKPOINT":"1","MESOS_
> DIRECTORY":"\/var\/lib\/mesos\/slaves\/7e30a916-1296-4f47-
> 813a-0972030b6907-S14\/frameworks\/7e30a916-1296-
> 4f47-813a-0972030b6907-0020\/executors\/client_b12.tar-
> 0f9e8f80-a217-4b28-bb5e-4dd7cc587381\/runs\/ee8857e2-
> ac19-4f07-810d-c2e71fbf522e","MESOS_EXECUTOR_ID":"client_
> b12.tar-0f9e8f80-a217-4b28-bb5e-4dd7cc587381","MESOS_
> EXECUTOR_SHUTDOWN_GRACE_PERIOD":"5secs","MESOS_
> FRAMEWORK_ID":"7e30a916-1296-4f47-813a-0972030b6907-0020","MESOS
> _HTTP_COMMAND_EXECUTOR":"0","MESOS_NATIVE_JAVA_LIBRARY"
> :"\/usr\/local\/lib\/libmesos-1.1.0.so","MESOS_NATIVE_
> LIBRARY":"\/usr\/local\/lib\/libmesos-1.1.0.so","MESOS_
> RECOVERY_TIMEOUT":"15mins","MESOS_SANDBOX":"\/var\/lib\/
> mesos\/slaves\/7e30a916-1296-4f47-813a-0972030b6907-S14\/
> frameworks\/7e30a916-1296-4f47-813a-0972030b6907-0020\/
> executors\/client_b12.tar-0f9e8f80-a217-4b28-bb5e-
> 4dd7cc587381\/runs\/ee8857e2-ac19-4f07-810d-c2e71fbf522e","MESOS
> _SLAVE_ID":"7e30a916-1296-4f47-813a-0972030b6907-S14","MESOS_SLAVE_PID
> ":"slave(1)@10.10.10.10:5051","MESOS_SUBSCRIPTION_BACKOFF_MAX":"
> 2secs","PATH":"\/usr\/local\/sbin:\/usr\/local\/bin:\/usr\/sbin
> :\/usr\/bin:\/sbin:\/bin"} --help=false --pipe_read=12 --pipe_write=13 --
> pre_exec_commands=[] --runtime_directory=/var/run/mesos
> /containers/ee8857e2-ac19-4f07-810d-c2e71fbf522e --unshare_namespace_mnt=false
> --user=ec2-user --working_directory=/var/lib/mesos/slaves/7e30a916-1296-
> 4f47-813a-0972030b6907-S14/frameworks/7e30a916-1296-4f47-
> 813a-0972030b6907-0020/executors/client_b12.tar-0f9e8f80-a217-4b28-bb5e-
> 4dd7cc587381/runs/ee8857e2-ac19-4f07-810d-c2e71fbf522e
>
> Thanks,
> -Kiril​
>

Reply via email to