​Hi,

I have executor which run java programs. One executor execute around 800
tasks. Last ~100 failed with "Too many open files". I increased nofile and
nproc limits. During my debug I could not say that problem in in the tasks.
But some times linux reach limits. I see some boxes are good with out 'Too
many open files errors'. But some has such errors.

I run executor through mesos-containerizer and isolation is posix/cpu,posix/
mem.

Can some one explain why this happens? Is it better to create separate
executor for each task? Tasks have common code but has different commands.

Any help are welcomed.

mesos-containerizer launch --command={"shell":true,"value":"java -cp
executor-all-1.0.jar com.stone.mesos.MesosTestExecutor"} --environment={"
LIBPROCESS_IP":"10.10.10.10","LIBPROCESS_PORT":"0","MESOS_AGENT_ENDPOINT":"
10.10.10.10:5051","MESOS_CHECKPOINT":"1","MESOS
_DIRECTORY":"\/var\/lib\/mesos\/slaves\/7e30a916-1296-4f47-813a-0972030b6907-S14\/frameworks\/7e30a916-1296-4f47-813a-0972030b6907-0020\/executors\/client_b12.tar-0f9e8f80-a217-4b28-bb5e-4dd7cc587381\/runs\/ee8857e2-ac19-4f07-810d-c2e71fbf522e","
MESOS_EXECUTOR_ID":"client_b12.tar-0f9e8f80-a217-4b28-bb5e-4dd7cc587381","
MESOS_EXECUTOR_SHUTDOWN_GRACE_PERIOD":"5secs","MESOS
_FRAMEWORK_ID":"7e30a916-1296-4f47-813a-0972030b6907-0020","MESOS
_HTTP_COMMAND_EXECUTOR":"0","MESOS_NATIVE_JAVA_LIBRARY":"\/usr\/local\/lib\/
libmesos-1.1.0.so","MESOS_NATIVE_LIBRARY":"\/usr\/local\/lib\/libmesos-
1.1.0.so","MESOS_RECOVERY_TIMEOUT":"15mins","MESOS
_SANDBOX":"\/var\/lib\/mesos\/slaves\/7e30a916-1296-4f47-813a-0972030b6907-S14\/frameworks\/7e30a916-1296-4f47-813a-0972030b6907-0020\/executors\/client_b12.tar-0f9e8f80-a217-4b28-bb5e-4dd7cc587381\/runs\/ee8857e2-ac19-4f07-810d-c2e71fbf522e","
MESOS_SLAVE_ID":"7e30a916-1296-4f47-813a-0972030b6907-S14","MESOS_SLAVE_PID
":"slave(1)@10.10.10.10:5051","MESOS_SUBSCRIPTION_BACKOFF
_MAX":"2secs","PATH":"\/usr\/local\/sbin:\/usr\/local\/bin:\/usr\/sbin
:\/usr\/bin:\/sbin:\/bin"} --help=false --pipe_read=12 --pipe_write=13
--pre_exec_commands=[]
--runtime_directory=/var/run/mesos/containers/ee8857e2-ac19-4f07-810d-c2e71fbf522e
--unshare_namespace_mnt=false --user=ec2-user --working_directory=/var/lib/
mesos
/slaves/7e30a916-1296-4f47-813a-0972030b6907-S14/frameworks/7e30a916-1296-4f47-813a-0972030b6907-0020/executors/client_b12.tar-0f9e8f80-a217-4b28-bb5e-4dd7cc587381/runs/ee8857e2-ac19-4f07-810d-c2e71fbf522e

Thanks,
-Kiril​

Reply via email to