Joerg Schad created MESOS-4998: ---------------------------------- Summary: Problematic fork/clone performance at high load. Key: MESOS-4998 URL: https://issues.apache.org/jira/browse/MESOS-4998 Project: Mesos Issue Type: Epic Reporter: Joerg Schad Assignee: Joerg Schad
Creating a new subprocess in mesos involves forking/cloning a new process. In most cases (executors, perf, ..) the parent of the new process is the agent/slave process. This can lead to problematic behavior especially when creating several new processes at the same time. The problem here is that the normal fork() (or clone syscall used by libprocess) provides a copy-on-write (cow) view of the parents address space until the child execs its new binary. Note that during the time between fork and exec Mesos does several setup actions such as placing the new processes in systemd units or assigning them to the freezer cgroup. This cow property of the address space implies that existing memory is marked as read-only and any write will trigger a page-fault and a newly created page. Note this behavior also extends to the parent process and hence any write will be very costly. We simulated the number of pagefaults when forking/cloning new processes by this benchmark: https://github.com/joerg84/forking-benchmark Results can be seen here: https://docs.google.com/presentation/d/1SUjKAVHdrutLPpFJy3Q1yhinG5FOMw3HbbEdzuhZ7A8 -- This message was sent by Atlassian JIRA (v6.3.4#6332)