Joerg Schad created MESOS-4998:
----------------------------------

             Summary: Problematic fork/clone performance at high load.
                 Key: MESOS-4998
                 URL: https://issues.apache.org/jira/browse/MESOS-4998
             Project: Mesos
          Issue Type: Epic
            Reporter: Joerg Schad
            Assignee: Joerg Schad


Creating a new subprocess in mesos involves forking/cloning a new process. In 
most cases (executors, perf, ..) the parent of the new process is the 
agent/slave process. This can lead to problematic behavior especially when 
creating several new processes at the same time.

The problem here is that the normal fork() (or clone syscall used by 
libprocess) provides a copy-on-write (cow) view of the parents address space 
until the child execs its new binary. Note that during the time between fork 
and exec Mesos does several setup actions such as placing the new processes in 
systemd units or assigning them to the freezer cgroup.
This cow property of the address space implies that existing memory is marked 
as read-only and any write will trigger a page-fault and a newly created page. 
Note this behavior also extends to the parent process and hence any write will 
be very costly.

We simulated the number of pagefaults when forking/cloning new processes by 
this benchmark:
https://github.com/joerg84/forking-benchmark

Results can be seen here: 
https://docs.google.com/presentation/d/1SUjKAVHdrutLPpFJy3Q1yhinG5FOMw3HbbEdzuhZ7A8



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to