Joseph Wu created MESOS-5723:
--------------------------------

             Summary: SSL-enabled libprocess will leak incoming links to forks
                 Key: MESOS-5723
                 URL: https://issues.apache.org/jira/browse/MESOS-5723
             Project: Mesos
          Issue Type: Bug
          Components: libprocess
    Affects Versions: 0.28.0, 0.27.0, 0.26.0, 0.25.0, 0.24.0
            Reporter: Joseph Wu
            Assignee: Joseph Wu
            Priority: Blocker
             Fix For: 1.0.0


Encountered two different buggy behaviors that can be tracked down to the same 
underlying problem.

Repro #1 (non-crashy):
(1) Start a master.  Doesn't matter if SSL is enabled or not.
(2) Start an agent, with SSL enabled.  Downgrade support has the same problem.  
The master/agent {{link}} to one another.
(3) Run a sleep task.  Keep this alive.  If you inspect FDs at this point, 
you'll notice the task has inherited the {{link}} FD (master -> agent).
(4) Restart the agent.  Due to (3), the master's {{link}} stays open.
(5) Check master's logs for the agent's re-registration message.
(6) Check the agent's logs for re-registration.  The message will not appear.  
The master is actually using the old {{link}} which is not connected to the 
agent.

----

Repro #2 (crashy):
(1) Start a master.  Doesn't matter if SSL is enabled or not.
(2) Start an agent, with SSL enabled.  Downgrade support has the same problem.
(3) Run ~100 sleep task one after the other, keep them all alive.  Each task 
links back to the agent.  Due to an FD leak, each task will inherit the 
incoming links from all other actors...
(4) At some point, the agent will run out of FDs and kernel panic.

----

It appears that the SSL socket {{accept}} call is missing {{os::nonblock}} and 
{{os::cloexec}} calls:
https://github.com/apache/mesos/blob/4b91d936f50885b6a66277e26ea3c32fe942cf1a/3rdparty/libprocess/src/libevent_ssl_socket.cpp#L794-L806

For reference, here's {{poll}} socket's {{accept}}:
https://github.com/apache/mesos/blob/4b91d936f50885b6a66277e26ea3c32fe942cf1a/3rdparty/libprocess/src/poll_socket.cpp#L53-L75




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to