I'm going to -1 (non binding) for the same reason as David Robinson. I would classify the FD leak as serious and a violation of the isolation that the agent provides.
It should be back ported to 1.1.0 just like how it was backported to 1.0.2 On Mon, Oct 24, 2016 at 5:37 PM, David Robinson <drobin...@twitter.com> wrote: > -1 > > Can the fix for MESOS-6420 be backported? The Mesos agent leaks sockets > when the port mapping network isolator is enabled, the leaked sockets are > passed to the executor (the close-on-exec flag is not set) and that can > cause problems for certain frameworks. The Aurora executor uses Kazoo (the > python ZooKeeper library) for service announcement, Kazoo uses Python's > select() call for polling its file descriptors and Python's select() chokes > when there's > 1024 file descriptors. The end result for Aurora is that > after an agent runs > 1024 tasks any new tasks will fail to announce (will > not be registered in ZooKeeper) and will therefore be unknown to other > services. > > On Tue, Oct 18, 2016 at 1:01 PM, Till Toenshoff <toensh...@me.com> wrote: > >> Hi all, >> >> Please vote on releasing the following candidate as Apache Mesos 1.1.0. >> >> >> 1.1.0 includes the following: >> ------------------------------------------------------------ >> -------------------- >> * [MESOS-2449] - **Experimental** support for launching a group of tasks >> via a new `LAUNCH_GROUP` Offer operation. Mesos will guarantee that >> either >> all tasks or none of the tasks in the group are delivered to the >> executor. >> Executors receive the task group via a new `LAUNCH_GROUP` event. >> >> * [MESOS-2533] - **Experimental** support for HTTP and HTTPS health >> checks. >> Executors may now use the updated `HealthCheck` protobuf to implement >> HTTP(S) health checks. Both default executors (command and docker) >> leverage >> `curl` binary for sending HTTP(S) requests and connect to `127.0.0.1`, >> hence a task must listen on all interfaces. On Linux, For BRIDGE and >> USER >> modes, docker executor enters the task's network namespace. >> >> * [MESOS-3421] - **Experimental** Support sharing of resources across >> containers. Currently persistent volumes are the only resources >> allowed to >> be shared. >> >> * [MESOS-3567] - **Experimental** support for TCP health checks. >> Executors >> may now use the updated `HealthCheck` protobuf to implement TCP health >> checks. Both default executors (command and docker) connect to >> `127.0.0.1`, >> hence a task must listen on all interfaces. On Linux, For BRIDGE and >> USER >> modes, docker executor enters the task's network namespace. >> >> * [MESOS-4324] - Allow access to persistent volumes as read-only or >> read-write >> by tasks. Mesos doesn't allow persistent volumes to be created as >> read-only >> but in 1.1 it starts allow tasks to use the volumes as read-only. >> This is >> mainly motivated by shared persistent volumes but applies to regular >> persistent volumes as well. >> >> * [MESOS-5275] - **Experimental** support for linux capabilities. >> Frameworks >> or operators now have fine-grained control over the capabilities that >> a >> container may have. This allows a container to run as root, but not >> have all >> the privileges associated with the root user (e.g., CAP_SYS_ADMIN). >> >> * [MESOS-5344] -- **Experimental** support for partition-aware Mesos >> frameworks. In previous Mesos releases, when an agent is partitioned >> from >> the master and then reregisters with the cluster, all tasks running >> on the >> agent are terminated and the agent is shutdown. In Mesos 1.1, >> partitioned >> agents will no longer be shutdown when they reregister with the >> master. By >> default, tasks running on such agents will still be killed (for >> backward >> compatibility); however, frameworks can opt-in to the new >> PARTITION_AWARE >> capability. If they do this, their tasks will not be killed when a >> partition >> is healed. This allows frameworks to define their own policies for >> how to >> handle partitioned tasks. Enabling the PARTITION_AWARE capability also >> introduces a new set of task states: TASK_UNREACHABLE, TASK_DROPPED, >> TASK_GONE, TASK_GONE_BY_OPERATOR, and TASK_UNKNOWN. These new states >> are >> intended to eventually replace the TASK_LOST state. >> >> * [MESOS-6077] - **Experimental** A new default executor is introduced >> which >> frameworks can use to launch task groups as nested containers. All the >> nested containers share resources likes cpu, memory, network and >> volumes. >> >> * [MESOS-6014] - **Experimental** A new port-mapper CNI plugin, the >> `mesos-cni-port-mapper` has been introduced. For Mesos containers, >> with the >> CNI port-mapper plugin, users can now expose container ports through >> host >> ports using DNAT. This is especially useful when Mesos containers are >> attached to isolated CNI networks such as private bridge networks, >> and the >> services running in the container needs to be exposed outside these >> isolated networks. >> >> >> The CHANGELOG for the release is available at: >> https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=blob_p >> lain;f=CHANGELOG;hb=1.1.0-rc1 >> ------------------------------------------------------------ >> -------------------- >> >> The candidate for Mesos 1.1.0 release is available at: >> https://dist.apache.org/repos/dist/dev/mesos/1.1.0-rc1/mesos-1.1.0.tar.gz >> >> The tag to be voted on is 1.1.0-rc1: >> https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=commit;h=1.1.0-rc1 >> >> The MD5 checksum of the tarball can be found at: >> https://dist.apache.org/repos/dist/dev/mesos/1.1.0-rc1/mesos >> -1.1.0.tar.gz.md5 >> >> The signature of the tarball can be found at: >> https://dist.apache.org/repos/dist/dev/mesos/1.1.0-rc1/mesos >> -1.1.0.tar.gz.asc >> >> The PGP key used to sign the release is here: >> https://dist.apache.org/repos/dist/release/mesos/KEYS >> >> The JAR is up in Maven in a staging repository here: >> https://repository.apache.org/content/repositories/orgapachemesos-1158 >> >> Please vote on releasing this package as Apache Mesos 1.1.0! >> >> The vote is open until Fri Oct 21 21:57:02 CEST 2016 and passes if a >> majority of at least 3 +1 PMC votes are cast. >> >> [ ] +1 Release this package as Apache Mesos 1.1.0 >> [ ] -1 Do not release this package because ... >> >> Thanks, >> Alex & Till >> >> > > > -- > David Robinson > SRE - Mesos > @daverobinson > > -- > Zameer Manji >