Benjamin Mahler created MESOS-5988:
--------------------------------------

             Summary: PollSocketImpl can write to a stale fd.
                 Key: MESOS-5988
                 URL: https://issues.apache.org/jira/browse/MESOS-5988
             Project: Mesos
          Issue Type: Bug
          Components: libprocess
            Reporter: Benjamin Mahler
            Priority: Blocker
             Fix For: 1.0.1


When tracking down MESOS-5986 with [~greggomann] and [~anandmazumdar]. We were 
curious why PollSocketImpl avoids the same issue. It seems that PollSocketImpl 
has a similar race, however in the case of PollSocketImpl we will simply write 
to a stale file descriptor.

One example is {{PollSocketImpl::send(const char*, size_t)}}:

https://github.com/apache/mesos/blob/1.0.0/3rdparty/libprocess/src/poll_socket.cpp#L241-L245
{code}
Future<size_t> PollSocketImpl::send(const char* data, size_t size)
{
  return io::poll(get(), io::WRITE)
    .then(lambda::bind(&internal::socket_send_data, get(), data, size));
}

Future<size_t> socket_send_data(int s, const char* data, size_t size)
{
  CHECK(size > 0);

  while (true) {
    ssize_t length = send(s, data, size, MSG_NOSIGNAL);

#ifdef __WINDOWS__
    int error = WSAGetLastError();
#else
    int error = errno;
#endif // __WINDOWS__

    if (length < 0 && net::is_restartable_error(error)) {
      // Interrupted, try again now.
      continue;
    } else if (length < 0 && net::is_retryable_error(error)) {
      // Might block, try again later.
      return io::poll(s, io::WRITE)
        .then(lambda::bind(&internal::socket_send_data, s, data, size));
    } else if (length <= 0) {
      // Socket error or closed.
      if (length < 0) {
        const string error = os::strerror(errno);
        VLOG(1) << "Socket error while sending: " << error;
      } else {
        VLOG(1) << "Socket closed while sending";
      }
      if (length == 0) {
        return length;
      } else {
        return Failure(ErrnoError("Socket send failed"));
      }
    } else {
      CHECK(length > 0);

      return length;
    }
  }
}
{code}

If the last reference to the {{Socket}} goes away before the 
{{socket_send_data}} loop completes, then we will write to a stale fd!

It turns out that we have avoided this issue because in libprocess we happen to 
keep a reference to the {{Socket}} around when sending:

https://github.com/apache/mesos/blob/1.0.0/3rdparty/libprocess/src/process.cpp#L1678-L1707
{code}
void send(Encoder* encoder, Socket socket)
{
  switch (encoder->kind()) {
    case Encoder::DATA: {
      size_t size;
      const char* data = static_cast<DataEncoder*>(encoder)->next(&size);
      socket.send(data, size)
        .onAny(lambda::bind(
            &internal::_send,
            lambda::_1,
            socket,
            encoder,
            size));
      break;
    }
    case Encoder::FILE: {
      off_t offset;
      size_t size;
      int fd = static_cast<FileEncoder*>(encoder)->next(&offset, &size);
      socket.sendfile(fd, offset, size)
        .onAny(lambda::bind(
            &internal::_send,
            lambda::_1,
            socket,
            encoder,
            size));
      break;
    }
  }
}
{code}

However, this may not be true in all call-sites going forward. Currently, it 
appears that http::Connection can trigger this bug.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to