On Wed, 15 Nov 2017, Juergen Gross wrote:
> >>>         while(mutex_is_locked(&map->active.in_mutex.owner) ||
> >>>               mutex_is_locked(&map->active.out_mutex.owner))
> >>>                 cpu_relax();
> >>>
> >>> ?
> >> I'm not convinced there isn't a race.
> >>
> >> In pvcalls_front_recvmsg() sock->sk->sk_send_head is being read and only
> >> then in_mutex is taken. What happens if pvcalls_front_release() resets
> >> sk_send_head and manages to test the mutex before the mutex is locked?
> >>
> >> Even in case this is impossible: the whole construct seems to be rather
> >> fragile.

I agree it looks fragile, and I agree that it might be best to avoid the
usage of in_mutex and out_mutex as refcounts. More comments on this
below.

 
> > I think we can wait until pvcalls_refcount is 1 (i.e. it's only us) and
> > not rely on mutex state.
> 
> Yes, this would work.

Yes, I agree it would work and for the sake of getting something in
shape for the merge window I am attaching a patch for it. Please go
ahead with it. Let me know if you need anything else immediately, and
I'll work on it ASAP.



However, I should note that this is a pretty big hammer we are using:
the refcount is global, while we only need to wait until it's only us
_on this specific socket_.

We really need a per socket refcount. If we don't want to use the mutex
internal counters, then we need another one.

See the appended patch that introduces a per socket refcount. However,
for the merge window, also using pvcalls_refcount is fine.

The race Juergen is concerned about is only theoretically possible:

recvmsg:                                 release:
  
  test sk_send_head                      clear sk_send_head
  <only few instructions>                <prepare a message>
  grab in_mutex                          <send a message to the server>
                                         <wait for an answer>
                                         test in_mutex

Without kernel preemption is not possible for release to clear
sk_send_head and test in_mutex after recvmsg tests sk_send_head and
before recvmsg grabs in_mutex.

But maybe we need to disable kernel preemption in recvmsg and sendmsg to
stay on the safe side?

The patch below introduces a per active socket refcount, so that we
don't have to rely on in_mutex and out_mutex for refcounting. It also
disables preemption in sendmsg and recvmsg in the region described
above.

I don't think this patch should go in immediately. We can take our time
to figure out the best fix.


diff --git a/drivers/xen/pvcalls-front.c b/drivers/xen/pvcalls-front.c
index 0c1ec68..8c1030b 100644
--- a/drivers/xen/pvcalls-front.c
+++ b/drivers/xen/pvcalls-front.c
@@ -68,6 +68,7 @@ struct sock_mapping {
                        struct pvcalls_data data;
                        struct mutex in_mutex;
                        struct mutex out_mutex;
+                       atomic_t sock_refcount;
 
                        wait_queue_head_t inflight_conn_req;
                } active;
@@ -497,15 +498,20 @@ int pvcalls_front_sendmsg(struct socket *sock, struct 
msghdr *msg,
        }
        bedata = dev_get_drvdata(&pvcalls_front_dev->dev);
 
+       preempt_disable();
        map = (struct sock_mapping *) sock->sk->sk_send_head;
        if (!map) {
+               preempt_enable();
                pvcalls_exit();
                return -ENOTSOCK;
        }
 
+       atomic_inc(&map->active.sock_refcount);
        mutex_lock(&map->active.out_mutex);
+       preempt_enable();
        if ((flags & MSG_DONTWAIT) && !pvcalls_front_write_todo(map)) {
                mutex_unlock(&map->active.out_mutex);
+               atomic_dec(&map->active.sock_refcount);
                pvcalls_exit();
                return -EAGAIN;
        }
@@ -528,6 +534,7 @@ int pvcalls_front_sendmsg(struct socket *sock, struct 
msghdr *msg,
                tot_sent = sent;
 
        mutex_unlock(&map->active.out_mutex);
+       atomic_dec(&map->active.sock_refcount);
        pvcalls_exit();
        return tot_sent;
 }
@@ -600,13 +607,17 @@ int pvcalls_front_recvmsg(struct socket *sock, struct 
msghdr *msg, size_t len,
        }
        bedata = dev_get_drvdata(&pvcalls_front_dev->dev);
 
+       preempt_disable();
        map = (struct sock_mapping *) sock->sk->sk_send_head;
        if (!map) {
+               preempt_enable();
                pvcalls_exit();
                return -ENOTSOCK;
        }
 
+       atomic_inc(&map->active.sock_refcount);
        mutex_lock(&map->active.in_mutex);
+       preempt_enable();
        if (len > XEN_FLEX_RING_SIZE(PVCALLS_RING_ORDER))
                len = XEN_FLEX_RING_SIZE(PVCALLS_RING_ORDER);
 
@@ -625,6 +636,7 @@ int pvcalls_front_recvmsg(struct socket *sock, struct 
msghdr *msg, size_t len,
                ret = 0;
 
        mutex_unlock(&map->active.in_mutex);
+       atomic_dec(&map->active.sock_refcount);
        pvcalls_exit();
        return ret;
 }
@@ -1048,8 +1060,7 @@ int pvcalls_front_release(struct socket *sock)
                 * is set to NULL -- we only need to wait for the existing
                 * waiters to return.
                 */
-               while (!mutex_trylock(&map->active.in_mutex) ||
-                          !mutex_trylock(&map->active.out_mutex))
+               while (atomic_read(&map->active.sock_refcount) > 0)
                        cpu_relax();
 
                pvcalls_front_free_map(bedata, map);
xen/pvcalls: fix potential endless loop in pvcalls-front.c

mutex_trylock() returns 1 if you take the lock and 0 if not. Assume you
take in_mutex on the first try, but you can't take out_mutex. Next times
you call mutex_trylock() in_mutex is going to fail. It's an endless
loop.

Solve the problem by waiting until the global refcount is 1 instead (the
refcount is 1 when the only active pvcalls frontend function is
pvcalls_front_release).

Reported-by: Dan Carpenter <dan.carpen...@oracle.com>
Signed-off-by: Stefano Stabellini <sstabell...@kernel.org>
CC: boris.ostrov...@oracle.com
CC: jgr...@suse.com


diff --git a/drivers/xen/pvcalls-front.c b/drivers/xen/pvcalls-front.c
index 0c1ec68..72a74c3 100644
--- a/drivers/xen/pvcalls-front.c
+++ b/drivers/xen/pvcalls-front.c
@@ -1048,8 +1048,7 @@ int pvcalls_front_release(struct socket *sock)
 		 * is set to NULL -- we only need to wait for the existing
 		 * waiters to return.
 		 */
-		while (!mutex_trylock(&map->active.in_mutex) ||
-			   !mutex_trylock(&map->active.out_mutex))
+		while (atomic_read(&pvcalls_refcount) > 1)
 			cpu_relax();
 
 		pvcalls_front_free_map(bedata, map);
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

Reply via email to