• Vitaliy Makkoveev [2023-10-04 23:38]:
On 5 Oct 2023, at 00:31, Alexander Bluhm <alexander.bl...@gmx.net> wrote:

On Wed, Oct 04, 2023 at 11:03:27PM +0300, Vitaliy Makkoveev wrote:
On Wed, Oct 04, 2023 at 09:13:59PM +0200, Alexander Bluhm wrote:
On Wed, Oct 04, 2023 at 08:42:48PM +0200, Kirill Miazine wrote:
If it happns again, could you send an 'ps axlww | grep ifconifg'
output?  Then we see the wait channel where it hangs in the kernel.

$ ps axlww
   UID   PID  PPID CPU PRI  NI   VSZ   RSS WCHAN   STAT   TT       TIME COMMAND

Here it happened again:

     0 75339 23922   0  10   0   360   296 wg_ifq  D+U    p0    0:00.00
ifconfig wg1 destroy

wg_peer_destroy()
        ...
        NET_LOCK();
        while (!ifq_empty(&sc->sc_if.if_snd)) {
                NET_UNLOCK();
                tsleep_nsec(sc, PWAIT, "wg_ifq", 1000);
                NET_LOCK();
        }
        NET_UNLOCK();

This net lock dance looks fishy.  And the sleep has a timeout of 1
milli second.  But that is may be per packet.  So if you have a
long queue or the queue refills somehow, it will take forever.

I think the difference in the usage is constant traffic that keeps
the send queue full.  The timeout hides the problem when there are
only a few packets.


This should ensure wg_qstart() will not dereference dying `peer'. Looks
crappy and potentially could block forever, but should work. However
netlock it unnecessary here. netlocked wg_output() could fill `if_snd'
while netlock released before tsleep(), so it serializes nothing but
stops packets processing.

Kirill, does this diff help?

I doubt that it changes much.  When netlock is not taken, the queue
can still be filled with packets.

Removing this ugly netlock makes sense anyway.  But without any
synchronisation just reading a variable feels wrong.  Can we add a
read once like for mq_len in sys/mbuf.h?  And the ifq_set_maxlen()
also looks very unsafe.  For mbuf queues I added a mutex, interface
queues should do the same.

ok?


I guess this is uniprocessor machine, so synchronisation is not
related.
new diff doesn't prevent hang in test scenario either.

wg destroy would hang on both UP and MP machines -- the fresh Vultr test machine is MP.

diff is ok mvs.

bluhm

Index: net/ifq.c
===================================================================
RCS file: /data/mirror/openbsd/cvs/src/sys/net/ifq.c,v
retrieving revision 1.50
diff -u -p -r1.50 ifq.c
--- net/ifq.c   30 Jul 2023 05:39:52 -0000      1.50
+++ net/ifq.c   4 Oct 2023 21:04:20 -0000
@@ -529,6 +529,14 @@ ifq_hdatalen(struct ifqueue *ifq)
        return (len);
}

+void
+ifq_set_maxlen(struct ifqueue *ifq, unsigned int maxlen)
+{
+       mtx_enter(&ifq->ifq_mtx);
+       ifq->ifq_maxlen = maxlen;
+       mtx_leave(&ifq->ifq_mtx);
+}
+
unsigned int
ifq_purge(struct ifqueue *ifq)
{
Index: net/ifq.h
===================================================================
RCS file: /data/mirror/openbsd/cvs/src/sys/net/ifq.h,v
retrieving revision 1.38
diff -u -p -r1.38 ifq.h
--- net/ifq.h   30 Jul 2023 05:39:52 -0000      1.38
+++ net/ifq.h   4 Oct 2023 21:09:04 -0000
@@ -435,6 +435,7 @@ void                 ifq_deq_commit(struct ifqueue *,
void             ifq_deq_rollback(struct ifqueue *, struct mbuf *);
struct mbuf     *ifq_dequeue(struct ifqueue *);
int              ifq_hdatalen(struct ifqueue *);
+void            ifq_set_maxlen(struct ifqueue *, unsigned int);
void             ifq_mfreem(struct ifqueue *, struct mbuf *);
void             ifq_mfreeml(struct ifqueue *, struct mbuf_list *);
unsigned int     ifq_purge(struct ifqueue *);
@@ -448,9 +449,8 @@ int          ifq_deq_sleep(struct ifqueue *, st
                     const char *, volatile unsigned int *,
                     volatile unsigned int *);

-#define        ifq_len(_ifq)                   ((_ifq)->ifq_len)
-#define        ifq_empty(_ifq)                 (ifq_len(_ifq) == 0)
-#define        ifq_set_maxlen(_ifq, _l)        ((_ifq)->ifq_maxlen = (_l))
+#define ifq_len(_ifq)          READ_ONCE((_ifq)->ifq_len)
+#define ifq_empty(_ifq)                (ifq_len(_ifq) == 0)

static inline int
ifq_is_priq(struct ifqueue *ifq)
@@ -490,8 +490,8 @@ int          ifiq_input(struct ifiqueue *, stru
int              ifiq_enqueue(struct ifiqueue *, struct mbuf *);
void             ifiq_add_data(struct ifiqueue *, struct if_data *);

-#define        ifiq_len(_ifiq)                 ml_len(&(_ifiq)->ifiq_ml)
-#define        ifiq_empty(_ifiq)               ml_empty(&(_ifiq)->ifiq_ml)
+#define ifiq_len(_ifiq)                READ_ONCE(ml_len(&(_ifiq)->ifiq_ml))
+#define ifiq_empty(_ifiq)      (ifiq_len(_ifiq) == 0)

#endif /* _KERNEL */


Reply via email to