On Tue, Aug 15, 2017 at 17:14 +0200, Mike Belopuhov wrote: > Hi, > > I've just triggered an assert in hfsc_deferred (a callout) on an > MP kernel running on an SP virtual machine: > > panic: kernel diagnostic assertion "HFSC_ENABLED(ifq)" failed: file > "/home/mike/src/openbsd/sys/net/hfsc.c", line 950 > Stopped at db_enter+0x9: leave > TID PID UID PRFLAGS PFLAGS CPU COMMAND > *247463 28420 0 0x3 0 0 pfctl > db_enter() at db_enter+0x9 > > panic(ffffffff817f78f0,4,ffffffff81a3ffc0,ffffffff8110c140,ffff8000000c2060,fff > fffff81598b1c) at panic+0x102 > __assert(ffffffff81769d93,ffffffff817d7350,3b6,ffffffff817d72bd) at > __assert+0x > 35 > hfsc_deferred(ffff8000000c2060) at hfsc_deferred+0x9e > timeout_run(ffff80000004adc8) at timeout_run+0x4c > softclock(0) at softclock+0x146 > softintr_dispatch(0) at softintr_dispatch+0x9f > Xsoftclock() at Xsoftclock+0x1f > --- interrupt --- > end of kernel > end trace frame: 0x728d481974c08548, count: 7 > 0x2cfe9c031c90000: > https://www.openbsd.org/ddb.html describes the minimum info required in bug > reports. Insufficient info makes it difficult to find and fix bugs. > ddb{0}> ps > PID TID PPID UID S FLAGS WAIT COMMAND > *28420 247463 5000 0 7 0x3 pfctl > > > pfctl runs in the loop reloading the ruleset. So at some point we > disable HFSC on the interface but lose a race with hfsc_deferred > before re-enabling it. >
While talking to visa@, I've added this piece that explains where there race is happening: When we *disable* hfsc, we put back the prio ifq ops: 203 ifq_attach(struct ifqueue *ifq, const struct ifq_ops *newops, void *opsarg) 204 { 205 struct mbuf_list ml = MBUF_LIST_INITIALIZER(); 206 struct mbuf_list free_ml = MBUF_LIST_INITIALIZER(); 207 struct mbuf *m; 208 const struct ifq_ops *oldops; 209 void *newq, *oldq; 210 211 newq = newops->ifqop_alloc(ifq->ifq_idx, opsarg); 212 213 mtx_enter(&ifq->ifq_mtx); 214 ifq->ifq_ops->ifqop_purge(ifq, &ml); 215 ifq->ifq_len = 0; 216 217 oldops = ifq->ifq_ops; 218 oldq = ifq->ifq_q; 219 220 ifq->ifq_ops = newops; 221 ifq->ifq_q = newq; 222 223 while ((m = ml_dequeue(&ml)) != NULL) { 224 m = ifq->ifq_ops->ifqop_enq(ifq, m); 225 if (m != NULL) { 226 ifq->ifq_qdrops++; 227 ml_enqueue(&free_ml, m); 228 } else 229 ifq->ifq_len++; 230 } 231 mtx_leave(&ifq->ifq_mtx); 232 233 oldops->ifqop_free(ifq->ifq_idx, oldq); 234 235 ml_purge(&free_ml); 236 } Line 214 calls hfsc_purge, on line 231 we release the IPL_NET mutex protecting us from interrupts and then finally on line 233 we call hfsc_free that does timeout_del. This opens a window of opportunity after we release the mutex for a networking interrupt to fire and call the softclock softintr before returning control to ifq_attach. And visa@ has pointed out another potential race after ifqop_alloc (aka hfsc_alloc that does timeout_add) and line 213 where we grab the IPL_NET mutex to set ifq_ops on line 220. Since NET_LOCK doesn't have an interrupt protection from softclock softintrs anymore an IPL_NET interrupt can fire and if lucky trigger the hfsc_deferred before we set ifq_ops. To avoid this race I'm proposing to move the timeout_add from hfsc_alloc to hfsc_enq_begin, i.e. add the timeout when there are packets to deal with. > IFQ has a mechanism to lock the underlying object and I believe this > is the right tool for this job. Any other ideas? > > I don't think it's a good idea to hold the mutex (ifq_q_enter and > ifq_q_leave effectively lock and unlock it) during the ifq_start, > so we have to make a concession and run the ifq_start before knowing > whether or not HFSC is attached. IMO, it's a small price to pay to > avoide clutter. Kernel lock assertion is pointless at this point. > > OK? > A new diff combining all three modifications: diff --git sys/net/hfsc.c sys/net/hfsc.c index 410bea733c6..b81afd43531 100644 --- sys/net/hfsc.c +++ sys/net/hfsc.c @@ -584,14 +584,13 @@ hfsc_idx(unsigned int nqueues, const struct mbuf *m) void * hfsc_alloc(unsigned int idx, void *q) { struct hfsc_if *hif = q; + KASSERT(idx == 0); /* when hfsc is enabled we only use the first ifq */ KASSERT(hif != NULL); - - timeout_add(&hif->hif_defer, 1); return (hif); } void hfsc_free(unsigned int idx, void *q) @@ -825,12 +824,15 @@ hfsc_enq(struct ifqueue *ifq, struct mbuf *m) } dm = hfsc_class_enqueue(cl, m); /* successfully queued. */ - if (dm != m && hfsc_class_qlength(cl) == 1) + if (dm != m && hfsc_class_qlength(cl) == 1) { hfsc_set_active(hif, cl, m->m_pkthdr.len); + if (!timeout_pending(&hif->hif_defer)) + timeout_add(&hif->hif_defer, 1); + } /* drop occurred. */ if (dm != NULL) PKTCNTR_INC(&cl->cl_stats.drop_cnt, dm->m_pkthdr.len); @@ -944,20 +946,22 @@ hfsc_deferred(void *arg) { struct ifnet *ifp = arg; struct ifqueue *ifq = &ifp->if_snd; struct hfsc_if *hif; - KERNEL_ASSERT_LOCKED(); - KASSERT(HFSC_ENABLED(ifq)); + if (!HFSC_ENABLED(ifq)) + return; if (!ifq_empty(ifq)) ifq_start(ifq); - hif = ifq->ifq_q; - + hif = ifq_q_enter(&ifp->if_snd, ifq_hfsc_ops); + if (hif == NULL) + return; /* XXX HRTIMER nearest virtual/fit time is likely less than 1/HZ. */ timeout_add(&hif->hif_defer, 1); + ifq_q_leave(&ifp->if_snd, hif); } void hfsc_cl_purge(struct hfsc_if *hif, struct hfsc_class *cl, struct mbuf_list *ml) {