On Wed, Apr 28, 2021 at 03:31:41PM +0200, Claudio Jeker wrote: > On Wed, Apr 28, 2021 at 02:56:27PM +0200, Alexandr Nedvedicky wrote: > > I would say the la->la_mq queue should be empty once we dispatch all > > packets to wire. If it is not empty, then something went wrong and > > packets > > got recirculated back to la->la_mq. I think dropping them is better > > plan than keeping them in queue. There is not much we can do in this > > case than to drop them. > > Another option is to assert on the condition. It is an error case > that should not exist.
Such a diff is below. I will test extensively towmorrow. If anyone wants to try now, be my guest. Is the comment correct? > There are a few more issues in the arp code once the concurrency goes up. > The way the link-local address is updated looks not save to me. > At least arpresolve() and arpcache() need to make sure that access to > rt_gateway is either serialized or the updates happen so that the result > is always correct. At the moment this is not the case. arpcache() is running with kernel lock. So it is serialized. > I assume that arp_rtrequest() is accessed with an exclusive NET_LOCK if > not then there is more trouble waiting (setting rt_llinfo early is one > of them). As arp_rtrequest() is called from the packet output path, it will be running with shared net lock. There is no easy way to upgrade from shared to exclusive rwlock. For TCP and other protocol input functions, I added another network queue. This allows to upgrade once for multiple packets. I don't want to do this for ARP. My strategy is to fix the easy things in ARP with mutexes. We can also put complex functions in kernel lock to protect them against running in parallel. As long these functions are not in the hot path kernel lock does not matter for performance. Goal is to put real MP pressure on the stack. Multiple CPU processing packets in parallel. We should not search for the perfect solution for every corner case, before running with real MP load. I think kernel lock can help us there. It is done for arpcache() and I have tried it for arpresolve() and nd6_resolve() when testing multiple nettasks. I hope this works and is a practicable way forward. bluhm Index: netinet/if_ether.c =================================================================== RCS file: /cvs/src/sys/netinet/if_ether.c,v retrieving revision 1.248 diff -u -p -r1.248 if_ether.c --- netinet/if_ether.c 28 Apr 2021 21:21:44 -0000 1.248 +++ netinet/if_ether.c 28 Apr 2021 21:44:22 -0000 @@ -689,12 +689,15 @@ arpcache(struct ifnet *ifp, struct ether len = ml_len(&ml); while ((m = ml_dequeue(&ml)) != NULL) ifp->if_output(ifp, m, rt_key(rt), rt); - /* XXXSMP we discard if other CPU enqueues */ - if (mq_len(&la->la_mq) > 0) { - /* mbuf is back in queue. Discard. */ - atomic_sub_int(&la_hold_total, len + mq_purge(&la->la_mq)); - } else - atomic_sub_int(&la_hold_total, len); + atomic_sub_int(&la_hold_total, len); + + /* + * This function is protected by kernel lock. No other CPU can insert + * to la_mq while we feed the packets to if_output(). If a packet + * is reinserted from there we have a loop. This should not happen + * and we want to find such cases. + */ + KASSERT(mq_len(&la->la_mq) == 0); return (0); }