Hello,

</snip>
> Such a diff is below.  I will test extensively towmorrow.  If anyone
> wants to try now, be my guest.
> 
> Is the comment correct?

    I think comment is not quite right.

</snip>

the packet gets inserted into la->la_mq via arpresolve(), which
is not protected by KERNEL_LOCK().

arpresolve() runs as a reader on netlock, right?


> Index: netinet/if_ether.c
> ===================================================================
> RCS file: /cvs/src/sys/netinet/if_ether.c,v
> retrieving revision 1.248
> diff -u -p -r1.248 if_ether.c
> --- netinet/if_ether.c        28 Apr 2021 21:21:44 -0000      1.248
> +++ netinet/if_ether.c        28 Apr 2021 21:44:22 -0000
> @@ -689,12 +689,15 @@ arpcache(struct ifnet *ifp, struct ether
>       len = ml_len(&ml);
>       while ((m = ml_dequeue(&ml)) != NULL)
>               ifp->if_output(ifp, m, rt_key(rt), rt);
> -     /* XXXSMP we discard if other CPU enqueues */
> -     if (mq_len(&la->la_mq) > 0) {
> -             /* mbuf is back in queue. Discard. */
> -             atomic_sub_int(&la_hold_total, len + mq_purge(&la->la_mq));
> -     } else
> -             atomic_sub_int(&la_hold_total, len);
> +     atomic_sub_int(&la_hold_total, len);
> +
> +     /*
> +      * This function is protected by kernel lock.  No other CPU can insert
> +      * to la_mq while we feed the packets to if_output().  If a packet
> +      * is reinserted from there we have a loop.  This should not happen
> +      * and we want to find such cases.
> +      */
> +     KASSERT(mq_len(&la->la_mq) == 0);

    tripping the assert here means the ARP cache entry either
    expired or got deleted, while we were dispatching queued packets 
    to wire in a while() loop above.

    we enter dispatch loop above after ARP successfully resolves IP address.
    if ARP entry gets lost while we dispatching packets to wire we end up
    queuing packets into la->la_mq queue. This happens in arpresolve().
    arpresolve() is called on behalf of ifp->if_output()

    given arpcache() runs under KERNEL_LOCK() we can rule out case user
    runs 'arp -a -d' command, which would delete ARP cache. the 'arp -a -d'
    is excluded by KERNEL_LOCK().

    so if the assert fires something very strange must be happening.

I would suggest comment below:

/*
 * If la_mq is not empty, then something very strange has happened.
 * We expect la_mq to be empty, because while() loop above dispatches
 * queued packets to wire after ARP resolves IP address.
 *
 * la_mq not being empty means arpresolve() called on behalf of
 * ifp->if_output() could not find ARP entry we've just put to
 * cache. Quite unexpected given we run under KERNEL_LOCK().
 */

thanks and
regards
sashan

Reply via email to