Hi riastradh@n.o, Thank you for your reviews.
In fist, I reflect your reviews and restructure patch series. Here is git format-patch patch series. http://www.netbsd.org/~knakahara/fix-gif-softint-using-psref2/fix-gif-softint-using-psref2-2.tgz And here is the unified patch, maybe unnecessary. http://www.netbsd.org/~knakahara/fix-gif-softint-using-psref2/unified-fix-gif-softint-using-psref2-2.patch On 2016/03/01 3:07, Taylor R Campbell wrote: > Date: Wed, 17 Feb 2016 13:56:02 +0900 > From: Kengo NAKAHARA <k-nakah...@iij.ad.jp> > > I fix above bugs and rebase, here is "git format-patch" patch series. > > http://www.netbsd.org/~knakahara/fix-gif-softint-using-psref2/fix-gif-softint-using-psref2.tgz > # I think this can be applied cleanly this time... > And here is the unified patch. > > http://www.netbsd.org/~knakahara/fix-gif-softint-using-psref2/unified-fix-gif-softint-using-psref2.patch > > Could you comments this patch? > > Some comments on all the patches in that box: > > commit 739235b87a941f9fb0e7150e9d7c0f888a3fd30f > Author: k-nakahara <k-nakah...@iij.ad.jp> > Date: Fri Jan 15 13:10:10 2016 +0900 > > fix: let gif(4) promise softint(9) contract snip > No need for membar_sync here: the xc_broadcast/xc_wait after this has > the same effect. I see. I will remove the unnecessary membar_sync(). > @@ -908,24 +921,15 @@ void > gif_delete_tunnel(struct ifnet *ifp) > { > struct gif_softc *sc = ifp->if_softc; > - struct sockaddr *osrc, *odst; > - void *osi; > int s; > > s = splsoftnet(); > + encap_lock_enter(); > > Why is this called encap_lock_enter/exit? It seems to me that this is > specific to gif -- nothing else uses it outside if_gif.c, and nothing > in ip_encap.c relies on it. Can you document the resources that > encap_lock is supposed to serialize? Uh...sorry, it's my mistake. As you point out below, encap_attach() check duplicated configuration. In contrast, encap_attach_func() does not check. So, I erroneously assumed that only encap_attach() is required to hold encap_lock. However, not only encap_attach() but also encap_attach_func() is required to hold encap_lock to avoid encap[46]_lookup() and encap_add() race. I will add encap_lock() to the following functions. - stf_clone_create()@if_stf.c - add_vif()@ip_mroute.c - ipe4_attach()@xform_ipip.c. > It looks like this is what I referred to as the global lock for the > system's gif(4) configuration, gif_lock, in my sketch at > <https://mail-index.netbsd.org/tech-net/2016/01/11/msg005475.html>. Yes, right. :) BTW, as the KERNEL_LOCK ranges are increased by the following integration encaptab.lock and encap_lock, I make encap_lock_enter/exit interruptable which you present the above mail. Thanks. > In that case, it serializes changes to the systemwide set of gif(4) > tunnels, and preserves the invariant that there are no overlapping > gif(4) tunnels. > > But maybe it will *also* be needed to serialize changes to the set of > ip_encap tunnels too -- in that case, the name is good, but it needs > to be used in ip_encap as well. More on that below. > > It also looks like this change is just about gif(4) -- it is > independent of the change to use an rwlock for ip_encap. Can you > split this one into two changes? Or are there locking rules related > to both encap_lock and the rwlock you added that I haven't guessed? I bundled the modifications which is required to fix panic caused by the race between gif(4) packet processing and ioctl, but that is a bad idea, sorry. I will split gif(4) change and ip_encap change to the following two patches. - 0001-let-gif-4-promise-softint-9-contract-1-2-gif-4-side.patch - 0002-let-gif-4-promise-softint-9-contract-2-2-ip_encap-si.patch > commit 568b52fa26e6141251fc4e63264fa93920e468ff > Author: k-nakahara <k-nakah...@iij.ad.jp> > Date: Thu Feb 4 17:43:29 2016 +0900 > > add receive side stop code to gif_encap_pause() > > Can you explain in a little more detail why this change is necessary? > My memory is fuzzy -- I recall I suggested something like it, but I > suggested that because I thought the softint was scheduled by the > receive side, not by the send side. Sorry about a lack of explanation. This patch does not relate with softint. This patch fixed a race in radix tree. In my parallel test which several tens of "ifconfig {tunnnel,deletetunnel}" and gif(4) flood ping run, I encountered panic at rn_match()@radix.c. It must be caused by parallel run between rn_match() and rn_delete(). This is caused by radix tree which is not MP-safe, however in this gif(4) case, I think more serious problem is receiving packet processing is not suspended nevertheless sending packet processing is. In our design, I think, if ioctl(writing configuration) begin, packet processing(reading configuration) should be suspended. So, I added above patch. # I think ip_encap may not use radix tree in the future. # If encap[46]_lookup() can eliminate linear search, it does not # need to use radix tree to fix many tunnels scaling issue. > The rest of the changes look pretty good except for one part: where > you commented > > /* check if anyone have already attached with exactly same config */ > > in encap_attach, you cannot rely on that check to *continue* to be > true after pserialize_read_exit: someone else might concurrently call > encap_attach. You will need, e.g., encap_lock to cover *all* of > encap_attach, so that there can be only one change to the set of > ip_encap tunnels at any given time. > > So perhaps this is what you want: > > - Caller must do encap_lock_enter/encap_attach/encap_lock_exit. > - encap_attach should kassert encap_lock_held. > - No need for pserialize read section in encap_attach. > - No need for *separate* encaptab.lock and encap_lock -- these locks > could be one and the same. (Of course, if the caller of encap_attach > holds encap_lock, it is not necessary for encap_add to acquire it.) > > This way, gif(4) can do > > 1. encap_lock_enter > 2. check gif configurations > 3. establish/disestablish softints > 4. encap_attach > 5. encap_lock_exit I see. I will integrate encaptab.lock and encap_lock, and implement above your design. The modification is the following patch. - 0010-reflect-reastradh-n.o-s-2016-03-02-review.patch Could you comment these patches? Thanks, -- ////////////////////////////////////////////////////////////////////// Internet Initiative Japan Inc. Device Engineering Section, Core Product Development Department, Product Division, Technology Unit Kengo NAKAHARA <k-nakah...@iij.ad.jp>