On Mon, Jun 10, 2019 at 11:46:55AM -0300, Martin Pieuchot wrote:
> On 10/06/19(Mon) 09:29, Alexandr Nedvedicky wrote:
> > Hello,
> >
> > sorry for extra delay (was off-line over the weekend).
> >
> > On Sat, Jun 08, 2019 at 09:46:24PM +1000, Jonathan Matthew wrote:
> > > On Tue, Jun 04, 2019 at 01:50:51AM +0200, Alexandr Nedvedicky wrote:
> > > > Hello,
> > > >
> > > > I've managed to get pf_test() running in parallel on forwarding path in
> > > > my
> > > > experimental tree. And there was some fall out. PF died on ASSERT() in
> > > > pf_state_key_link_reverse() at line 7371:
> > > >
> > > > 7368 pf_state_key_link_reverse(struct pf_state_key *sk, ...)
> > > > 7369 {
> > > > 7370 /* Note that sk and skrev may be equal, then we ... */
> > > > 7371 KASSERT(sk->reverse == NULL);
> > > > 7372 KASSERT(skrev->reverse == NULL);
> > > > 7373 sk->reverse = pf_state_key_ref(skrev);
> > > > 7374 skrev->reverse = pf_state_key_ref(sk);
> > > > 7375 }
> > > >
> > > >
> > > > pf_state_key_link_reverse() function is being called from
> > > > pf_find_state() here:
> > > >
> > > > 1074 if (sk == NULL) {
> > > > 1075 if ((sk = RB_FIND(pf_state_tree, &pf_statetbl,
> > > > 1076 (struct pf_state_key *)key)) == NULL)
> > > > 1077 return (PF_DROP);
> > > > 1078 if (pd->dir == PF_OUT && pkt_sk &&
> > > > 1079 pf_compare_state_keys(pkt_sk, sk, ...) ==
> > > > 0)
> > > > 1080 pf_state_key_link_reverse(sk, pkt_sk);
> > > > 1081 else if (pd->dir == PF_OUT &&
> > > > pd->m->m_pkthdr.pf.inp &&
> > > > 1082 !pd->m->m_pkthdr.pf.inp->inp_pf_sk &&
> > > > !sk->inp)
> > > > 1083 pf_state_key_link_inpcb(sk,
> > > > pd->m->m_pkt...);
> > > > 1084 }
> > > >
> > > > the pf_find_state() performs a look up to PF state table and as such it
> > > > runs
> > > > under PF_STATE_ENTER_READ() lock. So there might be actually more
> > > > threads
> > > > running at pf_state_key_link_reverse() function. The thread, which
> > > > currently
> > > > loses the race trips the assert in 7371. Patch below uses atomic ops
> > > > to handle the race properly.
> > >
> > > Does this mean you're ending up with packets from a single flow being
> > > processed
> > > on multiple threads at the same time?
> >
> > yes that's correct. the patch above comes from my private branch [1].
> > mpi@ pointed out in off-line email exchange the patch unlocks local
> > inbound
> > packets too, which is coming bit early. However for forwarding case
> > things
> > seem to work (given all the panics I could see and fix).
>
> The way I understand Jono's question is: "Do we want this locking?". If
> we decide that a single flow will always be processed by a single CPU,
> is this necessary? If it isn't should we add an assert to document this
> design decision?
That's why I was asking. I'm not sure we want to make that guarantee, or at
least not until we have a better idea of how rx hashing and things like tunnel
protocols and nat interact. Using atomics here seems like a reasonable way to
defer that and allow progress in pf.
Are there other side-effects from pf_test() that we'll have to worry about?