On Thu, May 28, 2015 at 13:34 +0200, Alexandr Nedvedicky wrote:
> On Thu, May 28, 2015 at 11:43:02AM +0200, Mike Belopuhov wrote:
> > On Thu, May 28, 2015 at 01:17 +0200, Alexandr Nedvedicky wrote:
> > > Hello,
> > >
> > >
> > > On Wed, May 27, 2015 at 07:44:15PM +0200, Mike Belopuhov wrote:
> > > > On Wed, May 27, 2015 at 10:39 +0200, Alexandr Nedvedicky wrote:
> > > > > Hello,
> > > > >
> > > > > > > - if (pf_state_insert(BOUND_IFACE(r, pd->kif), skw, sks,
> > > > > > > s)) {
> > > > > > > - pf_state_key_detach(s, PF_SK_STACK);
> > > > > > > - pf_state_key_detach(s, PF_SK_WIRE);
> > > > > >
> > > > > > This bug is not yours, but doing two pf_state_key_detach is wrong
> > > > > > and results in all kinds of protection fault fireworks. The right
> > > > > > thing is to do pf_detach_state that would handle the case where
> > > > > > PF_SK_STACK == PF_SK_WIRE (i.e. we need only one invocation).
> > > > > good catch.
> > > > >
> > > > > >
> > > > > > In csfailed case we do pf_remove_src_node and then call
> > > > > > pf_src_tree_remove_state. With your diff this means that
> > > > > > pf_src_tree_remove_state will be dereferencing sni->sn that
> > > > > > might be pool_put by the pf_remove_src_node. Therefore their
> > > > > > order needs to be reversed.
> > > > >
> > > > > I see. Another option to fix it would be to do:
> > > > >
> > > > > sni->sn = sns[i];
> > > > > sns[i] = NULL;
> > > > >
> > > >
> > > > I'm not sure I follow. Where do you want to do it?
> > > > If it's when we pool_get sni's, then this would mean
> > > > we won't run pf_remove_src_node and cleanup source
> > > > nodes that might have been created.
> > > >
> > >
> > > sorry I was too brief. The snippet below comes from
> > > pf_create_state() with your patch applied:
> > >
> > > 3560 for (i = 0; i < PF_SN_MAX; i++)
> > > 3561 if (sns[i] != NULL) {
> > > 3562 struct pf_sn_item *sni;
> > > 3563
> > > 3564 sni = pool_get(&pf_sn_item_pl, PR_NOWAIT);
> > > 3565 if (sni == NULL) {
> > > 3566 REASON_SET(&reason, PFRES_MEMORY);
> > > 3567 goto csfailed;
> > > 3568 }
> > > 3569 sni->sn = sns[i];
> > > 3570 sns[i] = NULL;
> > > 3571 SLIST_INSERT_HEAD(&s->src_nodes, sni,
> > > next);
> > > 3572 sni->sn->states++;
> > > 3573 }
> > > 3574
> > >
> > > the point of my suggestion is to transfer ownership of source node from
> > > local variable sns[] to state. so the check performed later in csfailed
> > > at line 3617, will find NULL pointer in sns[i] field and won't attempt
> > > to call pf_remove_src_node().
> > >
> > > 3610 csfailed:
> > > 3611 if (s) {
> > > 3612 pf_normalize_tcp_cleanup(s); /* safe even w/o
> > > init */
> > > 3613 pf_src_tree_remove_state(s);
> > > 3614 }
> > > 3615
> > > 3616 for (i = 0; i < PF_SN_MAX; i++)
> > > 3617 if (sns[i] != NULL)
> > > 3618 pf_remove_src_node(sns[i]);
> > > 3619
> > >
> > >
> > > your patch updated by my suggestion is further below.
> > >
> >
> > But that introduces a potential leak, as I said in my mail:
> > we won't run pf_remove_src_node and cleanup source nodes that
> > might have been created.
> >
> > sns[] is an array of pointers to independently tracked objects
> > (inserted into the pf_src_tree).
> >
>
> yes you are right, I was completely wrong. Still my only concern is what
> happens in case we fail to allocate source node item (sni) for let's say
> PF_SN_NAT type, while there is PF_SN_ROUTE type in sns[] to be processed.
>
> I'm referring to line numbers above. So we fail to allocate PF_SN_NAT
> source node and taking goto at 3567, arriving to 3610.
>
> We do the right thing for state clean up (lines 3612, 3613). And now
> arriving to for loop 3616, which will remove all source nodes from
> sns[], including PF_SN_ROUTE. I completely agree we should do
> pf_remove_src_node(sns[PF_SN_NAT]) as we took reference at 3572 for it.
But we'll drop this "reference" in pf_src_tree_remove_state,
then how will sns[PF_SN_NAT] and sns[PF_SN_ROUTE] be different?
> I'm not sure if we should be calling pf_remove_src_node(sns[PF_SN_ROUTE])
> here since line 3572 was not executed for it, for loop at 3560 terminated
> prematurely. Probably introducing new for-loop counter 'j' in csfailed:
> will fix it:
> > > 3616 for (j = 0; j < i; j++)
>
sns[] array was prepared for this state so if we can't insert
the state, sns entries must be cleaned up. pf_remove_src_node
checks the number of associated states and if source node should
expire some time later.
Speaking of PF_SN_ROUTE, pf_set_rt_ifp should be probably called
before we insert the state for the very same reason, plus it
should check the pf_map_addr return value and do the cleanup.
>
> regards
> sasha
>