On Thu, Apr 27, 2023 at 02:54:38PM +0200, Claudio Jeker wrote:
> On Thu, Apr 27, 2023 at 01:55:33PM +0300, Vitaliy Makkoveev wrote:
> > Now only direct netlock used for inet sockets protection. The unlocked
> > access to all other sockets is safe, but we could lost consistency for a
> > little. Since the solock() used for sockets protection, make locking
> > path common and use it. Make it shared because this is read-only access
> > to sockets data. For same reason use shared netlock while performing
> > inpcb tables foreach walkthrough.
> 
> This is still wrong. fill_file() is not allowed to sleep. So you can not
> use a rwlock in here. fill_file is called inside a allprocess
> LIST_FOREACH loop and sleeping there will blow up.

Yes.

> Maybe it is enough to grab a shared NETLOCK around the LIST_FOREACH() to
> ensure that we don't sleep inside.

I have tired to fix this for years.  It is not easy.  Taking the
net lock to early results in a lock ordering problem.  Down in
fill_file() fdplock() is called:

#define fdplock(fdp) NET_ASSERT_UNLOCKED(); rw_enter_write(&(fdp)->fd_lock);

This guarantees lock ordering fdlock before netlock.  You probably
run into NFS problems if you want to relax that.

> Please fix this properly. Right now running fstat is like playing russian
> roulette.

Shared netlock is possible russian roulette with less bullets
compared to exclusive netlock.  For network sockets pcb mutex and
pru_lock the situation gets better.  For non-network sockets this
diff does rw_enter_write(&so->so_lock).  Do we need that?  It means
more sleeping points and more opportunity to crash.

bluhm

> > Index: sys/kern/kern_sysctl.c
> > ===================================================================
> > RCS file: /cvs/src/sys/kern/kern_sysctl.c,v
> > retrieving revision 1.411
> > diff -u -p -r1.411 kern_sysctl.c
> > --- sys/kern/kern_sysctl.c  22 Jan 2023 12:05:44 -0000      1.411
> > +++ sys/kern/kern_sysctl.c  27 Apr 2023 10:18:01 -0000
> > @@ -1173,14 +1173,11 @@ fill_file(struct kinfo_file *kf, struct 
> >  
> >             if (so == NULL) {
> >                     so = (struct socket *)fp->f_data;
> > +                   solock_shared(so);
> > +                   locked = 1;
> > +           } else {
> >                     /* if so is passed as parameter it is already locked */
> > -                   switch (so->so_proto->pr_domain->dom_family) {
> > -                   case AF_INET:
> > -                   case AF_INET6:
> > -                           NET_LOCK();
> > -                           locked = 1;
> > -                           break;
> > -                   }
> > +                   soassertlocked(so);
> >             }
> >  
> >             kf->so_type = so->so_type;
> > @@ -1203,14 +1200,13 @@ fill_file(struct kinfo_file *kf, struct 
> >                     kf->so_splicelen = -1;
> >             if (so->so_pcb == NULL) {
> >                     if (locked)
> > -                           NET_UNLOCK();
> > +                           sounlock_shared(so);
> >                     break;
> >             }
> >             switch (kf->so_family) {
> >             case AF_INET: {
> >                     struct inpcb *inpcb = so->so_pcb;
> >  
> > -                   NET_ASSERT_LOCKED();
> >                     if (show_pointers)
> >                             kf->inp_ppcb = PTRTOINT64(inpcb->inp_ppcb);
> >                     kf->inp_lport = inpcb->inp_lport;
> > @@ -1232,7 +1228,6 @@ fill_file(struct kinfo_file *kf, struct 
> >             case AF_INET6: {
> >                     struct inpcb *inpcb = so->so_pcb;
> >  
> > -                   NET_ASSERT_LOCKED();
> >                     if (show_pointers)
> >                             kf->inp_ppcb = PTRTOINT64(inpcb->inp_ppcb);
> >                     kf->inp_lport = inpcb->inp_lport;
> > @@ -1279,7 +1274,7 @@ fill_file(struct kinfo_file *kf, struct 
> >                 }
> >             }
> >             if (locked)
> > -                   NET_UNLOCK();
> > +                   sounlock_shared(so);
> >             break;
> >         }
> >  
> > @@ -1375,7 +1370,7 @@ sysctl_file(int *name, u_int namelen, ch
> >             if (arg == DTYPE_SOCKET) {
> >                     struct inpcb *inp;
> >  
> > -                   NET_LOCK();
> > +                   NET_LOCK_SHARED();
> >                     mtx_enter(&tcbtable.inpt_mtx);
> >                     TAILQ_FOREACH(inp, &tcbtable.inpt_queue, inp_queue)
> >                             FILLSO(inp->inp_socket);
> > @@ -1395,7 +1390,7 @@ sysctl_file(int *name, u_int namelen, ch
> >                             FILLSO(inp->inp_socket);
> >                     mtx_leave(&rawin6pcbtable.inpt_mtx);
> >  #endif
> > -                   NET_UNLOCK();
> > +                   NET_UNLOCK_SHARED();
> >             }
> >             fp = NULL;
> >             while ((fp = fd_iterfile(fp, p)) != NULL) {
> > 
> 
> -- 
> :wq Claudio

Reply via email to