On Thu, Apr 27, 2023 at 02:54:38PM +0200, Claudio Jeker wrote:
> On Thu, Apr 27, 2023 at 01:55:33PM +0300, Vitaliy Makkoveev wrote:
> > Now only direct netlock used for inet sockets protection. The unlocked
> > access to all other sockets is safe, but we could lost consistency for a
> > little. Since the solock() used for sockets protection, make locking
> > path common and use it. Make it shared because this is read-only access
> > to sockets data. For same reason use shared netlock while performing
> > inpcb tables foreach walkthrough.
>
> This is still wrong. fill_file() is not allowed to sleep. So you can not
> use a rwlock in here. fill_file is called inside a allprocess
> LIST_FOREACH loop and sleeping there will blow up.
Yes.
> Maybe it is enough to grab a shared NETLOCK around the LIST_FOREACH() to
> ensure that we don't sleep inside.
I have tired to fix this for years. It is not easy. Taking the
net lock to early results in a lock ordering problem. Down in
fill_file() fdplock() is called:
#define fdplock(fdp) NET_ASSERT_UNLOCKED(); rw_enter_write(&(fdp)->fd_lock);
This guarantees lock ordering fdlock before netlock. You probably
run into NFS problems if you want to relax that.
> Please fix this properly. Right now running fstat is like playing russian
> roulette.
Shared netlock is possible russian roulette with less bullets
compared to exclusive netlock. For network sockets pcb mutex and
pru_lock the situation gets better. For non-network sockets this
diff does rw_enter_write(&so->so_lock). Do we need that? It means
more sleeping points and more opportunity to crash.
bluhm
> > Index: sys/kern/kern_sysctl.c
> > ===================================================================
> > RCS file: /cvs/src/sys/kern/kern_sysctl.c,v
> > retrieving revision 1.411
> > diff -u -p -r1.411 kern_sysctl.c
> > --- sys/kern/kern_sysctl.c 22 Jan 2023 12:05:44 -0000 1.411
> > +++ sys/kern/kern_sysctl.c 27 Apr 2023 10:18:01 -0000
> > @@ -1173,14 +1173,11 @@ fill_file(struct kinfo_file *kf, struct
> >
> > if (so == NULL) {
> > so = (struct socket *)fp->f_data;
> > + solock_shared(so);
> > + locked = 1;
> > + } else {
> > /* if so is passed as parameter it is already locked */
> > - switch (so->so_proto->pr_domain->dom_family) {
> > - case AF_INET:
> > - case AF_INET6:
> > - NET_LOCK();
> > - locked = 1;
> > - break;
> > - }
> > + soassertlocked(so);
> > }
> >
> > kf->so_type = so->so_type;
> > @@ -1203,14 +1200,13 @@ fill_file(struct kinfo_file *kf, struct
> > kf->so_splicelen = -1;
> > if (so->so_pcb == NULL) {
> > if (locked)
> > - NET_UNLOCK();
> > + sounlock_shared(so);
> > break;
> > }
> > switch (kf->so_family) {
> > case AF_INET: {
> > struct inpcb *inpcb = so->so_pcb;
> >
> > - NET_ASSERT_LOCKED();
> > if (show_pointers)
> > kf->inp_ppcb = PTRTOINT64(inpcb->inp_ppcb);
> > kf->inp_lport = inpcb->inp_lport;
> > @@ -1232,7 +1228,6 @@ fill_file(struct kinfo_file *kf, struct
> > case AF_INET6: {
> > struct inpcb *inpcb = so->so_pcb;
> >
> > - NET_ASSERT_LOCKED();
> > if (show_pointers)
> > kf->inp_ppcb = PTRTOINT64(inpcb->inp_ppcb);
> > kf->inp_lport = inpcb->inp_lport;
> > @@ -1279,7 +1274,7 @@ fill_file(struct kinfo_file *kf, struct
> > }
> > }
> > if (locked)
> > - NET_UNLOCK();
> > + sounlock_shared(so);
> > break;
> > }
> >
> > @@ -1375,7 +1370,7 @@ sysctl_file(int *name, u_int namelen, ch
> > if (arg == DTYPE_SOCKET) {
> > struct inpcb *inp;
> >
> > - NET_LOCK();
> > + NET_LOCK_SHARED();
> > mtx_enter(&tcbtable.inpt_mtx);
> > TAILQ_FOREACH(inp, &tcbtable.inpt_queue, inp_queue)
> > FILLSO(inp->inp_socket);
> > @@ -1395,7 +1390,7 @@ sysctl_file(int *name, u_int namelen, ch
> > FILLSO(inp->inp_socket);
> > mtx_leave(&rawin6pcbtable.inpt_mtx);
> > #endif
> > - NET_UNLOCK();
> > + NET_UNLOCK_SHARED();
> > }
> > fp = NULL;
> > while ((fp = fd_iterfile(fp, p)) != NULL) {
> >
>
> --
> :wq Claudio