On Sat, Jan 09, 2016 at 08:25:05AM +0100, Mateusz Guzik wrote: > On Sat, Jan 09, 2016 at 02:25:02PM +0800, Paul Goyette wrote: > > On Sat, 9 Jan 2016, Mateusz Guzik wrote: > > > > >While I don't know all the details, it is clear that the purpose would > > >be much better served by ktrace and I would argue efforts should be > > >spent there. > > > > filemon's purpose is somewhat different than that of ktrace. There > > is a limited set of events that filemon captures, and it only > > captures successful operations. Its output is quite simple and easy > > to parse by a human of shell script. > > > > ktrace already have limited support for specifying what events are > wanted. One can add a special filemon-like mode. > > The output of kdump is definitely less friendly for scripts, but it is > way easier to write a script formatting it than it is to fix filemon. > > > >>From usability perspective what seems to be needed is few hacks to > > >ktrace to only log the stuff make is interested in and parsing code for > > >bmake to translate to what looks like current filemon format for > > >aformentioned copying. > > > > Perhaps. But filemon is also a run-time loadable module (and it is > > preferred that it NOT be built-in to any kernels). If I remember > > correctly, ktrace has hooks all over the system and does not readily > > lend itself to modularization. > > > > I doubt that compiling in ktrace on a which can handle filemon is an > issue, and even then it should outweight hops around filemon. > > For correct of a minimalistic safe filemon module the kernel would > already have to grow hooks in few places (see below). > > > > >Filemon monitors stuff at syscall level, stores the fd and remembers > > >pids. > > > > The fd is no longer being stored... :) But yes, it does still > > remember pids, so it can determine if an event belongs to a monitored > > process tree. Again IIRC, ktrace is somewhat more intrusive, in that > > it manipulates various process flags which can affect behavior of the > > target (monitored) process. > > It has a dedicated field in struct proc which is the only sensible way > of implementing this I can think of. > > > >> error = kauth_authorize_process(curproc->p_cred, > > >> KAUTH_PROCESS_CANSEE, tp, > > >> KAUTH_ARG(KAUTH_REQ_PROCESS_CANSEE_ENTRY), NULL, NULL); > > > > > >Now that the check is pased the process could have executed something > > >setuid. > > > > Which process could have executed something setuid? How would that > > matter? (I will confess I haven't checked, but I would be surprised > > if "executing something setuid" would change the result of kauth() > > check. But I could be wrong here.) > > > > You start monitoring given process and proceed to execve a setuid file, > which gives you information what it is doing, even though the > kauth_authorize_process check would fail. This is basically what you > yourself stated below about pid reuse. > > So this either needs to prevent processes from changing credentials when > traced or stop tracing when such a change occurs. The former seems like > a rather bad idea and latter is already done by ktrace. > > > >I'm somewhat surprised this code is here. Should not > > >kauth_authorize_process assert that tp has stable credentials in some > > >way, in effect just crashing filemon consumers? > > > > > >> if (!error) { > > >> filemon->fm_pid = tp->p_pid; > > >> } > > > > Why does the target process need stable credentials? It's not doing > > anything for filemon. > > > > See below. > > > >Pid is stored, but tp is not held. What if the 'traced' process exits? > > >grep only reveals the following bit in filemon_wrapper_sys_exit: > > > > > >> if (filemon->fm_pid == curproc->p_pid) > > >> filemon_printf(filemon, "# Bye bye\n"); > > > > > >So filemon will continue tracing anything which happened to reuse the > > >pid. > > > > Yes, this is an outstanding issue. More particularly, the new user > > of the pid might not be accessible by the monitoring process (ie, it > > would fail the earlier kauth() check). But holding the pointer to > > the proc isn't really the answer either - although less likely it is > > entirely possible that the address could have been reused. > > > > What you need is the ability to clear tracing bits on process exit. > ktrace definitely has relevant bits in place already. > > > (It's too bad we don't have "generation numbers" on the pid, which > > would easily detect reuse.) > > That would require struct proc to be type stable, which seems wasteful. > > > >> lp = p; > > >> p = p->p_pptr; > > >> rw_exit(&lp->p_reflock); > > > > > >I guess the lock is needed to read the pointer to the parent. > > > > Correct. > > > > >Nothing was done to keep the parent around, which means it could have > > >exited and be freed by now. Which in turn makes the second loop > > >iteration a use-after-free. > > > > Hmmm, this code used to work. Looks like the locking got messed up > > somewhere. It should do (in pseudo code - I'm too lazy to write it > > all out!) > > > > lock(p) > > while (p != NULL) { > > TAILQ_FOREACH(filemon) { > > if (p->p_pid == filemon->fm_pid) { > > release(p); > > return filemon; > > } > > } > > last_p = p; > > p = p->pptr; > > if (p != NULL) > > lock(p); > > release(last_p); > > } > > > > We should lock the new (parent) process before releasing the initial > > lock. > > > > I will fix this. > > > > Just locking the parent like that looks like a deadlock potential. > > I just checked both ptrace and exit code. The parent<->child > relationship is protected with proc_lock and p_lock. p_reflock happens > to be held in exit1 and also cover the area, but is not held in ptrace. > > Unclear what's the protocol here without proc_lock held. If the sample > from ptrace is to be trusted, it's locking the lower address first (and > thus trylocking the lower address if we already got the higher one). > > The sample from sys_ptrace: > > > struct proc *parent = t->p_pptr; > > > > if (parent->p_lock < t->p_lock) { > > if (!mutex_tryenter(parent->p_lock)) { > > mutex_exit(t->p_lock); > > mutex_enter(parent->p_lock); > > As a side note this looks like a bug. t->p_lock is not relocked, so the > code ends up without the lock held. There is a similar sample in fork1. > > > } > > } else if (parent->p_lock > t->p_lock) { > > mutex_enter(parent->p_lock); > > } > > parent->p_slflag |= PSL_CHTRACED; > > proc_reparent(t, p); > > if (parent->p_lock != t->p_lock) > > mutex_exit(parent->p_lock); > > > > That said, even if following p_pptr was the right thing to do, I don't > think this would get away without hacks or holding proc_lock. > > While the first iteration uses curproc and could reliably lock the > parent despite possible relocking, the need to keep lower < higher order > may meen you need to temporarily unlock the process, but for that to not > be a problem you need to hold the process and/or restart the loop > possibly with proc_mutex held. Looks very hairy. > > With a dedicated pointer stored in struct proc (like with ktrace) there > is no need to do any lookups in the first place, so probles like this > don't arose. > > > > > > >The first iteration is safe if the argument is curproc (which it is), as > > >it clearly cannot disappear. > > > > > >Turns out this traversal is even more wrong since p_pptr does not have > > >to be the process you are looking for - ptrace is reparenting tracees. > > > > If the process has been reparented, then its activity will no longer > > be monitored and reported by filemon. > > > > Sorry, I forgot to state the corner case. I can't test it, but if NetBSD > allows you to trace your parent, the loop above becomes infinite. > > Also you can get surprise events from ptrace tracees. > > So again, I highly doubt filemon in its current form can be made safe to > use while preserving any advantage over ktrace. > > The thing is there is filemon and bmake in FreeBSD as well. While I > don't have the time to modify ktrace for this purpose, I would definitely > help both projects if bmake started moving away from filemon. >
Interesting typo. I meant to write: it would help both projects, not that I would do it. Still, I'm happy to rant. > However, this is just my $0,03 as I don't contribute to this project. > -- Mateusz Guzik <mjguzik gmail.com>