Re: [PATCH v18] exec: Fix dead-lock in de_thread with ptrace_attach

2025-11-20 Thread Eric W. Biederman
"Eric W. Biederman"  writes:

> Bernd Edlinger  writes:
>
>> This introduces signal->exec_bprm, which is used to
>> fix the case when at least one of the sibling threads
>> is traced, and therefore the trace process may dead-lock
>> in ptrace_attach, but de_thread will need to wait for the
>> tracer to continue execution.
>
> A small quibble it isn't a dead lock.  It isn't even really a live lock,
> as it is possible to SIGKILL our way out.
>
> Thinking about this there is a really silly and simple way we can deal
> with this situation for PTRACE_ATTACH.  We can send SIGSTOP and wait for
> the thread to stop before doing anything with cred_guard_mutex.
>
> PTRACE_ATTACH already implies sending SIGSTOP so as long as we have
> enough permissions to send SIGSTOP I don't see that being a problem.
>
> The worst case I can see is that we get a case where we stop the
> process, the permission check fails under cred_guard_mutex and
> and ptrace attach has fails and has to send SIGCONT to undo it's
> premature SIGSTOP.  That might almost be visible, but it would still
> be legitimate because we can still check that we have permission to
> send SIGSTOP.

Bah no I am full of it.

The challenging behavior is in the semantics of the kernel operations.
We need to describe it as such please.

It is the same class of problem as a single threaded process calls exec
with a pipe attached to both stdin and stdout of the new process.

For the stdin and stdout we can say just use pull and nonblocking I/O.

The problem is that both PTRACE_ATTACH and PTRACE_SEIZE block over
the duration of exec, and if exec is waiting for a thread to exit,
and that thread is blocked in PTRACE_EVENT_EXIT waiting for that very
same tracer those processes will hang. Not deadlock.


I haven't seen anyone clearly describe the problem lately so I am
repeating it.


Just looking at the code I don't think there is any fundamental reason
to call commit_creds after de_thread.  If we can change that we can sort
this out without any change in userspace semantics.

If we can't move commit_creds we have to either give
PTRACE_ATTACH/PTRACE_SEIZE a non-block mode, or break out of
PTRACE_EVENT_EXIT in de_thread.

I will post a proof of concept of moving commit_creds in just a minute.

Eric



Re: [PATCH v18] exec: Fix dead-lock in de_thread with ptrace_attach

2025-11-20 Thread Eric W. Biederman
Bernd Edlinger  writes:

> This introduces signal->exec_bprm, which is used to
> fix the case when at least one of the sibling threads
> is traced, and therefore the trace process may dead-lock
> in ptrace_attach, but de_thread will need to wait for the
> tracer to continue execution.

A small quibble it isn't a dead lock.  It isn't even really a live lock,
as it is possible to SIGKILL our way out.

Thinking about this there is a really silly and simple way we can deal
with this situation for PTRACE_ATTACH.  We can send SIGSTOP and wait for
the thread to stop before doing anything with cred_guard_mutex.

PTRACE_ATTACH already implies sending SIGSTOP so as long as we have
enough permissions to send SIGSTOP I don't see that being a problem.

The worst case I can see is that we get a case where we stop the
process, the permission check fails under cred_guard_mutex and
and ptrace attach has fails and has to send SIGCONT to undo it's
premature SIGSTOP.  That might almost be visible, but it would still
be legitimate because we can still check that we have permission to
send SIGSTOP.

Eric



[PATCH v18] exec: Fix dead-lock in de_thread with ptrace_attach

2025-11-18 Thread Bernd Edlinger
This introduces signal->exec_bprm, which is used to
fix the case when at least one of the sibling threads
is traced, and therefore the trace process may dead-lock
in ptrace_attach, but de_thread will need to wait for the
tracer to continue execution.

The problem happens when a tracer tries to ptrace_attach
to a multi-threaded process, that does an execve in one of
the threads at the same time, without doing that in a forked
sub-process.  That means: There is a race condition, when one
or more of the threads are already ptraced, but the thread
that invoked the execve is not yet traced.  Now in this
case the execve locks the cred_guard_mutex and waits for
de_thread to complete.  But that waits for the traced
sibling threads to exit, and those have to wait for the
tracer to receive the exit signal, but the tracer cannot
call wait right now, because it is waiting for the ptrace
call to complete, and this never does not happen.
The traced process and the tracer are now in a deadlock
situation, and can only be killed by a fatal signal.

The solution is to detect this situation and allow
ptrace_attach to continue by temporarily releasing the
cred_guard_mutex, while de_thread() is still waiting for
traced zombies to be eventually released by the tracer.
In the case of the thread group leader we only have to wait
for the thread to become a zombie, which may also need
co-operation from the tracer due to PTRACE_O_TRACEEXIT.

When a tracer wants to ptrace_attach a task that already
is in execve, we simply retry the ptrace_may_access
check with the new PTRACE_MODE_BPRMCREDS flag to check
that the tracer has permission to trace the new credentials
and dumpability which are about to be used after execve
completes.  If the ptrace_attach happens on a thread that
is a sibling-thread of the thread doing execve, it is
sufficient to check against the old credentials, as this
thread will be waited for, before the new credentials are
installed.

Other threads die quickly since the cred_guard_mutex is
released, but a deadly signal is already pending.  In case
the mutex_lock_killable misses the signal, the non-zero
current->signal->exec_bprm makes sure they release the
mutex immediately and return with -ERESTARTNOINTR.

This means there is no API change, unlike the previous
version of this patch which was discussed here:

https://lore.kernel.org/lkml/[email protected]/

See tools/testing/selftests/ptrace/vmaccess.c
for a test case that gets fixed by this change.

Note that since the test case was originally designed to
test the ptrace_attach returning an error in this situation,
the test expectation needed to be adjusted, to allow the
API to succeed at the first attempt.

Signed-off-by: Bernd Edlinger 
---
 fs/exec.c |  86 +++---
 fs/proc/base.c|  12 ++
 include/linux/cred.h  |   1 +
 include/linux/ptrace.h|   1 +
 include/linux/sched/signal.h  |  18 +++
 kernel/cred.c |  30 -
 kernel/ptrace.c   |  29 -
 kernel/seccomp.c  |  18 ++-
 security/apparmor/lsm.c   |   5 +-
 security/commoncap.c  |   5 +-
 security/landlock/task.c  |   7 +-
 security/selinux/hooks.c  |   7 +-
 security/smack/smack_lsm.c|   5 +-
 security/yama/yama_lsm.c  |  11 +-
 tools/testing/selftests/ptrace/vmaccess.c | 135 --
 15 files changed, 324 insertions(+), 46 deletions(-)

v10: Changes to previous version, make the PTRACE_ATTACH
return -EAGAIN, instead of execve return -ERESTARTSYS.
Added some lessions learned to the description.

v11: Check old and new credentials in PTRACE_ATTACH again without
changing the API.

Note: I got actually one response from an automatic checker to the v11 patch,

https://lore.kernel.org/lkml/[email protected]/

which is complaining about:

>> >> kernel/ptrace.c:425:26: sparse: sparse: incorrect type in assignment 
>> >> (different address spaces) @@ expected struct cred const *old_cred @@ 
>> >> got struct cred const [noderef] __rcu *real_cred @@

   417  struct linux_binprm *bprm = task->signal->exec_bprm;
   418  const struct cred *old_cred;
   419  struct mm_struct *old_mm;
   420  
   421  retval = 
down_write_killable(&task->signal->exec_update_lock);
   422  if (retval)
   423  goto unlock_creds;
   424  task_lock(task);
 > 425  old_cred = task->real_cred;

v12: Essentially identical to v11.

- Fixed a minor merge conflict in linux v5.17, and fixed the
above mentioned nit by adding __rcu to the declaration.

- re-tested the patch with all linux versions from v5.11 to v6.6

v10 was an alternative approach which did imp