> Date: Fri, 24 Jan 2025 10:21:33 +0100 > From: Peter Skvarka <p...@softinengines.com> > > Unfortunately I am not able to create minimal sample. > It happens in wxWidgets library function wxExecute() used by Codeblocks > IDE and I can reproduce it only with started codeblocks process. The > code of wxExecute() is complicated and it is problem to extract only > essential parts and create minimal reproducer.
You could try ktracing the process to record its system calls: 1. Start up the Codeblocks IDE and find its pid, say 12345. 2. In a terminal, run `ktrace -p 12345'. 3. Trigger the hang. 4. In the terminal, run `ktrace -C' to stop tracing. 5. Run `kdump' to print the trace of syscalls. > I have this additional info: Forked parent retrieves child output > through pipe and it waits for child finishing with select() infinitely > and child stays zombie - I am seeing flag Z in ps -auxd list. > I checked command line arguments passed to execv(), it is ok ("gcc > -dumpversion") It's hard to tell from just this information, but one possibility is that the parent and child are communicating through a pipe, and the parent has kept the writing side of the pipe open after forking the child. Suppose the parent is waiting in select() for the reading side of the pipe to be ready, and the child exits. If the child had the last descriptor for the writing side, then the parent would wake up -- but if the parent also has a descriptor for the writing side, and the parent isn't handling SIGCHLD, then select() might wait forever in a deadlock. > Child's C++ code from fork() to execv() does not uses pthread > synchronization objects, it only prepares pipes and calls execv(). > So it is question if reason is bad using of fork() in multithreaded > application, or for example bad usage of pipes or something other. I > think that it is NetBSD specific thing because of no similar report on > other os-es. This is most likely an application error. I wouldn't be surprised if there are nefarious locks lurking underneath some innocent-looking C++ tokens. And I wouldn't be surprised if there's a mistake in handling file descriptors and child waits. The application could be accidentally relying on the way other operating systems implement some kind of undefined behaviour it triggers. But there's too little information to say so far. > Currently I have debug built and I am trying to retrieve more info why > forked child stays zombie. > Do you think that is possible to diagnose or to debug with gdb phase of > changing state to zombie ? > Or to look on some child's process structures > What is real reason for zombie state ? Can it be holding of not closed > pipe or file ? Or it can stay zombie > when it is terminated from another process ? Is possible to investigate > what resource is not freed by process which can be reason for entering > into zombie state ? When a process terminates, it becomes a zombie process until the parent calls one of the wait() family of system calls. This is the basic mechanism in Unix for managing processes; if you're not familiar with the Unix process life cycle, you might want to find a tutorial on Unix processes, like maybe the Stevens book (Advanced Programming in the Unix Environment, 1992).