Mark,
On 11/26/20 05:14, Mark Thomas wrote:
On 26/11/2020 04:57, Christopher Schultz wrote:
<snip/>
After a normal clean-up the parent then calls close on the two file
descriptors associated with the pipe for a second time."
So the child cleans them up AND the parent cleans them up? Or the parent
cleans when up twice? The child should be able to call close() as many
times as it wants and only poison itself. Does the child process ever
exit()?
With the caveat that some of the below is educated guess work because
the strace was configured to look at the events we were interested in so
I am having to fill in some of the gaps.
The parent "process" is a Java thread currently in native code in a 3rd
party library.
The parent creates a pipe which comes with two file descriptors. One for
the read end, one for the write end.
The parent process then forks. The child process now has copies of the
two file descriptors. (see man fork for more details).
The parent closes its fd for the write end of the pipe. The child closes
its fd for the read end of the pipe.
The child writes to the pipe and the parent reads from it.
The child exits and closes its fd for the write end of the pipe.
The parent closes its fd for the read end of the pipe.
At this point all is good. All the closes completely cleanly. Everything
has been shutdown properly.
+1
The two fds allocated to the parent are back in the pool any may be
reused by other threads in the JVM.
The parent then attempts to close the fds associated with the pipe
again. For each fd, if it has not been reallocated an EBADF error
occurs. If it has been reallocated, it is closed thereby breaking
whatever was using it legitimately.
Thanks for clarifying this. I was confused and thinking you were saying
that the child process was the one breaking things, but it's the parent
process. Since the parent is the JVM (the long-running process), all
hell breaks loose.
The parent process must be the JVM process, right? And the parent
process (this native library, running within the JVM process)
double-closes file descriptors, with some measurable delay?
Correct. In the instance where I did most of the log analysis the delay
was about 0.1 seconds. In other logs I did observe longer delays with
what looked like a very different failure mode.
That's the
only way this could make sense. And of course it mess mess everything up
in really *really* unpredictable ways.
Yep.
Fascinating.
Thanks for the wild ride, Eric and Mark :)
-chris
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org