Re: RFR 7162400: Intermittent java.io.IOException: Bad file number during HotSpotVirtualMachine.executeCommand

Mikael Gerdin Tue, 09 Jul 2013 10:13:48 -0700

On 07/09/2013 05:25 PM, Peter Allwin wrote:

Mikael,


That's a good point, unfortunately attach uses os::get_temp_directory which
is hardcoded to use /tmp. We could add a whitebox API to allow us to
override this but now we're on the border to noreg-hard land again IMO.

Any other opinions on this?

Can you use the "-XX:+PauseAtStartup" vm flag it will create avm.paused.<pid> file in the current work directory. You could extractthe pid, touch the correct attach file in /tmp and then remove thevm.paused to let the VM resume.


I didn't check if PauseAtStartup stops the VM early enough though.

An alternate, even more hacky approach is to do something like (in bash):
(bash -c 'echo $$; touch /tmp/.java_pid$$; exec java -version')

Where you can extract the pid of the subshell process with $$ and thenexec into the java launcher and keep the same pid (at least on Linux,unsure about the Solaris launcher).


/Mikael



Thanks!

/peter

-----Original Message-----
From: Mikael Gerdin [mailto:mikael.ger...@oracle.com]
Sent: Tuesday, July 9, 2013 2:49 PM
To: Peter Allwin
Cc: serguei.spit...@oracle.com; daniel.daughe...@oracle.com;
serviceability-dev@openjdk.java.net; hotspot-runtime-
d...@openjdk.java.net
Subject: Re: RFR 7162400: Intermittent java.io.IOException: Bad file

number

during HotSpotVirtualMachine.executeCommand

Peter,

On 2013-07-09 14:25, Peter Allwin wrote:

Hello!

It is reproducible by letting the test create .java_pid* files for all
possible process id's on the system, setting correct access flags,
launching the target VM and attempting to connect. There are some
caveats though but it should be doable.

I'll convert the repro script to JTREG and add it to the webrev.


It's probably not a good idea to have a test which taints the system with

stale

.java_pid* files.
If the test execution times out and the script isn't allowed to clean up I
imagine that other subsequent executions could fail.
Is there a way to tell the attach api to use a specific directory so you

won't

need to taint /tmp?

/Mikael


Thanks for the reviews!

/peter

*From:*serguei.spit...@oracle.com [mailto:serguei.spit...@oracle.com]
*Sent:* Tuesday, July 9, 2013 1:26 AM
*To:* daniel.daughe...@oracle.com
*Cc:* Peter Allwin; serviceability-dev@openjdk.java.net;
hotspot-runtime-...@openjdk.java.net
*Subject:* Re: RFR 7162400: Intermittent java.io.IOException: Bad file
number during HotSpotVirtualMachine.executeCommand

Ok, thanks!

Peter, did you manage to reproduce this issue with your script?
If so, then, please, include it into the bug report and remove the
"noreg-sqe" label.

It is Ok if you did not reproduce it, though.

Thanks,
Serguei


On 7/8/13 4:20 PM, Daniel D. Daugherty wrote:

     I definitely don't insist... :-)

     BTW, I noticed this in Peter's e-mail:

     > Testing:
     > JPRT, reproducing script on Solaris, Linux.

     so maybe Peter already has this covered with "reproducing script"...

     Dan

     On 7/8/13 5:07 PM, serguei.spit...@oracle.com
     <mailto:serguei.spit...@oracle.com> wrote:

         Dan,

         Dan, thank you for the recommendation.
         But I'm still not sure it is a right thing to do.
         Even though, there are multiple test cases associated with this
         bug they
         can not be used to verify that fix because an additional

condition

         must be present as well.
         This condition is a presence of stale door file which is not
         that easy to reproduce.

         However, if you insist then I can change the lable to the
         "noreg-sqe"
         with the corresponding comment.

         Thanks,
         Serguei


         On 7/8/13 3:46 PM, Daniel D. Daugherty wrote:

             Serguei,

             There are a number of existing tests associated with this
             bug. I don't
             think that 'noreg-hard' is the right label. I think
             'noreg-sqe' is
             the right one:

             noreg-sqe
                  Change can be verified by running an existing SQE test
             suite; the bug
                  should identify the suite and the specific test

case(s).


             Dan

             On 7/8/13 12:59 PM, serguei.spit...@oracle.com
             <mailto:serguei.spit...@oracle.com> wrote:

                 Peter,

                 I've added the label "noreg-hard" with the comment to
                 the report.
                 It is not easy to reproduce the issue and demonstrate
                 the fix in a regression test.

                 Thanks,
                 Serguei


                 On 7/8/13 11:36 AM, serguei.spit...@oracle.com
                 <mailto:serguei.spit...@oracle.com> wrote:

                     Hi Peter,

                     The fix looks good.

                     Thanks,
                     Serguei

                     On 7/8/13 6:54 AM, Peter Allwin wrote:

                         Hello!

                         Looking for reviews of this change:

http://cr.openjdk.java.net/~allwin/7162400/webrev.01/


<http://cr.openjdk.java.net/%7Eallwin/7162400/webrev.01/>

                         For CR:

                         http://bugs.sun.com/view_bug.do?bug_id=7162400

                         https://jbs.oracle.com/bugs/browse/JDK-7162400

                         Summary:

                         This change addresses an issue in the Attach API
                         on Solaris, Linux and BSD where an attaching
                         application can receive IOExceptions such as
                         "Bad file number" (Solaris), "Connection
                         refused" (Linux/BSD), or "well-known file is not
                         secure".

                         The attach process uses a file in the temporary
                         directory as a door (Solaris) or domain socket
                         (Linux,BSD) to communicate with the VM. In
                         certain circumstances stale files can be left in
                         the file system which can cause the attaching
                         application to believe that the VM is ready to
                         receive a connection when it's not. With this
                         change the stale file will be removed during VM
                         startup.

                         Note that there is still an issue if we don't
                         have permission to remove the stale file, the
                         attaching process will fail to connect.

                         Testing:

                         JPRT, reproducing script on Solaris, Linux.

                         Credits:

                         Thanks to Staffan Larsen who worked on this
                         issue with me.

                         Regards,


                         Peter

Re: RFR 7162400: Intermittent java.io.IOException: Bad file number during HotSpotVirtualMachine.executeCommand

Reply via email to