Hello, Thank you everyone for the feedback, I've incorporated the recommendations into a new revision:
http://cr.openjdk.java.net/~allwin/7162400/webrev.02/ Changes: - Fixed speling misteaks - Added jtreg regression test using Mikael's excellent suggestion of -XX:+PauseAtStartup, tested locally on linux and solaris. - Reverted use of MAX_PATH+1 vs. UNIX_MAX_PATH Also thanks to Christian Törnqvist for helping out with the jtreg test! /peter > -----Original Message----- > From: Mikael Gerdin [mailto:mikael.ger...@oracle.com] > Sent: Tuesday, July 9, 2013 7:13 PM > To: Peter Allwin > Cc: serviceability-dev@openjdk.java.net; hotspot-runtime- > d...@openjdk.java.net > Subject: Re: RFR 7162400: Intermittent java.io.IOException: Bad file number > during HotSpotVirtualMachine.executeCommand > > On 07/09/2013 05:25 PM, Peter Allwin wrote: > > Mikael, > > > > That's a good point, unfortunately attach uses os::get_temp_directory > > which is hardcoded to use /tmp. We could add a whitebox API to allow > > us to override this but now we're on the border to noreg-hard land again > IMO. > > > > Any other opinions on this? > > Can you use the "-XX:+PauseAtStartup" vm flag it will create a > vm.paused.<pid> file in the current work directory. You could extract the pid, > touch the correct attach file in /tmp and then remove the vm.paused to let > the VM resume. > > I didn't check if PauseAtStartup stops the VM early enough though. > > An alternate, even more hacky approach is to do something like (in bash): > (bash -c 'echo $$; touch /tmp/.java_pid$$; exec java -version') Where you > can extract the pid of the subshell process with $$ and then exec into the > java launcher and keep the same pid (at least on Linux, unsure about the > Solaris launcher). > > /Mikael > > > > > > > Thanks! > > > > /peter > > > >> -----Original Message----- > >> From: Mikael Gerdin [mailto:mikael.ger...@oracle.com] > >> Sent: Tuesday, July 9, 2013 2:49 PM > >> To: Peter Allwin > >> Cc: serguei.spit...@oracle.com; daniel.daughe...@oracle.com; > >> serviceability-dev@openjdk.java.net; hotspot-runtime- > >> d...@openjdk.java.net > >> Subject: Re: RFR 7162400: Intermittent java.io.IOException: Bad file > > number > >> during HotSpotVirtualMachine.executeCommand > >> > >> Peter, > >> > >> On 2013-07-09 14:25, Peter Allwin wrote: > >>> Hello! > >>> > >>> It is reproducible by letting the test create .java_pid* files for > >>> all possible process id's on the system, setting correct access > >>> flags, launching the target VM and attempting to connect. There are > >>> some caveats though but it should be doable. > >>> > >>> I'll convert the repro script to JTREG and add it to the webrev. > >> > >> It's probably not a good idea to have a test which taints the system > >> with > > stale > >> .java_pid* files. > >> If the test execution times out and the script isn't allowed to clean > >> up I imagine that other subsequent executions could fail. > >> Is there a way to tell the attach api to use a specific directory so > >> you > > won't > >> need to taint /tmp? > >> > >> /Mikael > >> > >>> > >>> Thanks for the reviews! > >>> > >>> /peter > >>> > >>> *From:*serguei.spit...@oracle.com > >>> [mailto:serguei.spit...@oracle.com] > >>> *Sent:* Tuesday, July 9, 2013 1:26 AM > >>> *To:* daniel.daughe...@oracle.com > >>> *Cc:* Peter Allwin; serviceability-dev@openjdk.java.net; > >>> hotspot-runtime-...@openjdk.java.net > >>> *Subject:* Re: RFR 7162400: Intermittent java.io.IOException: Bad > >>> file number during HotSpotVirtualMachine.executeCommand > >>> > >>> Ok, thanks! > >>> > >>> Peter, did you manage to reproduce this issue with your script? > >>> If so, then, please, include it into the bug report and remove the > >>> "noreg-sqe" label. > >>> > >>> It is Ok if you did not reproduce it, though. > >>> > >>> Thanks, > >>> Serguei > >>> > >>> > >>> On 7/8/13 4:20 PM, Daniel D. Daugherty wrote: > >>> > >>> I definitely don't insist... :-) > >>> > >>> BTW, I noticed this in Peter's e-mail: > >>> > >>> > Testing: > >>> > JPRT, reproducing script on Solaris, Linux. > >>> > >>> so maybe Peter already has this covered with "reproducing script"... > >>> > >>> Dan > >>> > >>> On 7/8/13 5:07 PM, serguei.spit...@oracle.com > >>> <mailto:serguei.spit...@oracle.com> wrote: > >>> > >>> Dan, > >>> > >>> Dan, thank you for the recommendation. > >>> But I'm still not sure it is a right thing to do. > >>> Even though, there are multiple test cases associated with this > >>> bug they > >>> can not be used to verify that fix because an additional > > condition > >>> must be present as well. > >>> This condition is a presence of stale door file which is not > >>> that easy to reproduce. > >>> > >>> However, if you insist then I can change the lable to the > >>> "noreg-sqe" > >>> with the corresponding comment. > >>> > >>> Thanks, > >>> Serguei > >>> > >>> > >>> On 7/8/13 3:46 PM, Daniel D. Daugherty wrote: > >>> > >>> Serguei, > >>> > >>> There are a number of existing tests associated with this > >>> bug. I don't > >>> think that 'noreg-hard' is the right label. I think > >>> 'noreg-sqe' is > >>> the right one: > >>> > >>> noreg-sqe > >>> Change can be verified by running an existing SQE test > >>> suite; the bug > >>> should identify the suite and the specific test > > case(s). > >>> > >>> Dan > >>> > >>> On 7/8/13 12:59 PM, serguei.spit...@oracle.com > >>> <mailto:serguei.spit...@oracle.com> wrote: > >>> > >>> Peter, > >>> > >>> I've added the label "noreg-hard" with the comment to > >>> the report. > >>> It is not easy to reproduce the issue and demonstrate > >>> the fix in a regression test. > >>> > >>> Thanks, > >>> Serguei > >>> > >>> > >>> On 7/8/13 11:36 AM, serguei.spit...@oracle.com > >>> <mailto:serguei.spit...@oracle.com> wrote: > >>> > >>> Hi Peter, > >>> > >>> The fix looks good. > >>> > >>> Thanks, > >>> Serguei > >>> > >>> On 7/8/13 6:54 AM, Peter Allwin wrote: > >>> > >>> Hello! > >>> > >>> Looking for reviews of this change: > >>> > >>> > > http://cr.openjdk.java.net/~allwin/7162400/webrev.01/ > >>> > >>> <http://cr.openjdk.java.net/%7Eallwin/7162400/webrev.01/> > >>> > >>> For CR: > >>> > >>> > >>> http://bugs.sun.com/view_bug.do?bug_id=7162400 > >>> > >>> > >>> https://jbs.oracle.com/bugs/browse/JDK-7162400 > >>> > >>> Summary: > >>> > >>> This change addresses an issue in the Attach API > >>> on Solaris, Linux and BSD where an attaching > >>> application can receive IOExceptions such as > >>> "Bad file number" (Solaris), "Connection > >>> refused" (Linux/BSD), or "well-known file is not > >>> secure". > >>> > >>> The attach process uses a file in the temporary > >>> directory as a door (Solaris) or domain socket > >>> (Linux,BSD) to communicate with the VM. In > >>> certain circumstances stale files can be left in > >>> the file system which can cause the attaching > >>> application to believe that the VM is ready to > >>> receive a connection when it's not. With this > >>> change the stale file will be removed during VM > >>> startup. > >>> > >>> Note that there is still an issue if we don't > >>> have permission to remove the stale file, the > >>> attaching process will fail to connect. > >>> > >>> Testing: > >>> > >>> JPRT, reproducing script on Solaris, Linux. > >>> > >>> Credits: > >>> > >>> Thanks to Staffan Larsen who worked on this > >>> issue with me. > >>> > >>> Regards, > >>> > >>> > >>> Peter > >>> > >