Peter, As UNIX_PATH_MAX is sort (108 bytes) we *must* check return value of snprintf (e.g. ll.446 of attachListener_linux.cpp) and make sure it's less or equal UNIX_PATH_MAX, otherwise we unlink wrong file in case of UNIX_PATH_MAX overflow.
-Dmitry On 2013-07-11 18:14, Peter Allwin wrote: > Hello, > > Thank you everyone for the feedback, I've incorporated the recommendations > into a new revision: > > http://cr.openjdk.java.net/~allwin/7162400/webrev.02/ > > Changes: > - Fixed speling misteaks > - Added jtreg regression test using Mikael's excellent suggestion of > -XX:+PauseAtStartup, tested locally on linux and solaris. > - Reverted use of MAX_PATH+1 vs. UNIX_MAX_PATH > > > Also thanks to Christian Törnqvist for helping out with the jtreg test! > > /peter > >> -----Original Message----- >> From: Mikael Gerdin [mailto:mikael.ger...@oracle.com] >> Sent: Tuesday, July 9, 2013 7:13 PM >> To: Peter Allwin >> Cc: serviceability-dev@openjdk.java.net; hotspot-runtime- >> d...@openjdk.java.net >> Subject: Re: RFR 7162400: Intermittent java.io.IOException: Bad file > number >> during HotSpotVirtualMachine.executeCommand >> >> On 07/09/2013 05:25 PM, Peter Allwin wrote: >>> Mikael, >>> >>> That's a good point, unfortunately attach uses os::get_temp_directory >>> which is hardcoded to use /tmp. We could add a whitebox API to allow >>> us to override this but now we're on the border to noreg-hard land again >> IMO. >>> >>> Any other opinions on this? >> >> Can you use the "-XX:+PauseAtStartup" vm flag it will create a >> vm.paused.<pid> file in the current work directory. You could extract the > pid, >> touch the correct attach file in /tmp and then remove the vm.paused to let >> the VM resume. >> >> I didn't check if PauseAtStartup stops the VM early enough though. >> >> An alternate, even more hacky approach is to do something like (in bash): >> (bash -c 'echo $$; touch /tmp/.java_pid$$; exec java -version') Where you >> can extract the pid of the subshell process with $$ and then exec into the >> java launcher and keep the same pid (at least on Linux, unsure about the >> Solaris launcher). >> >> /Mikael >> >>> >>> >>> Thanks! >>> >>> /peter >>> >>>> -----Original Message----- >>>> From: Mikael Gerdin [mailto:mikael.ger...@oracle.com] >>>> Sent: Tuesday, July 9, 2013 2:49 PM >>>> To: Peter Allwin >>>> Cc: serguei.spit...@oracle.com; daniel.daughe...@oracle.com; >>>> serviceability-dev@openjdk.java.net; hotspot-runtime- >>>> d...@openjdk.java.net >>>> Subject: Re: RFR 7162400: Intermittent java.io.IOException: Bad file >>> number >>>> during HotSpotVirtualMachine.executeCommand >>>> >>>> Peter, >>>> >>>> On 2013-07-09 14:25, Peter Allwin wrote: >>>>> Hello! >>>>> >>>>> It is reproducible by letting the test create .java_pid* files for >>>>> all possible process id's on the system, setting correct access >>>>> flags, launching the target VM and attempting to connect. There are >>>>> some caveats though but it should be doable. >>>>> >>>>> I'll convert the repro script to JTREG and add it to the webrev. >>>> >>>> It's probably not a good idea to have a test which taints the system >>>> with >>> stale >>>> .java_pid* files. >>>> If the test execution times out and the script isn't allowed to clean >>>> up I imagine that other subsequent executions could fail. >>>> Is there a way to tell the attach api to use a specific directory so >>>> you >>> won't >>>> need to taint /tmp? >>>> >>>> /Mikael >>>> >>>>> >>>>> Thanks for the reviews! >>>>> >>>>> /peter >>>>> >>>>> *From:*serguei.spit...@oracle.com >>>>> [mailto:serguei.spit...@oracle.com] >>>>> *Sent:* Tuesday, July 9, 2013 1:26 AM >>>>> *To:* daniel.daughe...@oracle.com >>>>> *Cc:* Peter Allwin; serviceability-dev@openjdk.java.net; >>>>> hotspot-runtime-...@openjdk.java.net >>>>> *Subject:* Re: RFR 7162400: Intermittent java.io.IOException: Bad >>>>> file number during HotSpotVirtualMachine.executeCommand >>>>> >>>>> Ok, thanks! >>>>> >>>>> Peter, did you manage to reproduce this issue with your script? >>>>> If so, then, please, include it into the bug report and remove the >>>>> "noreg-sqe" label. >>>>> >>>>> It is Ok if you did not reproduce it, though. >>>>> >>>>> Thanks, >>>>> Serguei >>>>> >>>>> >>>>> On 7/8/13 4:20 PM, Daniel D. Daugherty wrote: >>>>> >>>>> I definitely don't insist... :-) >>>>> >>>>> BTW, I noticed this in Peter's e-mail: >>>>> >>>>> > Testing: >>>>> > JPRT, reproducing script on Solaris, Linux. >>>>> >>>>> so maybe Peter already has this covered with "reproducing > script"... >>>>> >>>>> Dan >>>>> >>>>> On 7/8/13 5:07 PM, serguei.spit...@oracle.com >>>>> <mailto:serguei.spit...@oracle.com> wrote: >>>>> >>>>> Dan, >>>>> >>>>> Dan, thank you for the recommendation. >>>>> But I'm still not sure it is a right thing to do. >>>>> Even though, there are multiple test cases associated with > this >>>>> bug they >>>>> can not be used to verify that fix because an additional >>> condition >>>>> must be present as well. >>>>> This condition is a presence of stale door file which is not >>>>> that easy to reproduce. >>>>> >>>>> However, if you insist then I can change the lable to the >>>>> "noreg-sqe" >>>>> with the corresponding comment. >>>>> >>>>> Thanks, >>>>> Serguei >>>>> >>>>> >>>>> On 7/8/13 3:46 PM, Daniel D. Daugherty wrote: >>>>> >>>>> Serguei, >>>>> >>>>> There are a number of existing tests associated with this >>>>> bug. I don't >>>>> think that 'noreg-hard' is the right label. I think >>>>> 'noreg-sqe' is >>>>> the right one: >>>>> >>>>> noreg-sqe >>>>> Change can be verified by running an existing SQE > test >>>>> suite; the bug >>>>> should identify the suite and the specific test >>> case(s). >>>>> >>>>> Dan >>>>> >>>>> On 7/8/13 12:59 PM, serguei.spit...@oracle.com >>>>> <mailto:serguei.spit...@oracle.com> wrote: >>>>> >>>>> Peter, >>>>> >>>>> I've added the label "noreg-hard" with the comment to >>>>> the report. >>>>> It is not easy to reproduce the issue and demonstrate >>>>> the fix in a regression test. >>>>> >>>>> Thanks, >>>>> Serguei >>>>> >>>>> >>>>> On 7/8/13 11:36 AM, serguei.spit...@oracle.com >>>>> <mailto:serguei.spit...@oracle.com> wrote: >>>>> >>>>> Hi Peter, >>>>> >>>>> The fix looks good. >>>>> >>>>> Thanks, >>>>> Serguei >>>>> >>>>> On 7/8/13 6:54 AM, Peter Allwin wrote: >>>>> >>>>> Hello! >>>>> >>>>> Looking for reviews of this change: >>>>> >>>>> >>> http://cr.openjdk.java.net/~allwin/7162400/webrev.01/ >>>>> >>>>> <http://cr.openjdk.java.net/%7Eallwin/7162400/webrev.01/> >>>>> >>>>> For CR: >>>>> >>>>> >>>>> http://bugs.sun.com/view_bug.do?bug_id=7162400 >>>>> >>>>> >>>>> https://jbs.oracle.com/bugs/browse/JDK-7162400 >>>>> >>>>> Summary: >>>>> >>>>> This change addresses an issue in the Attach > API >>>>> on Solaris, Linux and BSD where an attaching >>>>> application can receive IOExceptions such as >>>>> "Bad file number" (Solaris), "Connection >>>>> refused" (Linux/BSD), or "well-known file is > not >>>>> secure". >>>>> >>>>> The attach process uses a file in the > temporary >>>>> directory as a door (Solaris) or domain > socket >>>>> (Linux,BSD) to communicate with the VM. In >>>>> certain circumstances stale files can be left > in >>>>> the file system which can cause the attaching >>>>> application to believe that the VM is ready > to >>>>> receive a connection when it's not. With this >>>>> change the stale file will be removed during > VM >>>>> startup. >>>>> >>>>> Note that there is still an issue if we don't >>>>> have permission to remove the stale file, the >>>>> attaching process will fail to connect. >>>>> >>>>> Testing: >>>>> >>>>> JPRT, reproducing script on Solaris, Linux. >>>>> >>>>> Credits: >>>>> >>>>> Thanks to Staffan Larsen who worked on this >>>>> issue with me. >>>>> >>>>> Regards, >>>>> >>>>> >>>>> Peter >>>>> >>> > > -- Dmitry Samersoff Oracle Java development team, Saint Petersburg, Russia * I would love to change the world, but they won't give me the sources.