Misha, Looks good. A couple of nits.
1. You might want to remove hotspot_containers section of ProblemList.txt since there are no bugs listed. 2. Can you make this timeout a constant just like TIME_TO_RUN_MAIN_PROCESS 73 mainContainer.waitForMainMethodStart(5*1000); 3. assertIsAlive() is not used except for the commented out tests. Do you think you’ll ultimately use this method or is this left over from previous attempts? 222 public void assertIsAlive() throws Exception { Bob. > On Aug 29, 2019, at 11:41 AM, mikhailo.seledt...@oracle.com wrote: > > I believe I need a second reviewer for this change. Could someone, please, > review this change version 2 ? (David already reviewed it). > > http://cr.openjdk.java.net/~mseledtsov/8228960.02/ > > > Thank you in advance, > > Misha > > > On 8/26/19 12:32 PM, mikhailo.seledt...@oracle.com wrote: >> Hi David, >> >> Thank you for review. >> >> On 8/26/19 12:57 AM, David Holmes wrote: >>> Hi Misha, >>> >>> On 24/08/2019 3:21 am, mikhailo.seledt...@oracle.com wrote: >>>> Finally got some time to work on this issue. >>>> Since I have encountered problem using files for passing messages between >>>> a container and a test driver (due to permissions), I looked for >>>> alternative solutions. I am using the output of a container process to >>>> signal when the main method has started, and it works. This simplifies >>>> things quite a bit as well. >>>> >>>> Normally, we use OutputAnalyzer test utility to collect the whole output >>>> once the process has completed, and then analyze the resulting output for >>>> "contains some string", match, etc. However, testutils/ProcessTools >>>> provides an API to consume the output as it is produced. I am using this >>>> API to detect when the main() method of the container has started. >>> >>> That seems reasonable. Do we want to make the following change to minimise >>> unneeded output processing: >>> >>> private Consumer<String> outputConsumer = s -> { >>> ! if (!mainMethodStarted && >>> s.contains(EventGeneratorLoop.MAIN_METHOD_STARTED)) { >>> System.out.println("MainContainer: setting >>> mainMethodStarted"); >>> mainMethodStarted = true; >>> } >>> }; >> Thank you for the suggestion. I will update the code accordingly. >>> >>>> Updated webrev: >>>> http://cr.openjdk.java.net/~mseledtsov/8228960.02/ >>> >>> Otherwise looks okay. Hopefully those other test cases will be enabled in >>> the not too distant future. >> >> I hope so as well. >> >> >> Thank you, >> >> Misha >> >>> >>> Thanks, >>> David >>> ----- >>> >>>> >>>> Testing: >>>> >>>> Ran the test on Linux-x64, various multiple nodes in a test cluster 50 >>>> times - All PASS >>>> >>>> >>>> Thank you, >>>> >>>> Misha >>>> >>>> On 8/13/19 2:05 PM, Bob Vandette wrote: >>>>> >>>>>> On Aug 13, 2019, at 3:28 PM, mikhailo.seledt...@oracle.com wrote: >>>>>> >>>>>> >>>>>> On 8/13/19 12:06 PM, Bob Vandette wrote: >>>>>>>> On Aug 13, 2019, at 2:57 PM, mikhailo.seledt...@oracle.com wrote: >>>>>>>> >>>>>>>> Hi Bob, >>>>>>>> >>>>>>>> The workdir (JTwork/scratch) is created with the "test user" >>>>>>>> permissions. Let me try to place the "signal" file in /tmp instead, >>>>>>>> since /tmp should normally have a 777 permission on Linux. >>>>>>> Aren’t you creating a file inside a docker container and then checking >>>>>>> for its existence outside of the container? >>>>>> Correct >>>>>>> Isn’t the root user running inside the container? >>>>>> By default it is. But it still fails to create a file, for some reason. >>>>>> Can be related to selinux settings (for instance, see this article: >>>>>> https://stackoverflow.com/questions/24288616/permission-denied-on-accessing-host-directory-in-docker/31334443), >>>>>> I can not change those. >>>>> Is your JTWork/scratch on an NFS mounted file system? If this is the >>>>> case then the problem is that root is equivalent to nobody on >>>>> mounted file systems and can’t create files unless the directory has 777 >>>>> permissions. I just confirmed this. You’d have to either run >>>>> the container test as test-user or change the scratch directory >>>>> permission. >>>>> >>>>> Bob. >>>>> >>>>>> My hope is that /tmp is configured to be accessed by a container engine >>>>>> as a general purpose directory, hence I was thinking to try it out. >>>>>> >>>>>>> Both processes don’t see the same /tmp right? So that shouldn’t help. >>>>>> In my next experiment, I will map a /tmp from host to be a /host-tmp >>>>>> inside the container (--volume /tmp:/host-tmp), then write a signal file >>>>>> to /host-tmp. >>>>>>> If scratch has 777 permissions, anyone can create a file. >>>>>> scratch has "rwxr-xr-x" >>>>>>> You have to be careful that you can clean up the >>>>>>> file from outside the container. I’d make sure to create it with 777. >>>>>> I do use deleteOnExit(), so it should work (unless the JVM crashes). I >>>>>> guess I could add extra layer of safety here, and set the permissions to >>>>>> 777. Thank you for advice. >>>>>> >>>>>> >>>>>> Thank you, >>>>>> >>>>>> Misha >>>>>> >>>>>>> Bob. >>>>>>> >>>>>>>> If this works, I will have to add some unique number to the file name, >>>>>>>> perhaps a PID of a child process. >>>>>>>> >>>>>>>> I will try this, and let you know how it works. >>>>>>>> >>>>>>>> >>>>>>>> Thank you, >>>>>>>> >>>>>>>> Misha >>>>>>>> >>>>>>>> On 8/13/19 6:34 AM, Bob Vandette wrote: >>>>>>>>> Sorry, I just looked at the webrev and you are trying the approach I >>>>>>>>> suggested. I thought you >>>>>>>>> were trying to use file change notification. >>>>>>>>> >>>>>>>>> Where does the workdir get created? Does it have 777 permissions? >>>>>>>>> >>>>>>>>> Bob. >>>>>>>>> >>>>>>>>> >>>>>>>>>> On Aug 13, 2019, at 9:29 AM, Bob Vandette <bob.vande...@oracle.com> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> What if you just poll for the creation of the file waiting some >>>>>>>>>> small amount of time between polling with a maximum timeout. >>>>>>>>>> >>>>>>>>>> Bob. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> On Aug 12, 2019, at 8:22 PM, mikhailo.seledt...@oracle.com wrote: >>>>>>>>>>> >>>>>>>>>>> Unfortunately, this approach does not seem to work on many of our >>>>>>>>>>> test cluster machines. The creation of a "signal" file results in >>>>>>>>>>> "PermissionDenied". >>>>>>>>>>> >>>>>>>>>>> The possible reason is the selinux configuration, or some other >>>>>>>>>>> permission related stuff. The container tries to create a new file >>>>>>>>>>> on a mounted volume on a host system, but host system denies it. I >>>>>>>>>>> will look a bit deeper into this, but I think this type of issue >>>>>>>>>>> can be encountered on any automated test system. Hence, we may have >>>>>>>>>>> to abandon this approach. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> >>>>>>>>>>> Misha >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On 8/12/19 3:59 PM, mikhailo.seledt...@oracle.com wrote: >>>>>>>>>>>> Here is an updated webrev: >>>>>>>>>>>> http://cr.openjdk.java.net/~mseledtsov/8228960.01/ >>>>>>>>>>>> >>>>>>>>>>>> I am using a simple file-based mechanism to communicate between >>>>>>>>>>>> the processes. The "EventGeneratorLoop" process creates a specific >>>>>>>>>>>> "signal" file on a shared mounted volume, while the main test >>>>>>>>>>>> process waits for the file to exist before running the test cases. >>>>>>>>>>>> >>>>>>>>>>>> Passes on Linux-x64 Docker-enabled host. Testing in the test >>>>>>>>>>>> cluster is in progress. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Thank you, >>>>>>>>>>>> >>>>>>>>>>>> Misha >>>>>>>>>>>> >>>>>>>>>>>> On 8/7/19 5:11 PM, David Holmes wrote: >>>>>>>>>>>>> On 8/08/2019 9:04 am, Mikhailo Seledtsov wrote: >>>>>>>>>>>>>> Hi Severin, Bob, >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thank you for reviewing the code. >>>>>>>>>>>>>> >>>>>>>>>>>>>> On 8/7/19, 11:38 AM, Bob Vandette wrote: >>>>>>>>>>>>>>> Can’t you come up with a better way of synchronizing the test >>>>>>>>>>>>>>> by possibly writing a >>>>>>>>>>>>>>> file and waiting for it to exist with a timeout? >>>>>>>>>>>>>> I will try out this approach. >>>>>>>>>>>>> This seems like a fundamental problem with jcmd - so cc'ing >>>>>>>>>>>>> serviceability-dev. >>>>>>>>>>>>> >>>>>>>>>>>>> But I'm pretty sure they recently addressed a similar issue with >>>>>>>>>>>>> the premature sending of the attach signal? >>>>>>>>>>>>> >>>>>>>>>>>>> David >>>>>>>>>>>>> ----- >>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>> Misha >>>>>>>>>>>>>>> Isn’t there a shared volume between the two >>>>>>>>>>>>>>> processes? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> We’ve been fighting test reliability for a while now. I can >>>>>>>>>>>>>>> only hope we’re getting >>>>>>>>>>>>>>> to the end. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Bob. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Aug 7, 2019, at 2:18 PM, Severin >>>>>>>>>>>>>>>> Gehwolf<sgehw...@redhat.com> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Hi Misha, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Tue, 2019-08-06 at 20:17 -0700, >>>>>>>>>>>>>>>> mikhailo.seledt...@oracle.com wrote: >>>>>>>>>>>>>>>>> Please review this change that fixes a container test >>>>>>>>>>>>>>>>> TestJcmdWithSideCar. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> My investigation indicated that a root cause for this failure >>>>>>>>>>>>>>>>> is: >>>>>>>>>>>>>>>>> JCMD -l shows 'Unknown' for class name because the main class >>>>>>>>>>>>>>>>> has not >>>>>>>>>>>>>>>>> been loaded yet. >>>>>>>>>>>>>>>>> The target test JVM has started, it is initializing, but has >>>>>>>>>>>>>>>>> not loaded >>>>>>>>>>>>>>>>> the main test class. >>>>>>>>>>>>>>>> That's what I've found too. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> The proposed solution is to try 'jcmd -l' several times, with >>>>>>>>>>>>>>>>> a short >>>>>>>>>>>>>>>>> sleep in between. >>>>>>>>>>>>>>>> Thread.sleep() isn't great, but I'm not sure there is an >>>>>>>>>>>>>>>> alternative. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Also I have commented out the testCase02() due to another bug: >>>>>>>>>>>>>>>>> "JDK-8228850: jhsdb jinfo fails with ClassCastException: >>>>>>>>>>>>>>>>> s.j.h.oops.TypeArray cannot be cast to s.j.h.oops.Instance", >>>>>>>>>>>>>>>>> which is not a test bug. IMO, it is better to run the test >>>>>>>>>>>>>>>>> and skip a >>>>>>>>>>>>>>>>> sub-case than to skip the entire test. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8228960 >>>>>>>>>>>>>>>>> Webrev: >>>>>>>>>>>>>>>>> http://cr.openjdk.java.net/~mseledtsov/8228960.00/ >>>>>>>>>>>>>>>> Looks OK to me. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>> Severin >>>>>>>>>>>>>>>>