Misha,

Looks good.  A couple of nits.

1. You might want to remove hotspot_containers section of ProblemList.txt since 
there are no bugs listed.

2. Can you make this timeout a constant just like TIME_TO_RUN_MAIN_PROCESS

  73             mainContainer.waitForMainMethodStart(5*1000);

3. assertIsAlive() is not used except for the commented out tests.  Do you 
think you’ll ultimately use
this method or is this left over from previous attempts?

222         public void assertIsAlive() throws Exception {


Bob.


> On Aug 29, 2019, at 11:41 AM, mikhailo.seledt...@oracle.com wrote:
> 
> I believe I need a second reviewer for this change. Could someone, please, 
> review this change version 2 ? (David already reviewed it).
> 
> http://cr.openjdk.java.net/~mseledtsov/8228960.02/
> 
> 
> Thank you in advance,
> 
> Misha
> 
> 
> On 8/26/19 12:32 PM, mikhailo.seledt...@oracle.com wrote:
>> Hi David,
>> 
>>   Thank you for review.
>> 
>> On 8/26/19 12:57 AM, David Holmes wrote:
>>> Hi Misha,
>>> 
>>> On 24/08/2019 3:21 am, mikhailo.seledt...@oracle.com wrote:
>>>> Finally got some time to work on this issue.
>>>> Since I have encountered problem using files for passing messages between 
>>>> a container and a test driver (due to permissions), I looked for 
>>>> alternative solutions. I am using the output of a container process to 
>>>> signal when the main method has started, and it works. This simplifies 
>>>> things quite a bit as well.
>>>> 
>>>> Normally, we use OutputAnalyzer test utility to collect the whole output 
>>>> once the process has completed, and then analyze the resulting output for 
>>>> "contains some string", match, etc. However, testutils/ProcessTools 
>>>> provides an API to consume the output as it is produced. I am using this 
>>>> API to detect when the main() method of the container has started.
>>> 
>>> That seems reasonable. Do we want to make the following change to minimise 
>>> unneeded output processing:
>>> 
>>>          private Consumer<String> outputConsumer = s -> {
>>> !            if (!mainMethodStarted && 
>>> s.contains(EventGeneratorLoop.MAIN_METHOD_STARTED)) {
>>>                  System.out.println("MainContainer: setting 
>>> mainMethodStarted");
>>>                  mainMethodStarted = true;
>>>              }
>>>          };
>> Thank you for the suggestion. I will update the code accordingly.
>>> 
>>>> Updated webrev:
>>>>      http://cr.openjdk.java.net/~mseledtsov/8228960.02/
>>> 
>>> Otherwise looks okay. Hopefully those other test cases will be enabled in 
>>> the not too distant future.
>> 
>> I hope so as well.
>> 
>> 
>> Thank you,
>> 
>> Misha
>> 
>>> 
>>> Thanks,
>>> David
>>> -----
>>> 
>>>> 
>>>> Testing:
>>>> 
>>>>    Ran the test on Linux-x64, various multiple nodes in a test cluster 50 
>>>> times - All PASS
>>>> 
>>>> 
>>>> Thank you,
>>>> 
>>>> Misha
>>>> 
>>>> On 8/13/19 2:05 PM, Bob Vandette wrote:
>>>>> 
>>>>>> On Aug 13, 2019, at 3:28 PM, mikhailo.seledt...@oracle.com wrote:
>>>>>> 
>>>>>> 
>>>>>> On 8/13/19 12:06 PM, Bob Vandette wrote:
>>>>>>>> On Aug 13, 2019, at 2:57 PM, mikhailo.seledt...@oracle.com wrote:
>>>>>>>> 
>>>>>>>> Hi Bob,
>>>>>>>> 
>>>>>>>>    The workdir (JTwork/scratch) is created with the "test user" 
>>>>>>>> permissions. Let me try to place the "signal" file in /tmp instead, 
>>>>>>>> since /tmp should normally have a 777 permission on Linux.
>>>>>>> Aren’t you creating a file inside a docker container and then checking 
>>>>>>> for its existence outside of the container?
>>>>>> Correct
>>>>>>> Isn’t the root user running inside the container?
>>>>>> By default it is. But it still fails to create a file, for some reason. 
>>>>>> Can be related to selinux settings (for instance, see this article: 
>>>>>> https://stackoverflow.com/questions/24288616/permission-denied-on-accessing-host-directory-in-docker/31334443),
>>>>>>  I can not change those.
>>>>> Is your JTWork/scratch on an NFS mounted file system?  If this is the 
>>>>> case then the problem is that root is equivalent to nobody on
>>>>> mounted file systems and can’t create files unless the directory has 777 
>>>>> permissions.  I just confirmed this. You’d have to either run
>>>>> the container test as test-user or change the scratch directory 
>>>>> permission.
>>>>> 
>>>>> Bob.
>>>>> 
>>>>>> My hope is that /tmp is configured to be accessed by a container engine 
>>>>>> as a general purpose directory, hence I was thinking to try it out.
>>>>>> 
>>>>>>> Both processes don’t see the same /tmp right?   So that shouldn’t help.
>>>>>> In my next experiment, I will map a /tmp from host to be a /host-tmp 
>>>>>> inside the container (--volume /tmp:/host-tmp), then write a signal file 
>>>>>> to /host-tmp.
>>>>>>> If scratch has 777 permissions, anyone can create a file.
>>>>>> scratch has  "rwxr-xr-x"
>>>>>>> You have to be careful that you can clean up the
>>>>>>> file from outside the container.  I’d make sure to create it with 777.
>>>>>> I do use deleteOnExit(), so it should work (unless the JVM crashes). I 
>>>>>> guess I could add extra layer of safety here, and set the permissions to 
>>>>>> 777. Thank you for advice.
>>>>>> 
>>>>>> 
>>>>>> Thank you,
>>>>>> 
>>>>>> Misha
>>>>>> 
>>>>>>> Bob.
>>>>>>> 
>>>>>>>> If this works, I will have to add some unique number to the file name, 
>>>>>>>> perhaps a PID of a child process.
>>>>>>>> 
>>>>>>>> I will try this, and let you know how it works.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Thank you,
>>>>>>>> 
>>>>>>>> Misha
>>>>>>>> 
>>>>>>>> On 8/13/19 6:34 AM, Bob Vandette wrote:
>>>>>>>>> Sorry, I just looked at the webrev and you are trying the approach I 
>>>>>>>>> suggested. I thought you
>>>>>>>>> were trying to use file change notification.
>>>>>>>>> 
>>>>>>>>> Where does the workdir get created?  Does it have 777 permissions?
>>>>>>>>> 
>>>>>>>>> Bob.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> On Aug 13, 2019, at 9:29 AM, Bob Vandette <bob.vande...@oracle.com> 
>>>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>> What if you just poll for the creation of the file waiting some 
>>>>>>>>>> small amount of time between polling with a maximum timeout.
>>>>>>>>>> 
>>>>>>>>>> Bob.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>> On Aug 12, 2019, at 8:22 PM, mikhailo.seledt...@oracle.com wrote:
>>>>>>>>>>> 
>>>>>>>>>>> Unfortunately, this approach does not seem to work on many of our 
>>>>>>>>>>> test cluster machines. The creation of a "signal" file results in 
>>>>>>>>>>> "PermissionDenied".
>>>>>>>>>>> 
>>>>>>>>>>> The possible reason is the selinux configuration, or some other 
>>>>>>>>>>> permission related stuff. The container tries to create a new file 
>>>>>>>>>>> on a mounted volume on a host system, but host system denies it. I 
>>>>>>>>>>> will look a bit deeper into this, but I think this type of issue 
>>>>>>>>>>> can be encountered on any automated test system. Hence, we may have 
>>>>>>>>>>> to abandon this approach.
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> Thanks,
>>>>>>>>>>> 
>>>>>>>>>>> Misha
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> On 8/12/19 3:59 PM, mikhailo.seledt...@oracle.com wrote:
>>>>>>>>>>>> Here is an updated webrev: 
>>>>>>>>>>>> http://cr.openjdk.java.net/~mseledtsov/8228960.01/
>>>>>>>>>>>> 
>>>>>>>>>>>> I am using a simple file-based mechanism to communicate between 
>>>>>>>>>>>> the processes. The "EventGeneratorLoop" process creates a specific 
>>>>>>>>>>>> "signal" file on a shared mounted volume, while the main test 
>>>>>>>>>>>> process waits  for the file to exist before running the test cases.
>>>>>>>>>>>> 
>>>>>>>>>>>> Passes on Linux-x64 Docker-enabled host. Testing in the test 
>>>>>>>>>>>> cluster is in progress.
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> Thank you,
>>>>>>>>>>>> 
>>>>>>>>>>>> Misha
>>>>>>>>>>>> 
>>>>>>>>>>>> On 8/7/19 5:11 PM, David Holmes wrote:
>>>>>>>>>>>>> On 8/08/2019 9:04 am, Mikhailo Seledtsov wrote:
>>>>>>>>>>>>>> Hi Severin, Bob,
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>    Thank you for reviewing the code.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On 8/7/19, 11:38 AM, Bob Vandette wrote:
>>>>>>>>>>>>>>> Can’t you come up with a better way of synchronizing the test 
>>>>>>>>>>>>>>> by possibly writing a
>>>>>>>>>>>>>>> file and waiting for it to exist with a timeout?
>>>>>>>>>>>>>> I will try out this approach.
>>>>>>>>>>>>> This seems like a fundamental problem with jcmd - so cc'ing 
>>>>>>>>>>>>> serviceability-dev.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> But I'm pretty sure they recently addressed a similar issue with 
>>>>>>>>>>>>> the premature sending of the attach signal?
>>>>>>>>>>>>> 
>>>>>>>>>>>>> David
>>>>>>>>>>>>> -----
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>> Misha
>>>>>>>>>>>>>>> Isn’t there a shared volume between the two
>>>>>>>>>>>>>>> processes?
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> We’ve been fighting test reliability for a while now.  I can 
>>>>>>>>>>>>>>> only hope we’re getting
>>>>>>>>>>>>>>> to the end.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Bob.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> On Aug 7, 2019, at 2:18 PM, Severin 
>>>>>>>>>>>>>>>> Gehwolf<sgehw...@redhat.com> wrote:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Hi Misha,
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> On Tue, 2019-08-06 at 20:17 -0700, 
>>>>>>>>>>>>>>>> mikhailo.seledt...@oracle.com wrote:
>>>>>>>>>>>>>>>>> Please review this change that fixes a container test 
>>>>>>>>>>>>>>>>> TestJcmdWithSideCar.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> My investigation indicated that a root cause for this failure 
>>>>>>>>>>>>>>>>> is:
>>>>>>>>>>>>>>>>> JCMD -l shows 'Unknown' for class name because the main class 
>>>>>>>>>>>>>>>>> has not
>>>>>>>>>>>>>>>>> been loaded yet.
>>>>>>>>>>>>>>>>> The target test JVM has started, it is initializing, but has 
>>>>>>>>>>>>>>>>> not loaded
>>>>>>>>>>>>>>>>> the main test class.
>>>>>>>>>>>>>>>> That's what I've found too.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> The proposed solution is to try 'jcmd -l' several times, with 
>>>>>>>>>>>>>>>>> a short
>>>>>>>>>>>>>>>>> sleep in between.
>>>>>>>>>>>>>>>> Thread.sleep() isn't great, but I'm not sure there is an 
>>>>>>>>>>>>>>>> alternative.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Also I have commented out the testCase02() due to another bug:
>>>>>>>>>>>>>>>>> "JDK-8228850: jhsdb jinfo fails with ClassCastException:
>>>>>>>>>>>>>>>>> s.j.h.oops.TypeArray cannot be cast to s.j.h.oops.Instance",
>>>>>>>>>>>>>>>>> which is not a test bug. IMO, it is better to run the test 
>>>>>>>>>>>>>>>>> and skip a
>>>>>>>>>>>>>>>>> sub-case than to skip the entire test.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>      JBS: https://bugs.openjdk.java.net/browse/JDK-8228960
>>>>>>>>>>>>>>>>>      Webrev: 
>>>>>>>>>>>>>>>>> http://cr.openjdk.java.net/~mseledtsov/8228960.00/
>>>>>>>>>>>>>>>> Looks OK to me.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>> Severin
>>>>>>>>>>>>>>>> 

Reply via email to