Hi Bob,

  Thank you for review.

On 8/29/19 10:55 AM, Bob Vandette wrote:
Misha,

Looks good.  A couple of nits.

1. You might want to remove hotspot_containers section of ProblemList.txt since 
there are no bugs listed.
I can remove that section.
2. Can you make this timeout a constant just like TIME_TO_RUN_MAIN_PROCESS

   73             mainContainer.waitForMainMethodStart(5*1000);
Will do.

3. assertIsAlive() is not used except for the commented out tests.  Do you 
think you’ll ultimately use
this method or is this left over from previous attempts?

222         public void assertIsAlive() throws Exception {

Currently it is not in use. However when test cases 02 and 03 are back online, this check will be used.


Since the changes you suggest are minor, I will not post an updated webrev (let me know if you would like to see the updated webrev).

I will make the updates recommended by you, run another test cycle, and integrate the changes.


Thank you,

Misha



Bob.


On Aug 29, 2019, at 11:41 AM, mikhailo.seledt...@oracle.com wrote:

I believe I need a second reviewer for this change. Could someone, please, 
review this change version 2 ? (David already reviewed it).

http://cr.openjdk.java.net/~mseledtsov/8228960.02/


Thank you in advance,

Misha


On 8/26/19 12:32 PM, mikhailo.seledt...@oracle.com wrote:
Hi David,

   Thank you for review.

On 8/26/19 12:57 AM, David Holmes wrote:
Hi Misha,

On 24/08/2019 3:21 am, mikhailo.seledt...@oracle.com wrote:
Finally got some time to work on this issue.
Since I have encountered problem using files for passing messages between a 
container and a test driver (due to permissions), I looked for alternative 
solutions. I am using the output of a container process to signal when the main 
method has started, and it works. This simplifies things quite a bit as well.

Normally, we use OutputAnalyzer test utility to collect the whole output once the process 
has completed, and then analyze the resulting output for "contains some 
string", match, etc. However, testutils/ProcessTools provides an API to consume the 
output as it is produced. I am using this API to detect when the main() method of the 
container has started.
That seems reasonable. Do we want to make the following change to minimise 
unneeded output processing:

          private Consumer<String> outputConsumer = s -> {
!            if (!mainMethodStarted && 
s.contains(EventGeneratorLoop.MAIN_METHOD_STARTED)) {
                  System.out.println("MainContainer: setting 
mainMethodStarted");
                  mainMethodStarted = true;
              }
          };
Thank you for the suggestion. I will update the code accordingly.
Updated webrev:
      http://cr.openjdk.java.net/~mseledtsov/8228960.02/
Otherwise looks okay. Hopefully those other test cases will be enabled in the 
not too distant future.
I hope so as well.


Thank you,

Misha

Thanks,
David
-----

Testing:

    Ran the test on Linux-x64, various multiple nodes in a test cluster 50 
times - All PASS


Thank you,

Misha

On 8/13/19 2:05 PM, Bob Vandette wrote:
On Aug 13, 2019, at 3:28 PM, mikhailo.seledt...@oracle.com wrote:


On 8/13/19 12:06 PM, Bob Vandette wrote:
On Aug 13, 2019, at 2:57 PM, mikhailo.seledt...@oracle.com wrote:

Hi Bob,

    The workdir (JTwork/scratch) is created with the "test user" permissions. Let me try 
to place the "signal" file in /tmp instead, since /tmp should normally have a 777 
permission on Linux.
Aren’t you creating a file inside a docker container and then checking for its 
existence outside of the container?
Correct
Isn’t the root user running inside the container?
By default it is. But it still fails to create a file, for some reason. Can be 
related to selinux settings (for instance, see this article: 
https://stackoverflow.com/questions/24288616/permission-denied-on-accessing-host-directory-in-docker/31334443),
 I can not change those.
Is your JTWork/scratch on an NFS mounted file system?  If this is the case then 
the problem is that root is equivalent to nobody on
mounted file systems and can’t create files unless the directory has 777 
permissions.  I just confirmed this. You’d have to either run
the container test as test-user or change the scratch directory permission.

Bob.

My hope is that /tmp is configured to be accessed by a container engine as a 
general purpose directory, hence I was thinking to try it out.

Both processes don’t see the same /tmp right?   So that shouldn’t help.
In my next experiment, I will map a /tmp from host to be a /host-tmp inside the 
container (--volume /tmp:/host-tmp), then write a signal file to /host-tmp.
If scratch has 777 permissions, anyone can create a file.
scratch has  "rwxr-xr-x"
You have to be careful that you can clean up the
file from outside the container.  I’d make sure to create it with 777.
I do use deleteOnExit(), so it should work (unless the JVM crashes). I guess I 
could add extra layer of safety here, and set the permissions to 777. Thank you 
for advice.


Thank you,

Misha

Bob.

If this works, I will have to add some unique number to the file name, perhaps 
a PID of a child process.

I will try this, and let you know how it works.


Thank you,

Misha

On 8/13/19 6:34 AM, Bob Vandette wrote:
Sorry, I just looked at the webrev and you are trying the approach I suggested. 
I thought you
were trying to use file change notification.

Where does the workdir get created?  Does it have 777 permissions?

Bob.


On Aug 13, 2019, at 9:29 AM, Bob Vandette <bob.vande...@oracle.com> wrote:

What if you just poll for the creation of the file waiting some small amount of 
time between polling with a maximum timeout.

Bob.


On Aug 12, 2019, at 8:22 PM, mikhailo.seledt...@oracle.com wrote:

Unfortunately, this approach does not seem to work on many of our test cluster machines. The 
creation of a "signal" file results in "PermissionDenied".

The possible reason is the selinux configuration, or some other permission 
related stuff. The container tries to create a new file on a mounted volume on 
a host system, but host system denies it. I will look a bit deeper into this, 
but I think this type of issue can be encountered on any automated test system. 
Hence, we may have to abandon this approach.


Thanks,

Misha


On 8/12/19 3:59 PM, mikhailo.seledt...@oracle.com wrote:
Here is an updated webrev: http://cr.openjdk.java.net/~mseledtsov/8228960.01/

I am using a simple file-based mechanism to communicate between the processes. The 
"EventGeneratorLoop" process creates a specific "signal" file on a shared 
mounted volume, while the main test process waits  for the file to exist before running the test 
cases.

Passes on Linux-x64 Docker-enabled host. Testing in the test cluster is in 
progress.


Thank you,

Misha

On 8/7/19 5:11 PM, David Holmes wrote:
On 8/08/2019 9:04 am, Mikhailo Seledtsov wrote:
Hi Severin, Bob,

    Thank you for reviewing the code.

On 8/7/19, 11:38 AM, Bob Vandette wrote:
Can’t you come up with a better way of synchronizing the test by possibly 
writing a
file and waiting for it to exist with a timeout?
I will try out this approach.
This seems like a fundamental problem with jcmd - so cc'ing serviceability-dev.

But I'm pretty sure they recently addressed a similar issue with the premature 
sending of the attach signal?

David
-----

Thanks,
Misha
Isn’t there a shared volume between the two
processes?

We’ve been fighting test reliability for a while now.  I can only hope we’re 
getting
to the end.

Bob.

On Aug 7, 2019, at 2:18 PM, Severin Gehwolf<sgehw...@redhat.com> wrote:

Hi Misha,

On Tue, 2019-08-06 at 20:17 -0700, mikhailo.seledt...@oracle.com wrote:
Please review this change that fixes a container test TestJcmdWithSideCar.

My investigation indicated that a root cause for this failure is:
JCMD -l shows 'Unknown' for class name because the main class has not
been loaded yet.
The target test JVM has started, it is initializing, but has not loaded
the main test class.
That's what I've found too.

The proposed solution is to try 'jcmd -l' several times, with a short
sleep in between.
Thread.sleep() isn't great, but I'm not sure there is an alternative.

Also I have commented out the testCase02() due to another bug:
"JDK-8228850: jhsdb jinfo fails with ClassCastException:
s.j.h.oops.TypeArray cannot be cast to s.j.h.oops.Instance",
which is not a test bug. IMO, it is better to run the test and skip a
sub-case than to skip the entire test.

      JBS: https://bugs.openjdk.java.net/browse/JDK-8228960
      Webrev: http://cr.openjdk.java.net/~mseledtsov/8228960.00/
Looks OK to me.

Thanks,
Severin

Reply via email to