Hi Dmitry,

It's hard to fix this issue without access to Oracle testing infrastructure.

Agree.
I'm not an Oracle employee. Thus I cannot access it.

Should I change the Assignee on JBS (JDK-8165869) to you?


Thanks,

Yasumasa


On 2016/10/25 15:09, Dmitry Samersoff wrote:
Yasumasa,

It's hard to fix this issue without access to Oracle testing infrastructure.

I'm working on the issue but unfortunately I was caught by some internal
problems so it progresses slowly than I expected.

I'm sorry about it.

-Dmitry


On 2016-10-24 13:24, Yasumasa Suenaga wrote:
Hi Dmitry, David,

I want to resume to work for this issue because this issue has been
marked P3.
Dmitry, do you have any idea for it?

IMHO, we can fix as below:

  http://cr.openjdk.java.net/~ysuenaga/JDK-8165869/webrev.00/

According to David, JvmtiExport::load_agent_library() should return zero
when the operation which is kicked by AttachListener is succeeded.
AttachListener will write response code from
JvmtiExport::load_agent_library() at first, and the client (e.g.
HotSpotVirtualMachine.java) checks it.

Thus I keep current implementation about return code, and I added the
code to send error message to the client.
It works fine with jdk/test/com/sun/tools/attach/BasicTests.java .

What do you think about it?


Thanks,

Yasumasa


On 2016/09/15 20:08, Dmitry Samersoff wrote:
Yasumasa, David,

I'm looking into this issue and come back as soon as I have a clear
picture.

It takes some time. Sorry!

-Dmitry

On 2016-09-15 14:01, Yasumasa Suenaga wrote:
Hi David,

I agree the call sequence which you write in the case of "jcmd"
operation.
However, we should think about "load" operation in AttachListener.

We can access JvmtiExport::load_agent_library() through two routes:

 Route "load": AttachListener -> JvmtiExport::load_agent_library()
 Route "jcmd": AttachListener -> jcmd() in attachListener.cpp ->
DCmd::parse_and_execute()

"load" sets error code (it means first data in the stream) when
JvmtiExport::load_agent_library() returns JNI_ERR.
OTOH "jcmd" sets error code if exception is occurred ONLY.

If we try to load invalid JVMTI agent, "load" writes JNI_ERR to stream,
however "jcmd" writes JNI_OK because pending exception is not available
at this point.
Currently, JCmd.java shows "Command executed successfully" when it
cannot read the message from the socket. In case of this issue,
JVMTIAgentLoadDCmd does not write any message because it cannot attach
the agent.
If return code which is in the top of stream should mean whether
communication with AttachListener is succeeded, I think HotSpot should
write error message or error code to the stream for JCmd.java .

I'm not sure this email is the answer for you.
Sorry for my English.


Yasumasa


On 2016/09/15 18:43, David Holmes wrote:
Hi Yasumasa,

On 15/09/2016 1:56 PM, Yasumasa Suenaga wrote:
Hi,

If this is done through jcmd then jcmd needs to know how to "unwrap"
the information it gets back from the Dcmd.

In case of JVMTI.agent_load dcmd, I think we should be able to get the
response as below:

  1. Invalid JVMTI agent
  2. Agent_OnAttach() did not return JNI_OK
  3. Succeeded

Currently, JvmtiExport::load_agent_library() returns JNI_OK to caller,
and jcmd() in attachListener.cpp returns JNI_ERR if exception is
occurred (in Agent_OnAttach()).

Can you respond to my comment in the bug report about the chain of
calls and explain exactly where you think things are going wrong in
this scenario. I'm having a lot of trouble trying to see how the
information flows back through the layers.

IMHO, HotSpot should return error code (in top of the stream:
written by
AttachListener at first). So we should be able to get some error
code in
case 1. and 2. Then the client (jcmd or other methods) can decide to
parse text information in the stream.

It returns the high-level response code first "did communication with
the target VM succeed", then the actual action response code. That
seems the right thing to do to me. It is up to the callers reading the
stream to understand how to read it. To me the issue lies somewhere on
the jcmd/dcmd side.

Thanks,
David


Sorry for my English.

Yasumasa


On 2016/09/14 7:03, David Holmes wrote:
On 13/09/2016 10:31 PM, Yasumasa Suenaga wrote:
Thanks David!
If we should not change the meaning of return code from
JvmtiExport::load_agent_library(), we should print return code as
below:
-------------------
diff -r 0cf03b9d9b1f src/share/vm/prims/jvmtiExport.cpp
--- a/src/share/vm/prims/jvmtiExport.cpp    Mon Sep 12 18:59:13 2016
+0000
+++ b/src/share/vm/prims/jvmtiExport.cpp    Tue Sep 13 21:12:14 2016
+0900
@@ -2412,6 +2412,10 @@
       result = JNI_OK;
     }
   }
+ // Print error code if Agent_OnAttach cannot be executed
+ if (result != JNI_OK) {
+ st->print_cr("%d", result);
+ }
   return result;
 }
-------------------

Not sure I see the point. "return result" will put the error code
into
the socket stream and that error will be seen and responded to. I
don't expect anything to then read further into the stream to see the
"result" repeated a second time. In other words if execute() doesn't
see a zero result then you go no further; if execute sees a zero
result then the actual action may have had an issue so you proceed to
read the "action's result".

It can pass com/sun/tools/attach/BasicTests.java, and we can get
-1 if
we try to attach invalid agent.

Can you please outline your test case for this again. If this is done
through jcmd then jcmd needs to know how to "unwrap" the information
it gets back from the Dcmd.

Thanks,
David


Yasumasa


On 2016/09/13 17:06, Dmitry Samersoff wrote:
David,

Thank you for the evaluation.

With that in mind I suspect the real fix for the original issue
is to
have Dcmd recognize that it also has to read two "results".
Though I'm
not sure how exactly.

This logic seems completely broken for me. But I don't see
anything we
can do right now - for jdk 9.

It requires careful changes of both - code and tests.

I can help with this task but not today.

-Dmitry

On 2016-09-13 08:08, David Holmes wrote:
On 13/09/2016 1:53 PM, David Holmes wrote:
On 13/09/2016 1:30 PM, Yasumasa Suenaga wrote:
Hi,

I could reproduce and I added a comment to JBS:
https://bugs.openjdk.java.net/browse/JDK-8165869?focusedCommentId=14000623&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14000623








In case of BasicTests.java, I think it is a test bug.

I don't agree as yet. I have not been able to isolate the exact
difference between what happens with your fix and what happens
without.
I know it relates to the presence of the error code, but also
how we
get
AgentInitializationException instead of AgentLoadException. I
need
to be
able to run the test outside of jtreg but that is proving to be
very
difficult to arrange. :(

I finally managed to connect all the pieces on this.

The basic problem is that with the proposed change
VirtualMachineImpl.execute() will throw AgentLoadException, which
then
leads to the InternalError. Without the change we reach this
block in
HotspotVirtualMachine.loadAgentLibrary:

int result = readInt(in);
if (result != 0) {
   throw new AgentInitializationException("Agent_OnAttach failed",
result);
}

and so get the expected AgentInitializationException.

Looking at the proposed change in jvmtiExport.cpp if we have the
original:

      st->print_cr("%d", result);
      result = JNI_OK;

then execute() manages to read a zero completion status from the
stream,
and then loadAgentLibrary is able to read the actual "result" -
whether
zero or not - from the stream. This is because the AttachListener
code
calls op->complete(result, &st) where st is the stream where we
wrote
the result value above in print_cr. Then if we look at, for
example,
LinuxAttachOperation::complete, we will write "result" to the
socket
first, followed by the contents of st. Hence on a successful
operation
we can read 0 for execute, and then 0 for loadAgent. On error we
read 0
for execute and the actual error code for loadAgent. With the
proposed
change execute() sees the real result (instead of JNI_OK) and so
throws
AgentLoadException.

So in summary there are two different error indicators written
into
the
stream, and we need the first to be zero unless some major error
with
the actual attach functionality occurred - otherwise even if the
"load"
failed in some other way, it is treated as a secondary error.

With that in mind I suspect the real fix for the original issue
is to
have Dcmd recognize that it also has to read two "results".
Though I'm
not sure how exactly.

David
-----

David



If it is accepted, I will upload webrev for this redo task.
However, I cannot access (and fix) Oracle internal test. Can
someone
help me?
(I cannot access JPRT. So I need a sponsor.)


Thanks,

Yasumasa


On 2016/09/13 9:00, David Holmes wrote:
I think I see the problem. The attach framework tries to
load the
"instrument" agent library. Prior to 8164913 if this fails it
actually
appears to succeed. Now it fails, and that error propagates
through.
I'm guessing, at the moment, that the failure is due to a
missing
module related option.

David

On 13/09/2016 9:54 AM, David Holmes wrote:
Hi Yasumasa,

On 13/09/2016 9:23 AM, Yasumasa Suenaga wrote:
Hi,

Hmm, it has been backouted...

I agree to David. This error seems to be a test bug.
Can you share the test? I want to evaluate it.

Sorry we can't share the tests. I took a quick look and it
seems it
may
be related to the result code parsing (that I thought we
ended up
not
touching!).

Can you run a test of HotSpotVirtualMachine.loadAgent(null)
and
see
what
happens? We see AgentLoadException from the code that
internally
tries
to load the "instrument" agent:

Exception in thread "main" Failure: Unexpected exception
during
test
execution: java.lang.InternalError: instrument library is
missing in
target VM
...
Caused by: java.lang.InternalError: instrument library is
missing in
target VM
...
Caused by: com.sun.tools.attach.AgentLoadException: Failed to
load
agent
library


I do not agree to fix this bug in JDK 10 because VM might lie
when
the
JVMTI agent could not be attached. IMHO this bug should be
fixed
in 9
GA, and we should fix testcase(s).

I agree. It has to be noted that "we" failed to run all the
appropriate
tests before pushing this, which would have discovered the
problem -
assuming it is actually a problem with the change and not an
unrelated
issue in our testing environment.

Unfortunately I have some high priority tasks to handle right
now,
so I
can't go into this in any more depth at the moment.

David
-----


Thanks,

Yasumasa


On 2016/09/13 6:15, David Holmes wrote:
For the record it looks like the tests involved are
broken, in
that
because the VM used to lie about the success of attaching
the
agent, the
tests expected different exceptions to occur.

David

On 13/09/2016 3:11 AM, harold seigel wrote:
Looks good.

Harold


On 9/12/2016 1:05 PM, Christian Tornqvist wrote:
Hi everyone,


Please review this (clean) backout of the change for
JDK-8164913, it
failed
several tests in the nightly testing. The failures are
tracked in:
https://bugs.openjdk.java.net/browse/JDK-8165869


Webrev:

http://cr.openjdk.java.net/~ctornqvi/webrev/8165881/webrev.00/




Bug:

https://bugs.openjdk.java.net/browse/JDK-8165881


Thanks,

Christian









Reply via email to