On 6/6/18 11:52 AM, Gary Adams wrote:
On 6/6/18, 2:20 PM, Chris Plummer wrote:
If that were the case, shouldn't
the delay be added to sendCommand() instead?
I'm still a bit unsure about how this sequence is suppose to
work in the first place:
sendCommand(command);
return receiveReply(startPos, compoundPromptOnly,
count);
Send command "cont"
Receive the reply "main[1]"
At this point the delay allows the event handler to run in the
debuggee process.
I guess the assumption is that the first prompt will always be
a simple one? Is that the case? If so, then the expectation is
that we will see:
main[1] cont
>
"Breakpoint hit:" <location>
main[1]
But then we can still see, even after your fix:
main[1] cont
"Breakpoint hit:" > <location>
main[1]
I'm not clear on what you are saying in this example.
Just sometimes I see ">" as the prompt and sometimes "main[1]",
but in either case the prompt could appear in the middle of the
output when the async command completes.
However, I think maybe the issue is different for step
commands, which seem to result in two compound prompts.
Remember your example:
Here's another recent failure with
eval001
1. cont and Break hit
2. step and Step completed (partial reply)
3. eval and error with second half of Step completed
====================================
Sending command: cont
reply[0]: >
reply[1]:
Breakpoint hit: "thread=main",
nsk.jdb.eval.eval001.eval001a.lastBreak(), line=23 bci=0
reply[2]:
23 static void lastBreak () {}
reply[3]:
reply[4]:
====================================
main[1]
Sending command: step
reply[0]:
reply[1]:
Step completed: main[1]
====================================
Sending command: eval
nsk.jdb.eval.eval001.eval001a.myStaticField
reply[0]: "thread=main",
nsk.jdb.eval.eval001.eval001a.runIt(), line=36 bci=53
reply[1]:
36 }
reply[2]:
reply[3]:
====================================
main[1]
# ERROR: jdb failed to report value of _expression_:
nsk.jdb.eval.eval001.eval001a.myStaticField
<snip>
# ERROR: expected : -2147483648 ;
# ERROR: reported: "thread=main",
nsk.jdb.eval.eval001.eval001a.runIt(), line=36 bci=53
Finally the sequence from the jdb.session file. Notice the
main[1] prompt
embedded in the Step completed reply.
main[1] cont
>
Breakpoint hit: "thread=main",
nsk.jdb.eval.eval001.eval001a.lastBreak(), line=23 bci=0
23 static void lastBreak () {}
main[1] step
Step completed: main[1] "thread=main",
nsk.jdb.eval.eval001.eval001a.runIt(), line=36 bci=53
36 }
main[1] eval
nsk.jdb.eval.eval001.eval001a.myStaticField
The part that uses "cont" is fine. The problems started when
the step command was issued. Rather than receiving:
Step completed: "thread=main",
nsk.jdb.eval.eval001.eval001a.runIt(), line=36 bci=53
We got:
Step completed: main[1]
====================================
Sending command: eval
nsk.jdb.eval.eval001.eval001a.myStaticField
reply[0]: "thread=main",
nsk.jdb.eval.eval001.eval001a.runIt(), line=36 bci=53
I don't see your change addressing this. The step command was
considered to be done as soon as the main[] prompt arrived,
even though it came right in the middle of the full "Step
completed: ..." message. Your delay won't change this. Yes, it
will make sure the rest of the reply has arrived, but the part
after main[1] will not be considered part of the step reply
because it will terminate at the main[1] prompt.
With the proposed change to delay after receiving a reply after
command, we would see
====================================
Sending command: cont
reply[0]: >
==================================== delay after command
Breakpoint hit: "thread=main",
nsk.jdb.eval.eval001.eval001a.lastBreak(), line=23 bci=0
23 static void lastBreak () {}
main[1]
====================================
Sending command: step
main[1]
==================================== delay after command
Step
completed: "thread=main",
nsk.jdb.eval.eval001.eval001a.runIt(), line=36 bci=53
36 }
Yes, you have a delay after the command. Yes, you are not going to
actually read the "Step completed:" reply until after it is fully
received. However, you've done nothing to introduce a delay after
the initial main[1] prompt is sent and before the "Step completed:
..." is sent. They are both sent on different threads and can still
be interspersed.
The point I'm trying to get across is I don't think the issue is
with the rate of commands being sent. I think the issue can be
isolated to a single step command that has the first prompt
appearing in the middle of the "Step completed: ..." output.
command thread: receive and execute "step" command
event thread: receive "step" breakpoint. print "Step completed: "
command thread: print "main[1]"
event thread: print text that comes after "Step completed: "
event thread: print "main[1]"
This has to be fixed in the sender of the output, not the receiver
of the output (or the sender of the command).
Chris
====================================
Sending command: eval nsk.jdb.eval.eval001.eval001a.myStaticField
main[1]
==================================== delay after command
...
I'll try some testing over night.
Chris
On 6/6/18 10:44 AM, Gary Adams wrote:
On
6/6/18, 1:26 PM, Chris Plummer wrote:
Hi Gary,
I thought the issue was having 1 command like "cont" or
"step" producing two prompts, and the first prompt ending up
in the middle of the output for the breakpoint. So instead
of:
main[1] cont
main[1]
"Breakpoint hit:" <location>
main[1]
We sometimes get:
main[1] cont
"Breakpoint hit:" main[1] <location>
main[1]
If you look at the commands sent :
stop at location1
stop at location2
cont
cont
the expectation is that breakpoint will be encountered and
then
the test will continue til another breakpoint is encountered.
The problem only occurs because the commands are sent
so quickly a race condition is introduced between the
command processing and the event handling in the debuggee
process. Since prompts are written both from the command
thread and the event handler, the output can end up
interspersed.
I don't think a delay on the receiving end like you have
added can effect this. You need to make sure the main[1]
produced for the "cont" command is in the output stream
before handling the breakpoint event. Dan's solution was a
bit different in that it actually delayed reaching the
breakpoint. I don't think your solution does the same.
My goal is to delay the sending of the next command,
because the debuggee is not actually ready for the command.
Chris
On 6/6/18 10:07 AM, Gary Adams wrote:
It may be sufficient when the
command prompt is delivered
to introduce a brief pause to allow an asynchronous event
to
be processed before another command can be sent. e.g.
throttling the stream of commands being sent
diff --git
a/test/hotspot/jtreg/vmTestbase/nsk/share/jdb/Jdb.java
b/test/hotspot/jtreg/vmTestbase/nsk/share/jdb/Jdb.java
--- a/test/hotspot/jtreg/vmTestbase/nsk/share/jdb/Jdb.java
+++ b/test/hotspot/jtreg/vmTestbase/nsk/share/jdb/Jdb.java
@@ -348,7 +348,17 @@
int startPos = stdoutBuffer.length();
sendCommand(command);
- return receiveReply(startPos, compoundPromptOnly,
count);
+ String[] reply = receiveReply(startPos,
compoundPromptOnly, count);
+
+ if (asyncCommand(command)) {
+ // delay to allow debuggee time to process
async command
+ try {
+ Thread.sleep(100);
+ } catch (InterruptedException e) {
+ // ignored
+ }
+ }
+ return reply;
}
On 6/6/18, 11:51 AM, Daniel D. Daugherty wrote:
The two prompts are for two
different things so I don't think
that removing one of the prompts is the right solution
either.
In the case of an asynchronous command like 'cont', you
get
the '<prompt=after-cont-command>' right away
because the command
processor is ready for another command. You will also
get the
'<prompt-after-breakpoint-event-location1>' prompt
when the
breakpoint is hit because that identifies the thread
that posted
the breakpoint event. If it takes a long time to hit the
breakpoint,
then it will be a long time before the breakpoint prompt
is seen.
Likewise, if the breakpoint is hit quickly, then the
breakpoint
prompt will be seen more quickly and will sometimes
interfere with
the command processor prompt. That's normal for an async
command.
One possible way to fix this kind of issue is to
structure the
test flow to slightly delay the target thread so it
takes longer
to get to the breakpoint.
'<prompt=after-cont-command>' will have
a better chance to be printed before (and not colliding
with) the
'<prompt-after-breakpoint-event-location1>'
prompt.
So if the test program has:
<some line of code>; // breakpoint here
change it to something like:
try { Thread.sleep(100); } catch
(InterruptedException ie) {};
<some line of code>; // breakpoint here
Of course, I'm assuming here that the test driver code
is looking
for the breakpoint to be hit (and not for
'<prompt=after-cont-command>').
Jerry and I did find a few jdb based tests looking for
the wrong prompt...
Dan
On 6/2/18 7:02 AM, gary.ad...@oracle.com
wrote:
On 6/1/18 6:54 PM, Chris Plummer
wrote:
Hi Gary,
I'm a bit unclear on one point. According to your
description, should we be seeing two prompts for
commands like "cont", and "step"? When you say we
are expecting:
"Breakpoint hit:" <location>
<prompt>
Isn't it really:
<prompt>
"Breakpoint hit:" <location>
<prompt>
Let me stretch out the log a little bit
The <prompt> is main[1] or Thread-1[1], etc.
and not distinguished for command or event completion.
<prompt>
stop at <location1>
<prompt-after-stop-command>
stop at <location2>
<prompt-after-stop-command>
cont
<prompt=after-cont-command>
cont
"Breakpoint hit:" <location1>
<prompt-after-cont-command>
"Breakpoint hit:"
<prompt-after-breakpoint-event-location1>
<location2>
<prompt-after-breakpoint-event-location2>
The source of the command or event prompts
was verified with a stacktrace in the printPrompt()
method.
I figure this must be the case, because your fix
removes the printing of a prompt, and we still need
one prompt, so there must have been two to start
with. I think in the above the last <prompt>
is being produced sequentially right after the
"Breakpoint hit:" output, so is never an issue. The
first <prompt> comes from the printPrompt()
call after issuing the "cont" command, and it's
possible to hit the breakpoint and start producing
more output before printPrompt() has completed, thus
causing the problem you are seeing.
printPrompt() is a very quick operation and is not
being interrupted.
But the output from the event handler is produced
incrementally.
- event header, e.g. "Breakpont hit:", "Step
completed:", etc.
- from vmInterrupted()
- printCurrentLocation()
- monitor command output (if any)
- printPrompt()
There seems to be some purpose to printing this
first prompt and I'm not so sure eliminating it is
the right thing to do. I think it might be there to
indicate that the command has been initiated,
because there could be some delay before the
breakpoint is hit. So the issue then is how to make
sure this prompt appears in the input stream before
any of the "Step completed:" or "Breakpoint hit:"
output. Possibly event processing should be blocked
until executeCommand() completes.
The primary purposes of prompts is for "user
interactive" nudge to type another command.
But the failures are from tests that are using the
presence of a prompt as an ACK of the
last command. This "prompt-protocol", does not include
any sequence numbers ,
does not pair prompt to command or event, and does not
block during
any multipart message formation.
To prevent the error we are observing we would have to
buffer all of the event output
"Breakpoint hit:" <location>
<monitored-commands>
<prompt for breakpoint event>
Another way to look at the current failure is to say
the event handler output
has been interrupted by a command sent prematurely.
This would never
be seen at user speeds typing in commands.
I should also mention any other output from the user
application can also
interfere with the command and event output streams. I
have seen
stray prompts in the logs from System.out.println().
That was the
reason earlier I was looking at changing the tests to
look for
specific prompts before continuing to the next
command.
\gra
thanks,
Chris
On 5/26/18 3:50 AM, gary.ad...@oracle.com
wrote:
This is a review request for
a previously closed test bug.
The test was recently moved to the open repos, and
the
proposed fix is in the open code.
Webrev: http://cr.openjdk.java.net/~gadams/8169718/webrev/
-------- Forwarded Message --------
Subject: RFR: JDK-8169718:
nsk/jdb/locals/locals002: ERROR: Cannot find
boolVar with expected value: false
Date: Fri, 25 May 2018 11:35:10 -0400
From: Gary Adams <gary.ad...@oracle.com>
Reply-To: gary.ad...@oracle.com
The jdb tests use stdin to send commands to a jdb
process
and parses the stdout to determine if a command
was
successful and when the process is prompting for
new commands
to be sent.
Some commands are synchronous, so when the command
is completed
a new prompt is sent back immediately.
Some commands are asynchronous, so there could be
a delay
until a breakpoint is reached. The event handler
then sends a prompt
when the application thread is stopped and new jdb
commands can be sent.
The problem causing the intermittent failures was
a corruption in the
output stream when prompts were being sent at the
wrong times.
Instead of receiving
"Breakpoint hit:" <location>
<prompt>
the log contained
"Breakpoint hit:" <prompt>
<location>
Once out of sync, jdb commands were being sent
prematurely
and the wrong values were being compared against
expected behavior.
The simple fix proposed here recognizes that
commands like "cont",
"step" and "next" are asynchronous commands and
should not send back
a prompt immediately. Instead. the event handler
will deliver the next prompt
when the next "Breakpoint hit:" or "Step
completed:" state change occurs.
The bulk of the testing was done on
windows-x64-debug builds where the
intermittent failures were observed in ~5 in 1000
testruns. The fix has
also been tested on linux-x64-debug,
solaris-sparcv9-debug,
and macosx-x64-debug, even though the failures
have never been reported
against those platforms.
Failures have been observed in many of the nsk/jdb
tests with similar corrupted
output streams, but never directly associated with
this issue before.
redefine001, caught_exception002, locals002,
eval001, next001,
stop_at003, step002, print002, trace001,
step_up001, read001,
clear004, kill001, set001
|