Hi Evangelos,
There's a couple of reasons, but mostly it's because we want to see if
we can improve the time it takes to explore ideas by using long-running
timing simulations instead of the sampling methodology. At the moment,
we tend to spend a lot of time working in functional simulation trying
to see if something has potential, then if we want to measure the
performance impact, we have to generate flexpoints and run timing
simulation. We've consistently been frustrated by the need to develop
models in the functional simulator and then port the same model to the
timing simulator. In addition, the time required to generate the
flexpoints also becomes a bit of a bottleneck, especially for the new
cloudsuite workloads.
So we've been thinking of using the in-order core so that long-running
timing simulations would hopefully run fast enough that we could use
them for early exploration of the performance potential of different
ideas. The thought here being that the order InOrder simulator would be
significantly faster than just putting the OoO simualtor into in-order
mode. Do you have a rough estimate of the kind of speedup you experience
between in-order and out-of-order using the OoO simulator?
Thanks,
Jason
On 2013-03-29 9:26 AM, Evangelos Vlachos wrote:
Hi Jason,
is there a reason why you want to use the InOrder simulator? We
discontinued it (at least) since the last release. Even when I
started using Flexus (6-7 years ago) the older students were
suggesting I would use the OoO simulator and configure it to model an
InOrder core, just because the OoO codebase was getting more
attention. I believe we have been doing that ever since.
Regards,
Evangelos
On Mar 29, 2013, at 1:49 PM, Jason Zebchuk wrote:
We're using timing with an inorder core (InorderSimicsFeeder,
Execute, IFetch, and BPWarm instead of uArch, FetchAddressGenerate,
and uFetch, etc.).
In the first case, we set it to stop after the first cycle and it
actually ran for about 165 cycles or so until the first instruction
for each core completed. We're simulating 16 cores with a scientific
benchmark and most of the cores tried to fetch the same instruction
on the first cycle resulting in a lot of queuing. I tracked the
behavior in this case and it issued 1 instruction for each core and
completed just after every instruction would have finished.
In the second case, it was set to terminate after 15k cycles.
Looking at timestamps, that took a couple of minutes. The next 5k
cycles took about 2 hours and it still hadn't stopped executing.
Because it's so slow, I haven't tried to track down whether there are
any memory requests that are delayed this long in the hierarchy or
whether there's some other reason why it's still executing. From my
experience, it's pretty rare for a memory request to take that long,
especially considering that the in-order core should cause less
contention than an out-of-order core.
We did some debugging with gdb and it's definitely saving the
statistics every cycle, which is definitely create a huge slowdown.
It looks like it's getting stuck in the loop in
nInorderSimicsFeeder::SimicsCycleManager::advanceCycles() in
components/InorderSimicsFeeder/CycleManager.hpp I would expect that
trying to terminate the simulation should cause it to break out of
this loop, but it looks like that's not happening.
Jason
On 2013-03-29 1:10 AM, Mahmood Naderan wrote:
Hi
>It tried to terminate after the first cycle, but it looks like it
kept executing for several cycles afterwards. It kept printing out
the following messages:
What is the end cycle? 1000?
>In one case, it executed 15k cycles very quickly, and then took a
couple of hours executing another 5k cycles and it still hadn't
stopped the simulation
Are you sure this behavior is the result of saving stats every cycle?
Are you using trace? Timing?
--
Regards,
Mahmood
------------------------------------------------------------------------
*From:* Jason Zebchuk <[email protected]>
*To:* "[email protected]" <[email protected]>
*Sent:* Friday, March 29, 2013 5:11 AM
*Subject:* Inorder simulation not stopping gracefully
Hi guys,
We tried running a simulation using the inorder core instead of the
out-of-order core, and we ran into a little problem.
We did:
flexus.set "-magic-break:stop_cycle" "1"
to stop after a single cycle. It tried to terminate after the first cycle, but
it looks like it kept executing for several cycles afterwards. It kept printing
out the following messages:
<breakpoint_tracker.cpp:447> {1}- Reached target cycle. Ending simulation.
<flexus.cpp:717> {1}- Terminating simulation. Timestamp: 2013-Mar-28 20:02:51
<flexus.cpp:718> {1}- Saving final stats_db.
This was repeated over and over (with the cycle number incrementing by one each
time) until the simulation eventually stopped.
It looks like it's waiting for outstanding memory requests to terminate before
exiting the simulation. Is this the normal behavior with the in-order core?
The real problem is that each cycle it tries to save the statistics. When we
try running longer simulations, the statistics get rather large so it advances
very slowly. We also saw cases where it would continue running for several
hours after it should have terminated. In one case, it executed 15k cycles very
quickly, and then took a couple of hours executing another 5k cycles and it
still hadn't stopped the simulation. I'm not sure if this is an issue with the
memory hierarchy taking a long time to complete all of the outstanding
requests, or if there's some other bug in this case.
Any thoughts you might have would be useful.
Thanks,
Jason