Hi Evangelos,

There's a couple of reasons, but mostly it's because we want to see if we can improve the time it takes to explore ideas by using long-running timing simulations instead of the sampling methodology. At the moment, we tend to spend a lot of time working in functional simulation trying to see if something has potential, then if we want to measure the performance impact, we have to generate flexpoints and run timing simulation. We've consistently been frustrated by the need to develop models in the functional simulator and then port the same model to the timing simulator. In addition, the time required to generate the flexpoints also becomes a bit of a bottleneck, especially for the new cloudsuite workloads.

So we've been thinking of using the in-order core so that long-running timing simulations would hopefully run fast enough that we could use them for early exploration of the performance potential of different ideas. The thought here being that the order InOrder simulator would be significantly faster than just putting the OoO simualtor into in-order mode. Do you have a rough estimate of the kind of speedup you experience between in-order and out-of-order using the OoO simulator?


Thanks,

Jason



On 2013-03-29 9:26 AM, Evangelos Vlachos wrote:
Hi Jason,

is there a reason why you want to use the InOrder simulator? We discontinued it (at least) since the last release. Even when I started using Flexus (6-7 years ago) the older students were suggesting I would use the OoO simulator and configure it to model an InOrder core, just because the OoO codebase was getting more attention. I believe we have been doing that ever since.

Regards,
Evangelos

On Mar 29, 2013, at 1:49 PM, Jason Zebchuk wrote:

We're using timing with an inorder core (InorderSimicsFeeder, Execute, IFetch, and BPWarm instead of uArch, FetchAddressGenerate, and uFetch, etc.).

In the first case, we set it to stop after the first cycle and it actually ran for about 165 cycles or so until the first instruction for each core completed. We're simulating 16 cores with a scientific benchmark and most of the cores tried to fetch the same instruction on the first cycle resulting in a lot of queuing. I tracked the behavior in this case and it issued 1 instruction for each core and completed just after every instruction would have finished.

In the second case, it was set to terminate after 15k cycles. Looking at timestamps, that took a couple of minutes. The next 5k cycles took about 2 hours and it still hadn't stopped executing. Because it's so slow, I haven't tried to track down whether there are any memory requests that are delayed this long in the hierarchy or whether there's some other reason why it's still executing. From my experience, it's pretty rare for a memory request to take that long, especially considering that the in-order core should cause less contention than an out-of-order core.

We did some debugging with gdb and it's definitely saving the statistics every cycle, which is definitely create a huge slowdown.

It looks like it's getting stuck in the loop in nInorderSimicsFeeder::SimicsCycleManager::advanceCycles() in components/InorderSimicsFeeder/CycleManager.hpp I would expect that trying to terminate the simulation should cause it to break out of this loop, but it looks like that's not happening.


Jason



On 2013-03-29 1:10 AM, Mahmood Naderan wrote:

Hi

>It tried to terminate after the first cycle, but it looks like it kept executing for several cycles afterwards. It kept printing out the following messages:

What is the end cycle? 1000?


>In one case, it executed 15k cycles very quickly, and then took a couple of hours executing another 5k cycles and it still hadn't stopped the simulation

Are you sure this behavior is the result of saving stats every cycle?

Are you using trace? Timing?

--
Regards,
Mahmood



------------------------------------------------------------------------
*From:* Jason Zebchuk <[email protected]>
*To:* "[email protected]" <[email protected]>
*Sent:* Friday, March 29, 2013 5:11 AM
*Subject:* Inorder simulation not stopping gracefully

  Hi guys,

We tried running a simulation using the inorder core instead of the 
out-of-order core, and we ran into a little problem.

We did:

flexus.set "-magic-break:stop_cycle" "1"

to stop after a single cycle. It tried to terminate after the first cycle, but 
it looks like it kept executing for several cycles afterwards. It kept printing 
out the following messages:

<breakpoint_tracker.cpp:447> {1}- Reached target cycle. Ending simulation.
<flexus.cpp:717> {1}- Terminating simulation. Timestamp: 2013-Mar-28 20:02:51
<flexus.cpp:718> {1}- Saving final stats_db.

This was repeated over and over (with the cycle number incrementing by one each 
time) until the simulation eventually stopped.

It looks like it's waiting for outstanding memory requests to terminate before 
exiting the simulation. Is this the normal behavior with the in-order core?

The real problem is that each cycle it tries to save the statistics.  When we 
try running longer simulations, the statistics get rather large so it advances 
very slowly. We also saw cases where it would continue running for several 
hours after it should have terminated. In one case, it executed 15k cycles very 
quickly, and then took a couple of hours executing another 5k cycles and it 
still hadn't stopped the simulation.  I'm not sure if this is an issue with the 
memory hierarchy taking a long time to complete all of the outstanding 
requests, or if there's some other bug in this case.

Any thoughts you might have would be useful.


Thanks,

Jason



Reply via email to