Hi,
I managed to get get the test-app working with CMPFlex. When I try
running './interactive -ma CMPFlex.OoO', however, it seems to work, but
it takes a long time. Running it with CMPFlex takes about 5-6 minutes,
but I have yet to see how long CMPFlex.OoO takes to run. I started
running it last Thursday or Friday, and I believe it ran until the
server crashed this morning. I don't know if the server crash was
related to running CMPFlex.OoO, but I'm pretty sure it didn't finish
running before it crashed.
The simulation appears to be working with CMPFlex.OoO, messages keep
appearing on the simics terminal, and none of them appear to be errors.
The only suspicious messages are:
164 <CacheControllerImpl.cpp:651> {295471}- Upgrade reply does not
contain data
and I remember seeing an earlier post saying there's nothing wrong with
these messages.
Do you have any suggestions for what might be going wrong with the
simulation? Is it possible that it's just running really, really slowly
and it will eventually finish?
Thanks,
Jason
From twenisch at ece.cmu.edu Mon Feb 27 14:07:11 2006
From: twenisch at ece.cmu.edu (Thomas Wenisch)
List-Post: [email protected]
Date: Mon Feb 27 14:06:55 2006
Subject: [Simflex] SimFlex help
In-Reply-To: <[email protected]>
References: <[email protected]>
Message-ID: <pine.lnx.4.53l-ece.cmu.edu.0602241410130.9...@dalmore.ece.cmu.edu>
Hi Shadi,
The SimFlex cache modules are designed to support an arbitrary number of
cache levels. The set of components in a simulation and the way they are
connected is described in the simulator wiring file, in this case
simulators/CMPFlex/wiring.cpp. To add a third cache level, you need to
add additional statements to each section of this file.
I suggest you examine this file and simulators/UniFlex/wiring.cpp.
Roughly speaking, you will want to duplicate all of the wiring description
for the L2 Cache in UniFlex, and insert it into CMPFlex, connecting it
beneath the shared L2.
Take a look at these files and the SimFlex tutorial slides to get started,
and then contact the mailing list again if you need more assistance.
Regards,
-Tom Wenisch
Computer Architecture Lab
Carnegie Mellon University
On Fri, 24 Feb 2006, Shadi Harb wrote:
> Hi all,
>
> I am wondering if somebody can help me out of this, I need to
> modify Simflex in someway to implement a third level of cache(L3)
> in CMP architecture. I would be grateful if I can get any hints or
> suggestions.
>
> Thanks in advance
> Shadi
>
>
> _______________________________________________
> SimFlex mailing list
> [email protected]
> https://sos.ece.cmu.edu/mailman/listinfo/simflex
> SimFlex web page: http://www.ece.cmu.edu/~simflex
>
From twenisch at ece.cmu.edu Mon Feb 27 17:08:41 2006
From: twenisch at ece.cmu.edu (Thomas Wenisch)
List-Post: [email protected]
Date: Mon Feb 27 17:08:18 2006
Subject: [Simflex] Problem running test-app with CMPFlex.OoO
In-Reply-To: <[email protected]>
References: <[email protected]>
Message-ID: <pine.lnx.4.53l-ece.cmu.edu.0602271408020.13...@dalmore.ece.cmu.edu>
Hi Jason,
On Mon, 27 Feb 2006, Jason Zebchuk wrote:
> Hi,
>
> I managed to get get the test-app working with CMPFlex. When I try
> running './interactive -ma CMPFlex.OoO', however, it seems to work, but
> it takes a long time. Running it with CMPFlex takes about 5-6 minutes,
> but I have yet to see how long CMPFlex.OoO takes to run. I started
> running it last Thursday or Friday, and I believe it ran until the
> server crashed this morning. I don't know if the server crash was
> related to running CMPFlex.OoO, but I'm pretty sure it didn't finish
> running before it crashed.
>
> The simulation appears to be working with CMPFlex.OoO, messages keep
> appearing on the simics terminal, and none of them appear to be errors.
> The only suspicious messages are:
>
> 164 <CacheControllerImpl.cpp:651> {295471}- Upgrade reply does not
> contain data
Yes, these messages are not serious.
>
> and I remember seeing an earlier post saying there's nothing wrong with
> these messages.
>
> Do you have any suggestions for what might be going wrong with the
> simulation? Is it possible that it's just running really, really slowly
> and it will eventually finish?
It sounds like it is running correctly. OoO simulation is slow, we
typically see a simulation rate in the kIPS range. The one thing you
might want to do is run for a while, and then stop the simulation and
inspect the code (using the Simics disassemble command) that is running to
make sure that it is user code that looks reasonable, and not something
that looks like the tight loop in the OS kernel-panic handler. If
something is amiss with Flexus, the starting checkpoints, etc, the
simulated Solaris will eventually panic.
Simulating more than 1M cycles is painful. To be able to get any
meaningful measurement results for anything but microbenchmarks, you will
have to employ some form of sampling as we describe in the tutorial.
What we typically do is use TraceFlex (or TraceCMPFlex) to rapidly create
cache + Simics state checkpoints spread over the application. We than
measure a small, fixed number of cycles from each starting point. I
suggest you take a look at the code for the MagicBreak component to see
how you can configure Flexus to stop / save checkpoints at certain points
in the simulation.
Regards,
-Tom Wenisch
Computer Architecture Lab
Carnegie Mellon University
>
>
> Thanks,
>
> Jason
> _______________________________________________
> SimFlex mailing list
> [email protected]
> https://sos.ece.cmu.edu/mailman/listinfo/simflex
> SimFlex web page: http://www.ece.cmu.edu/~simflex
>
From liqun.cheng at gmail.com Mon Feb 27 12:34:42 2006
From: liqun.cheng at gmail.com (Liqun Cheng)
List-Post: [email protected]
Date: Mon Feb 27 17:10:06 2006
Subject: [Simflex] question on representative queries in TPC-H
Message-ID: <[email protected]>
Hi,
My name is Liqun Cheng, a graduate student in Univ. of Utah. I am trying to
set up simics to run TPC-H on mysql. The TPC-H implementation from
tpc.orgprovides 22 queries, in which I can finish 9 successfully.
However, running
all these quries with detailed timing model is almost infeasible. Could you
guys share some experence on which queries are considered to be
representative. And what techniques are appropriate to warm up the cache?
thanks a lot!
Legion
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://sos.ece.cmu.edu/pipermail/simflex/attachments/20060227/0a876cca/attachment.html
From twenisch at ece.cmu.edu Mon Feb 27 17:33:33 2006
From: twenisch at ece.cmu.edu (Thomas Wenisch)
List-Post: [email protected]
Date: Mon Feb 27 17:33:18 2006
Subject: [Simflex] question on representative queries in TPC-H
In-Reply-To: <[email protected]>
References: <[email protected]>
Message-ID: <pine.lnx.4.53l-ece.cmu.edu.0602271710480.13...@dalmore.ece.cmu.edu>
Hi Liqun,
On Mon, 27 Feb 2006, Liqun Cheng wrote:
> Hi,
>
> My name is Liqun Cheng, a graduate student in Univ. of Utah. I am trying to
> set up simics to run TPC-H on mysql. The TPC-H implementation from
> tpc.orgprovides 22 queries, in which I can finish 9 successfully.
> However, running all these quries with detailed timing model is almost
> infeasible. Could you guys share some experence on which queries are
> considered to be representative.
I suggest you take a look at this paper:
http://www.ece.cmu.edu/~simflex/publ/cascon2005.pdf "DBmbench: Fast and
Accurate Database Workload Representation on Modern Microarchitecture".
One of the contributions of that work is a categorization of TPC-H queries
into scan-bound, join-bound, and balanced behavior. Queries within the
same category have similar microarchitectural and cache behavior on real
hardware. We use this categorization in our research to choose a set of
queries for study. As an example, Query 1 is scan-bound, Query 2 is
join-bound (and extremely short), and Query 17 is balanced. While these
three queries don't reveal every behavior of TPC-H, they are more
representative than a selection at random.
> And what techniques are appropriate to warm up the cache?
>
We have some guidance on this in the SimFlex tutorial; I suggest you
review the slides (and attend at ISCA if you can ;)
We use systematic sampling to rapidly measure TPC-H queries. The key idea
is to create checkpoints which contain accurate cache state using a
simpler simulation model, and then lauch your OoO simulations with these
warmed caches. Thus, you avoid the expense of OoO simulation while
warming the cache. This methodology is based on the SMARTS and Live-points
techniques we have developed for SPEC 2K; those papers are available on
the SimFlex publications page.
Here are some details that are specific to TPC-H. We prepare checkpoints
for detailed timing measurements using the following multi-tier procedure:
1) Spread Simics checkpoints evenly over the query execution.
We run Simics in fast mode (~10-25 MIPS) to create a set of 10-30
starting points spread across execution. These checkpoints contain only
architectural state - no caches. We do this step so that we can
parallelize step 2 over many machines (step 1 is neccessarily serial).
2) Create accurate cache checkpoints between Simics checkpoints
We use the TraceFlex simulation mode in Flexus (~1.5 MIPS) to create
a chain of accurate cache images between each of the Simics checkpoints.
TraceFlex is a high-speed cache simulator, but does not maintain any
notion of time. The cache checkpoints are widely spread, typically 100M
or more cycles apart, and we chain the cache image from one simulation to
the next, so the caches are perfectly warm. A typical sample size might
be around 200 checkpoints.
3) Launch OoO timing jobs from each cache checkpoint
We run a short OoO simulation from each checkpoint (~150k cycles). We
use the beginning of each simulation to warm pipeline/queue/interconnect
state, and then use the final portion of each simulation to collect
statistics. More details on this appear in the tutorial.
--
I hope the above helps. Let me know if you have any specific questions.
Regards,
-Tom Wenisch
Computer Architecture Lab
Carnegie Mellon University