Hello Yorgo,
Your drawing looks ok. However, the instruction fetch component does not
forward anything to the L1 cache,
it actually fetches instructions from it.
The most important thing you are missing in this picture is the NUCA cache
design. The way we achieved non-uniform cache
access time in L2 is by using a tiled architecture. This means that every core
has a slice of the L2 cache
and a slice of the directory associated with it. So, every tile has a core with
its private caches,
and a slice of the shared L2 cache. The L2 cache is distributed across the
tiles. The blocks
in the L2 cache are address-interleaved, which means that the last bits of
block's address define
its tile. Tiles are connected together through an on-chip interconnection
network
(the network component you are referring to), which you can configure using an
external file
(e.g. 16x3-mesh.topology file). If this confuses you, please send an e-mail
directly to me.
When we talk about the memory, we don't actually simulate DRAM there. We
simulate DRAM controller and
along with it we simulate DRAM latencies. The responsible component is
MemoryLoopback. We have plans to
integrate a DRAM simulator to simulate those low-level things more accurately.
Regarding RTDirectory, it's an implementation of Region Tracker. It is a part
of a specific
implementation which we don't typically use (we use standard directory). There
is also an
implementation of the Tagless directory scheme. If you want to know more about
this
(this is not related to NUCA designs) you can search for the aforementioned
publications.
theL2Cfg.Cores.initialize(64) - this commands tells the L2 cache how much L1
caches are out there.
If you have 16 cores, it should be 32 (instruction+data caches). Whatever you
write for this
parameter will highly likely be overridden in the config file
(user-postload.simics).
In this file, you should provide your final configuration which you can change
in runtime. You can also specify
various replacement policies for your caches and the coherence protocols you
wish to use.
Regards,
Djordje
________________________________________
From: el06041 [[email protected]]
Sent: Sunday, March 27, 2011 5:58 PM
To: Simflex
Subject: RE: NUCA Cache implementation in SimFlex 4
Hello and thanks for your response.
Following your guidelines, I have been examining during the past few
days the source code, of the simulators and the actual components as
well, so as to get a more in-depth information of the implementations.
In particular, I have been examining the CMP.L2SharedNUCA.Inorder model
that comes with flexus-4.0. For the rest of this e-mail, I will be
referring to this model
---------------------------------------------
First of all, I have designed a layout of the architecture, in trying
to understand the interconnection between the different components. This
is based on the wiring.cpp file of the simulator and can be found here:
https://pithos.grnet.gr/pithos/rest/[email protected]/files/SimFlex/CMP.L2SharedNUCA.Inorder-layout.pdf
If I am correct:
* The Feeder provides the instructions
* The Fetcher fetches the instructions and forwards them to L1
Instruction Cache and to BPWarm
* The BPWarm component must be the Branch Predictor
* The Execute component - very obviously - executes the instructions
and requests data from L1 Data Cache.
* L1 Instruction / Data cache components: obvious
* L2 cache: is the component I have to configure in order to implement
a shared NUCA cache (correct?)
* NetMapper: is the splitter that distributes requests among the
components
* Memory: must be the memory below L2 (i.e. RAM)
Have I understood the architecture correctly?
Moreover, what is the purpose of the "NIC" and "Network" components?
As I have seen:
* The Network component is an instance of a "NetShim/MemoryNetwork"
component, though it is not very obvious what is its relevancy with the
L2 Cache.
* Concerning the NIC, I guess it must be a Network Interface
Controller. I have taken a look in the MultiNic component folder, where
I've seen that it has multiple implementations: MultiNic1, MultiNic2,
MultiNic3, MultiNic4 and a general MultiNicX, where it must hold general
implementations.
I have seen that the various implementations have been defining
different values for FLEXUS_MULTI_NIC_NUMPORTS: does it have to do with
the number of the components the NIC is connected with?
---------------------------------------------
On the L2 cache:
I have seen a sample configuration in wiring.cpp of
L2SharedNUCA.Inorder, as well as
flexus-4.0/components/CMPCache/CMPCache.hpp.
In wiring.cpp, the parameter theL2Cfg.Cores.initialize(64) initializes
64 cores. What are these cores and how are they related to the CPU cores
or the 64 banks of the cache, which are initialized at
theL2Cfg.Banks.initialize(64)?
What is more, I am trying to figure out where following issues are
defined:
* The mapping between "CPU cores - L2 Banks", that is to which core is
each CPU mapped to.
* Replacement/Migration policies. I have only noticed that the
coherence policy is in
flexus-4.0/components/CMPCache/NonInclusiveMESIPolicy.cpp, if I am
correct.
Finally, I have found in flexus-4.0/components/CMPCache/RTDirectory.hpp
the following scheme:
Physical address layout:
+---------+--------------+------+-------------+----------------------------+
| Tag | R Index High | Bank | R Index Low | RegionOffset |
BlockOffset |
+---------+--------------+------+-------------+----------------------------+
|<------ setLowShift
------->|
|<----------->|
setMaskLow
Where I have not understood the purpose of the fields R Index High/Low.
I presume, by the way, that due to the presence "Bank" field, the
placement policy of the data in the corresponding NUCA Banks must be
static,
i.e. every block will be *initially* always placed in the same bank,
according to its address.
---------------------------------------------
On the Network component:
As I have seen in the wiring.cpp file, the parameter
"theNetworkCfg.NetworkTopologyFile.initialize()" selects the topology
file that will be used for the network.
An example file is 16x3-mesh.topology (it can be initially found in
L2ShadedNUCA.OoO folder), which defines a 4x4 grid of "switches", where
each switch has 4 ports to interconnect with other switches and 3 ports
that connect the switch with nodes (so in total (4x4)x3 nodes).
I have understood the topology and the routing tables that are being
defined, but I have not understood how these nodes and switches are
related to the L2-NUCA cache, if there is any relationship at all.
---------------------------------------------
Thank you in advance for your help.
I will be glad to provide any additional information, you might need.
-George
On Wed, 23 Mar 2011 16:47:14 +0000, Djordje Jevdjic wrote:
> Hello,
>
> Thanks for your message.
>
> Concerning your first question: yes, all the messages exchanged
> through this list are in one of those archives. For technical reasons
> we decided to split them into two separate archives (the old and the
> new archive).
>
> Regarding your second question: I don't think you need to implement
> anything to have a NUCA simulator. NUCA systems have
> been already implemented (actually, almost all simulators we use in
> Flexus are NUCA simulators).
> The ones you listed bellow (
> flexus-4.0/simulators/CMP.L2SharedNUCA.Inorder and
> flexus-4.0/simulators/CMP.L2SharedNUCA.OoO)
> are examples of such architectures with a shared and tiled L2 cache.
> So, things have already been implemented, there's no need to
> reimplement it.
>
> However, if you are interested to know more details of the
> implementations, you can look at the source code and find some useful
> comments there.
> If you are examining the source code, it's a good idea to look at the
> code of individual components included in the simulator, not the
> simulator directory itself.
> You might also want to check the getting started guide on our
> website. Besides that and the Simflex publications, we don't maintain
> any further documentation.
>
> Also, keep in mind that the current version of Flexus works only with
> Simics 3. Whatever you try to do with Simics 4 highly likely will not
> work.
> We are planning to move to Simics 4 soon.
>
> Regards,
> Djordje