Hello Yorgo,
  
Your drawing looks ok. However, the instruction fetch component does not 
forward anything to the L1 cache,
it actually fetches instructions from it.

The most important thing you are missing in this picture is the NUCA cache 
design. The way we achieved non-uniform cache
access time in L2 is by using a tiled architecture. This means that every core 
has a slice of the L2 cache
and a slice of the directory associated with it. So, every tile has a core with 
its private caches,
and a slice of the shared L2 cache. The L2 cache is distributed across the 
tiles. The blocks 
in the L2 cache are address-interleaved, which means that the last bits of 
block's address define
its tile. Tiles are connected together through an on-chip interconnection 
network 
(the network component you are referring to), which you can configure using an 
external file 
(e.g. 16x3-mesh.topology file). If this confuses you, please send an e-mail 
directly to me.

When we talk about the memory, we don't actually simulate DRAM there. We 
simulate DRAM controller and
along with it we simulate DRAM latencies. The responsible component is 
MemoryLoopback. We have plans to
integrate a DRAM simulator to simulate those low-level things more accurately.

Regarding RTDirectory, it's an implementation of Region Tracker. It is a part 
of a specific 
implementation which we don't typically use (we use standard directory). There 
is also an 
implementation of the Tagless directory scheme. If you want to know more about 
this 
(this is not related to NUCA designs) you can search for the aforementioned 
publications.

                                              
theL2Cfg.Cores.initialize(64)  - this commands tells the L2 cache how much L1 
caches are out there.
If you have 16 cores, it should be 32 (instruction+data caches). Whatever you 
write for this 
parameter will highly likely be overridden in the config file 
(user-postload.simics).
In this file, you should provide your final configuration which you can change 
in runtime. You can also specify
various replacement policies for your caches and the coherence protocols you 
wish to use.

Regards,
Djordje

________________________________________
From: el06041 [[email protected]]
Sent: Sunday, March 27, 2011 5:58 PM
To: Simflex
Subject: RE: NUCA Cache implementation in SimFlex 4

 Hello and thanks for your response.


 Following your guidelines, I have been examining during the past few
 days the source code, of the simulators and the actual components as
 well, so as to get a more in-depth information of the implementations.

 In particular, I have been examining the CMP.L2SharedNUCA.Inorder model
 that comes with flexus-4.0. For the rest of this e-mail, I will be
 referring to this model

 ---------------------------------------------

 First of all, I have designed a layout of the architecture, in trying
 to understand the interconnection between the different components. This
 is based on the wiring.cpp file of the simulator and can be found here:

 
https://pithos.grnet.gr/pithos/rest/[email protected]/files/SimFlex/CMP.L2SharedNUCA.Inorder-layout.pdf

 If I am correct:

 * The Feeder provides the instructions
 * The Fetcher fetches the instructions and forwards them to L1
 Instruction Cache and to BPWarm
 * The BPWarm component must be the Branch Predictor
 * The Execute component - very obviously - executes the instructions
 and requests data from L1 Data Cache.

 * L1 Instruction / Data cache components: obvious
 * L2 cache: is the component I have to configure in order to implement
 a shared NUCA cache (correct?)
 * NetMapper: is the splitter that distributes requests among the
 components
 * Memory: must be the memory below L2 (i.e. RAM)

 Have I understood the architecture correctly?
 Moreover, what is the purpose of the "NIC" and "Network" components?

 As I have seen:
 * The Network component is an instance of a "NetShim/MemoryNetwork"
 component, though it is not very obvious what is its relevancy with the
 L2 Cache.

 * Concerning the NIC, I guess it must be a Network Interface
 Controller. I have taken a look in the MultiNic component folder, where
 I've seen that it has multiple implementations: MultiNic1, MultiNic2,
 MultiNic3, MultiNic4 and a general MultiNicX, where it must hold general
 implementations.
 I have seen that the various implementations have been defining
 different values for FLEXUS_MULTI_NIC_NUMPORTS: does it have to do with
 the number of the components the NIC is connected with?

 ---------------------------------------------

 On the L2 cache:
 I have seen a sample configuration in wiring.cpp of
 L2SharedNUCA.Inorder, as well as
 flexus-4.0/components/CMPCache/CMPCache.hpp.

 In wiring.cpp, the parameter theL2Cfg.Cores.initialize(64) initializes
 64 cores. What are these cores and how are they related to the CPU cores
 or the 64 banks of the cache, which are initialized at
 theL2Cfg.Banks.initialize(64)?

 What is more, I am trying to figure out where following issues are
 defined:
 * The mapping between "CPU cores - L2 Banks", that is to which core is
 each CPU mapped to.
 * Replacement/Migration policies. I have only noticed that the
 coherence policy is in
 flexus-4.0/components/CMPCache/NonInclusiveMESIPolicy.cpp, if I am
 correct.


 Finally, I have found in flexus-4.0/components/CMPCache/RTDirectory.hpp
 the following scheme:

 Physical address layout:
 +---------+--------------+------+-------------+----------------------------+
 | Tag     | R Index High | Bank | R Index Low | RegionOffset |
 BlockOffset |
 +---------+--------------+------+-------------+----------------------------+
                                               |<------ setLowShift
 ------->|
                                 |<----------->|
                                    setMaskLow

 Where I have not understood the purpose of the fields R Index High/Low.

 I presume, by the way, that due to the presence "Bank" field, the
 placement policy of the data in the corresponding NUCA Banks must be
 static,
 i.e. every block will be *initially* always placed in the same bank,
 according to its address.

 ---------------------------------------------

 On the Network component:

 As I have seen in the wiring.cpp file, the parameter
 "theNetworkCfg.NetworkTopologyFile.initialize()" selects the topology
 file that will be used for the network.

 An example file is 16x3-mesh.topology (it can be initially found in
 L2ShadedNUCA.OoO folder), which defines a 4x4 grid of "switches", where
 each switch has 4 ports to interconnect with other switches and 3 ports
 that connect the switch with nodes (so in total (4x4)x3 nodes).

 I have understood the topology and the routing tables that are being
 defined, but I have not understood how these nodes and switches are
 related to the L2-NUCA cache, if there is any relationship at all.

 ---------------------------------------------


 Thank you in advance for your help.
 I will be glad to provide any additional information, you might need.

 -George


 On Wed, 23 Mar 2011 16:47:14 +0000, Djordje Jevdjic wrote:
> Hello,
>
> Thanks for your message.
>
> Concerning your first question: yes, all the messages exchanged
> through this list are in one of those archives. For technical reasons
> we decided to split them into two separate archives (the old and the
> new archive).
>
> Regarding your second question: I don't think you need to implement
> anything to have a NUCA simulator. NUCA systems have
> been already implemented (actually, almost all simulators we use in
> Flexus are NUCA simulators).
> The ones you listed bellow (
> flexus-4.0/simulators/CMP.L2SharedNUCA.Inorder and
> flexus-4.0/simulators/CMP.L2SharedNUCA.OoO)
> are examples of such architectures with a shared and tiled L2 cache.
> So, things have already been implemented, there's no need to
> reimplement it.
>
> However, if you are interested to know more details of the
> implementations, you can look at the source code and find some useful
> comments there.
> If you are examining the source code, it's a good idea to look at the
> code of individual components included in the simulator, not the
> simulator directory itself.
> You might also want to check the getting started guide on our
> website. Besides that and the Simflex publications, we don't maintain
> any further documentation.
>
> Also, keep in mind that the current version of Flexus works only with
> Simics 3. Whatever you try to do with Simics 4 highly likely will not
> work.
> We are planning to move to Simics 4 soon.
>
> Regards,
> Djordje

Reply via email to