[Simflex] Private L2 caches design

Lide Duan Fri Mar 2 20:31:41 2007

Hi Jared,

I did succefully run DSM simulator according to what you said. Thank you for
your help.


By "shared memory", I mean an off-chip DRAM memory with a constant distance
to all cores, and the on-chip cache system is still two levels, both being
private. The model "a CMP with cores/L1/L2 + some on-chip coherent network +
external memory" you mentioned is exactly what I want to implement.

However, I am just wondering that if I remove the NetShim component (and
also Nic, I think) from DSM, how do the multiple cores communicate with each
other? or what are the possible components used in the "some on-chip
coherent network"?

Is it reasonable that all the core/L1/L2 systems are connected to the
current Local Engine, and Local Engine/Protocol Engine/Directory/Memory are
shared to all cores (thus we need only one copy of each of them)? If so, do
I need to modify the souce code to maintain the coherence?

In addition, could you please explain a little bit about the components used
in DSM simulator, e.g. the Directory is making directory entry for each
block in Memory or in L2 cache? etc.

Really appreciated!

Thanks,
Lide

On 3/2/07, Jared C. Smolens <[email protected]> wrote:
>
>
> Hi Lide,
>
> 1. The *.rom files should be copied into the current working directory in
> which you start simics.   I don't think you should have to change the
> simics startup scripts to make this work.
>
> 2. By "shared memory" do you want to model a shared cache or simply shared
> DRAM?
>
> A shared L3 cache will require some effort, because the CmpCache's
> coherence protocol currently assumes a single level of cache above
> it.  One
> way to correct this is to guarantee inclusion between the L1 and L2 (this
> is not enforced by the Cache component, but it's possible to change).
>
> Alternatively, if you only need DRAM with a constant distance from the
> cores, you could remove or disable the network component of the DSM
> simulator.  If you keep the directories and protocol engine, you should be
> able to model a CMP with cores/L1/L2 + some on-chip coherent network +
> external memory.
>
> - Jared
>
> Excerpts From "Lide Duan" <[email protected]>:
> Re: [Simflex] Private L2 caches des: "Lide Duan" <[email protected]>
> >Hi Jared,
> >
> >Thank you for your reply!
> >
> >I read through the source code of the components used in DSMFlex, trying
> to
> >catch a main idea of this simulator. However, I still have some
> questions.
> >
> >1. How can I run this simulator? I copied the generated .so file from
> >simulators/DSMFlex/ to SIMICS_ROOT/x86-linux/lib/ , just as what I did
> with
> >CMPFlex. I could see the new module flexus-DSMFlex-v9-iface-gcc when
> >"list-modules" in simics console, but if I load this module into a check
> >point and run the simulated machine, it showed as blow:
> >
> >FATAL ERROR: No such file or directory (2): rxx_cntl: failed to open
> >microcode file he.rom***  Simics getting shaky, switching to 'safe' mode.
> >Simics (main thread) received a segmentation fault. Will try to
> recuperate.
> >
> >Seems that I need to copy a microcode file into some simics directory due
> to
> >the usage of Protocol Engines in this simulator. Where should I place
> this
> >microcode file "he.rom"? Besides, do I need to modify the simics boot
> script
> >because of the new memory structure?
> >
> >2. You mentioned that I can re-tune some parameters of DSMFlex to get the
> >two levels of private cache. Certainly, each core in DSMFlex has its own
> >private L1 and L2, but the memory is also distributed, different from the
> >shared memory in CMPFlex. Actually, I want to implement a CMP cache
> system
> >which has private L1 and private L2 caches but a shared memory to all
> cores,
> >just like CMPFlex. If so, can I still use the DSMFlex? The Local
> >Engine/Protocol Engine/Directory used in DSMFlex make a directory for
> each
> >block in the memory, responsible for the coherence of the distributed
> >memories. But I think I need to maintain the coherence of the private L2
> >caches in my implementation. What do you think?
> >
> >Thanks,
> >Lide
> >
> >On 2/27/07, Jared C. Smolens <[email protected]> wrote:
> >>
> >>
> >> Hi Lide,
> >>
> >> If you only want to have two levels of cache (L1 & L2, both private to
> >> each
> >> core and no shared cache), you might actually be able to use the
> DSMFlex
> >> simulators, after re-tuning for on-chip CMP latencies/bandwidth.
> >>
> >> The Cache/CmpCache components are used for "timing" simulations,
> whereas
> >> the TraceFlex simulator's Fast* components are for "functional"
> >> simulations
> >> (where all cache transactions are atomic and have zero latency).  If
> you
> >> want correct coherence with timing, you will have to use the
> >> Cache/CmpCaches.
> >>
> >> 1. The snoop/request channels exist to prevent races between requests
> and
> >> acknowledgements which can occur in timing simulations.  The "snoop"
> >> channel is a high priority channel for acknowledgement and eviction
> >> messages, while the request channel sends request messages.
> Prioritizing
> >> the snoop channel allows older requests to complete before starting new
> >> ones, avoiding deadlock scenarios.
> >>
> >> The Fast components have no concurrency and, therefore, don't need
> these
> >> channels.  Their implementation is also far simpler because of this.
> >>
> >> 2. I'm not sure on this one.
> >>
> >> 3. We have found that DMA and non-allocating writes are important for
> >> correctly modeling cache behaviors of I/O-intensive commercial
> workloads.
> >>
> >> - Jared
> >>
> >> Excerpts From "Lide Duan" <[email protected]>:
> >> [Simflex] Private L2 caches design: "Lide Duan" <[email protected]>
> >> >Hi all,
> >> >
> >> >I am trying to implement a two level CMP cache design, which has
> private
> >> L1
> >> >and private L2 caches, based on the components provided by Flexus. The
> >> >existing simulator CMPFlex has private L1 cache (Cache component) and
> >> shared
> >> >L2 cache (CmpCache component), both having cache contorllers but
> >> different
> >> >cache controller implementations. In this case, The shared L2 cache is
> >> >responsible for the coherence among different private L1 caches. I
> have
> >> read
> >> >the souce codes in these components, and I think I can use the Cache
> >> >component as my private L2 cache if only modifying the ports to
> connnect
> >> to
> >> >the L1 caches in the front side and the shared bus in the back side.
> >> >However, how can I maintain the coherence of the private L2 caches? I
> >> >noticed that the TraceFlex has the same structure as I desired, and it
> >> uses
> >> >FastBus component as the interconnection to the different L2 caches to
> >> >perform the coherence. So I intended to focus on Fastbus rather than
> >> >CmpCache.
> >> >
> >> >1. In CMPFlex, each Cache component has three ports (Request, Snoop,
> Out)
> >> in
> >> >both front and back sides, but FastBus has only two ports (FromCaches,
> >> >ToSnoops) in front side. How can I connect them? or What are the main
> >> >functions of the various ports, respectively?
> >> >2. In TraceFlex, Fastbus isn't connected to the memory, so the back
> side
> >> >ports (Writes, Reads, Evictions, etc.) are not used, right? Why is
> that?
> >> >3. There are also two ports (DMA, NonAllocateWrite) in FastBus
> connected
> >> to
> >> >the feeder. What are they used for? Do I need to use them in my
> >> >implementation?
> >> >Most likely, I will implement a new component as an external shared
> bus
> >> >connected to the L2 caches, just like what FastBus does in TraceFlex.
> But
> >> I
> >> >am worrying about the correctness of the coherence. Do you have any
> >> >suggestion to simplify the implementation?
> >> >
> >> >Any help would be appreciated!
> >> >
> >> >Regards,
> >> >Lide
> >>
> >> _______________________________________________
> >> SimFlex mailing list
> >> [email protected]
> >> https://sos.ece.cmu.edu/mailman/listinfo/simflex
> >> SimFlex web page: http://www.ece.cmu.edu/~simflex
> >>
>
> _______________________________________________
> SimFlex mailing list
> [email protected]
> https://sos.ece.cmu.edu/mailman/listinfo/simflex
> SimFlex web page: http://www.ece.cmu.edu/~simflex
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 
http://sos.ece.cmu.edu/pipermail/simflex/attachments/20070302/6cf61846/attachment.html
From jsmolens+ at ece.cmu.edu  Mon Mar  5 11:31:44 2007
From: jsmolens+ at ece.cmu.edu (Jared C. Smolens)
List-Post: [email protected]
Date: Mon Mar  5 11:31:47 2007
Subject: [Simflex] Private L2 caches design
Message-ID: <1910858270.1173112...@miura>


Hi Lide,

See inline...

Excerpts From "Lide Duan" <[email protected]>:
 Re: [Simflex] Private L2 caches des: "Lide Duan" <[email protected]>
>Hi Jared,
>
>I did succefully run DSM simulator according to what you said. Thank you 
for
>your help.
>
>By "shared memory", I mean an off-chip DRAM memory with a constant 
distance
>to all cores, and the on-chip cache system is still two levels, both being
>private. The model "a CMP with cores/L1/L2 + some on-chip coherent network 
+
>external memory" you mentioned is exactly what I want to implement.
>
>However, I am just wondering that if I remove the NetShim component (and
>also Nic, I think) from DSM, how do the multiple cores communicate with 
each
>other? or what are the possible components used in the "some on-chip
>coherent network"?

You'll need to keep the Nic component and some sort of interconnect.  This 
is not for coherence, but instead to allow messages to reach other 
cores/caches.  This can be either "NetShim" (a detailed network simulator) 
or "Network" (a fixed-delay interconnect with infinite buffering).  Either 
way, you'll want to fix the latencies to be representative of something 
on-chip, rather than cross-chip.  

>Is it reasonable that all the core/L1/L2 systems are connected to the
>current Local Engine, and Local Engine/Protocol Engine/Directory/Memory 
are
>shared to all cores (thus we need only one copy of each of them)? If so, 
do
>I need to modify the souce code to maintain the coherence?

You should have one local engine/protocol engine/directory for each core.  
These components will maintain coherence for you.  

>In addition, could you please explain a little bit about the components 
used
>in DSM simulator, e.g. the Directory is making directory entry for each
>block in Memory or in L2 cache? etc.

The supplied directory is very similar to the one implemented in Piranha 
(see Barroso, ISCA'00).  It is my understanding that the directory in 
Flexus is a full directory.

- Jared

[Simflex] Private L2 caches design

Reply via email to