Hi,
I was trying to figure this one on my own, but I think I need to double check
with you (I kind of got lost in the code).
My question refers to what happens when a WriteReq arrives while a PrefetchReq
is in progress.
As far as I understand, when a Prefetch is injected in the hierarchy, provided
the block is not in the cache and there is no in-progress request for it (no
MAF
entry for that block), the prefetch is sent as a read request. What happens if,
while this request is in progress, there is a WriteReq for the same block? My
impression is that the WriteReq is issued as if the prefetch is not there, but
I'm not sure.
Thank you,
Ioana
From ioana at eecg.toronto.edu Fri Oct 6 13:19:08 2006
From: ioana at eecg.toronto.edu (Ioana Burcea)
List-Post: [email protected]
Date: Fri Oct 6 13:19:15 2006
Subject: [Simflex] magic calls - weird behavior
Message-ID: <[email protected]>
Hi,
I have something funny going on and I was hoping you have seen this
before and you have an idea about what's going on.
So, I have a MAGIC call for which I have a callback function where I
save a simics checkpoint. The checkpoint is fine, it saves, I run simics
with that checkpoint and that particular MAGIC call happens again. It
looks as if simics goes to the callback function before executing the
magic instruction and that's where I save the checkpoint.
Have you seen this before? Does anyone know how to fix this?
Thank you,
Ioana
From twenisch at ece.cmu.edu Fri Oct 6 13:25:48 2006
From: twenisch at ece.cmu.edu (Thomas Wenisch)
List-Post: [email protected]
Date: Fri Oct 6 13:25:54 2006
Subject: [Simflex] magic calls - weird behavior
In-Reply-To: <[email protected]>
References: <[email protected]>
Message-ID: <[email protected]>
Hi,
On Fri, 6 Oct 2006, Ioana Burcea wrote:
> Hi,
>
> I have something funny going on and I was hoping you have seen this before
> and you have an idea about what's going on.
>
> So, I have a MAGIC call for which I have a callback function where I save a
> simics checkpoint. The checkpoint is fine, it saves, I run simics with that
> checkpoint and that particular MAGIC call happens again. It looks as if
> simics goes to the callback function before executing the magic instruction
> and that's where I save the checkpoint.
>
> Have you seen this before? Does anyone know how to fix this?
>From what I understand, this is Simics's intended behavior - the MAGIC hap
occurs before the magic instruction is executed. If you are using Simics
python code to catch the hap and write out checkpoints, you can look in to
using the SIM_step_post() function to cause Simics to call a function in
the next cycle, after the MAGIC instruction has completed.
Regards,
-Tom Wenisch
From dimitris at cs.toronto.edu Fri Oct 6 20:40:46 2006
From: dimitris at cs.toronto.edu (Dimitris Tsirogiannis)
List-Post: [email protected]
Date: Sat Oct 7 03:21:24 2006
Subject: [Simflex] measure total number of instructions
Message-ID: <[email protected]>
Hi,
What is the best way to measure the total number of instructions for the
execution of an
application running in a multi-core architecture with a
memory hierarchy based on the Piranha model? I know how to use flexpoints
for
gathering statistics regarding the behavior of the memory hierarchy. Do I
need to
go through flexus or do I have to implement the piranha memory hierarchy
model on top of simics and
use functional simulation only if I want to measure the total running time
of my application?
Thanks a lot,
Dimitris
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
http://sos.ece.cmu.edu/pipermail/simflex/attachments/20061006/19e4af17/attachment.html
From ioana at eecg.toronto.edu Tue Oct 10 12:03:23 2006
From: ioana at eecg.toronto.edu (Ioana Burcea)
List-Post: [email protected]
Date: Tue Oct 10 12:03:28 2006
Subject: [Fwd: [Simflex] PrefetchReq and WriteReq]
Message-ID: <[email protected]>
Hi,
I posted these questions last week. I'm forwarding them again in hope
for an answer :)
I would also want to double check that PrefetchRequests use the same
MAFs (MSHRs) like the normal requests.
Thank you in advance,
Ioana
-------- Original Message --------
Subject: [Simflex] PrefetchReq and WriteReq
List-Post: [email protected]
Date: Thu, 05 Oct 2006 22:20:09 -0400
From: Ioana Burcea <[email protected]>
Reply-To: SimFlex software support <[email protected]>
To: SimFlex software support <[email protected]>
Hi,
I was trying to figure this one on my own, but I think I need to double
check with you (I kind of got lost in the code).
My question refers to what happens when a WriteReq arrives while a
PrefetchReq is in progress.
As far as I understand, when a Prefetch is injected in the hierarchy,
provided the block is not in the cache and there is no in-progress
request for it (no MAF entry for that block), the prefetch is sent as a
read request. What happens if, while this request is in progress, there
is a WriteReq for the same block? My impression is that the WriteReq is
issued as if the prefetch is not there, but I'm not sure.
Thank you,
Ioana
_______________________________________________
SimFlex mailing list
[email protected]
https://sos.ece.cmu.edu/mailman/listinfo/simflex
SimFlex web page: http://www.ece.cmu.edu/~simflex
From ssomogyi at ece.cmu.edu Tue Oct 10 13:21:42 2006
From: ssomogyi at ece.cmu.edu (Stephen Somogyi)
List-Post: [email protected]
Date: Tue Oct 10 13:21:48 2006
Subject: [Fwd: [Simflex] PrefetchReq and WriteReq]
In-Reply-To: <[email protected]>
References: <[email protected]>
Message-ID: <[email protected]>
Hi Ioana,
Sorry for the delayed response.
PrefetchReqs use almost the same logic as regular read requests. If there
is no request for the block outstanding (i.e., no matching entry in the
MAF), then a new entry is allocated and a ReadReq sent out to the next
level in the memory hierarchy. While this prefetch is outstanding, if any
other request arrives (read or write), it is entered into the MAF and will
be processed when the prefetch is satisfied.
However, if a PrefetchReq arrives while another request for the block is
outstanding, a PrefetchReadRedundant message is immediately returned (no
need to wait for the outstanding request to complete).
Stephen
On Tue, 10 Oct 2006, Ioana Burcea wrote:
> Hi,
>
> I posted these questions last week. I'm forwarding them again in hope
> for an answer :)
>
> I would also want to double check that PrefetchRequests use the same
> MAFs (MSHRs) like the normal requests.
>
> Thank you in advance,
> Ioana
>
> -------- Original Message --------
> Subject: [Simflex] PrefetchReq and WriteReq
> Date: Thu, 05 Oct 2006 22:20:09 -0400
> From: Ioana Burcea <[email protected]>
> Reply-To: SimFlex software support <[email protected]>
> To: SimFlex software support <[email protected]>
>
> Hi,
>
> I was trying to figure this one on my own, but I think I need to double
> check with you (I kind of got lost in the code).
>
> My question refers to what happens when a WriteReq arrives while a
> PrefetchReq is in progress.
>
> As far as I understand, when a Prefetch is injected in the hierarchy,
> provided the block is not in the cache and there is no in-progress
> request for it (no MAF entry for that block), the prefetch is sent as a
> read request. What happens if, while this request is in progress, there
> is a WriteReq for the same block? My impression is that the WriteReq is
> issued as if the prefetch is not there, but I'm not sure.
>
> Thank you,
> Ioana
> _______________________________________________
> SimFlex mailing list
> [email protected]
> https://sos.ece.cmu.edu/mailman/listinfo/simflex
> SimFlex web page: http://www.ece.cmu.edu/~simflex
> _______________________________________________
> SimFlex mailing list
> [email protected]
> https://sos.ece.cmu.edu/mailman/listinfo/simflex
> SimFlex web page: http://www.ece.cmu.edu/~simflex
>
From ioana at eecg.toronto.edu Wed Oct 11 12:30:49 2006
From: ioana at eecg.toronto.edu (Ioana Burcea)
List-Post: [email protected]
Date: Wed Oct 11 12:30:55 2006
Subject: [Simflex] Prefetch messages and ports
Message-ID: <[email protected]>
Hi,
This time I'm asking for some advice :)
I've been playing with some prefetch messages and I notice that if I
inject the PrefechReqs on the Prefetch ports, they are sent forward as
ReadReq indeed, but on Prefetch out ports. Since I'm inserting them at
the L2 level, they are sent out through the Prefetch out port from the
cache, and nobody picks them up (since that particular port is not wired
in the Uniflex.OoO).
I can see two solutions to this:
1. I can inject the Prefetch messages using the Request channel. As far
as I understood from Tom, if I do this, the prefetch and request
messages are treated the same.
2. I could create a dummy component (something like the IDMux) that acts
like a funnel and spills both Request and Prefetch ports into the Memory
in port.
I'm more inclined to use the second solution, since it uses the
"arbitration" that the Cache component does, giving priority to the
Request messages as opposed to the Prefetch ones. Is there any issue
that I'm not aware of for using the second solution?
Thank you for your help,
Ioana
From twenisch at ece.cmu.edu Wed Oct 11 13:11:09 2006
From: twenisch at ece.cmu.edu (Thomas Wenisch)
List-Post: [email protected]
Date: Wed Oct 11 13:11:15 2006
Subject: [Simflex] measure total number of instructions
In-Reply-To: <[email protected]>
References: <[email protected]>
Message-ID: <[email protected]>
Hi Dimitris,
On Fri, 6 Oct 2006, Dimitris Tsirogiannis wrote:
> Hi,
>
> What is the best way to measure the total number of instructions for the
> execution of an application running in a multi-core architecture with a
> memory hierarchy based on the Piranha model? I know how to use
> flexpoints for gathering statistics regarding the behavior of the memory
> hierarchy. Do I need to go through flexus or do I have to implement the
> piranha memory hierarchy model on top of simics and use functional
> simulation only if I want to measure the total running time of my
> application?
If you want to get meaningful estimates of time, you must run one of
Flexus' timing models (in-order, e.g. UniFlex, or out-of-order, e.g.
UniFlex.OoO). If you run in Simics without Flexus, all instructions will
take a single cycle, so your "running time" would simply be a count of
the number of instructions in your application.
If your application is very short (< 10 million instructions, say), you
can simply load it up and run it to completion in Flexus. If it is long,
you will need to use sampling to get a performance estimate without
waiting weeks for a simulation to complete. Take a look at the SimFlex
tutorial, getting started guide, and associated publications for more
information on sampling.
Regards,
-Tom Wenisch
Computer Architecture Lab
Carnegie Mellon University
>
> Thanks a lot,
>
> Dimitris
>
-------------- next part --------------
_______________________________________________
SimFlex mailing list
[email protected]
https://sos.ece.cmu.edu/mailman/listinfo/simflex
SimFlex web page: http://www.ece.cmu.edu/~simflex
From twenisch at ece.cmu.edu Wed Oct 11 13:16:13 2006
From: twenisch at ece.cmu.edu (Thomas Wenisch)
List-Post: [email protected]
Date: Wed Oct 11 13:16:18 2006
Subject: [Simflex] Prefetch messages and ports
In-Reply-To: <[email protected]>
References: <[email protected]>
Message-ID: <[email protected]>
Hi,
On Wed, 11 Oct 2006, Ioana Burcea wrote:
> Hi,
>
> 2. I could create a dummy component (something like the IDMux) that acts like
> a funnel and spills both Request and Prefetch ports into the Memory in port.
>
I would choose this one, to maintain the priority of regular requests over
prefetches. Funnelling together should be straight-forward - I would do
it in a component without any drive - that is, as soon as the message gets
pushed in, push it out to the Memory. I believe that is how the IDMux
works, right?
Regards,
-Tom Wenisch
From ioana at eecg.toronto.edu Wed Oct 11 13:18:28 2006
From: ioana at eecg.toronto.edu (Ioana Burcea)
List-Post: [email protected]
Date: Wed Oct 11 13:18:35 2006
Subject: [Simflex] Prefetch messages and ports
In-Reply-To: <[email protected]>
References: <[email protected]>
<[email protected]>
Message-ID: <[email protected]>
> I would choose this one, to maintain the priority of regular requests
> over prefetches. Funnelling together should be straight-forward - I
> would do it in a component without any drive - that is, as soon as the
> message gets pushed in, push it out to the Memory. I believe that is
> how the IDMux works, right?
Thank you for such a prompt reply. Yes, that's how the IDMux works and
I'm happy to hear that there is nothing wrong with using the same idea
in this case.
Thanks again,
Ioana
From mrinal at ece.umn.edu Mon Oct 16 04:39:23 2006
From: mrinal at ece.umn.edu (Mrinal Nath)
List-Post: [email protected]
Date: Mon Oct 16 04:39:39 2006
Subject: [Simflex] A question related to sending adding messages on the
snoop channel
In-Reply-To: <[email protected]>
References: <[email protected]>
Message-ID: <[email protected]>
Hi,
I am trying the make the L1 caches write-through. I have added a "write
through request" which is sent by the L1 to the L2 when a write-hit
occurs in the L1. All I have done is used "enqueueMessage" to send the
write through message on the BackSideOut_Snoop port of the L1.
The L2 does *not* send back any acknowledgment of the write through
(since I have made the L2 inclusive).
I am using the snoop channel to send this message (if I use the request
channel, then I get some races between the write-through message and
invalidate-acks or downgrade-acks). Using the snoop channel avoids the
races between the write-through message and the acks.
However, I have a big problem now: the watchdog timer expires because
one of my CPUs does not make any forward progress. I have tried tracking
the problem, but it is very difficult to track, since the timer expires
after a long time.
If I use the request channel, I don't get such a timer related problem
(I only have the races mentioned above).
Can anyone provide some insight about why adding a simple message on the
snoop channel from L1 to L2 is causing the processor to loose forward
progress?
Have I missed out some important/special requirement related to handling
snoop messages going from L1 to L2 ?
Any insight will be greatly appreciated.
Thanks,
- Mrinal
Jared C. Smolens wrote:
> Hi Mrinal,
>
> See inline...
>
> Excerpts From Mrinal Nath <[email protected]>:
> Re: [Simflex] Piranha Dir states: Mrinal Nath <[email protected]>
>> Hi Jared,
>> Thanks a lot for posting the state diagram.
>>
>> However, by going though the code, I noticed that the main memory
>> replies to a read request (which missed in L1 and L2) by returning a
>> "MissReplyWritable" message. And the directory keeps this line in the M
>> state, with the owner set to the requesting L1. (correct me if I am
> wrong)
>
> Yes, this is the behavior that should be happening.
>
>> So, if I am correct above, then in this particular example the state
>> diagram document is not consistent with the code. i.e. the document (on
>> pg 2) shows a transition from I to S (transition labeled 1) but
>> according to the code, the transition seems to be from I to M.
>>
>> Please let me know if I am on the right track, or I am missing
> something.
>
> You're right. It looks like the external requests diagram is missing a
> transition from GetS to M, upon receiving a writable external miss reply.
> There's also a transition from S to GetM that's missing. I have updated
> the diagram.
>
>> Also, I when the memory replies to L2 for a miss, I would like to
>> allocate the block in the L2 and then pass it on to L1 (currently the
>> block is allocated only in L1). Can this be done? How? Is there any
>> inherent drawback/problem in allocating the line in L2 ?
>
> This can be done by setting up an eviction when the miss reply is
> received (handle_D_ExtGet() is probably the best place) and making sure
> the state/ownership bits are set properly. If the L2 size is similar to
> the aggregate L1 size, then you may be wasting cache space by replicating
> blocks. If the L2 is big, this will not matter so much.
>
> Also, be aware that Piranha is non-inclusive, so blocks allocated in an
> L1 *may* be in the L2, but that is not guaranteed (the L2 can replace a
> block without telling L1's who may also have a copy). If you require
> inclusion, you'll have to do a lot of pen-and-paper work to make sure you
> can guarantee that.
>
> Cheers,
>
> Jared
>
>> Thanks
>> - Mrinal
>>
>> Jared C. Smolens wrote:
>>> Hi Mrinal,
>>>
>>> I have posted the Flexus CMP Cache Coherence state diagram on the
> SimFlex
>>> webpage (click on Software). States such as D_S2MW are transient
> states.
>>> You have the correct interpretation for D_S2MW.
>>>
>>> Jared
>>>
>>> Excerpts From Mrinal Nath <[email protected]>:
>>> [Simflex] Piranha Dir states: Mrinal Nath <[email protected]>
>>>
>>>> Can someone please provide some details about what the
> PiranhaDirStates
>>>> mean.
>>>>
>>>> I can understand the stable states of D_M, D_O, D_S, D_I.
>>>> I think I also understand some of the other states like D_S2MW (I
> think
>>>> this means, "going from S to M state due to a write")
>>>>
>>>> But I cannot guess out what most the other states mean. Some help will
>
>>>> be very useful, and will also serve as documentation about those
> states.
>>>
>>>
>>>
>>> Jared Smolens ----------- Electrical and Computer Engineering
>>> www.rabidpenguin.org ------------- Carnegie Mellon University
>>> jsmolens AT ece.cmu.edu ------ HH A-313 ------ Pittsburgh, PA
>>>
>>> _______________________________________________
>>> SimFlex mailing list
>>> [email protected]
>>> https://sos.ece.cmu.edu/mailman/listinfo/simflex
>>> SimFlex web page: http://www.ece.cmu.edu/~simflex
>> _______________________________________________
>> SimFlex mailing list
>> [email protected]
>> https://sos.ece.cmu.edu/mailman/listinfo/simflex
>> SimFlex web page: http://www.ece.cmu.edu/~simflex
>
>
> Jared Smolens ----------- Electrical and Computer Engineering
> www.rabidpenguin.org ------------- Carnegie Mellon University
> jsmolens AT ece.cmu.edu ------ HH A-313 ------ Pittsburgh, PA
>
> _______________________________________________
> SimFlex mailing list
> [email protected]
> https://sos.ece.cmu.edu/mailman/listinfo/simflex
> SimFlex web page: http://www.ece.cmu.edu/~simflex
From jsmolens+ at ece.cmu.edu Mon Oct 16 10:38:40 2006
From: jsmolens+ at ece.cmu.edu (Jared C. Smolens)
List-Post: [email protected]
Date: Mon Oct 16 10:38:45 2006
Subject: [Simflex] A question related to sending adding messages on the
Message-ID: <[email protected]>
Hi Mrinal,
Yes, messages like this do need to be on the snoop channel to prevent
races with downgrades, invalidates, and evictions.
Watchdog timeouts occur for two general reasons: deadlock or absurdly
long request-reply latencies.
Deadlocks are most likely due to resource leaks or protocol bugs. I
doubt you have introduced a protocol bug with your change, but it's
relatively easy to forget to free a resource. If I had to guess, I'd
look at MAF entries. If allocated, these must be freed along all return
paths from the cache, even if you don't send an ack. See below for
debugging techniques.
The latter is unlikely, but easy to test: simply extend the tiemout and
see if things eventually pick up again. If so, you might want to look
for long quueing delays. 20K cycles in a CMP is a reasonable upper bound
here.
My general workflow for solving deadlocks is to identify the first
processor that has stalled and figure out what physical address request
it is waiting on (you may have to compile with a higher debugging mode,
such as -trace or -verb to determine this). I then re-run the job with
trace-level debugging to determine what happened to that message. Tip:
once you know the address, you can significantly speed this up by setting
the -$CACHE_NAME:trace_address option to the DECIMAL value of the
physical address on each level of cache.
Cheers,
Jared
Excerpts From Mrinal Nath <[email protected]>:
[Simflex] A question related to sen: Mrinal Nath <[email protected]>
>Hi,
>I am trying the make the L1 caches write-through. I have added a "write
>through request" which is sent by the L1 to the L2 when a write-hit
>occurs in the L1. All I have done is used "enqueueMessage" to send the
>write through message on the BackSideOut_Snoop port of the L1.
>
>The L2 does *not* send back any acknowledgment of the write through
>(since I have made the L2 inclusive).
>
>I am using the snoop channel to send this message (if I use the request
>channel, then I get some races between the write-through message and
>invalidate-acks or downgrade-acks). Using the snoop channel avoids the
>races between the write-through message and the acks.
>
>However, I have a big problem now: the watchdog timer expires because
>one of my CPUs does not make any forward progress. I have tried tracking
>the problem, but it is very difficult to track, since the timer expires
>after a long time.
>
>If I use the request channel, I don't get such a timer related problem
>(I only have the races mentioned above).
>
>Can anyone provide some insight about why adding a simple message on the
>snoop channel from L1 to L2 is causing the processor to loose forward
>progress?
>
>Have I missed out some important/special requirement related to handling
> snoop messages going from L1 to L2 ?
>
>Any insight will be greatly appreciated.
>Thanks,
>- Mrinal
Jared Smolens ----------- Electrical and Computer Engineering
www.rabidpenguin.org ------------- Carnegie Mellon University
jsmolens AT ece.cmu.edu ------ HH A-313 ------ Pittsburgh, PA