[Simflex] PrefetchReq and WriteReq

Ioana Burcea Thu Oct 5 22:20:41 2006

Hi,

I was trying to figure this one on my own, but I think I need to double check 
with you (I kind of got lost in the code).

My question refers to what happens when a WriteReq arrives while a PrefetchReq 
is in progress.

As far as I understand, when a Prefetch is injected in the hierarchy, provided 
the block is not in the cache and there is no in-progress request for it (no 
MAF 
entry for that block), the prefetch is sent as a read request. What happens if, 
while this request is in progress, there is a WriteReq for the same block? My 
impression is that the WriteReq is issued as if the prefetch is not there, but 
I'm not sure.

Thank you,
        Ioana
From ioana at eecg.toronto.edu  Fri Oct  6 13:19:08 2006
From: ioana at eecg.toronto.edu (Ioana Burcea)
List-Post: [email protected]
Date: Fri Oct  6 13:19:15 2006
Subject: [Simflex] magic calls - weird behavior
Message-ID: <[email protected]>

Hi,

I have something funny going on and I was hoping you have seen this 
before and you have an idea about what's going on.

So, I have a MAGIC call for which I have a callback function where I 
save a simics checkpoint. The checkpoint is fine, it saves, I run simics 
with that checkpoint and that particular MAGIC call happens again. It 
looks as if simics goes to the callback function before executing the 
magic instruction and that's where I save the checkpoint.

Have you seen this before? Does anyone know how to fix this?

Thank you,
        Ioana
From twenisch at ece.cmu.edu  Fri Oct  6 13:25:48 2006
From: twenisch at ece.cmu.edu (Thomas Wenisch)
List-Post: [email protected]
Date: Fri Oct  6 13:25:54 2006
Subject: [Simflex] magic calls - weird behavior
In-Reply-To: <[email protected]>
References: <[email protected]>
Message-ID: <[email protected]>

Hi,

On Fri, 6 Oct 2006, Ioana Burcea wrote:

> Hi,
>
> I have something funny going on and I was hoping you have seen this before 
> and you have an idea about what's going on.
>
> So, I have a MAGIC call for which I have a callback function where I save a 
> simics checkpoint. The checkpoint is fine, it saves, I run simics with that 
> checkpoint and that particular MAGIC call happens again. It looks as if 
> simics goes to the callback function before executing the magic instruction 
> and that's where I save the checkpoint.
>
> Have you seen this before? Does anyone know how to fix this?

>From what I understand, this is Simics's intended behavior - the MAGIC hap 
occurs before the magic instruction is executed.  If you are using Simics 
python code to catch the hap and write out checkpoints, you can look in to 
using the SIM_step_post() function to cause Simics to call a function in 
the next cycle, after the MAGIC instruction has completed.

Regards,
-Tom Wenisch
From dimitris at cs.toronto.edu  Fri Oct  6 20:40:46 2006
From: dimitris at cs.toronto.edu (Dimitris Tsirogiannis)
List-Post: [email protected]
Date: Sat Oct  7 03:21:24 2006
Subject: [Simflex] measure total number of instructions
Message-ID: <[email protected]>

Hi,

What is the best way to measure the total number of instructions for the
execution of an
application running in a multi-core architecture with a
memory hierarchy based on the Piranha model? I know how to use flexpoints
for
gathering statistics regarding the behavior of the memory hierarchy. Do I
need to
go through flexus or do I have to implement the piranha memory hierarchy
model on top of simics and
use functional simulation only if I want to measure the total running time
of my application?

Thanks a lot,

Dimitris
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 
http://sos.ece.cmu.edu/pipermail/simflex/attachments/20061006/19e4af17/attachment.html
From ioana at eecg.toronto.edu  Tue Oct 10 12:03:23 2006
From: ioana at eecg.toronto.edu (Ioana Burcea)
List-Post: [email protected]
Date: Tue Oct 10 12:03:28 2006
Subject: [Fwd: [Simflex] PrefetchReq and WriteReq]
Message-ID: <[email protected]>

Hi,

I posted these questions last week. I'm forwarding them again in hope 
for an answer :)

I would also want to double check that PrefetchRequests use the same 
MAFs (MSHRs) like the normal requests.

Thank you in advance,
        Ioana

-------- Original Message --------
Subject: [Simflex] PrefetchReq and WriteReq
List-Post: [email protected]
Date: Thu, 05 Oct 2006 22:20:09 -0400
From: Ioana Burcea <[email protected]>
Reply-To: SimFlex software support <[email protected]>
To: SimFlex software support <[email protected]>

Hi,

I was trying to figure this one on my own, but I think I need to double 
check with you (I kind of got lost in the code).

My question refers to what happens when a WriteReq arrives while a 
PrefetchReq is in progress.

As far as I understand, when a Prefetch is injected in the hierarchy, 
provided the block is not in the cache and there is no in-progress 
request for it (no MAF entry for that block), the prefetch is sent as a 
read request. What happens if, while this request is in progress, there 
is a WriteReq for the same block? My impression is that the WriteReq is 
issued as if the prefetch is not there, but I'm not sure.

Thank you,
        Ioana
_______________________________________________
SimFlex mailing list
[email protected]
https://sos.ece.cmu.edu/mailman/listinfo/simflex
SimFlex web page: http://www.ece.cmu.edu/~simflex
From ssomogyi at ece.cmu.edu  Tue Oct 10 13:21:42 2006
From: ssomogyi at ece.cmu.edu (Stephen Somogyi)
List-Post: [email protected]
Date: Tue Oct 10 13:21:48 2006
Subject: [Fwd: [Simflex] PrefetchReq and WriteReq]
In-Reply-To: <[email protected]>
References: <[email protected]>
Message-ID: <[email protected]>

Hi Ioana,

Sorry for the delayed response.

PrefetchReqs use almost the same logic as regular read requests.  If there
is no request for the block outstanding (i.e., no matching entry in the
MAF), then a new entry is allocated and a ReadReq sent out to the next
level in the memory hierarchy.  While this prefetch is outstanding, if any
other request arrives (read or write), it is entered into the MAF and will
be processed when the prefetch is satisfied.

However, if a PrefetchReq arrives while another request for the block is
outstanding, a PrefetchReadRedundant message is immediately returned (no
need to wait for the outstanding request to complete).

Stephen

On Tue, 10 Oct 2006, Ioana Burcea wrote:

> Hi,
>
> I posted these questions last week. I'm forwarding them again in hope
> for an answer :)
>
> I would also want to double check that PrefetchRequests use the same
> MAFs (MSHRs) like the normal requests.
>
> Thank you in advance,
>       Ioana
>
> -------- Original Message --------
> Subject: [Simflex] PrefetchReq and WriteReq
> Date: Thu, 05 Oct 2006 22:20:09 -0400
> From: Ioana Burcea <[email protected]>
> Reply-To: SimFlex software support <[email protected]>
> To: SimFlex software support <[email protected]>
>
> Hi,
>
> I was trying to figure this one on my own, but I think I need to double
> check with you (I kind of got lost in the code).
>
> My question refers to what happens when a WriteReq arrives while a
> PrefetchReq is in progress.
>
> As far as I understand, when a Prefetch is injected in the hierarchy,
> provided the block is not in the cache and there is no in-progress
> request for it (no MAF entry for that block), the prefetch is sent as a
> read request. What happens if, while this request is in progress, there
> is a WriteReq for the same block? My impression is that the WriteReq is
> issued as if the prefetch is not there, but I'm not sure.
>
> Thank you,
>       Ioana
> _______________________________________________
> SimFlex mailing list
> [email protected]
> https://sos.ece.cmu.edu/mailman/listinfo/simflex
> SimFlex web page: http://www.ece.cmu.edu/~simflex
> _______________________________________________
> SimFlex mailing list
> [email protected]
> https://sos.ece.cmu.edu/mailman/listinfo/simflex
> SimFlex web page: http://www.ece.cmu.edu/~simflex
>
From ioana at eecg.toronto.edu  Wed Oct 11 12:30:49 2006
From: ioana at eecg.toronto.edu (Ioana Burcea)
List-Post: [email protected]
Date: Wed Oct 11 12:30:55 2006
Subject: [Simflex] Prefetch messages and ports
Message-ID: <[email protected]>

Hi,

This time I'm asking for some advice :)

I've been playing with some prefetch messages and I notice that if I 
inject the PrefechReqs on the Prefetch ports, they are sent forward as 
ReadReq indeed, but on Prefetch out ports. Since I'm inserting them at 
the L2 level, they are sent out through the Prefetch out port from the 
cache, and nobody picks them up (since that particular port is not wired 
in the Uniflex.OoO).

I can see two solutions to this:

1. I can inject the Prefetch messages using the Request channel. As far 
as I understood from Tom, if I do this, the prefetch and request 
messages are treated the same.

2. I could create a dummy component (something like the IDMux) that acts 
like a funnel and spills both Request and Prefetch ports into the Memory 
  in port.

I'm more inclined to use the second solution, since it uses the 
"arbitration" that the Cache component does, giving priority to the 
Request messages as opposed to the Prefetch ones. Is there any issue 
that I'm not aware of for using the second solution?

Thank you for your help,
        Ioana
From twenisch at ece.cmu.edu  Wed Oct 11 13:11:09 2006
From: twenisch at ece.cmu.edu (Thomas Wenisch)
List-Post: [email protected]
Date: Wed Oct 11 13:11:15 2006
Subject: [Simflex] measure total number of instructions
In-Reply-To: <[email protected]>
References: <[email protected]>
Message-ID: <[email protected]>

Hi Dimitris,

On Fri, 6 Oct 2006, Dimitris Tsirogiannis wrote:

> Hi,
>
> What is the best way to measure the total number of instructions for the 
> execution of an application running in a multi-core architecture with a 
> memory hierarchy based on the Piranha model? I know how to use 
> flexpoints for gathering statistics regarding the behavior of the memory 
> hierarchy. Do I need to go through flexus or do I have to implement the 
> piranha memory hierarchy model on top of simics and use functional 
> simulation only if I want to measure the total running time of my 
> application?

If you want to get meaningful estimates of time, you must run one of 
Flexus' timing models (in-order, e.g. UniFlex, or out-of-order, e.g. 
UniFlex.OoO).  If you run in Simics without Flexus, all instructions will 
take a single cycle, so your "running time" would simply be a count of 
the number of instructions in your application.

If your application is very short (< 10 million instructions, say), you 
can simply load it up and run it to completion in Flexus.  If it is long, 
you will need to use sampling to get a performance estimate without 
waiting weeks for a simulation to complete.  Take a look at the SimFlex 
tutorial, getting started guide, and associated publications for more 
information on sampling.

Regards,
-Tom Wenisch
Computer Architecture Lab
Carnegie Mellon University

>
> Thanks a lot,
>
> Dimitris
>
-------------- next part --------------
_______________________________________________
SimFlex mailing list
[email protected]
https://sos.ece.cmu.edu/mailman/listinfo/simflex
SimFlex web page: http://www.ece.cmu.edu/~simflex
From twenisch at ece.cmu.edu  Wed Oct 11 13:16:13 2006
From: twenisch at ece.cmu.edu (Thomas Wenisch)
List-Post: [email protected]
Date: Wed Oct 11 13:16:18 2006
Subject: [Simflex] Prefetch messages and ports
In-Reply-To: <[email protected]>
References: <[email protected]>
Message-ID: <[email protected]>

Hi,

On Wed, 11 Oct 2006, Ioana Burcea wrote:

> Hi,
>
> 2. I could create a dummy component (something like the IDMux) that acts like 
> a funnel and spills both Request and Prefetch ports into the Memory  in port.
>

I would choose this one, to maintain the priority of regular requests over 
prefetches.  Funnelling together should be straight-forward - I would do 
it in a component without any drive - that is, as soon as the message gets 
pushed in, push it out to the Memory.  I believe that is how the IDMux 
works, right?

Regards,
-Tom Wenisch
From ioana at eecg.toronto.edu  Wed Oct 11 13:18:28 2006
From: ioana at eecg.toronto.edu (Ioana Burcea)
List-Post: [email protected]
Date: Wed Oct 11 13:18:35 2006
Subject: [Simflex] Prefetch messages and ports
In-Reply-To: <[email protected]>
References: <[email protected]>
        <[email protected]>
Message-ID: <[email protected]>

> I would choose this one, to maintain the priority of regular requests 
> over prefetches.  Funnelling together should be straight-forward - I 
> would do it in a component without any drive - that is, as soon as the 
> message gets pushed in, push it out to the Memory.  I believe that is 
> how the IDMux works, right?

Thank you for such a prompt reply. Yes, that's how the IDMux works and 
I'm happy to hear that there is nothing wrong with using the same idea 
in this case.

Thanks again,
        Ioana
From mrinal at ece.umn.edu  Mon Oct 16 04:39:23 2006
From: mrinal at ece.umn.edu (Mrinal Nath)
List-Post: [email protected]
Date: Mon Oct 16 04:39:39 2006
Subject: [Simflex] A question related to sending adding messages on the
        snoop channel
In-Reply-To: <[email protected]>
References: <[email protected]>
Message-ID: <[email protected]>

Hi,
I am trying the make the L1 caches write-through. I have added a "write 
through request" which is sent by the L1 to the L2 when a write-hit 
occurs in the L1. All I have done is used "enqueueMessage" to send the 
write through message on the BackSideOut_Snoop port of the L1.

The L2 does *not* send back any acknowledgment of the write through 
(since I have made the L2 inclusive).

I am using the snoop channel to send this message (if I use the request 
channel, then I get some races between the write-through message and 
invalidate-acks or downgrade-acks). Using the snoop channel avoids the 
races between the write-through message and the acks.

However, I have a big problem now: the watchdog timer expires because 
one of my CPUs does not make any forward progress. I have tried tracking 
the problem, but it is very difficult to track, since the timer expires 
after a long time.

If I use the request channel, I don't get such a timer related problem 
(I only have the races mentioned above).

Can anyone provide some insight about why adding a simple message on the 
snoop channel from L1 to L2 is causing the processor to loose forward 
progress?

Have I missed out some important/special requirement related to handling 
  snoop messages going from L1 to L2 ?

Any insight will be greatly appreciated.
Thanks,
- Mrinal

Jared C. Smolens wrote:
> Hi Mrinal,
> 
> See inline...
> 
> Excerpts From Mrinal Nath <[email protected]>:
>  Re: [Simflex] Piranha Dir states: Mrinal Nath <[email protected]>
>> Hi Jared,
>> Thanks a lot for posting the state diagram.
>>
>> However, by going though the code, I noticed that the main memory 
>> replies to a read request (which missed in L1 and L2) by returning a 
>> "MissReplyWritable" message. And the directory keeps this line in the M 
>> state, with the owner set to the requesting L1. (correct me if I am 
> wrong)
> 
> Yes, this is the behavior that should be happening.  
> 
>> So, if I am correct above, then in this particular example the state 
>> diagram document is not consistent with the code. i.e. the document (on 
>> pg 2) shows a transition from I to S (transition labeled 1) but 
>> according to the code, the transition seems to be from I to M.
>>
>> Please let me know if I am on the right track, or I am missing 
> something.
> 
> You're right.  It looks like the external requests diagram is missing a 
> transition from GetS to M, upon receiving a writable external miss reply. 
> There's also a transition from S to GetM that's missing.  I have updated 
> the diagram.
> 
>> Also, I when the memory replies to L2 for a miss, I would like to 
>> allocate the block in the L2 and then pass it on to L1 (currently the 
>> block is allocated only in L1). Can this be done? How? Is there any 
>> inherent drawback/problem in allocating the line in L2 ?
> 
> This can be done by setting up an eviction when the miss reply is 
> received (handle_D_ExtGet() is probably the best place) and making sure 
> the state/ownership bits are set properly.  If the L2 size is similar to 
> the aggregate L1 size, then you may be wasting cache space by replicating 
> blocks.  If the L2 is big, this will not matter so much.
> 
> Also, be aware that Piranha is non-inclusive, so blocks allocated in an 
> L1 *may* be in the L2, but that is not guaranteed (the L2 can replace a 
> block without telling  L1's who may also have a copy).  If you require 
> inclusion, you'll have to do a lot of pen-and-paper work to make sure you 
> can guarantee that.
> 
> Cheers,
> 
> Jared
> 
>> Thanks
>> - Mrinal
>>
>> Jared C. Smolens wrote:
>>> Hi Mrinal,
>>>
>>> I have posted the Flexus CMP Cache Coherence state diagram on the 
> SimFlex 
>>> webpage (click on Software).  States such as D_S2MW are transient 
> states. 
>>>  You have the correct interpretation for D_S2MW.
>>>
>>> Jared
>>>
>>> Excerpts From Mrinal Nath <[email protected]>:
>>>  [Simflex] Piranha Dir states: Mrinal Nath <[email protected]>
>>>
>>>> Can someone please provide some details about what the 
> PiranhaDirStates 
>>>> mean.
>>>>
>>>> I can understand the stable states of D_M, D_O, D_S, D_I.
>>>> I think I also understand some of the other states like D_S2MW (I 
> think 
>>>> this means, "going from S to M state due to a write")
>>>>
>>>> But I cannot guess out what most the other states mean. Some help will 
> 
>>>> be very useful, and will also serve as documentation about those 
> states.
>>>
>>>
>>>
>>> Jared Smolens ----------- Electrical and Computer Engineering
>>> www.rabidpenguin.org ------------- Carnegie Mellon University
>>> jsmolens AT ece.cmu.edu ------ HH A-313 ------ Pittsburgh, PA
>>>
>>> _______________________________________________
>>> SimFlex mailing list
>>> [email protected]
>>> https://sos.ece.cmu.edu/mailman/listinfo/simflex
>>> SimFlex web page: http://www.ece.cmu.edu/~simflex
>> _______________________________________________
>> SimFlex mailing list
>> [email protected]
>> https://sos.ece.cmu.edu/mailman/listinfo/simflex
>> SimFlex web page: http://www.ece.cmu.edu/~simflex
> 
> 
> Jared Smolens ----------- Electrical and Computer Engineering
> www.rabidpenguin.org ------------- Carnegie Mellon University
> jsmolens AT ece.cmu.edu ------ HH A-313 ------ Pittsburgh, PA
> 
> _______________________________________________
> SimFlex mailing list
> [email protected]
> https://sos.ece.cmu.edu/mailman/listinfo/simflex
> SimFlex web page: http://www.ece.cmu.edu/~simflex
From jsmolens+ at ece.cmu.edu  Mon Oct 16 10:38:40 2006
From: jsmolens+ at ece.cmu.edu (Jared C. Smolens)
List-Post: [email protected]
Date: Mon Oct 16 10:38:45 2006
Subject: [Simflex] A question related to sending adding messages on the
Message-ID: <[email protected]>

Hi Mrinal,

Yes, messages like this do need to be on the snoop channel to prevent 
races with downgrades, invalidates, and evictions.

Watchdog timeouts occur for two general reasons: deadlock or absurdly 
long request-reply latencies.  

Deadlocks are most likely due to resource leaks or protocol bugs.  I 
doubt you have introduced a protocol bug with your change, but it's 
relatively easy to forget to free a resource.  If I had to guess, I'd 
look at MAF entries.  If allocated, these must be freed along all return 
paths from the cache, even if you don't send an ack.  See below for 
debugging techniques.

The latter is unlikely, but easy to test: simply extend the tiemout and 
see if things eventually pick up again.  If so, you might want to look 
for long quueing delays.  20K cycles in a CMP is a reasonable upper bound 
here.

My general workflow for solving deadlocks is to identify the first 
processor that has stalled and figure out what physical address request 
it is waiting on (you may have to compile with a higher debugging mode, 
such as -trace or -verb to determine this).  I then re-run the job with 
trace-level debugging to determine what happened to that message.   Tip: 
once you know the address, you can significantly speed this up by setting 
the -$CACHE_NAME:trace_address option to the DECIMAL value of the 
physical address on each level of cache.    

Cheers,

Jared

Excerpts From Mrinal Nath <[email protected]>:
 [Simflex] A question related to sen: Mrinal Nath <[email protected]>
>Hi,
>I am trying the make the L1 caches write-through. I have added a "write 
>through request" which is sent by the L1 to the L2 when a write-hit 
>occurs in the L1. All I have done is used "enqueueMessage" to send the 
>write through message on the BackSideOut_Snoop port of the L1.
>
>The L2 does *not* send back any acknowledgment of the write through 
>(since I have made the L2 inclusive).
>
>I am using the snoop channel to send this message (if I use the request 
>channel, then I get some races between the write-through message and 
>invalidate-acks or downgrade-acks). Using the snoop channel avoids the 
>races between the write-through message and the acks.
>
>However, I have a big problem now: the watchdog timer expires because 
>one of my CPUs does not make any forward progress. I have tried tracking 

>the problem, but it is very difficult to track, since the timer expires 
>after a long time.
>
>If I use the request channel, I don't get such a timer related problem 
>(I only have the races mentioned above).
>
>Can anyone provide some insight about why adding a simple message on the 

>snoop channel from L1 to L2 is causing the processor to loose forward 
>progress?
>
>Have I missed out some important/special requirement related to handling 

>  snoop messages going from L1 to L2 ?
>
>Any insight will be greatly appreciated.
>Thanks,
>- Mrinal

Jared Smolens ----------- Electrical and Computer Engineering
www.rabidpenguin.org ------------- Carnegie Mellon University
jsmolens AT ece.cmu.edu ------ HH A-313 ------ Pittsburgh, PA

[Simflex] PrefetchReq and WriteReq

Reply via email to