Re: [Xenomai] First call to rt_timer_tsc() causes an unexpected switch to secondary mode.

Tom Evans Thu, 16 Oct 2014 23:51:12 -0700

On 17/10/14 16:34, Gilles Chanteperdrix wrote:

On Fri, Oct 17, 2014 at 10:14:44AM +1100, Tom Evans wrote:


I think we're way off topic here. Should be stop?

Work out how many pixels per second you're processing and then
compare it to the memory bandwidth. You may be surprised at how slow
the memory system is.


The memory was a DDR3 running at 533/1066 MHZ. I would not call that
slow. Given the fact that:
- there were two interleaved banks
- each bank processes 2 bytes at every half tick
that would be 4 Gbytes/sec.

That has to be slow. Measure your memcpy() speed and see how many MBytes/secyou're getting.

You're working your way through memory, possibly linearly, which SHOULD keepthe memory pages open (and give you some speed), but can't.


What is happening at "the code level" for a memcpy()is:

1 - Read a word or 16 into the CPU from one address,
2 - Write them out to another address,
3 - Repeat until done.

What is happening is:

1 - Read a word or 16 into the CPU from one address,
1a - Pick a RANDOM cache line to evict to make room,
1b - Write the data from that cache line to memory,
1c - Whoops, wrong DDR3 page, close the previous page and open THAT one,
1d - Read the data into that cache line.
1e - Whoops, wrong DDR3 page, close the previous page and open THAT one,
1f - Read from the cache into the CPU,
2 - Write them out to another address,
2a - Pick a RANDOM cache line to evict to make room
2b - Whoops, wrong DDR3 page, close the page and open THAT one,
2c - Write the data from that cache line to memory,
2d - Read the data into that cache line.
2e - Whoops, wrong DDR3 page, close the previous page and open THAT one,
2f - Write from the CPU into that cache line
3 - Repeat until done.

The DDR3 can't keep on the same page and that slows it down. Opening the newpage takes hugely longer than the double-clocked burst transfer.

Using Neon gets rid of one redundant read, but the writes still have to evictcache lines.

It might be better to FLUSH the entire cache, perform a L2-sized transfer andthen flush it again. The flushes *might* be to linear addresses in open pages.

Otherwise it might be worth burst-reading to static RAM inside the CPU andthen burst-writing that, again possibly with full (or specific) cache flushes.

I got my fastest memcpy() speed on an MCF5329 by reading 2k to the stack (instatic ram in the CPU) and then writing that back out. Copying twice was a LOTfaster than any other method.


Tom



_______________________________________________
Xenomai mailing list
[email protected]
http://www.xenomai.org/mailman/listinfo/xenomai

Re: [Xenomai] First call to rt_timer_tsc() causes an unexpected switch to secondary mode.

Reply via email to