On 13/01/2011, at 23:21, Boris Zbarsky wrote:
> On 1/13/11 4:37 PM, Glenn Maynard wrote:
> 
>> I suspect there's something simpler going on here, though--as you
>> said, copying a 10 MB buffer really should be very quick.
> 
> It's really not that quick, actually.  First, you have to allocate a new 10MB 
> buffer.  Then you have to memcpy into it.  Then you have to free it at some 
> point.  I just wrote a simple test C program that has a single 10MB array 
> initialized and then in a loop allocates a 10MB array, memcpys into it, and 
> then frees the 10MB allocation it just made.  It takes about 5ms per loop 
> iteration to run on my system (fairly high-end laptop that was new in July 
> 2010).  The time is split about 50-50 between the allocation and the memcpy.
> 
> Just to be clear, 2.5ms to copy 10MB means that my CPU is spending about 
> 0.25ns per byte.  It's a 2.66Ghz CPU, so that's about 0.66 clock cycles per 
> byte, or about 1.5 bytes per clock cycle.  That's pretty believable if we're 
> having to stall the CPU every so often to wait for RAM.
> 
> Note that a key issue here is that 10MB is larger than half my L3 cache.  If 
> I stick to arrays that are small enough that both source and destination fit 
> in the cache, things are much faster.


Right, and there's neither the need to duplicate them, nor to occupy say 
300Mb/s of memory bandwidth memcpying frames @ 30 fps, nor to trash 5*30 ms of 
cpu time per second, gratuitously, when it can be avoided.
-- 
Jorge.

Reply via email to