On 13/01/2011, at 23:21, Boris Zbarsky wrote: > On 1/13/11 4:37 PM, Glenn Maynard wrote: > >> I suspect there's something simpler going on here, though--as you >> said, copying a 10 MB buffer really should be very quick. > > It's really not that quick, actually. First, you have to allocate a new 10MB > buffer. Then you have to memcpy into it. Then you have to free it at some > point. I just wrote a simple test C program that has a single 10MB array > initialized and then in a loop allocates a 10MB array, memcpys into it, and > then frees the 10MB allocation it just made. It takes about 5ms per loop > iteration to run on my system (fairly high-end laptop that was new in July > 2010). The time is split about 50-50 between the allocation and the memcpy. > > Just to be clear, 2.5ms to copy 10MB means that my CPU is spending about > 0.25ns per byte. It's a 2.66Ghz CPU, so that's about 0.66 clock cycles per > byte, or about 1.5 bytes per clock cycle. That's pretty believable if we're > having to stall the CPU every so often to wait for RAM. > > Note that a key issue here is that 10MB is larger than half my L3 cache. If > I stick to arrays that are small enough that both source and destination fit > in the cache, things are much faster.
Right, and there's neither the need to duplicate them, nor to occupy say 300Mb/s of memory bandwidth memcpying frames @ 30 fps, nor to trash 5*30 ms of cpu time per second, gratuitously, when it can be avoided. -- Jorge.