Hi, On Wed, Apr 8, 2015 at 1:24 PM, Daniel Stone <daniel at fooishbar.org> wrote:
> Hi, > > On 8 April 2015 at 10:57, Vasilis Liaskovitis <vliaskov at gmail.com> wrote: > > I have an issue where st_TexSubImage causes very high CPU load in > > __memcpy_sse2_unaligned (Mesa 10.1.3, Xorg 1.15.1, radeon driver, HD > 7870). > > > > Any obvious causes / tips for this? e.g. align textures or use different > > format/type? I 've tried using GL_BGRA/GL_UNSIGNED_BYTE and > > GL_BGRA/GL_UNSIGNED_INT_8_8_8_8_REV > > > > __memcpy_sse2_unaligned () at > > ../sysdeps/x86_64/multiarch/memcpy-sse2-unaligned.S:85 > > 85 ../sysdeps/x86_64/multiarch/memcpy-sse2-unaligned.S: No such file > or > > directory. > > (gdb) bt > > #0 __memcpy_sse2_unaligned () at > > ../sysdeps/x86_64/multiarch/memcpy-sse2-unaligned.S:85 > > #1 0x00007fffb572f154 in memcpy (__len=7680, __src=<optimized out>, > > __dest=0x7fff5835f800) at /usr/include/x86_64-linux-gnu/bits/string3.h:51 > > #2 st_TexSubImage (ctx=0x1b91420, dims=<optimized out>, > texImage=0x1f81710, > > xoffset=0, yoffset=0, zoffset=0, width=1920, height=1080, depth=1, > > format=32993, type=5121, pixels=0xdacf90, unpack=0x1bad590) > > at ../../../../src/mesa/state_tracker/st_cb_texture.c:752 > > Your source (0xdacf90) is only aligned to a 16-byte boundary, not 32. > This will cause issues particularly on ARM, where natural alignment is > required (i.e. 32-byte load/stores must be on 32-byte boundaries). By > contrast, the destination is already aligned to a 128-byte boundary. > So fixing the caller, rather than Mesa, should take care of the > problem. > thanks for the reply and the observation. I aligned source on 32-byte boundary (or even 128-byte boundary) but there was no difference. By the way, I am only using x86_64, not ARM. I believe intel sse2 only requires 16-byte boundary alignment, but perhaps i am missing something. Is this code path in st_TexSubImage using PBOs? I guess it depends on driver (radeon in my case) implementation? Related: pboUnpack http://www.songho.ca/opengl/files/pboUnpack.zip gives: Transfer Rate: 236.5 MB/s. (59.1 FPS) Does this sounds reasonably ok for uploading with PBO? Same bottleneck __memcpy_sse2_unaligned is observed. sample perf report output: 28,20% pboUnpack libc-2.19.so [.] __memcpy_sse2_unaligned 16,63% pboUnpack pboUnpack [.] 0x0000000000006542 6,96% pboUnpack [kernel.kallsyms] [k] clear_page_c_e 2,52% pboUnpack [drm] [k] drm_mm_insert_node_in_range_generic 2,10% pboUnpack [kernel.kallsyms] [k] get_page_from_freelist backtrace: __memcpy_sse2_unaligned () at ../sysdeps/x86_64/multiarch/memcpy-sse2-unaligned.S:86 86 ../sysdeps/x86_64/multiarch/memcpy-sse2-unaligned.S: No such file or directory. (gdb) bt #0 __memcpy_sse2_unaligned () at ../sysdeps/x86_64/multiarch/memcpy-sse2-unaligned.S:86 #1 0x00007ffff2bddbbd in memcpy (__len=4194304, __src=<optimized out>, __dest=<optimized out>) at /usr/include/x86_64-linux-gnu/bits/string3.h:51 #2 memcpy_texture (dimensions=dimensions at entry=2, dstFormat=dstFormat at entry=MESA_FORMAT_B8G8R8A8_UNORM, dstRowStride=dstRowStride at entry=4096, dstSlices=dstSlices at entry=0x7fffffffd6e8, srcWidth=srcWidth at entry=1024, srcHeight=srcHeight at entry=1024, srcDepth=srcDepth at entry=1, srcFormat=srcFormat at entry=32993, srcType=srcType at entry=5121, srcAddr=srcAddr at entry=0x7fffeeecd000, srcPacking=srcPacking at entry=0x7ffff7f69180, ctx=<optimized out>) at ../../../../src/mesa/main/texstore.c:949 #3 0x00007ffff2be353d in _mesa_texstore_memcpy (srcPacking=0x7ffff7f69180, srcAddr=<optimized out>, srcType=5121, srcFormat=32993, srcDepth=<optimized out>, srcHeight=<optimized out>, srcWidth=<optimized out>, dstSlices=<optimized out>, dstRowStride=<optimized out>, dstFormat=MESA_FORMAT_B8G8R8A8_UNORM, baseInternalFormat=6408, dims=<optimized out>, ctx=0x7ffff7f4d010) at ../../../../src/mesa/main/texstore.c:3938 #4 _mesa_texstore (ctx=0x7ffff7f4d010, dims=2, baseInternalFormat=6408, dstFormat=MESA_FORMAT_B8G8R8A8_UNORM, dstRowStride=4096, dstSlices=0x7fffffffd6e8, srcWidth=1024, srcHeight=1024, srcDepth=1, srcFormat=32993, srcType=5121, srcAddr=0x7fffeeecd000, srcPacking=0x7ffff7f69180) at ../../../../src/mesa/main/texstore.c:3958 #5 0x00007ffff2be3812 in store_texsubimage (ctx=ctx at entry=0x7ffff7f4d010, texImage=texImage at entry=0x7c8690, xoffset=xoffset at entry=0, yoffset=yoffset at entry=0, zoffset=zoffset at entry=0, width=1024, height=1024, depth=1, format=32993, type=5121, pixels=0x0, packing=0x7ffff7f69180, caller=0x7ffff2d609c7 "glTexSubImage") at ../../../../src/mesa/main/texstore.c:4107 #6 0x00007ffff2be3aa5 in _mesa_store_texsubimage (ctx=ctx at entry=0x7ffff7f4d010, dims=<optimized out>, texImage=texImage at entry=0x7c8690, xoffset=xoffset at entry=0, yoffset=yoffset at entry=0, zoffset=zoffset at entry=0, width=<optimized out>, width at entry=1024, height=<optimized out>, height at entry=1024, depth=<optimized out>, depth at entry=1, format=<optimized out>, format at entry=32993, type=<optimized out>, type at entry=5121, pixels=<optimized out>, pixels at entry=0x0, packing=<optimized out>, packing at entry=0x7ffff7f69180) at ../../../../src/mesa/main/texstore.c:4171 #7 0x00007ffff2c3acaa in st_TexSubImage (ctx=0x7ffff7f4d010, dims=<optimized out>, texImage=0x7c8690, xoffset=0, yoffset=0, zoffset=0, width=1024, height=1024, depth=1, format=32993, type=5121, pixels=0x0, unpack=0x7ffff7f69180) at ../../../../src/mesa/state_tracker/st_cb_texture.c:787 #8 0x00007ffff2bce83d in texsubimage (ctx=0x7ffff7f4d010, dims=dims at entry=2, target=3553, level=0, xoffset=0, yoffset=0, zoffset=zoffset at entry=0, width=1024, height=1024, depth=depth at entry=1, format=format at entry=32993, type=type at entry=5121, pixels=pixels at entry=0x0) at ../../../../src/mesa/main/teximage.c:3445 #9 0x00007ffff2bd259c in _mesa_TexSubImage2D (target=<optimized out>, level=<optimized out>, xoffset=<optimized out>, yoffset=<optimized out>, width=<optimized out>, height=<optimized out>, format=32993, type=5121, pixels=0x0) at ../../../../src/mesa/main/teximage.c:3483 pixels pointer in st_texSubImage is 0x0 here, maybe because it's an internal pbo to texture transfer? srcAddr in memcpy_texture() is 0x7fffeeecd000 which looks sufficiently aligned, but maybe this is not the correct pointer to look at. could there also be a CPU stall/sync issue when mapping a pbo buffer? Similar pbounpack/memcpy performance discussed a bit here recently with no conclusion: http://people.freedesktop.org/~cbrill/dri-log/?channel=dri-devel&date=2015-01-01 thanks, - Vasilis > > Cheers, > Daniel > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.freedesktop.org/archives/dri-devel/attachments/20150409/876e926a/attachment.html>