Re: Fix catalyst brain damage to speed up Falcon BMS 2x
Updated patch following review. diff --git a/dlls/wined3d/directx.c b/dlls/wined3d/directx.c index acdcc57..9b37458 100644 --- a/dlls/wined3d/directx.c +++ b/dlls/wined3d/directx.c @@ -854,6 +854,11 @@ static void quirk_broken_rgba16(struct wined3d_gl_info *gl_info) gl_info->quirks |= WINED3D_QUIRK_BROKEN_RGBA16; } +static void quirk_no_ARB_map_buffer_range(struct wined3d_gl_info *gl_info) +{ +gl_info->quirks |= WINED3D_QUIRK_BROKEN_MAP_BUFFER_RANGE; +} + static void quirk_infolog_spam(struct wined3d_gl_info *gl_info) { gl_info->quirks |= WINED3D_QUIRK_INFO_LOG_SPAM; @@ -967,6 +972,11 @@ static const struct driver_quirk quirk_table[] = quirk_r200_constants, "r200 vertex shader constants" }, +{ + match_fglrx, + quirk_no_ARB_map_buffer_range, + "Slow on fglrx" +} }; /* Certain applications (Steam) complain if we report an outdated driver version. In general, diff --git a/dlls/wined3d/utils.c b/dlls/wined3d/utils.c index 4b29ec8..65b1f27 100644 --- a/dlls/wined3d/utils.c +++ b/dlls/wined3d/utils.c @@ -1681,6 +1681,9 @@ static void apply_format_fixups(struct wined3d_adapter *adapter, struct wined3d_ gl_info->formats[idx].flags &= ~WINED3DFMT_FLAG_TEXTURE; } +if ((gl_info->quirks & WINED3D_QUIRK_BROKEN_MAP_BUFFER_RANGE) && gl_info->supported[ARB_MAP_BUFFER_RANGE]) + gl_info->supported[ARB_MAP_BUFFER_RANGE] = FALSE; + /* ATI instancing hack: Although ATI cards do not support Shader Model * 3.0, they support instancing. To query if the card supports instancing * CheckDeviceFormat() with the special format MAKEFOURCC('I','N','S','T') diff --git a/dlls/wined3d/wined3d_private.h b/dlls/wined3d/wined3d_private.h index 45f6b29..37dd9a6 100644 --- a/dlls/wined3d/wined3d_private.h +++ b/dlls/wined3d/wined3d_private.h @@ -61,6 +61,7 @@ #define WINED3D_QUIRK_BROKEN_RGBA16 0x0040 #define WINED3D_QUIRK_INFO_LOG_SPAM 0x0080 #define WINED3D_QUIRK_LIMITED_TEX_FILTERING 0x0100 +#define WINED3D_QUIRK_BROKEN_MAP_BUFFER_RANGE 0x0200 /* Texture format fixups */
Re: Fix catalyst brain damage to speed up Falcon BMS 2x
On 2013-02-16 11:47, Stanislaw Halik wrote: Going to ask Ben Supnik from Laminar Research (X-Plane developer) and BCC him, since he has apparently run into the same issue. There's much info of fglrx woes (not really Linux specific, either) on http://developer.x-plane.com/ Sorry for double-post but this might be of much interest: http://developer.x-plane.com/2012/02/three-nasty-bugs/ Overall "site:developer.x-plane.com radeon" is nice documentation for fglrx brokenness. -sh
Re: Fix catalyst brain damage to speed up Falcon BMS 2x
On 2013-02-16 09:04, Stefan Dösinger wrote: What you really want to do is figure out why GL_ARB_map_buffer_range is slow on fglrx, and make sure that the problem is really fglrx specific. I fixed a number of dynamic buffer performance problems in the past months, but there are still problems if we're falling back to draw_strided_slow for some reason, like fixed function material tracking. Thanks for reviewing this. Going to ask Ben Supnik from Laminar Research (X-Plane developer) and BCC him, since he has apparently run into the same issue. There's much info of fglrx woes (not really Linux specific, either) on http://developer.x-plane.com/ He said publicly to be in contact with AMD themselves, and been friendly to OSS by releasing an X-Plane Linux version, as well as overall cool fellow. Ben, Please help! Other than being wrong conceptually, you're disabling dynamic buffers the wrong way: The "proper" way would be to add a quirk to the quirk_table in directx.c that removes ARB_map_buffer_range from the list of supported extensions if the driver vendor is AMD. Like this? Patch attached. I've run into hard GPU hangs with fglrx 13.2, no VT switch either. This helps: [Software\\Wine\\Direct3D] "DirectDrawRenderer"="gdi" "Multisampling"="disabled" "OffscreenRenderingMode"="fbo" "UseGLSL"="enabled" Lack of GLSL disables HDR apparently. Without GDI, there's some nasty display corruption on FBOs. Also Catalyst likes to hang display when switching from 3D to 2D and VT switch helps. But with all this busywork, performance is near-native. Catalyst at least supports indirect addressing (whatever that means) and doesn't choke on > 128 temps... FYI Mesa bug submitted: https://bugs.freedesktop.org/show_bug.cgi?id=55420 -sh diff --git a/dlls/wined3d/directx.c b/dlls/wined3d/directx.c index acdcc57..9b37458 100644 --- a/dlls/wined3d/directx.c +++ b/dlls/wined3d/directx.c @@ -854,6 +854,11 @@ static void quirk_broken_rgba16(struct wined3d_gl_info *gl_info) gl_info->quirks |= WINED3D_QUIRK_BROKEN_RGBA16; } +static void quirk_no_ARB_map_buffer_range(struct wined3d_gl_info *gl_info) +{ +gl_info->quirks |= WINED3D_QUIRK_BROKEN_MAP_BUFFER_RANGE; +} + static void quirk_infolog_spam(struct wined3d_gl_info *gl_info) { gl_info->quirks |= WINED3D_QUIRK_INFO_LOG_SPAM; @@ -967,6 +972,11 @@ static const struct driver_quirk quirk_table[] = quirk_r200_constants, "r200 vertex shader constants" }, +{ + match_fglrx, + quirk_no_ARB_map_buffer_range, + "Slow on fglrx" +} }; /* Certain applications (Steam) complain if we report an outdated driver version. In general, diff --git a/dlls/wined3d/utils.c b/dlls/wined3d/utils.c index 4b29ec8..65b1f27 100644 --- a/dlls/wined3d/utils.c +++ b/dlls/wined3d/utils.c @@ -1681,6 +1681,9 @@ static void apply_format_fixups(struct wined3d_adapter *adapter, struct wined3d_ gl_info->formats[idx].flags &= ~WINED3DFMT_FLAG_TEXTURE; } +if ((gl_info->quirks & WINED3D_QUIRK_BROKEN_MAP_BUFFER_RANGE) && gl_info->supported[ARB_MAP_BUFFER_RANGE]) + gl_info->supported[ARB_MAP_BUFFER_RANGE] = FALSE; + /* ATI instancing hack: Although ATI cards do not support Shader Model * 3.0, they support instancing. To query if the card supports instancing * CheckDeviceFormat() with the special format MAKEFOURCC('I','N','S','T') diff --git a/dlls/wined3d/wined3d_private.h b/dlls/wined3d/wined3d_private.h index 45f6b29..37dd9a6 100644 --- a/dlls/wined3d/wined3d_private.h +++ b/dlls/wined3d/wined3d_private.h @@ -61,6 +61,7 @@ #define WINED3D_QUIRK_BROKEN_RGBA16 0x0040 #define WINED3D_QUIRK_INFO_LOG_SPAM 0x0080 #define WINED3D_QUIRK_LIMITED_TEX_FILTERING 0x0100 +#define WINED3D_QUIRK_BROKEN_MAP_BUFFER_RANGE 0x0200 /* Texture format fixups */
Fwd: Re: Fix catalyst brain damage to speed up Falcon BMS 2x
Ben allowed me to forward this email. Kudos to him for all the knowledge his brainbox contains! Original Message Subject: Re: Fix catalyst brain damage to speed up Falcon BMS 2x Date: Sat, 16 Feb 2013 11:20:35 -0500 From: Ben Supnik To: Stanislaw Halik Hi Guys, I'm afraid I don't know enough about the _specific_ situation you guys are seeing. I can tell you guys a few things from my GL work: 1. The ATI OpenGL Linux team is pretty accessible; do you guys have anyone in the fglrx beta program? 2. What we found was that for stream-draw buffers that need to be orphaned, mapped, unmapped and drawn, there was a fixed overhead in the ATI drivers compared to NV; this 'performance gap' is cross-platform - both NV and ATI use the same GL stack (more or less) for Windows and Linux, and we saw the slow-down on both. 3. We originally were using map buffer (not map buffer range) with a NULL glBufferData to "orphan" the buffer (the equivalent of d3d map-discard). I think I tried MBR and it didn't fix it - both were expensive because the fundamental memory mapping operation was slow. 4. The slowness was in milliseconds, e.g. "this hits our fps by 20% or 30%" - but it wasn't "this is 3x slower because it stalled the GPU." So if you're seeing truly face-meltingly bad performance, like a total pipeline stall, you have a different bug. 5. As a general statement, the original glMapBuffer is subject to a lot of heuristic behavior in the drivers; app developers are very fast and loose with how they use it, so the driver vendors tend to try to make it do the fastest, most useful, least crash-y thing because the apps use it like monkeys on type-writers. By comparison, MBR came out later and has much more specific semantics for particular optimizations, as a result, the MBR implementation will often do exactly what you say, _even_ if it's slower. Getting even one flag wrong in MBR can cause it to hit a face-meltingly slow path. We worked around the perf cost of mapping a buffer on ATI hw by using pinned memory (but we do have a Linux-only bug where we get corrupt geometry with pinned memory - it works on Windows); I have some todo items to investigate the problem more thoroughly now. Cheers Ben On 2/16/13 5:47 AM, Stanislaw Halik wrote: On 2013-02-16 09:04, Stefan Dösinger wrote: What you really want to do is figure out why GL_ARB_map_buffer_range is slow on fglrx, and make sure that the problem is really fglrx specific. I fixed a number of dynamic buffer performance problems in the past months, but there are still problems if we're falling back to draw_strided_slow for some reason, like fixed function material tracking. Thanks for reviewing this. Going to ask Ben Supnik from Laminar Research (X-Plane developer) and BCC him, since he has apparently run into the same issue. There's much info of fglrx woes (not really Linux specific, either) on http://developer.x-plane.com/ He said publicly to be in contact with AMD themselves, and been friendly to OSS by releasing an X-Plane Linux version, as well as overall cool fellow. Ben, Please help! Other than being wrong conceptually, you're disabling dynamic buffers the wrong way: The "proper" way would be to add a quirk to the quirk_table in directx.c that removes ARB_map_buffer_range from the list of supported extensions if the driver vendor is AMD. Like this? Patch attached. I've run into hard GPU hangs with fglrx 13.2, no VT switch either. This helps: [Software\\Wine\\Direct3D] "DirectDrawRenderer"="gdi" "Multisampling"="disabled" "OffscreenRenderingMode"="fbo" "UseGLSL"="enabled" Lack of GLSL disables HDR apparently. Without GDI, there's some nasty display corruption on FBOs. Also Catalyst likes to hang display when switching from 3D to 2D and VT switch helps. But with all this busywork, performance is near-native. Catalyst at least supports indirect addressing (whatever that means) and doesn't choke on > 128 temps... FYI Mesa bug submitted: https://bugs.freedesktop.org/show_bug.cgi?id=55420 -sh -- Scenery Home Page: http://scenery.x-plane.com/ Scenery blog: http://www.x-plane.com/blog/ Plugin SDK: http://www.xsquawkbox.net/xpsdk/ X-Plane Wiki: http://wiki.x-plane.com/ Scenery mailing list: x-plane-scen...@yahoogroups.com Developer mailing list: x-plane-...@yahoogroups.com
Re: Fix catalyst brain damage to speed up Falcon BMS 2x
On 2013-02-16 15:27, Henri Verbeet wrote: The driver essentially runs out of GPRs to run the shader. It's probably a combination of lack of optimization and a shader that isn't so great to begin with. Though if this is with the LLVM shader backend for r600g you'll probably just want to disable that, in general you get better results without. [Speaking out of my nether region:] Does it use registers at all due to lack of indirect addressing?
Re: Fix catalyst brain damage to speed up Falcon BMS 2x
On 16 February 2013 14:52, Stefan Dösinger wrote: > I'm not sure what you mean by this. The mesa bug is incomprehensible as well, > it isn't clear what exactly fails. I guess some shader doesn't compile, it > might be helpful to attach the failing shader from Wine's debug output. > The driver essentially runs out of GPRs to run the shader. It's probably a combination of lack of optimization and a shader that isn't so great to begin with. Though if this is with the LLVM shader backend for r600g you'll probably just want to disable that, in general you get better results without.
Re: Fix catalyst brain damage to speed up Falcon BMS 2x
On 2013-02-16 14:52, Stefan Dösinger wrote: Lack of GLSL disables HDR apparently. But enabling it changes FPS from 70 to 90. You want GLSL on AMD GPUs because the ARB shader extensions support only Shader Model 2.0. I don't see the problem with changing the fps from 70 to 90 either :-) . Did you accidentally swap the numbers? Yes. Without GDI, there's some nasty display corruption on FBOs. Uh, unless this is a DirectDraw / d3d7 application, the DirectDrawRenderer setting won't affect it. And if it is such an app, UseGLSL won't have any real effect. Are you sure about this? Yes. Falcon BMS is D3D7 from Falcon 4.0 ported to D3D9. Falcon has a complex history. Corporate acquisitions, code leaks, etc. If you want, can give you required files and instructions for installing. Always willing to shift the work on someone else ;-) Especially since the amount of triangles drawn by me using GL/D3D is nada. I'm not sure what you mean by this. The mesa bug is incomprehensible as well, it isn't clear what exactly fails. I guess some shader doesn't compile, it might be helpful to attach the failing shader from Wine's debug output. There's a log attached in the. Some Lisp-syntax LLVM GLSL decompiler output, BMS tries to set properties, tons of them, using separate temps for each, in result using over 128 temps. LLVM fails to optimize due to indirect addressing. Context is missing 'cause it was talked about on freenode/#radeon. As for the rest (temps, indirect addressing), don't know, not a GL programmer, sorry. As for the patch getting in, please see: http://lists.apple.com/archives/mac-opengl/2010/Feb/msg00026.html http://lists.apple.com/archives/mac-opengl/2010/Jul/msg00057.html http://lists.apple.com/archives/mac-opengl/2010/Feb/msg00027.html I can't really back this up, I'm no GL programmer! But see also the original bug report: http://bugs.winehq.org/show_bug.cgi?id=29071 Unless Ben replies (you might also want to contact him, he's an X-Plane developer), chances of getting this in are bleak. But what about a registry option of disabling it? Or environment variable? Or whatever? Maintaining local changes is always hard for my cynical self ;-) -sh
Re: Fix catalyst brain damage to speed up Falcon BMS 2x
Hi, The patch is conceptually wrong, tweaking how you disable ARB_map_buffer_range won't improve the odds of it getting in. As I said, you want to make sure that the problem is indeed fglrx(i.e., test on other cards and see if map_buffer_range has a negative impact there), and if it is, we have to find out why it is slow in this case, rather than just disabling it. ARB_map_buffer_range works as intended on fglrx in many other applications(e.g. Half Life 2, 3DMark 2000 and 2001, UT2004, World in Conflict, …) . Disabling it because it slows down one application is not the correct approach. I'll still address some of the points below as it may help you with the wined3d code: Am 16.02.2013 um 12:18 schrieb Stanislaw Halik : >> Other than being wrong conceptually, you're disabling dynamic buffers >> the wrong way: The "proper" way would be to add a quirk to the >> quirk_table in directx.c that removes ARB_map_buffer_range from the >> list of supported extensions if the driver vendor is AMD. > > Like this? Patch attached. No, you can just set gl_info->supported[ARB_MAP_BUFFER_RANGE] to FALSE. See e.g. quirk_amd_dx9(). > Lack of GLSL disables HDR apparently. But enabling it changes FPS from 70 to > 90. You want GLSL on AMD GPUs because the ARB shader extensions support only Shader Model 2.0. I don't see the problem with changing the fps from 70 to 90 either :-) . Did you accidentally swap the numbers? > Without GDI, there's some nasty display corruption on FBOs. Uh, unless this is a DirectDraw / d3d7 application, the DirectDrawRenderer setting won't affect it. And if it is such an app, UseGLSL won't have any real effect. Are you sure about this? In a ddraw app, setting DirectDrawRenderer = gdi will disable any 3D acceleration. > But with all this busywork, performance is near-native. Catalyst at > least supports indirect addressing (whatever that means) and doesn't > choke on > 128 temps... FYI Mesa bug submitted: I'm not sure what you mean by this. The mesa bug is incomprehensible as well, it isn't clear what exactly fails. I guess some shader doesn't compile, it might be helpful to attach the failing shader from Wine's debug output.
Re: Fix catalyst brain damage to speed up Falcon BMS 2x
Interesting, Google for: x-plane ARB_map_buffer_range "Ben Supnik" Sorry for double post. Also BCC'ing Ben. If Ben wishes to make himself seen, he will :)
Re: Fix catalyst brain damage to speed up Falcon BMS 2x
Resubmitting. Spam filter ate my message. Really sorry if you receive this twice! On 2013-02-16 09:04, Stefan Dösinger wrote: What you really want to do is figure out why GL_ARB_map_buffer_range is slow on fglrx, and make sure that the problem is really fglrx specific. I fixed a number of dynamic buffer performance problems in the past months, but there are still problems if we're falling back to draw_strided_slow for some reason, like fixed function material tracking. Thanks for reviewing this. Going to ask Ben Supnik from Laminar Research (X-Plane developer) and BCC him, since he has apparently run into the same issue. There's much info of fglrx woes (not really Linux specific, either) on http://developer.x-plane.com/ He said publicly to be in contact with AMD themselves, and been friendly to OSS by releasing an X-Plane Linux version, as well as overall cool fellow. Ben, Please help! http://developer.x-plane.com/2012/02/three-nasty-bugs/ Overall "site:developer.x-plane.com radeon" is nice documentation for fglrx brokenness. Other than being wrong conceptually, you're disabling dynamic buffers the wrong way: The "proper" way would be to add a quirk to the quirk_table in directx.c that removes ARB_map_buffer_range from the list of supported extensions if the driver vendor is AMD. Like this? Patch attached. I've run into hard GPU hangs with fglrx 13.2, no VT switch either. This helps: [Software\\Wine\\Direct3D] "DirectDrawRenderer"="gdi" "Multisampling"="disabled" "OffscreenRenderingMode"="fbo" "UseGLSL"="enabled" Lack of GLSL disables HDR apparently. But enabling it changes FPS from 70 to 90. Without GDI, there's some nasty display corruption on FBOs. Also Catalyst likes to hang display when switching from 3D to 2D and VT switch helps. But with all this busywork, performance is near-native. Catalyst at least supports indirect addressing (whatever that means) and doesn't choke on > 128 temps... FYI Mesa bug submitted: https://bugs.freedesktop.org/show_bug.cgi?id=55420 -sh diff --git a/dlls/wined3d/directx.c b/dlls/wined3d/directx.c index acdcc57..9b37458 100644 --- a/dlls/wined3d/directx.c +++ b/dlls/wined3d/directx.c @@ -854,6 +854,11 @@ static void quirk_broken_rgba16(struct wined3d_gl_info *gl_info) gl_info->quirks |= WINED3D_QUIRK_BROKEN_RGBA16; } +static void quirk_no_ARB_map_buffer_range(struct wined3d_gl_info *gl_info) +{ +gl_info->quirks |= WINED3D_QUIRK_BROKEN_MAP_BUFFER_RANGE; +} + static void quirk_infolog_spam(struct wined3d_gl_info *gl_info) { gl_info->quirks |= WINED3D_QUIRK_INFO_LOG_SPAM; @@ -967,6 +972,11 @@ static const struct driver_quirk quirk_table[] = quirk_r200_constants, "r200 vertex shader constants" }, +{ + match_fglrx, + quirk_no_ARB_map_buffer_range, + "Slow on fglrx" +} }; /* Certain applications (Steam) complain if we report an outdated driver version. In general, diff --git a/dlls/wined3d/utils.c b/dlls/wined3d/utils.c index 4b29ec8..65b1f27 100644 --- a/dlls/wined3d/utils.c +++ b/dlls/wined3d/utils.c @@ -1681,6 +1681,9 @@ static void apply_format_fixups(struct wined3d_adapter *adapter, struct wined3d_ gl_info->formats[idx].flags &= ~WINED3DFMT_FLAG_TEXTURE; } +if ((gl_info->quirks & WINED3D_QUIRK_BROKEN_MAP_BUFFER_RANGE) && gl_info->supported[ARB_MAP_BUFFER_RANGE]) + gl_info->supported[ARB_MAP_BUFFER_RANGE] = FALSE; + /* ATI instancing hack: Although ATI cards do not support Shader Model * 3.0, they support instancing. To query if the card supports instancing * CheckDeviceFormat() with the special format MAKEFOURCC('I','N','S','T') diff --git a/dlls/wined3d/wined3d_private.h b/dlls/wined3d/wined3d_private.h index 45f6b29..37dd9a6 100644 --- a/dlls/wined3d/wined3d_private.h +++ b/dlls/wined3d/wined3d_private.h @@ -61,6 +61,7 @@ #define WINED3D_QUIRK_BROKEN_RGBA16 0x0040 #define WINED3D_QUIRK_INFO_LOG_SPAM 0x0080 #define WINED3D_QUIRK_LIMITED_TEX_FILTERING 0x0100 +#define WINED3D_QUIRK_BROKEN_MAP_BUFFER_RANGE 0x0200 /* Texture format fixups */
Re: Fix catalyst brain damage to speed up Falcon BMS 2x
Am 16.02.2013 um 08:39 schrieb Stanisław Halik : > +dynamic_buffer_ok = gl_info->supported[APPLE_FLUSH_BUFFER_RANGE] || > + (!gl_info->[GL_AMDX_debug_output] && > gl_info->supported[ARB_MAP_BUFFER_RANGE]); What you really want to do is figure out why GL_ARB_map_buffer_range is slow on fglrx, and make sure that the problem is really fglrx specific. I fixed a number of dynamic buffer performance problems in the past months, but there are still problems if we're falling back to draw_strided_slow for some reason, like fixed function material tracking. If you absolutely have to do some fglrx hacks for performant dynamic buffers, look into GL_AMD_pinned_memory. I'd prefer not to use this extension though. Other than being wrong conceptually, you're disabling dynamic buffers the wrong way: The "proper" way would be to add a quirk to the quirk_table in directx.c that removes ARB_map_buffer_range from the list of supported extensions if the driver vendor is AMD.