Re: Fix catalyst brain damage to speed up Falcon BMS 2x

2013-02-17 Thread Stanislaw Halik

Updated patch following review.


diff --git a/dlls/wined3d/directx.c b/dlls/wined3d/directx.c
index acdcc57..9b37458 100644
--- a/dlls/wined3d/directx.c
+++ b/dlls/wined3d/directx.c
@@ -854,6 +854,11 @@ static void quirk_broken_rgba16(struct wined3d_gl_info *gl_info)
 gl_info->quirks |= WINED3D_QUIRK_BROKEN_RGBA16;
 }
 
+static void quirk_no_ARB_map_buffer_range(struct wined3d_gl_info *gl_info)
+{
+gl_info->quirks |= WINED3D_QUIRK_BROKEN_MAP_BUFFER_RANGE;
+}
+
 static void quirk_infolog_spam(struct wined3d_gl_info *gl_info)
 {
 gl_info->quirks |= WINED3D_QUIRK_INFO_LOG_SPAM;
@@ -967,6 +972,11 @@ static const struct driver_quirk quirk_table[] =
 quirk_r200_constants,
 "r200 vertex shader constants"
 },
+{
+	match_fglrx,
+	quirk_no_ARB_map_buffer_range,
+	"Slow on fglrx"
+}
 };
 
 /* Certain applications (Steam) complain if we report an outdated driver version. In general,
diff --git a/dlls/wined3d/utils.c b/dlls/wined3d/utils.c
index 4b29ec8..65b1f27 100644
--- a/dlls/wined3d/utils.c
+++ b/dlls/wined3d/utils.c
@@ -1681,6 +1681,9 @@ static void apply_format_fixups(struct wined3d_adapter *adapter, struct wined3d_
 gl_info->formats[idx].flags &= ~WINED3DFMT_FLAG_TEXTURE;
 }
 
+if ((gl_info->quirks & WINED3D_QUIRK_BROKEN_MAP_BUFFER_RANGE) && gl_info->supported[ARB_MAP_BUFFER_RANGE])
+	gl_info->supported[ARB_MAP_BUFFER_RANGE] = FALSE;
+
 /* ATI instancing hack: Although ATI cards do not support Shader Model
  * 3.0, they support instancing. To query if the card supports instancing
  * CheckDeviceFormat() with the special format MAKEFOURCC('I','N','S','T')
diff --git a/dlls/wined3d/wined3d_private.h b/dlls/wined3d/wined3d_private.h
index 45f6b29..37dd9a6 100644
--- a/dlls/wined3d/wined3d_private.h
+++ b/dlls/wined3d/wined3d_private.h
@@ -61,6 +61,7 @@
 #define WINED3D_QUIRK_BROKEN_RGBA16 0x0040
 #define WINED3D_QUIRK_INFO_LOG_SPAM 0x0080
 #define WINED3D_QUIRK_LIMITED_TEX_FILTERING 0x0100
+#define WINED3D_QUIRK_BROKEN_MAP_BUFFER_RANGE   0x0200
 
 /* Texture format fixups */
 



Re: Fix catalyst brain damage to speed up Falcon BMS 2x

2013-02-17 Thread Stanislaw Halik

On 2013-02-16 11:47, Stanislaw Halik wrote:

Going to ask Ben Supnik from Laminar Research (X-Plane developer) and
BCC him, since he has apparently run into the same issue. There's much
info of fglrx woes (not really Linux specific, either) on
http://developer.x-plane.com/


Sorry for double-post but this might be of much interest:

http://developer.x-plane.com/2012/02/three-nasty-bugs/

Overall "site:developer.x-plane.com radeon" is nice documentation for 
fglrx brokenness.


-sh





Re: Fix catalyst brain damage to speed up Falcon BMS 2x

2013-02-17 Thread Stanislaw Halik

On 2013-02-16 09:04, Stefan Dösinger wrote:

What you really want to do is figure out why GL_ARB_map_buffer_range
is slow on fglrx, and make sure that the problem is really fglrx
specific. I fixed a number of dynamic buffer performance problems in
the past months, but there are still problems if we're falling back
to draw_strided_slow for some reason, like fixed function material
tracking.


Thanks for reviewing this.

Going to ask Ben Supnik from Laminar Research (X-Plane developer) and 
BCC him, since he has apparently run into the same issue. There's much 
info of fglrx woes (not really Linux specific, either) on 
http://developer.x-plane.com/


He said publicly to be in contact with AMD themselves, and been friendly 
to OSS by releasing an X-Plane Linux version, as well as overall cool 
fellow.


Ben, Please help!


Other than being wrong conceptually, you're disabling dynamic buffers
the wrong way: The "proper" way would be to add a quirk to the
quirk_table in directx.c that removes ARB_map_buffer_range from the
list of supported extensions if the driver vendor is AMD.


Like this? Patch attached.

I've run into hard GPU hangs with fglrx 13.2, no VT switch either. This 
helps:


[Software\\Wine\\Direct3D]
"DirectDrawRenderer"="gdi"
"Multisampling"="disabled"
"OffscreenRenderingMode"="fbo"
"UseGLSL"="enabled"

Lack of GLSL disables HDR apparently.

Without GDI, there's some nasty display corruption on FBOs.

Also Catalyst likes to hang display when switching from 3D to 2D and VT 
switch helps.


But with all this busywork, performance is near-native. Catalyst at 
least supports indirect addressing (whatever that means) and doesn't 
choke on > 128 temps... FYI Mesa bug submitted:


https://bugs.freedesktop.org/show_bug.cgi?id=55420

-sh

diff --git a/dlls/wined3d/directx.c b/dlls/wined3d/directx.c
index acdcc57..9b37458 100644
--- a/dlls/wined3d/directx.c
+++ b/dlls/wined3d/directx.c
@@ -854,6 +854,11 @@ static void quirk_broken_rgba16(struct wined3d_gl_info *gl_info)
 gl_info->quirks |= WINED3D_QUIRK_BROKEN_RGBA16;
 }
 
+static void quirk_no_ARB_map_buffer_range(struct wined3d_gl_info *gl_info)
+{
+gl_info->quirks |= WINED3D_QUIRK_BROKEN_MAP_BUFFER_RANGE;
+}
+
 static void quirk_infolog_spam(struct wined3d_gl_info *gl_info)
 {
 gl_info->quirks |= WINED3D_QUIRK_INFO_LOG_SPAM;
@@ -967,6 +972,11 @@ static const struct driver_quirk quirk_table[] =
 quirk_r200_constants,
 "r200 vertex shader constants"
 },
+{
+	match_fglrx,
+	quirk_no_ARB_map_buffer_range,
+	"Slow on fglrx"
+}
 };
 
 /* Certain applications (Steam) complain if we report an outdated driver version. In general,
diff --git a/dlls/wined3d/utils.c b/dlls/wined3d/utils.c
index 4b29ec8..65b1f27 100644
--- a/dlls/wined3d/utils.c
+++ b/dlls/wined3d/utils.c
@@ -1681,6 +1681,9 @@ static void apply_format_fixups(struct wined3d_adapter *adapter, struct wined3d_
 gl_info->formats[idx].flags &= ~WINED3DFMT_FLAG_TEXTURE;
 }
 
+if ((gl_info->quirks & WINED3D_QUIRK_BROKEN_MAP_BUFFER_RANGE) && gl_info->supported[ARB_MAP_BUFFER_RANGE])
+	gl_info->supported[ARB_MAP_BUFFER_RANGE] = FALSE;
+
 /* ATI instancing hack: Although ATI cards do not support Shader Model
  * 3.0, they support instancing. To query if the card supports instancing
  * CheckDeviceFormat() with the special format MAKEFOURCC('I','N','S','T')
diff --git a/dlls/wined3d/wined3d_private.h b/dlls/wined3d/wined3d_private.h
index 45f6b29..37dd9a6 100644
--- a/dlls/wined3d/wined3d_private.h
+++ b/dlls/wined3d/wined3d_private.h
@@ -61,6 +61,7 @@
 #define WINED3D_QUIRK_BROKEN_RGBA16 0x0040
 #define WINED3D_QUIRK_INFO_LOG_SPAM 0x0080
 #define WINED3D_QUIRK_LIMITED_TEX_FILTERING 0x0100
+#define WINED3D_QUIRK_BROKEN_MAP_BUFFER_RANGE   0x0200
 
 /* Texture format fixups */
 



Fwd: Re: Fix catalyst brain damage to speed up Falcon BMS 2x

2013-02-16 Thread Stanisław Halik

Ben allowed me to forward this email.

Kudos to him for all the knowledge his brainbox contains!

 Original Message 
Subject: Re: Fix catalyst brain damage to speed up Falcon BMS 2x
Date: Sat, 16 Feb 2013 11:20:35 -0500
From: Ben Supnik 
To: Stanislaw Halik 

Hi Guys,

I'm afraid I don't know enough about the _specific_ situation you guys
are seeing.  I can tell you guys a few things from my GL work:

1. The ATI OpenGL Linux team is pretty accessible; do you guys have
anyone in the fglrx beta program?

2. What we found was that for stream-draw buffers that need to be
orphaned, mapped, unmapped and drawn, there was a fixed overhead in the
ATI drivers compared to NV; this 'performance gap' is cross-platform -
both NV and ATI use the same GL stack (more or less) for Windows and
Linux, and we saw the slow-down on both.

3. We originally were using map buffer (not map buffer range) with a
NULL glBufferData to "orphan" the buffer (the equivalent of d3d
map-discard).  I think I tried MBR and it didn't fix it - both were
expensive because the fundamental memory mapping operation was slow.

4. The slowness was in milliseconds, e.g. "this hits our fps by 20% or
30%" - but it wasn't "this is 3x slower because it stalled the GPU."  So
if you're seeing truly face-meltingly bad performance, like a total
pipeline stall, you have a different bug.

5. As a general statement, the original glMapBuffer is subject to a lot
of heuristic behavior in the drivers; app developers are very fast and
loose with how they use it, so the driver vendors tend to try to make it
do the fastest, most useful, least crash-y thing because the apps use it
like monkeys on type-writers.  By comparison, MBR came out later and has
much more specific semantics for particular optimizations, as a result,
the MBR implementation will often do exactly what you say, _even_ if
it's slower.  Getting even one flag wrong in MBR can cause it to hit a
face-meltingly slow path.

We worked around the perf cost of mapping a buffer on ATI hw by using
pinned memory (but we do have a Linux-only bug where we get corrupt
geometry with pinned memory - it works on Windows); I have some todo
items to investigate the problem more thoroughly now.

Cheers
Ben

On 2/16/13 5:47 AM, Stanislaw Halik wrote:

On 2013-02-16 09:04, Stefan Dösinger wrote:

What you really want to do is figure out why GL_ARB_map_buffer_range
is slow on fglrx, and make sure that the problem is really fglrx
specific. I fixed a number of dynamic buffer performance problems in
the past months, but there are still problems if we're falling back
to draw_strided_slow for some reason, like fixed function material
tracking.


Thanks for reviewing this.

Going to ask Ben Supnik from Laminar Research (X-Plane developer) and
BCC him, since he has apparently run into the same issue. There's much
info of fglrx woes (not really Linux specific, either) on
http://developer.x-plane.com/

He said publicly to be in contact with AMD themselves, and been friendly
to OSS by releasing an X-Plane Linux version, as well as overall cool
fellow.

Ben, Please help!


Other than being wrong conceptually, you're disabling dynamic buffers
the wrong way: The "proper" way would be to add a quirk to the
quirk_table in directx.c that removes ARB_map_buffer_range from the
list of supported extensions if the driver vendor is AMD.


Like this? Patch attached.

I've run into hard GPU hangs with fglrx 13.2, no VT switch either. This
helps:

[Software\\Wine\\Direct3D]
"DirectDrawRenderer"="gdi"
"Multisampling"="disabled"
"OffscreenRenderingMode"="fbo"
"UseGLSL"="enabled"

Lack of GLSL disables HDR apparently.

Without GDI, there's some nasty display corruption on FBOs.

Also Catalyst likes to hang display when switching from 3D to 2D and VT
switch helps.

But with all this busywork, performance is near-native. Catalyst at
least supports indirect addressing (whatever that means) and doesn't
choke on > 128 temps... FYI Mesa bug submitted:

https://bugs.freedesktop.org/show_bug.cgi?id=55420

-sh



--
Scenery Home Page: http://scenery.x-plane.com/
Scenery blog: http://www.x-plane.com/blog/
Plugin SDK: http://www.xsquawkbox.net/xpsdk/
X-Plane Wiki: http://wiki.x-plane.com/
Scenery mailing list: x-plane-scen...@yahoogroups.com
Developer mailing list: x-plane-...@yahoogroups.com






Re: Fix catalyst brain damage to speed up Falcon BMS 2x

2013-02-16 Thread Stanisław Halik

On 2013-02-16 15:27, Henri Verbeet wrote:

The driver essentially runs out of GPRs to run the shader. It's
probably a combination of lack of optimization and a shader that isn't
so great to begin with. Though if this is with the LLVM shader backend
for r600g you'll probably just want to disable that, in general you
get better results without.


[Speaking out of my nether region:]

Does it use registers at all due to lack of indirect addressing?




Re: Fix catalyst brain damage to speed up Falcon BMS 2x

2013-02-16 Thread Henri Verbeet
On 16 February 2013 14:52, Stefan Dösinger  wrote:
> I'm not sure what you mean by this. The mesa bug is incomprehensible as well, 
> it isn't clear what exactly fails. I guess some shader doesn't compile, it 
> might be helpful to attach the failing shader from Wine's debug output.
>
The driver essentially runs out of GPRs to run the shader. It's
probably a combination of lack of optimization and a shader that isn't
so great to begin with. Though if this is with the LLVM shader backend
for r600g you'll probably just want to disable that, in general you
get better results without.




Re: Fix catalyst brain damage to speed up Falcon BMS 2x

2013-02-16 Thread Stanisław Halik

On 2013-02-16 14:52, Stefan Dösinger wrote:

Lack of GLSL disables HDR apparently. But enabling it changes FPS from 70 to 90.

You want GLSL on AMD GPUs because the ARB shader extensions support only Shader 
Model 2.0. I don't see the problem with changing the fps from 70 to 90 either 
:-) . Did you accidentally swap the numbers?


Yes.


Without GDI, there's some nasty display corruption on FBOs.

Uh, unless this is a DirectDraw / d3d7 application, the DirectDrawRenderer 
setting won't affect it. And if it is such an app, UseGLSL won't have any real 
effect. Are you sure about this?


Yes. Falcon BMS is D3D7 from Falcon 4.0 ported to D3D9. Falcon has a 
complex history. Corporate acquisitions, code leaks, etc.


If you want, can give you required files and instructions for 
installing. Always willing to shift the work on someone else ;-)


Especially since the amount of triangles drawn by me using GL/D3D is nada.


I'm not sure what you mean by this. The mesa bug is incomprehensible as well, 
it isn't clear what exactly fails. I guess some shader doesn't compile, it 
might be helpful to attach the failing shader from Wine's debug output.


There's a log attached in the. Some Lisp-syntax LLVM GLSL decompiler 
output, BMS tries to set properties, tons of them, using separate temps 
for each, in result using over 128 temps. LLVM fails to optimize due to 
indirect addressing. Context is missing 'cause it was talked about on 
freenode/#radeon.


As for the rest (temps, indirect addressing), don't know, not a GL 
programmer, sorry.


As for the patch getting in, please see:

http://lists.apple.com/archives/mac-opengl/2010/Feb/msg00026.html
http://lists.apple.com/archives/mac-opengl/2010/Jul/msg00057.html
http://lists.apple.com/archives/mac-opengl/2010/Feb/msg00027.html

I can't really back this up, I'm no GL programmer! But see also the 
original bug report:


http://bugs.winehq.org/show_bug.cgi?id=29071

Unless Ben replies (you might also want to contact him, he's an  X-Plane 
developer), chances of getting this in are bleak. But what about a 
registry option of disabling it? Or environment variable? Or whatever? 
Maintaining local changes is always hard for my cynical self ;-)


-sh




Re: Fix catalyst brain damage to speed up Falcon BMS 2x

2013-02-16 Thread Stefan Dösinger
Hi,

The patch is conceptually wrong, tweaking how you disable ARB_map_buffer_range 
won't improve the odds of it getting in. As I said, you want to make sure that 
the problem is indeed fglrx(i.e., test on other cards and see if 
map_buffer_range has a negative impact there), and if it is, we have to find 
out why it is slow in this case, rather than just disabling it.

ARB_map_buffer_range works as intended on fglrx in many other applications(e.g. 
Half Life 2, 3DMark 2000 and 2001, UT2004, World in Conflict, …) . Disabling it 
because it slows down one application is not the correct approach.

I'll still address some of the points below as it may help you with the wined3d 
code:

Am 16.02.2013 um 12:18 schrieb Stanislaw Halik :
>> Other than being wrong conceptually, you're disabling dynamic buffers
>> the wrong way: The "proper" way would be to add a quirk to the
>> quirk_table in directx.c that removes ARB_map_buffer_range from the
>> list of supported extensions if the driver vendor is AMD.
> 
> Like this? Patch attached.
No, you can just set gl_info->supported[ARB_MAP_BUFFER_RANGE] to FALSE. See 
e.g. quirk_amd_dx9().

> Lack of GLSL disables HDR apparently. But enabling it changes FPS from 70 to 
> 90.
You want GLSL on AMD GPUs because the ARB shader extensions support only Shader 
Model 2.0. I don't see the problem with changing the fps from 70 to 90 either 
:-) . Did you accidentally swap the numbers?

> Without GDI, there's some nasty display corruption on FBOs.
Uh, unless this is a DirectDraw / d3d7 application, the DirectDrawRenderer 
setting won't affect it. And if it is such an app, UseGLSL won't have any real 
effect. Are you sure about this?

In a ddraw app, setting DirectDrawRenderer = gdi will disable any 3D 
acceleration.

> But with all this busywork, performance is near-native. Catalyst at
> least supports indirect addressing (whatever that means) and doesn't
> choke on > 128 temps... FYI Mesa bug submitted:
I'm not sure what you mean by this. The mesa bug is incomprehensible as well, 
it isn't clear what exactly fails. I guess some shader doesn't compile, it 
might be helpful to attach the failing shader from Wine's debug output.





Re: Fix catalyst brain damage to speed up Falcon BMS 2x

2013-02-16 Thread Stanislaw Halik

Interesting, Google for:

x-plane ARB_map_buffer_range "Ben Supnik"

Sorry for double post. Also BCC'ing Ben. If Ben wishes to make himself 
seen, he will :)






Re: Fix catalyst brain damage to speed up Falcon BMS 2x

2013-02-16 Thread Stanislaw Halik
Resubmitting. Spam filter ate my message. Really sorry if you receive 
this twice!


On 2013-02-16 09:04, Stefan Dösinger wrote:

What you really want to do is figure out why GL_ARB_map_buffer_range
is slow on fglrx, and make sure that the problem is really fglrx
specific. I fixed a number of dynamic buffer performance problems in
the past months, but there are still problems if we're falling back
to draw_strided_slow for some reason, like fixed function material
tracking.


Thanks for reviewing this.

Going to ask Ben Supnik from Laminar Research (X-Plane developer) and
BCC him, since he has apparently run into the same issue. There's much
info of fglrx woes (not really Linux specific, either) on
http://developer.x-plane.com/

He said publicly to be in contact with AMD themselves, and been friendly
to OSS by releasing an X-Plane Linux version, as well as overall cool
fellow.

Ben, Please help!

http://developer.x-plane.com/2012/02/three-nasty-bugs/

Overall "site:developer.x-plane.com radeon" is nice documentation for 
fglrx brokenness.



Other than being wrong conceptually, you're disabling dynamic buffers
the wrong way: The "proper" way would be to add a quirk to the
quirk_table in directx.c that removes ARB_map_buffer_range from the
list of supported extensions if the driver vendor is AMD.


Like this? Patch attached.

I've run into hard GPU hangs with fglrx 13.2, no VT switch either. This
helps:

[Software\\Wine\\Direct3D]
"DirectDrawRenderer"="gdi"
"Multisampling"="disabled"
"OffscreenRenderingMode"="fbo"
"UseGLSL"="enabled"

Lack of GLSL disables HDR apparently. But enabling it changes FPS from 
70 to 90.


Without GDI, there's some nasty display corruption on FBOs.

Also Catalyst likes to hang display when switching from 3D to 2D and VT
switch helps.

But with all this busywork, performance is near-native. Catalyst at
least supports indirect addressing (whatever that means) and doesn't
choke on > 128 temps... FYI Mesa bug submitted:

https://bugs.freedesktop.org/show_bug.cgi?id=55420

-sh
diff --git a/dlls/wined3d/directx.c b/dlls/wined3d/directx.c
index acdcc57..9b37458 100644
--- a/dlls/wined3d/directx.c
+++ b/dlls/wined3d/directx.c
@@ -854,6 +854,11 @@ static void quirk_broken_rgba16(struct wined3d_gl_info *gl_info)
 gl_info->quirks |= WINED3D_QUIRK_BROKEN_RGBA16;
 }
 
+static void quirk_no_ARB_map_buffer_range(struct wined3d_gl_info *gl_info)
+{
+gl_info->quirks |= WINED3D_QUIRK_BROKEN_MAP_BUFFER_RANGE;
+}
+
 static void quirk_infolog_spam(struct wined3d_gl_info *gl_info)
 {
 gl_info->quirks |= WINED3D_QUIRK_INFO_LOG_SPAM;
@@ -967,6 +972,11 @@ static const struct driver_quirk quirk_table[] =
 quirk_r200_constants,
 "r200 vertex shader constants"
 },
+{
+	match_fglrx,
+	quirk_no_ARB_map_buffer_range,
+	"Slow on fglrx"
+}
 };
 
 /* Certain applications (Steam) complain if we report an outdated driver version. In general,
diff --git a/dlls/wined3d/utils.c b/dlls/wined3d/utils.c
index 4b29ec8..65b1f27 100644
--- a/dlls/wined3d/utils.c
+++ b/dlls/wined3d/utils.c
@@ -1681,6 +1681,9 @@ static void apply_format_fixups(struct wined3d_adapter *adapter, struct wined3d_
 gl_info->formats[idx].flags &= ~WINED3DFMT_FLAG_TEXTURE;
 }
 
+if ((gl_info->quirks & WINED3D_QUIRK_BROKEN_MAP_BUFFER_RANGE) && gl_info->supported[ARB_MAP_BUFFER_RANGE])
+	gl_info->supported[ARB_MAP_BUFFER_RANGE] = FALSE;
+
 /* ATI instancing hack: Although ATI cards do not support Shader Model
  * 3.0, they support instancing. To query if the card supports instancing
  * CheckDeviceFormat() with the special format MAKEFOURCC('I','N','S','T')
diff --git a/dlls/wined3d/wined3d_private.h b/dlls/wined3d/wined3d_private.h
index 45f6b29..37dd9a6 100644
--- a/dlls/wined3d/wined3d_private.h
+++ b/dlls/wined3d/wined3d_private.h
@@ -61,6 +61,7 @@
 #define WINED3D_QUIRK_BROKEN_RGBA16 0x0040
 #define WINED3D_QUIRK_INFO_LOG_SPAM 0x0080
 #define WINED3D_QUIRK_LIMITED_TEX_FILTERING 0x0100
+#define WINED3D_QUIRK_BROKEN_MAP_BUFFER_RANGE   0x0200
 
 /* Texture format fixups */
 




Re: Fix catalyst brain damage to speed up Falcon BMS 2x

2013-02-16 Thread Stefan Dösinger

Am 16.02.2013 um 08:39 schrieb Stanisław Halik :
> +dynamic_buffer_ok = gl_info->supported[APPLE_FLUSH_BUFFER_RANGE] ||
> + (!gl_info->[GL_AMDX_debug_output] && 
> gl_info->supported[ARB_MAP_BUFFER_RANGE]);
What you really want to do is figure out why GL_ARB_map_buffer_range is slow on 
fglrx, and make sure that the problem is really fglrx specific. I fixed a 
number of dynamic buffer performance problems in the past months, but there are 
still problems if we're falling back to draw_strided_slow for some reason, like 
fixed function material tracking.

If you absolutely have to do some fglrx hacks for performant dynamic buffers, 
look into GL_AMD_pinned_memory. I'd prefer not to use this extension though.

Other than being wrong conceptually, you're disabling dynamic buffers the wrong 
way: The "proper" way would be to add a quirk to the quirk_table in directx.c 
that removes ARB_map_buffer_range from the list of supported extensions if the 
driver vendor is AMD.