Hi, I spent a few hours debugging wined3d performance today. No, I found no magic fix for the slowness, just some semi-usable data.
First I wrote a hacky patch to avoid redundant FBO applications. This gave a tiny, tiny performance increase, see http://www.winehq.org/pipermail/wine- devel/2011-April/089832.html. The main investigation concerned redundant shader applications. The aim was to find out how many of our glBindProgramARB calls are re-binding the same program, and how much this costs. Depending on the game between 20% to 90% of all BindProgram calls are redundant. I'll attach my debug hack so others can test their own apps. I used ARB shaders for testing because they can apply vertex and fragment programs separately. This brings up two questions: (a) How much does this cost (b) Why does this happen The costs: In my draw overhead tester hacking out the redundant apply calls improved performance a lot, from about 101 fps to 157 fps. The biggest part of that are the GL calls. Without them but the remaining shader logic I get 144 fps. Unfortunately this does not translate to any performance gains in real apps. I tried to filter out the redundant apply calls in the simplest way possible: Track the current value per wined3d_context and check before calling glBindProgramARB. This gave the 144 fps in the draw overhead tester, but no measurable increase in any other apps(I tested StarCraft 2, HL2, Team Fortress 2, World in Conflict and a few others) Given the amount of redundant apply calls and the cost of them in the draw overlay tester I have expected at least some improvement. Certainly not a 50% performance increase(the draw overlay tester performs no shader changes at all in the draw loop), but at least a 2-3% gain. So far I have no explanation why I didn't see that. But why do those redundant apply calls happen? It seems like the state dirtification comes all the way from the stream sources and/or vertex declaration. STREAMSRC is linked to VDECL, which is linked to VERTEXSHADER, which in turn reapplies the pixel shader. This means redundant vertex and pixel shader applications. Separating those states will be a major challenge. The vdecl<->vshader link shouldn't be needed any more, except in rare cases where GL_ARB_vertex_array_bgra is not supported and the application switches one attribute from D3DDECL_D3DCOLOR to a non-d3dcolor attribute. If the vertex shader changes we still have to reparse the vertex declaraion and reapply the stream sources because the vshader determines the stream numbers. Maybe we can reduce the number of times this happens by ordering stream usages and indices to make sure shaders with compatible input get the same stream ordering. vdecl and streamsrc are pretty related. If the vdecl is changed we have to reapply the stream sources. The other way around shouldn't cause problems though. There's no need to reapply every stream except the changed ones and there's no need to reapply the vertex shader. The vertex and pixel shader are linked for a few reasons: The shader backend API offers only a function to set both. Basic GLSL only offers a function to set both at once(GL_ARB_separate_shader_objects changes that). And even in ARB the pixel shader input may require some changes in the vertex shader output to get Shader Model 3.0 varyings right. The shader backend API can be changed, but it has to be done in a way that doesn't hurt GLSL without ARB_separate_shader_objects. If we have classic GLSL we have to keep the link. With ARB we can conditionally reapply the vertex shader if the ps_input_signature is changed. To complicate matters there are additional states that affect the shaders, like fog, textures, clipping. We don't keep track of those dependencies. So it's a lot of work to clean up these state dependencies and we don't know how much it'll gain us :-( Stefan
diff --git a/dlls/wined3d/arb_program_shader.c b/dlls/wined3d/arb_program_shader.c
index 1197aac..920118b 100644
--- a/dlls/wined3d/arb_program_shader.c
+++ b/dlls/wined3d/arb_program_shader.c
@@ -39,6 +39,15 @@ WINE_DECLARE_DEBUG_CHANNEL(d3d_constants);
WINE_DECLARE_DEBUG_CHANNEL(d3d_caps);
WINE_DECLARE_DEBUG_CHANNEL(d3d);
+unsigned long runs, vs_change, ps_change;
+unsigned long ps_change_ps_trigger;
+
+unsigned long shader_select_caller = 0;
+BOOL ps_dirty, vs_dirty;
+
+DWORD ps_dirtifiers[1024];
+unsigned int num_ps_dirtifiers;
+
/* Extract a line. Note that this modifies the source string. */
static char *get_line(char **ptr)
{
@@ -4575,7 +4584,9 @@ static void shader_arb_select(const struct wined3d_context *context, BOOL usePS,
const struct wined3d_gl_info *gl_info = context->gl_info;
const struct wined3d_state *state = &This->stateBlock->state;
int i;
+ GLint old_prog;
+ runs++;
/* Deal with pixel shaders first so the vertex shader arg function has the input signature ready */
if (usePS)
{
@@ -4589,6 +4600,20 @@ static void shader_arb_select(const struct wined3d_context *context, BOOL usePS,
priv->current_fprogram_id = compiled->prgId;
priv->compiled_fprog = compiled;
+ GL_EXTCALL(glGetProgramivARB(GL_FRAGMENT_PROGRAM_ARB, GL_PROGRAM_BINDING_ARB, &old_prog));
+ if(priv->current_fprogram_id != old_prog)
+ {
+ ps_change++;
+ }
+ else
+ {
+ unsigned a;
+ ERR("redundant ps apply triggered by %u states\n", num_ps_dirtifiers);
+ for(a = 0; a < num_ps_dirtifiers; a++)
+ {
+ ERR("%s\n", debug_d3dstate(ps_dirtifiers[a]));
+ }
+ }
/* Bind the fragment program */
GL_EXTCALL(glBindProgramARB(GL_FRAGMENT_PROGRAM_ARB, priv->current_fprogram_id));
checkGLcall("glBindProgramARB(GL_FRAGMENT_PROGRAM_ARB, priv->current_fprogram_id);");
@@ -4647,6 +4672,12 @@ static void shader_arb_select(const struct wined3d_context *context, BOOL usePS,
priv->current_vprogram_id = compiled->prgId;
priv->compiled_vprog = compiled;
+ GL_EXTCALL(glGetProgramivARB(GL_VERTEX_PROGRAM_ARB, GL_PROGRAM_BINDING_ARB, &old_prog));
+ if(priv->current_vprogram_id != old_prog)
+ {
+ vs_change++;
+ }
+
/* Bind the vertex program */
GL_EXTCALL(glBindProgramARB(GL_VERTEX_PROGRAM_ARB, priv->current_vprogram_id));
checkGLcall("glBindProgramARB(GL_VERTEX_PROGRAM_ARB, priv->current_vprogram_id);");
@@ -4675,6 +4706,9 @@ static void shader_arb_select(const struct wined3d_context *context, BOOL usePS,
glDisable(GL_VERTEX_PROGRAM_ARB);
checkGLcall("glDisable(GL_VERTEX_PROGRAM_ARB)");
}
+ num_ps_dirtifiers = 0;
+ ERR("Runs %lu, vs change %lu, ps change %lu. Ratio %f %f\n", runs, vs_change, ps_change,
+ ((float) vs_change) / runs, ((float) ps_change) / runs);
}
/* GL locking is done by the caller */
@@ -6257,12 +6291,15 @@ static void fragment_prog_arbfp(DWORD state_id, struct wined3d_stateblock *state
state_texfactor_arbfp(STATE_RENDER(WINED3DRS_TEXTUREFACTOR), stateblock, context);
state_arb_specularenable(STATE_RENDER(WINED3DRS_SPECULARENABLE), stateblock, context);
} else if(use_pshader && !isStateDirty(context, device->StateTable[STATE_VSHADER].representative)) {
+ ps_dirtifiers[num_ps_dirtifiers++] = STATE_PIXELSHADER;
device->shader_backend->shader_select(context, use_pshader, use_vshader);
+ shader_select_caller = 0;
}
return;
}
if(!use_pshader) {
+ GLint old_prog;
/* Find or create a shader implementing the fixed function pipeline settings, then activate it */
gen_ffp_frag_op(stateblock, &settings, FALSE);
desc = (const struct arbfp_ffp_desc *)find_ffp_frag_shader(&priv->fragment_shaders, &settings);
@@ -6287,6 +6324,9 @@ static void fragment_prog_arbfp(DWORD state_id, struct wined3d_stateblock *state
desc = new_desc;
}
+ GL_EXTCALL(glGetProgramivARB(GL_FRAGMENT_PROGRAM_ARB, GL_PROGRAM_BINDING_ARB, &old_prog));
+ if(desc->shader != old_prog) ps_change++;
+
/* Now activate the replacement program. GL_FRAGMENT_PROGRAM_ARB is already active(however, note the
* comment above the shader_select call below). If e.g. GLSL is active, the shader_select call will
* deactivate it.
@@ -6318,7 +6358,9 @@ static void fragment_prog_arbfp(DWORD state_id, struct wined3d_stateblock *state
* shader handler
*/
if(!isStateDirty(context, device->StateTable[STATE_VSHADER].representative)) {
+ ps_dirtifiers[num_ps_dirtifiers++] = STATE_PIXELSHADER;
device->shader_backend->shader_select(context, use_pshader, use_vshader);
+ shader_select_caller = 0;
if (!isStateDirty(context, STATE_VERTEXSHADERCONSTANT) && (use_vshader || use_pshader))
stateblock_apply_state(STATE_VERTEXSHADERCONSTANT, stateblock, context);
diff --git a/dlls/wined3d/context.c b/dlls/wined3d/context.c
index 3d07a35..e545923 100644
--- a/dlls/wined3d/context.c
+++ b/dlls/wined3d/context.c
@@ -1156,6 +1156,11 @@ static void Context_MarkStateDirty(struct wined3d_context *context, DWORD state,
DWORD idx;
BYTE shift;
+ if(rep == STATE_PIXELSHADER || rep == STATE_VDECL)
+ {
+ ps_dirtifiers[num_ps_dirtifiers++] = state;
+ }
+
if (isStateDirty(context, rep)) return;
context->dirtyArray[context->numDirtyEntries++] = rep;
@@ -2277,6 +2282,7 @@ BOOL context_apply_draw_state(struct wined3d_context *context, IWineD3DDeviceImp
state_table[rep].apply(rep, device->stateBlock, context);
}
LEAVE_GL();
+ num_ps_dirtifiers = 0;
context->numDirtyEntries = 0; /* This makes the whole list clean */
context->last_was_blit = FALSE;
diff --git a/dlls/wined3d/device.c b/dlls/wined3d/device.c
index 827365a..847e726 100644
--- a/dlls/wined3d/device.c
+++ b/dlls/wined3d/device.c
@@ -7093,6 +7093,10 @@ void IWineD3DDeviceImpl_MarkStateDirty(IWineD3DDeviceImpl *This, DWORD state) {
for (i = 0; i < This->context_count; ++i)
{
context = This->contexts[i];
+ if(rep == STATE_PIXELSHADER || rep == STATE_VDECL)
+ {
+ ps_dirtifiers[num_ps_dirtifiers++] = state;
+ }
if(isStateDirty(context, rep)) continue;
context->dirtyArray[context->numDirtyEntries++] = rep;
diff --git a/dlls/wined3d/state.c b/dlls/wined3d/state.c
index 5138059..2ee7f6e 100644
--- a/dlls/wined3d/state.c
+++ b/dlls/wined3d/state.c
@@ -3715,6 +3715,7 @@ void apply_pixelshader(DWORD state_id, struct wined3d_stateblock *stateblock, st
}
if(!isStateDirty(context, device->StateTable[STATE_VSHADER].representative)) {
+ ps_dirtifiers[num_ps_dirtifiers++] = STATE_PIXELSHADER;
device->shader_backend->shader_select(context, use_pshader, use_vshader);
if (!isStateDirty(context, STATE_VERTEXSHADERCONSTANT) && (use_vshader || use_pshader)) {
@@ -4698,6 +4699,7 @@ static void vertexdeclaration(DWORD state_id, struct wined3d_stateblock *statebl
* application
*/
if (!isStateDirty(context, STATE_PIXELSHADER)) {
+ ps_dirtifiers[num_ps_dirtifiers++] = STATE_VDECL;
device->shader_backend->shader_select(context, usePixelShaderFunction, useVertexShaderFunction);
if (!isStateDirty(context, STATE_VERTEXSHADERCONSTANT) && (useVertexShaderFunction || usePixelShaderFunction)) {
diff --git a/dlls/wined3d/wined3d_private.h b/dlls/wined3d/wined3d_private.h
index 0a9b36a..24d2810 100644
--- a/dlls/wined3d/wined3d_private.h
+++ b/dlls/wined3d/wined3d_private.h
@@ -55,6 +55,9 @@
typedef struct IWineD3DSurfaceImpl IWineD3DSurfaceImpl;
typedef struct IWineD3DDeviceImpl IWineD3DDeviceImpl;
+extern DWORD ps_dirtifiers[1024];
+extern unsigned int num_ps_dirtifiers;
+
/* Texture format fixups */
enum fixup_channel_source
signature.asc
Description: This is a digitally signed message part.
