This patch series enables resource streamer gather constants for UBOs. With this feature, we treat UBO fetches as push constants instead of pull. The resource streamer hardware makes it possible to gather and pack easily with minimal overhead non-contiguous blocks of constant data from an arbitrary buffer object as is in the case for UBOs sources so the push constant state can treat the gathered constants as one GRF block. I've initially targeted UBOs but the same idea can be theoretically applied to any scattered uniform fetch as well - which I plan to focus on next. Mostly tested on Haswell, v2 has been incubating for some time and I believe I've ironed out most of the major issues on the fs-backend. All piglit tests for fragment shaders are passing. The vec4 backend still needs some additional fine-tuning but it passes all vertex and geometry shader piglit tests as well except gs-mat4x3. I've added a new environment flag to selectively enable which shader stages to optimize. Initial posting here if someone needs the original overview of the series: http://lists.freedesktop.org/archives/mesa-dev/2015-January/073594.html
Entire series lives here: git://people.freedesktop.org/~abj/mesa:rs_gather_constants_NIR Below are some real-world results from Unreal Engine 4 demos which feature heavy UBO usage. The benchmark enabled use of gather constants only for the fragment shaders. EffectsCave (NIR disabled): x fs gather constants disabled + fs gather constants enabled N Min Max Median Avg Stddev x 10 4.6008 4.83961 4.80967 4.791587 0.06943449 + 10 5.05152 5.14954 5.11507 5.106432 0.031042147 Difference at 95.0% confidence 0.314845 ± 0.0505323 6.57079% ± 1.0546% EffectsCave (NIR enabled): x fs gather constants disabled + fs gather constants enabled N Min Max Median Avg Stddev x 10 3.99146 4.26072 4.19591 4.157199 0.093623634 + 10 4.51396 4.59149 4.58185 4.574359 0.022251777 Difference at 95.0% confidence 0.41716 ± 0.0639358 10.0346% ± 1.53795% Reflections Subway (NIR disabled): x fs gather constants disabled + fs gather constants enabled N Min Max Median Avg Stddev x 10 6.64539 7.28898 7.11371 7.083675 0.19290418 + 10 7.58844 7.66247 7.64003 7.632628 0.022702317 Difference at 95.0% confidence 0.548953 ± 0.129049 7.74955% ± 1.82178% Reflections Subway (NIR enabled): x fs gather constants disabled + fs gather constants enabled N Min Max Median Avg Stddev x 10 6.03644 6.19722 6.08858 6.097111 0.062671415 + 10 6.30447 6.4363 6.35115 6.358372 0.043168601 Difference at 95.0% confidence 0.261261 ± 0.0505605 4.285% ± 0.829254% What's changed since initial posting: * Lots of squashed patches (~50 --> ~30)! * Use environment variable INTEL_UBO_GATHER=vs,fs,gs to selectively enable which shader stage to optimize with this feature. * NIR support for the fs-backend. * Remove unrelated fine-grained uniform support which I'll resubmit in a separate patch series. Dependencies: * You'll need the i915 kernel driver which enables the resource streamer. I plan to submit this in a separate patch series to the i915 mailing list: git://people.freedesktop.org/~abj/linux:intel_resource_streamer_2 * libdrm with updated headers: git://people.freedesktop.org/~abj/libdrm:libdrm_rs Patch overview: Patches 1 -5: Enables core resource streamer functionality and hardware-generated binding tables Patches 6 -10: Switches on the hardware bits for gather push constants Patches 11-16: Core compiler support Patches 17-20: Support for original i965 fs backend Patches 19: Support for NIR fs backend Patches 21-23: Support for vec4 backend Patches 24-26: Required state setup and workarounds Patches 29: Switch on push constants whenever we have UBO entries. Signed-off-by: Abdiel Janulgue <abdiel.janul...@linux.intel.com> --- src/glsl/nir/nir_types.cpp | 11 ++ src/glsl/nir/nir_types.h | 4 + .../drivers/dri/i965/brw_binding_tables.c | 180 +++++++++++++++++- src/mesa/drivers/dri/i965/brw_context.c | 41 ++++ src/mesa/drivers/dri/i965/brw_context.h | 36 ++++ src/mesa/drivers/dri/i965/brw_defines.h | 47 +++++ src/mesa/drivers/dri/i965/brw_fs.cpp | 71 ++++++- src/mesa/drivers/dri/i965/brw_fs.h | 6 + src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 59 ++++++ src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 86 ++++++++- src/mesa/drivers/dri/i965/brw_gs.c | 15 ++ src/mesa/drivers/dri/i965/brw_program.c | 5 + src/mesa/drivers/dri/i965/brw_shader.cpp | 4 +- src/mesa/drivers/dri/i965/brw_shader.h | 11 ++ src/mesa/drivers/dri/i965/brw_state.h | 19 +- src/mesa/drivers/dri/i965/brw_state_upload.c | 9 +- src/mesa/drivers/dri/i965/brw_vec4.cpp | 62 ++++-- src/mesa/drivers/dri/i965/brw_vec4.h | 3 + .../drivers/dri/i965/brw_vec4_visitor.cpp | 80 ++++++++ src/mesa/drivers/dri/i965/brw_vs.c | 18 ++ src/mesa/drivers/dri/i965/brw_wm.c | 18 ++ .../drivers/dri/i965/brw_wm_surface_state.c | 6 + src/mesa/drivers/dri/i965/gen6_gs_state.c | 2 +- src/mesa/drivers/dri/i965/gen6_vs_state.c | 39 +++- src/mesa/drivers/dri/i965/gen6_wm_state.c | 2 +- src/mesa/drivers/dri/i965/gen7_blorp.cpp | 1 + src/mesa/drivers/dri/i965/gen7_disable.c | 4 + src/mesa/drivers/dri/i965/gen7_vs_state.c | 73 ++++++- src/mesa/drivers/dri/i965/gen7_wm_state.c | 2 +- src/mesa/drivers/dri/i965/intel_batchbuffer.c | 7 +- src/mesa/drivers/dri/i965/intel_reg.h | 3 + 31 files changed, 881 insertions(+), 43 deletions(-) _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev