[Mesa-dev] [PATCH] spirv: handle gl_SampleMask

2017-01-24 Thread Iago Toral Quiroga
SPIR-V maps both gl_SampleMask and gl_SampleMaskIn to the same builtin (SampleMask). The only way to tell which one we are dealing with is to check if it is an input or an output. Fixes: dEQP-VK.pipeline.multisample_shader_builtin.sample_mask.write.* --- I am still waiting on Jenkins to report res

[Mesa-dev] [PATCH] spirv: acknowledge multisampled input attachments

2017-01-24 Thread Iago Toral Quiroga
This avoids tons of crashes in dEQP-VK.pipeline.multisample_shader_builtin.sample_id.* dEQP-VK.pipeline.multisample_shader_builtin.sample_mask.* dEQP-VK.pipeline.multisample_shader_builtin.sample_position.* --- With this patch some of the tests pass, some still fail and some (from the sample_mask

[Mesa-dev] [PATCH] anv: set UAV coherence required bit when needed

2017-01-16 Thread Iago Toral Quiroga
The same we do in the OpenGL driver (comment copied from there). This is required to ensure that we execute the fragment shader stage when side-effects (such as image or ssbo stores) are present but there are no color writes. I found this while writing a test to check rendering to a framebuffer w

[Mesa-dev] [PATCH] spirv: fix typo in warning message

2017-01-09 Thread Iago Toral Quiroga
--- src/compiler/spirv/spirv_to_nir.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/compiler/spirv/spirv_to_nir.c b/src/compiler/spirv/spirv_to_nir.c index befee1a..2205072 100644 --- a/src/compiler/spirv/spirv_to_nir.c +++ b/src/compiler/spirv/spirv_to_nir.c @@ -559,7 +

[Mesa-dev] [PATCH] spirv: gl_PrimitiveID in the fragment shader is handled as an input

2017-01-09 Thread Iago Toral Quiroga
Geometry and Tessellation stages do handle this as a system value instead. Fixes: dEQP-VK.geometry.basic.primitive_id --- src/compiler/spirv/vtn_variables.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/src/compiler/spirv/vtn_variables.c b/src/compiler/spirv/vtn_variables

[Mesa-dev] [PATCH] isl: render target cube maps should be handled as 2D images, not cubes

2017-01-06 Thread Iago Toral Quiroga
This fixes layered rendering Vulkan CTS tests with cube (arrays). We also do this in the GL driver, see this code from gen8_depth_state.c for example: case GL_TEXTURE_CUBE_MAP_ARRAY: case GL_TEXTURE_CUBE_MAP: /* The PRM claims that we should use BRW_SURFACE_CUBE for this * situation, but ex

[Mesa-dev] [PATCH] anv: don't skip the VUE header if we are reading gl_Layer in a fragment shader

2017-01-05 Thread Iago Toral Quiroga
This is the same we do in the GL driver: the hardware provides gl_Layer in the VUE header, so when the fragment shader reads it we can't skip it. --- With this patch we now successfully read gl_Layer in fragment shaders. Layered rendering still does not work though, probably because we still need

[Mesa-dev] [PATCH] docs: Mark GL_ARB_gpu_shader_fp64 and OpenGL 4.0 as done for i965/hsw+

2017-01-05 Thread Iago Toral Quiroga
--- docs/features.txt | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/features.txt b/docs/features.txt index 63b45af..f4a67df 100644 --- a/docs/features.txt +++ b/docs/features.txt @@ -107,7 +107,7 @@ GL 3.3, GLSL 3.30 --- all DONE: i965, nv50, nvc0, r600, radeonsi, l

[Mesa-dev] [PATCH] docs: add GL_ARB_gpu_shader_fp64 and OpenGL 4.0 support for Intel Haswell.

2017-01-04 Thread Iago Toral Quiroga
--- docs/relnotes/13.1.0.html | 2 ++ 1 file changed, 2 insertions(+) diff --git a/docs/relnotes/13.1.0.html b/docs/relnotes/13.1.0.html index 4dce843..124da93 100644 --- a/docs/relnotes/13.1.0.html +++ b/docs/relnotes/13.1.0.html @@ -47,6 +47,8 @@ Note: some of the new features are only availabl

[Mesa-dev] [PATCH v2 4/4] i965: add a kernel_features bitfield to intel screen

2017-01-04 Thread Iago Toral Quiroga
We can use this to track various features that may or may not be supported by the hw / kernel. Currently, we usually do this by checking the generation and supported command parser versions in various places thoughtout the driver code. With this patch, we centralize all these checks in just once pl

[Mesa-dev] [PATCH v2 3/4] i965/gen7: Enable OpenGL 4.0 in Haswell when supported

2017-01-04 Thread Iago Toral Quiroga
--- src/mesa/drivers/dri/i965/intel_extensions.c | 2 ++ src/mesa/drivers/dri/i965/intel_screen.c | 3 ++- 2 files changed, 4 insertions(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/intel_extensions.c b/src/mesa/drivers/dri/i965/intel_extensions.c index 8e67b57..aa89380 100644 --

[Mesa-dev] [PATCH v2 1/4] i965: Move the pipelined test for SO register access to the screen

2017-01-04 Thread Iago Toral Quiroga
From: Chris Wilson Moving the test to the screen places it alongside the other global HW feature tests that want to be shared between contexts. Also, we need to know if we support pipelined register writes at screen creation time so that we can tell if we can expose OpenGL 4.0 in gen7. Signed-o

[Mesa-dev] [PATCH v2 2/4] i965: get rid of brw->can_do_pipelined_register_writes

2017-01-04 Thread Iago Toral Quiroga
Instead, check the screen field directly. --- src/mesa/drivers/dri/i965/brw_context.c | 2 -- src/mesa/drivers/dri/i965/brw_context.h | 5 - src/mesa/drivers/dri/i965/gen7_l3_state.c| 5 +++-- src/mesa/drivers/dri/i965/intel_extensions.c | 2 +- src/mesa/drivers/dri/i965/intel_sc

[Mesa-dev] [PATCH v2 0/4] Enable OpenGL 4.0 on Haswell

2017-01-04 Thread Iago Toral Quiroga
ow we current were doing this and we save one flag, let me know if you prefer them to be separate flags. Chris Wilson (1): i965: Move the pipelined test for SO register access to the screen Iago Toral Quiroga (3): i965: get rid of brw->can_do_pipelined_register_writes i965/gen7: Enable OpenG

[Mesa-dev] [PATCH 2/6] i965: make intel_batchbuffer_emit_dword() take a batchbuffer as argument

2017-01-03 Thread Iago Toral Quiroga
--- src/mesa/drivers/dri/i965/intel_batchbuffer.c | 6 +++--- src/mesa/drivers/dri/i965/intel_batchbuffer.h | 18 +- 2 files changed, 12 insertions(+), 12 deletions(-) diff --git a/src/mesa/drivers/dri/i965/intel_batchbuffer.c b/src/mesa/drivers/dri/i965/intel_batchbuffer.c inde

[Mesa-dev] [PATCH 3/6] i965: make intel_batchbuffer_free() take a batchbuffer as argument

2017-01-03 Thread Iago Toral Quiroga
--- src/mesa/drivers/dri/i965/brw_context.c | 2 +- src/mesa/drivers/dri/i965/intel_batchbuffer.c | 8 src/mesa/drivers/dri/i965/intel_batchbuffer.h | 2 +- 3 files changed, 6 insertions(+), 6 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_context.c b/src/mesa/drivers/dri

[Mesa-dev] [PATCH 5/6] i965: check if we support pipelined register writes at screen creation time

2017-01-03 Thread Iago Toral Quiroga
We need this feature to enable OpenGL 4.0 on gen7 hardware. Some hardware and kernel combinations support this while other don't, so we check for it by writing to a register and verifying the result. Unfortunately, we need to know this at screen creation time when we don't have a brw_context avail

[Mesa-dev] [PATCH 0/6] Enable OpenGL 4.0 on Haswell

2017-01-03 Thread Iago Toral Quiroga
rg/archives/mesa-dev/2016-October/133502.html Iago Toral Quiroga (6): i965: Make intel_bachbuffer_reloc() take a batchbuffer argument i965: make intel_batchbuffer_emit_dword() take a batchbuffer as argument i965: make intel_batchbuffer_free() take a batchbuffer as argument i965: remove br

[Mesa-dev] [PATCH 4/6] i965: remove brw_context dependency from intel_batchbuffer_init()

2017-01-03 Thread Iago Toral Quiroga
--- src/mesa/drivers/dri/i965/brw_context.c | 2 +- src/mesa/drivers/dri/i965/intel_batchbuffer.c | 59 +++ src/mesa/drivers/dri/i965/intel_batchbuffer.h | 3 +- 3 files changed, 36 insertions(+), 28 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_context.c

[Mesa-dev] [PATCH 6/6] i965/gen7: Enable OpenGL 4.0 in Haswell when supported

2017-01-03 Thread Iago Toral Quiroga
--- src/mesa/drivers/dri/i965/intel_extensions.c | 2 ++ src/mesa/drivers/dri/i965/intel_screen.c | 11 --- 2 files changed, 10 insertions(+), 3 deletions(-) diff --git a/src/mesa/drivers/dri/i965/intel_extensions.c b/src/mesa/drivers/dri/i965/intel_extensions.c index 5be8f3a..6d9c3

[Mesa-dev] [PATCH 1/6] i965: Make intel_bachbuffer_reloc() take a batchbuffer argument

2017-01-03 Thread Iago Toral Quiroga
--- src/mesa/drivers/dri/i965/genX_blorp_exec.c | 4 ++-- src/mesa/drivers/dri/i965/intel_batchbuffer.c | 8 src/mesa/drivers/dri/i965/intel_batchbuffer.h | 18 +- 3 files changed, 15 insertions(+), 15 deletions(-) diff --git a/src/mesa/drivers/dri/i965/genX_blorp_exe

[Mesa-dev] [PATCH] nir/lower_tex: fix number of components in replace_gradient_with_lod()

2016-12-14 Thread Iago Toral Quiroga
We should make the dest in the textureLod() operation have the same number of components as the destination in the original textureGrad() Fixes regression in ES3-CTS.gtf.GL3Tests.shadow Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99072 --- src/compiler/nir/nir_lower_tex.c | 3 ++- 1 f

[Mesa-dev] [PATCH v2 5/7] i965/nir: enable lowering of texture gradient for shadow samplers

2016-12-12 Thread Iago Toral Quiroga
This gets the lowering on the Vulkan driver too, which is required for hardware that does not have the sample_l_d message (up to IvyBridge). --- src/mesa/drivers/dri/i965/brw_nir.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_nir.c b/src/mesa/drivers/dri/i9

[Mesa-dev] [PATCH v2 1/7] nir/lower_tex: generalize get_texture_size()

2016-12-12 Thread Iago Toral Quiroga
This was written specifically for RECT samplers. Make it more generic so we can call this from the gradient lowerings too. --- src/compiler/nir/nir_lower_tex.c | 15 ++- 1 file changed, 10 insertions(+), 5 deletions(-) diff --git a/src/compiler/nir/nir_lower_tex.c b/src/compiler/nir/n

[Mesa-dev] [PATCH v2 7/7] nir/lower_tex: lower gradients on shadow cube maps if lower_txd_shadow is set

2016-12-12 Thread Iago Toral Quiroga
Even if lower_txd_cube_map isn't. Suggested by Ken to make the flag more consistent with its name. --- src/compiler/nir/nir_lower_tex.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/src/compiler/nir/nir_lower_tex.c b/src/compiler/nir/nir_lower_tex.c index a9370c3..95c9f

[Mesa-dev] [PATCH v2 6/7] i965: remove brw_lower_texture_gradients

2016-12-12 Thread Iago Toral Quiroga
This has been ported to NIR now so we don'tneed to keep the GLSL IR lowering any more. --- src/mesa/drivers/dri/i965/Makefile.sources | 1 - src/mesa/drivers/dri/i965/brw_context.h| 2 - src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 2 +- src/mesa/drivers/dri/i965/

[Mesa-dev] [PATCH v2 4/7] nir/lower_tex: add lowering for texture gradient on shadow samplers

2016-12-12 Thread Iago Toral Quiroga
This is ported from the Intel lowering pass that we use with GLSL IR. This takes care of lowering texture gradients on shadow samplers other than cube maps. Intel hardware requires this for gen < 8. v2 (Ken): - Use the helper function to retrieve ddx/ddy - Swizzle away size components we are not

[Mesa-dev] [PATCH v2 3/7] i965/nir: enable lowering of texture gradient for cube maps

2016-12-12 Thread Iago Toral Quiroga
This gets the lowering on the Vulkan driver too. Fixes Vulkan CTS cube map texture gradient tests in: dEQP-VK.glsl.texture_functions.texturegrad.* --- src/mesa/drivers/dri/i965/brw_nir.c | 1 + 1 file changed, 1 insertion(+) diff --git a/src/mesa/drivers/dri/i965/brw_nir.c b/src/mesa/drivers/dr

[Mesa-dev] [PATCH v2 2/7] nir/lower_tex: add lowering for texture gradient on cube maps

2016-12-12 Thread Iago Toral Quiroga
This is ported from the Intel lowering pass that we use with GLSL IR. The NIR pass only handles cube maps, not shadow samplers, which are also lowered for gen < 8 on Intel hardware. We will add support for that in a later patch, at which point we should be able to remove the GLSL IR lowering pass.

[Mesa-dev] [PATCH v2 0/7] i965: port texture gradient lowering to NIR

2016-12-12 Thread Iago Toral Quiroga
, where we need to pass 2 components instead of one for example. Since we are swizzling away undesired components in the current GLSL IR lowering and you gave your Rb without asking for that change I did not look too much into it, but I can spend some more time into it if you think it is worth

[Mesa-dev] [PATCH] spirv: Builtin Layer is an input for fragment shaders

2016-12-02 Thread Iago Toral Quiroga
This change makes it so we emit a load_input intrinsic when Layer is read in a fragment shader. --- Even with this, layered rendering does not seem to work in the Vulkan driver, so there is something else that is broken. We are probably not mapping the Layer input correctly somewhere. src/compil

[Mesa-dev] [PATCH 6/6] i965: remove brw_lower_texture_gradients

2016-11-30 Thread Iago Toral Quiroga
This has been ported to NIR now so we don'tneed to keep the GLSL IR lowering any more. --- src/mesa/drivers/dri/i965/Makefile.sources | 1 - src/mesa/drivers/dri/i965/brw_context.h| 2 - src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 2 +- src/mesa/drivers/dri/i965/

[Mesa-dev] [PATCH 5/6] i965/nir: enable lowering of texture gradient for shadow samplers

2016-11-30 Thread Iago Toral Quiroga
This gets the lowering on the Vulkan driver too, which is required for hardware that does not have the sample_l_d message (up to IvyBridge). --- src/mesa/drivers/dri/i965/brw_nir.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_nir.c b/src/mesa/drivers/dri/i9

[Mesa-dev] [PATCH 2/6] nir/lower_tex: add lowering for texture gradient on cube maps

2016-11-30 Thread Iago Toral Quiroga
This is ported from the Intel lowering pass that we use with GLSL IR. The NIR pass only handles cube maps, not shadow samplers, which are also lowered for gen < 8 on Intel hardware. We will add support for that in a later patch, at which point we should be able to remove the GLSL IR lowering pass.

[Mesa-dev] [PATCH 4/6] nir/lower_tex: add lowering for texture gradient on shadow samplers

2016-11-30 Thread Iago Toral Quiroga
This is ported from the Intel lowering pass that we use with GLSL IR. This takes care of lowering texture gradients on shadow samplers other than cube maps. Intel hardware requires this for gen < 8. --- src/compiler/nir/nir.h | 7 +++ src/compiler/nir/nir_lower_tex.c | 40 ++

[Mesa-dev] [PATCH 0/6] i965: port texture gradient lowering to NIR

2016-11-30 Thread Iago Toral Quiroga
GLSL IR lowering pass. Iago Toral Quiroga (6): nir/lower_tex: generalize get_texture_size() nir/lower_tex: add lowering for texture gradient on cube maps i965/nir: enable lowering of texture gradient for cube maps nir/lower_tex: add lowering for texture gradient on shadow samplers i965/nir

[Mesa-dev] [PATCH 3/6] i965/nir: enable lowering of texture gradient for cube maps

2016-11-30 Thread Iago Toral Quiroga
This gets the lowering on the Vulkan driver too. Fixes Vulkan CTS cube map texture gradient tests in: dEQP-VK.glsl.texture_functions.texturegrad.* --- src/mesa/drivers/dri/i965/brw_nir.c | 1 + 1 file changed, 1 insertion(+) diff --git a/src/mesa/drivers/dri/i965/brw_nir.c b/src/mesa/drivers/dr

[Mesa-dev] [PATCH 1/6] nir/lower_tex: generalize get_texture_size()

2016-11-30 Thread Iago Toral Quiroga
This was written specifically for RECT samplers. Make it more generic so we can call this from the gradient lowerings too. --- src/compiler/nir/nir_lower_tex.c | 15 ++- 1 file changed, 10 insertions(+), 5 deletions(-) diff --git a/src/compiler/nir/nir_lower_tex.c b/src/compiler/nir/n

[Mesa-dev] [PATCH] anv/state: if enabled, use anisotropic filtering also with VK_FILTER_NEAREST

2016-11-24 Thread Iago Toral Quiroga
Fixes multiple Vulkan CTS tests that combine anisotropy and VK_FILTER_NEAREST in dEQP-VK.texture.filtering_anisotropy.* --- src/intel/vulkan/genX_state.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/intel/vulkan/genX_state.c b/src/intel/vulkan/genX_state.c index 4122395.

[Mesa-dev] [PATCH] anv/state: enable coordinate address rounding for Min/Mag filters

2016-11-18 Thread Iago Toral Quiroga
This patch improves pass rate of dEQP-VK.texture.explicit_lod.2d.sizes.* from 68.0% (98/144) to 83.3% (120/144) by enabling sampler address rounding mode when the selected filter is not nearest, which is the same thing we do for OpenGL. These tests check texture filtering for various texture sizes

[Mesa-dev] [PATCH v2] nir/spirv: implement ordered / unordered floating point comparisons properly

2016-11-17 Thread Iago Toral Quiroga
Besides the logical operation involved, these also require that we test if the operands are ordered / unordered. For ordered operations, both operands must be ordered (and they must pass the conditional test) while for unordered operations it is sufficient if only one of the operands is unordered

[Mesa-dev] [PATCH] anv: don't try to free VK_NULL_HANDLE objects

2016-11-16 Thread Iago Toral Quiroga
Initial Vulkan documentation stated that it was invalid to free VK_NULL_HANDLE objects, however this has been changed. From the Vulkan 1.0.33 specification. 2.6.2. Implicit Valid Usage. Valid Usage for Object Handles: "It is valid to pass VK_NULL_HANDLE to any vkDestroy* or vkFree* command, which

[Mesa-dev] [PATCH] nir/spirv: implement ordered / unordered floating point comparisons properly

2016-11-15 Thread Iago Toral Quiroga
Besides the logical operation involved, these also require that we test if the operands are ordered / unordered. For ordered operations, both operands must be ordered (and they must pass the conditional test) while for unordered operations it is sufficient if only one of the operands is unordered

[Mesa-dev] [PATCH 1/2] anv/format: handle unsupported formats properly

2016-11-14 Thread Iago Toral Quiroga
According to the spec for vkGetPhysicalDeviceImageFormatProperties: "If format is not a supported image format, or if the combination of format, type, tiling, usage, and flags is not supported for images, then vkGetPhysicalDeviceImageFormatProperties returns VK_ERROR_FORMAT_NOT_SUPPORTED." Mak

[Mesa-dev] [PATCH 2/2] anv/format: support VK_FORMAT_R8G8B8_SRGB

2016-11-14 Thread Iago Toral Quiroga
Fixes dEQP-VK.api.image_clearing.clear_color_image.1d_r8g8b8_srgb --- src/intel/vulkan/anv_formats.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/intel/vulkan/anv_formats.c b/src/intel/vulkan/anv_formats.c index bca9aeb..2adde8b 100644 --- a/src/intel/vulkan/anv_formats.

[Mesa-dev] [PATCH] glsl: validate output blocks against input blocks

2016-11-04 Thread Iago Toral Quiroga
Until now were validating in/out blocks by listing the inputs in the consumer stage and then, for each output of the producer, we checked that it was a match if it was consumed. This method does not catch the case where the consumer has an input that is not present as an output in the producer stag

[Mesa-dev] [PATCH 3/2] i965/vec4: initialize the pull param count 0 every time we run the visitor

2016-11-03 Thread Iago Toral Quiroga
This was being initialized just once to 0, but we want to reset it every time we call run() on the visitor, which can happen more than once in scenarios where we failed the compilation using a method and we are falling back to a different configuration. This is possible, for example, with geometry

[Mesa-dev] [PATCH 2/2] i965/vec4: fix indentation in lower_attributes_to_hw_regs()

2016-11-02 Thread Iago Toral Quiroga
--- src/mesa/drivers/dri/i965/brw_vec4.cpp | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp b/src/mesa/drivers/dri/i965/brw_vec4.cpp index 1fb65f3..03c83af 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp +++ b/src/mesa/drive

[Mesa-dev] [PATCH 1/2] i965/vec4: Fail compilation if we run out of registers during payload setup

2016-11-02 Thread Iago Toral Quiroga
Right now, if this happens we trigger an assertion when we try to create a GRF register > 128. Let the backend compiler fail the compilation in this case. For things like geometry shaders, this can trigger a compilation using interleaved attribute setup which might even help get the shader to compi

[Mesa-dev] [PATCH] glsl: add matrix layout information to interface block types

2016-10-21 Thread Iago Toral Quiroga
So far we have been checking that interface block definitions had matching matrix layouts by comparing the definitions of their fields, however, this does not cover the case where the interface blocks are defined with mismatching matrix layouts but don't define any field with a matrix type. In this

[Mesa-dev] [PATCH v2.1] i965/gen7: expose OpenGL 4.0 on Haswell

2016-10-20 Thread Iago Toral Quiroga
ARB_gpu_shader_fp64 was the last piece missing. Notice that some hardware and kernel combinations do not support pipelined register writes, which are required for some OpenGL 4.0 features, in which case the driver won't expose 4.0. v2 (Ian, Ken): - We should not set max_gl_core_version to 40 if

[Mesa-dev] [PATCH] glsl: Indirect array indexing on non-last SSBO member must fail compilation

2016-10-18 Thread Iago Toral Quiroga
After the changes in comit 5b2675093e863a52, we moved this check to the linker, but the spec expects this to be checked at compile-time. There are dEQP tests that expect an error at compile time and the spec seems to confirm that expectation: "Except for the last declared member of a shader storag

[Mesa-dev] [PATCH] i965/fs/nir: fix double pack from previous unpack optimization

2016-10-17 Thread Iago Toral Quiroga
It seems I initially wrote this as: if (cond_for_opt) { } and then I modified the style at some point to be like: if (!cond_for_opt) continue; But I did not re-write all the conditions accordingly. --- I tested this quickly on a haswell with our fp64 branch and it did not show any regre

[Mesa-dev] [PACH v2] glsl: SSBO unsized array declarations, if present, must be declared last

2016-10-14 Thread Iago Toral Quiroga
From the ARB_shader_storage_buffer_object spec: "In a shader storage block, the last member may be declared without an explicit size. In this case, the effective array size is inferred at run-time from the size of the data store backing the interface block. Such unsized arrays may be indexed

[Mesa-dev] [PATCH 2/2] glsl: SSBO unsized array declarations, if present, must be declared last

2016-10-14 Thread Iago Toral Quiroga
From the ARB_shader_storage_buffer_object spec: "In a shader storage block, the last member may be declared without an explicit size. In this case, the effective array size is inferred at run-time from the size of the data store backing the interface block. Such unsized arrays may be indexed

[Mesa-dev] [PATCH 1/2] glsl: fail compilation of compute shaders when unsupported

2016-10-14 Thread Iago Toral Quiroga
Generally, we only check for the presence of compute shaders during parsing when we find any language (like layout qualifiers) that are specific to compute shaders, however, it is possible to define an empty compute shader does not use any language specific to compute shaders at all and we should f

[Mesa-dev] [PATCH v2.1] i965/vec4: teach register coalescing about 64-bit

2016-10-13 Thread Iago Toral Quiroga
Specifically, at least for now, we don't want to deal with the fact that channel sizes for fp64 instructions are twice the size, so prevent coalescing from instructions with a different type size. Also, we should check that if we are coalescing a register from another MOV we should be writing the

[Mesa-dev] [PATCH v2 102/103] i965/vec4: enable ARB_gpu_shader_fp64 for Haswell

2016-10-11 Thread Iago Toral Quiroga
--- src/mesa/drivers/dri/i965/intel_extensions.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/src/mesa/drivers/dri/i965/intel_extensions.c b/src/mesa/drivers/dri/i965/intel_extensions.c index eef81e2..0491145 100644 --- a/src/mesa/drivers/dri/i965/intel_extensions.c +++ b/src/mesa/drive

[Mesa-dev] [PATCH v2 103/103] i965/gen7: expose OpenGL 4.0 on Haswell

2016-10-11 Thread Iago Toral Quiroga
ARB_gpu_shader_fp64 was the last piece missing. Notice that some hardware and kernel combinations do not support pipelined register writes, which are required for some OpenGL 4.0 features, in which case the driver won't expose 4.0. --- src/mesa/drivers/dri/i965/intel_extensions.c | 2 ++ src/mesa/

[Mesa-dev] [PATCH v2 100/103] i965/vec4: prevent spilling of DOUBLE_TO_SINGLE destination

2016-10-11 Thread Iago Toral Quiroga
DOUBLE_TO_SINGLE opcodes are setup so that they use a dst register with a size of 2 even if they only produce a single-precison result (this is so that the opcode can use the larger register to produce a 64-bit aligned intermediary result as required by the hardware during the conversion process).

[Mesa-dev] [PATCH v2 101/103] i965/vec4: adjust spilling costs for 64-bit registers.

2016-10-11 Thread Iago Toral Quiroga
--- src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp | 15 +-- 1 file changed, 13 insertions(+), 2 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp b/src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp index 0bf9f05..a551267 100644 --- a/src/mesa/drive

[Mesa-dev] [PATCH v2 098/103] i965/vec4: support basic spilling of 64-bit registers

2016-10-11 Thread Iago Toral Quiroga
The current spilling code can't spill vgrf allocations larger than 1 but SIMD4x2 doubles require 2 vgrfs, so we need to permit this case (which is handled properly for DF data types by emitting 2 scratch messages and doing data shuffling). We accomplish this by not auto-disabling spilling for vgrf

[Mesa-dev] [PATCH v2 038/103] i965: move exec_size from fs_instruction to backend_instruction

2016-10-11 Thread Iago Toral Quiroga
We are going to need this in the vec4 backend too. --- src/mesa/drivers/dri/i965/brw_ir_fs.h | 7 --- src/mesa/drivers/dri/i965/brw_shader.h | 7 +++ src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp | 1 + 3 files changed, 8 insertions(+), 7 deletions(-) diff --git a/src/me

[Mesa-dev] [PATCH v2 061/103] i965/vec4/nir: do not emit 64-bit MAD

2016-10-11 Thread Iago Toral Quiroga
RepCtrl=1 does not work with 64-bit operands so we need to use RepCtrl=0. In that situation, the regioning generated for the sources seems to be equivalent to <4,4,1>:DF, so it will only work for components XY, which means that we have to move any other swizzle to a temporary so that we can source

[Mesa-dev] [PATCH v2 096/103] i965/vec4: prevent src/dst hazards during 64-bit register allocation

2016-10-11 Thread Iago Toral Quiroga
8-wide compressed DF operations are executed as two separate 4-wide DF operations. In that scenario, we have to be careful when we allocate register space for their operands to prevent the case where the first half of the instruction overwrites the source of the second half. To do this we mark com

[Mesa-dev] [PATCH v2 054/103] i965/vec4: translate 64-bit swizzles to 32-bit

2016-10-11 Thread Iago Toral Quiroga
The hardware can only operate with 32-bit swizzles, which is a rather limiting restriction. However, the idea is not to expose this to the optimization passes, which would be a mess to deal with. Instead, we let the bulk of the vec4 backend ignore this fact and we fix the swizzles right at codegen

[Mesa-dev] [PATCH v2 076/103] i965/vec4: fix scratch offset for 64bit data

2016-10-11 Thread Iago Toral Quiroga
A vec4 is 16 bytes and a dvec4 is 32 bytes so for doubles we have to multiply the reladdr by 2. The reg_offset part is in units of 16 bytes and is used to select the low/high 16-byte chunk of a full dvec4, so we don't want to multiply that part of the address. --- src/mesa/drivers/dri/i965/brw_vec

[Mesa-dev] [PATCH v2 087/103] i965/vec4/tcs: fix input loading for 64-bit data

2016-10-11 Thread Iago Toral Quiroga
--- src/mesa/drivers/dri/i965/brw_vec4_tcs.cpp | 34 ++ 1 file changed, 30 insertions(+), 4 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_vec4_tcs.cpp b/src/mesa/drivers/dri/i965/brw_vec4_tcs.cpp index 498fb7c..f62dc9c 100644 --- a/src/mesa/drivers/dri/i965/

[Mesa-dev] [PATCH v2 045/103] i965: move the group field from fs_inst to backend_instruction.

2016-10-11 Thread Iago Toral Quiroga
Just like the exec_size, we are going to need this in the vec4 backend when we implement a simd splitting pass. --- src/mesa/drivers/dri/i965/brw_ir_fs.h | 9 - src/mesa/drivers/dri/i965/brw_shader.h | 9 + src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp | 1 + 3 fi

[Mesa-dev] [PATCH v2 031/103] i965/vec4: implement hardware workaround for align16 double to float conversion

2016-10-11 Thread Iago Toral Quiroga
From the BDW PRM, Workarounds chapter: "DF->f format conversion for Align16 has wrong emask calculation when source is immediate." So detect the case and move the immediate source to a VGRF before we attempt the conversion. Notice that Broadwell and later are strictly scalar at the moment

[Mesa-dev] [PATCH v2 081/103] i965/vec4: fix move_push_constants_to_pull_constants() for 64-bit data

2016-10-11 Thread Iago Toral Quiroga
--- src/mesa/drivers/dri/i965/brw_vec4.cpp | 20 +--- 1 file changed, 17 insertions(+), 3 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp b/src/mesa/drivers/dri/i965/brw_vec4.cpp index 0788ba2..b0bc2d5 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp +++ b/src

[Mesa-dev] [PATCH v2 058/103] i965/vec4: fix pack_uniform_registers for doubles

2016-10-11 Thread Iago Toral Quiroga
We need to consider the fact that dvec3/4 require two vec4 slots. --- src/mesa/drivers/dri/i965/brw_vec4.cpp | 11 +-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp b/src/mesa/drivers/dri/i965/brw_vec4.cpp index e5391b9..b79fd5e 10064

[Mesa-dev] [PATCH v2 056/103] i965/disasm: fix subreg for dst in Align16 mode

2016-10-11 Thread Iago Toral Quiroga
There is a single bit for this, so it is a binary 0 or 1 meaning offset 0B or 16B respectively. v2: - Since brw_inst_dst_da16_subreg_nr() is known to be 1, remove it from the expression (Curro) Reviewed-by: Francisco Jerez --- src/mesa/drivers/dri/i965/brw_disasm.c | 2 +- 1 file changed,

[Mesa-dev] [PATCH v2 049/103] i965/disasm: print NibCtrl for instructions with execsize < 8

2016-10-11 Thread Iago Toral Quiroga
v2 (Curro): - Print it also for execsize < 4. - QtrCtrl is still in effect, so print 2 * qtr_ctl + nib_ctl + 1 - Do not read the nib ctl from the instruction in gen < 7, the field only exists in gen7+. --- src/mesa/drivers/dri/i965/brw_disasm.c | 6 +- 1 file changed, 5 insertions(+)

[Mesa-dev] [PATCH v2 023/103] i965/vec4/nir: implement double comparisons

2016-10-11 Thread Iago Toral Quiroga
--- src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 21 ++--- 1 file changed, 18 insertions(+), 3 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp index 37c3d7c..815082e 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_

[Mesa-dev] [PATCH v2 080/103] i965/vec4: fix indentation in move_push_constants_to_pull_constants()

2016-10-11 Thread Iago Toral Quiroga
--- src/mesa/drivers/dri/i965/brw_vec4.cpp | 60 +- 1 file changed, 30 insertions(+), 30 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp b/src/mesa/drivers/dri/i965/brw_vec4.cpp index 75e47f9..0788ba2 100644 --- a/src/mesa/drivers/dri/i965/brw_vec

[Mesa-dev] [PATCH v2 062/103] i965/vec4: do not emit 64-bit MAD

2016-10-11 Thread Iago Toral Quiroga
The previous patch made sure that we do not generate MAD instructions for any NIR's 64-bit ffma, but there is nothing preventing i965 from producing MAD instructions as a result of lowerings or optimization passes. This patch makes sure that any 64-bit MAD produced inside the driver after translati

[Mesa-dev] [PATCH v2 088/103] i965/vec4/tcs: fix outputs for 64-bit data

2016-10-11 Thread Iago Toral Quiroga
--- src/mesa/drivers/dri/i965/brw_vec4_tcs.cpp | 31 -- 1 file changed, 29 insertions(+), 2 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_vec4_tcs.cpp b/src/mesa/drivers/dri/i965/brw_vec4_tcs.cpp index f62dc9c..914396c 100644 --- a/src/mesa/drivers/dri/i965/

[Mesa-dev] [PATCH v2 073/103] i965/vec4: extend the DWORD multiply DepCtrl restriction to all gen8 platforms

2016-10-11 Thread Iago Toral Quiroga
--- src/mesa/drivers/dri/i965/brw_vec4.cpp | 9 ++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp b/src/mesa/drivers/dri/i965/brw_vec4.cpp index 7af65ab..7f6acc3 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp +++ b/src/mesa/drivers

[Mesa-dev] [PATCH v2 036/103] i965/vec4: add a helper function to create double immediates

2016-10-11 Thread Iago Toral Quiroga
Gen7 hardware does not support double immediates so these need to be moved in 32-bit chunks to a regular vgrf instead. Instead of doing this every time we need to create a DF immediate, create a helper function that does the right thing depending on the hardware generation. v2 (Curro): - Use swi

[Mesa-dev] [PATCH v2 040/103] i965/vec4: fix regs_read() for doubles

2016-10-11 Thread Iago Toral Quiroga
--- src/mesa/drivers/dri/i965/brw_vec4.cpp | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp b/src/mesa/drivers/dri/i965/brw_vec4.cpp index 75a8473..2bde628 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp +++ b/src/mesa/drivers/dri/

[Mesa-dev] [PATCH v2 078/103] i965/vec4: fix scratch writes for 64bit data

2016-10-11 Thread Iago Toral Quiroga
Mostly the same stuff as usual: we ned to shuffle the data before we write and we need to emit two 32-bit write messages (with appropriate 32-bit writemask channels set) for a full dvec4 scratch write. --- src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp | 64 ++ 1 file chang

[Mesa-dev] [PATCH v2 072/103] i965/vec4: don't copy propagate misaligned registers

2016-10-11 Thread Iago Toral Quiroga
From: Samuel Iglesias Gonsálvez This means we would copy propagate partial reads or writes and that can affect the result. Signed-off-by: Samuel Iglesias Gonsálvez --- src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp | 3 +++ 1 file changed, 3 insertions(+) diff --git a/src/mesa/driver

[Mesa-dev] [PATCH v2 046/103] i965/vec4: add a SIMD lowering pass

2016-10-11 Thread Iago Toral Quiroga
Generally, instructions in Align16 mode only ever write to a single register and don't need any form of SIMD splitting, that's why we have never had a SIMD splitting pass in the vec4 backend. However, double-precision instructions typically write 2 registers and in some cases they run into certain

[Mesa-dev] [PATCH v2 025/103] i965/vec4: fix indentation in get_nir_src()

2016-10-11 Thread Iago Toral Quiroga
--- src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp index 860ec51..c825aeb 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp +++ b/src/m

[Mesa-dev] [PATCH v2 057/103] i965/vec4: teach register coalescing about 64-bit

2016-10-11 Thread Iago Toral Quiroga
Specifically, at least for now, we don't want to deal with the fact that channel sizes for fp64 instructions are twice the size, so prevent coalescing from instructions with a different type size. Also, we should check that if we are coalescing a register from another MOV we should be reading the

[Mesa-dev] [PATCH v2 020/103] i965/vec4: don't copy propagate vector opcodes that operate in align1 mode

2016-10-11 Thread Iago Toral Quiroga
Basically, ALIGN1 mode will ignore swizzles on the input vectors so we don't want the copy propagation pass to mess with them. --- .../drivers/dri/i965/brw_vec4_copy_propagation.cpp | 24 ++ 1 file changed, 24 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_vec4_copy_

[Mesa-dev] [PATCH v2 050/103] i965/vec4: teach CSE about exec_size, group and doubles

2016-10-11 Thread Iago Toral Quiroga
--- src/mesa/drivers/dri/i965/brw_vec4_cse.cpp | 31 +++--- 1 file changed, 24 insertions(+), 7 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_vec4_cse.cpp b/src/mesa/drivers/dri/i965/brw_vec4_cse.cpp index bef897a..229d7b2 100644 --- a/src/mesa/drivers/dri/i965/

[Mesa-dev] [PATCH v2 015/103] i965/vec4: We only support 32-bit integer ALU operations for now

2016-10-11 Thread Iago Toral Quiroga
Add asserts so we remember to address this when we enable 64-bit integer support, as suggested by Connor and Jason. Reviewed-by: Francisco Jerez --- src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 71 ++ 1 file changed, 53 insertions(+), 18 deletions(-) diff --git a/src

[Mesa-dev] [PATCH v2 091/103] i965/vec4/tes: consider register offsets during attribute setup

2016-10-11 Thread Iago Toral Quiroga
--- src/mesa/drivers/dri/i965/brw_vec4_tes.cpp | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_vec4_tes.cpp b/src/mesa/drivers/dri/i965/brw_vec4_tes.cpp index c8fa2ca..a1aa672 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_tes.cpp +++ b/src/m

[Mesa-dev] [PATCH v2 089/103] i965/vec4/tes: fix input loading for 64bit data types

2016-10-11 Thread Iago Toral Quiroga
--- src/mesa/drivers/dri/i965/brw_vec4_tes.cpp | 72 +++--- 1 file changed, 55 insertions(+), 17 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_vec4_tes.cpp b/src/mesa/drivers/dri/i965/brw_vec4_tes.cpp index 226dcb4..f2a4507 100644 --- a/src/mesa/drivers/dri/i965

[Mesa-dev] [PATCH v2 070/103] i965/vec4: Prevent copy propagation from violating pre-gen8 restrictions

2016-10-11 Thread Iago Toral Quiroga
In gen < 8 instructions that write more than one register need to read more than one register too. Make sure we don't break that restriction by copy propagating from a uniform. --- src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp | 7 +++ 1 file changed, 7 insertions(+) diff --git a/sr

[Mesa-dev] [PATCH v2 071/103] i965/vec4: don't propagate single-precision uniforms into 4-wide instructions

2016-10-11 Thread Iago Toral Quiroga
Otherwise we end up producing code that violates the register region restriction that says that when execsize == width and hstride != 0 the vstride can't be 0. --- src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp | 11 +++ 1 file changed, 11 insertions(+) diff --git a/src/mesa/driv

[Mesa-dev] [PATCH v2 066/103] i965/vec4: Fix SSBO loads for 64-bit data

2016-10-11 Thread Iago Toral Quiroga
Same requirements as for UBO loads. --- src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 31 +- 1 file changed, 26 insertions(+), 5 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp index f234e65..001a62f 100

[Mesa-dev] [PATCH v2 060/103] i965/vec4: Skip swizzle to subnr in 3src instructions with DF operands

2016-10-11 Thread Iago Toral Quiroga
We make scalar sources in 3src instructions use subnr instead of swizzles because they don't really use swizzles. With doubles it is more complicated because we use vstride=0 in more scenarios in which they don't produce scalar regions. Also RepCtrl=1 is not allowed with 64-bit operands, so we sho

[Mesa-dev] [PATCH v2 090/103] i965/vec4/tes: fix setup_payload() for 64bit data types

2016-10-11 Thread Iago Toral Quiroga
Use a width of 2 with 64-bit attributes. Also, if we have a dvec3/4 attribute that gets split across two registers such that components XY are stored in the second half of a register and components ZW are stored in the first half of the next, we need to fix regioning for any instruction that reads

[Mesa-dev] [PATCH v2 086/103] i965/vec4/gs: fix input loading for 64bit data

2016-10-11 Thread Iago Toral Quiroga
From: Samuel Iglesias Gonsálvez v2 (Iago): - Adapt 64-bit path to component packing changes. Signed-off-by: Samuel Iglesias Gonsálvez Signed-off-by: Iago Toral Quiroga --- src/mesa/drivers/dri/i965/brw_vec4_gs_nir.cpp | 51 ++- 1 file changed, 34 insertions(+), 17

[Mesa-dev] [PATCH v2 099/103] i965/vec4: avoid spilling of registers that mix 32-bit and 64-bit access

2016-10-11 Thread Iago Toral Quiroga
When 64-bit registers are (un)spilled, we need to execute data shuffling code before writing to or after reading from memory. If we have instructions that operate on 64-bit data via 32-bit instructions, (un)spills for the register produced by 32-bit instructions will not do data shuffling at all (b

[Mesa-dev] [PATCH v2 001/103] i965/nir: double/dvec2 uniforms only need to be padded to a single vec4 slot

2016-10-11 Thread Iago Toral Quiroga
From: Samuel Iglesias Gonsálvez max_vector_size is used in the vec4 backend to pad out the uniform components to match a size that is a multiple of a vec4. Double and dvec2 uniforms only require a single vec4 slot, not two. Signed-off-by: Samuel Iglesias Gonsálvez Signed-off-by: Iago Toral

[Mesa-dev] [PATCH v2 082/103] i965/vec4: make emit_pull_constant_load support 64-bit loads

2016-10-11 Thread Iago Toral Quiroga
This way callers don't need to know about 64-bit particularities and we reuse some code. --- src/mesa/drivers/dri/i965/brw_vec4.cpp | 22 ++- src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp | 81 ++ 2 files changed, 50 insertions(+), 53 deletions(-) diff --git a

<    1   2   3   4   5   6   7   8   9   10   >