SPIR-V maps both gl_SampleMask and gl_SampleMaskIn to the same
builtin (SampleMask). The only way to tell which one we are dealing with
is to check if it is an input or an output.
Fixes:
dEQP-VK.pipeline.multisample_shader_builtin.sample_mask.write.*
---
I am still waiting on Jenkins to report res
This avoids tons of crashes in
dEQP-VK.pipeline.multisample_shader_builtin.sample_id.*
dEQP-VK.pipeline.multisample_shader_builtin.sample_mask.*
dEQP-VK.pipeline.multisample_shader_builtin.sample_position.*
---
With this patch some of the tests pass, some still fail and some
(from the sample_mask
The same we do in the OpenGL driver (comment copied from there).
This is required to ensure that we execute the fragment shader stage when
side-effects (such as image or ssbo stores) are present but there are no
color writes.
I found this while writing a test to check rendering to a framebuffer
w
---
src/compiler/spirv/spirv_to_nir.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/src/compiler/spirv/spirv_to_nir.c
b/src/compiler/spirv/spirv_to_nir.c
index befee1a..2205072 100644
--- a/src/compiler/spirv/spirv_to_nir.c
+++ b/src/compiler/spirv/spirv_to_nir.c
@@ -559,7 +
Geometry and Tessellation stages do handle this as a system value instead.
Fixes:
dEQP-VK.geometry.basic.primitive_id
---
src/compiler/spirv/vtn_variables.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/src/compiler/spirv/vtn_variables.c
b/src/compiler/spirv/vtn_variables
This fixes layered rendering Vulkan CTS tests with cube (arrays). We
also do this in the GL driver, see this code from gen8_depth_state.c
for example:
case GL_TEXTURE_CUBE_MAP_ARRAY:
case GL_TEXTURE_CUBE_MAP:
/* The PRM claims that we should use BRW_SURFACE_CUBE for this
* situation, but ex
This is the same we do in the GL driver: the hardware provides gl_Layer
in the VUE header, so when the fragment shader reads it we can't skip it.
---
With this patch we now successfully read gl_Layer in fragment shaders. Layered
rendering still does not work though, probably because we still need
---
docs/features.txt | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/docs/features.txt b/docs/features.txt
index 63b45af..f4a67df 100644
--- a/docs/features.txt
+++ b/docs/features.txt
@@ -107,7 +107,7 @@ GL 3.3, GLSL 3.30 --- all DONE: i965, nv50, nvc0, r600,
radeonsi, l
---
docs/relnotes/13.1.0.html | 2 ++
1 file changed, 2 insertions(+)
diff --git a/docs/relnotes/13.1.0.html b/docs/relnotes/13.1.0.html
index 4dce843..124da93 100644
--- a/docs/relnotes/13.1.0.html
+++ b/docs/relnotes/13.1.0.html
@@ -47,6 +47,8 @@ Note: some of the new features are only availabl
We can use this to track various features that may or may not be supported
by the hw / kernel. Currently, we usually do this by checking the generation
and supported command parser versions in various places thoughtout the driver
code. With this patch, we centralize all these checks in just once pl
---
src/mesa/drivers/dri/i965/intel_extensions.c | 2 ++
src/mesa/drivers/dri/i965/intel_screen.c | 3 ++-
2 files changed, 4 insertions(+), 1 deletion(-)
diff --git a/src/mesa/drivers/dri/i965/intel_extensions.c
b/src/mesa/drivers/dri/i965/intel_extensions.c
index 8e67b57..aa89380 100644
--
From: Chris Wilson
Moving the test to the screen places it alongside the other global HW
feature tests that want to be shared between contexts.
Also, we need to know if we support pipelined register writes at
screen creation time so that we can tell if we can expose OpenGL 4.0
in gen7.
Signed-o
Instead, check the screen field directly.
---
src/mesa/drivers/dri/i965/brw_context.c | 2 --
src/mesa/drivers/dri/i965/brw_context.h | 5 -
src/mesa/drivers/dri/i965/gen7_l3_state.c| 5 +++--
src/mesa/drivers/dri/i965/intel_extensions.c | 2 +-
src/mesa/drivers/dri/i965/intel_sc
ow we current were doing this and we save
one flag, let me know if you prefer them to be separate flags.
Chris Wilson (1):
i965: Move the pipelined test for SO register access to the screen
Iago Toral Quiroga (3):
i965: get rid of brw->can_do_pipelined_register_writes
i965/gen7: Enable OpenG
---
src/mesa/drivers/dri/i965/intel_batchbuffer.c | 6 +++---
src/mesa/drivers/dri/i965/intel_batchbuffer.h | 18 +-
2 files changed, 12 insertions(+), 12 deletions(-)
diff --git a/src/mesa/drivers/dri/i965/intel_batchbuffer.c
b/src/mesa/drivers/dri/i965/intel_batchbuffer.c
inde
---
src/mesa/drivers/dri/i965/brw_context.c | 2 +-
src/mesa/drivers/dri/i965/intel_batchbuffer.c | 8
src/mesa/drivers/dri/i965/intel_batchbuffer.h | 2 +-
3 files changed, 6 insertions(+), 6 deletions(-)
diff --git a/src/mesa/drivers/dri/i965/brw_context.c
b/src/mesa/drivers/dri
We need this feature to enable OpenGL 4.0 on gen7 hardware. Some hardware and
kernel combinations support this while other don't, so we check for it by
writing to a register and verifying the result.
Unfortunately, we need to know this at screen creation time when we don't have
a brw_context avail
rg/archives/mesa-dev/2016-October/133502.html
Iago Toral Quiroga (6):
i965: Make intel_bachbuffer_reloc() take a batchbuffer argument
i965: make intel_batchbuffer_emit_dword() take a batchbuffer as
argument
i965: make intel_batchbuffer_free() take a batchbuffer as argument
i965: remove br
---
src/mesa/drivers/dri/i965/brw_context.c | 2 +-
src/mesa/drivers/dri/i965/intel_batchbuffer.c | 59 +++
src/mesa/drivers/dri/i965/intel_batchbuffer.h | 3 +-
3 files changed, 36 insertions(+), 28 deletions(-)
diff --git a/src/mesa/drivers/dri/i965/brw_context.c
---
src/mesa/drivers/dri/i965/intel_extensions.c | 2 ++
src/mesa/drivers/dri/i965/intel_screen.c | 11 ---
2 files changed, 10 insertions(+), 3 deletions(-)
diff --git a/src/mesa/drivers/dri/i965/intel_extensions.c
b/src/mesa/drivers/dri/i965/intel_extensions.c
index 5be8f3a..6d9c3
---
src/mesa/drivers/dri/i965/genX_blorp_exec.c | 4 ++--
src/mesa/drivers/dri/i965/intel_batchbuffer.c | 8
src/mesa/drivers/dri/i965/intel_batchbuffer.h | 18 +-
3 files changed, 15 insertions(+), 15 deletions(-)
diff --git a/src/mesa/drivers/dri/i965/genX_blorp_exe
We should make the dest in the textureLod() operation have the same number
of components as the destination in the original textureGrad()
Fixes regression in ES3-CTS.gtf.GL3Tests.shadow
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99072
---
src/compiler/nir/nir_lower_tex.c | 3 ++-
1 f
This gets the lowering on the Vulkan driver too, which is required for
hardware that does not have the sample_l_d message (up to IvyBridge).
---
src/mesa/drivers/dri/i965/brw_nir.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/src/mesa/drivers/dri/i965/brw_nir.c
b/src/mesa/drivers/dri/i9
This was written specifically for RECT samplers. Make it more generic so
we can call this from the gradient lowerings too.
---
src/compiler/nir/nir_lower_tex.c | 15 ++-
1 file changed, 10 insertions(+), 5 deletions(-)
diff --git a/src/compiler/nir/nir_lower_tex.c b/src/compiler/nir/n
Even if lower_txd_cube_map isn't. Suggested by Ken to make the flag more
consistent with its name.
---
src/compiler/nir/nir_lower_tex.c | 6 --
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/src/compiler/nir/nir_lower_tex.c b/src/compiler/nir/nir_lower_tex.c
index a9370c3..95c9f
This has been ported to NIR now so we don'tneed to keep the GLSL IR
lowering any more.
---
src/mesa/drivers/dri/i965/Makefile.sources | 1 -
src/mesa/drivers/dri/i965/brw_context.h| 2 -
src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 2 +-
src/mesa/drivers/dri/i965/
This is ported from the Intel lowering pass that we use with GLSL IR.
This takes care of lowering texture gradients on shadow samplers other
than cube maps. Intel hardware requires this for gen < 8.
v2 (Ken):
- Use the helper function to retrieve ddx/ddy
- Swizzle away size components we are not
This gets the lowering on the Vulkan driver too.
Fixes Vulkan CTS cube map texture gradient tests in:
dEQP-VK.glsl.texture_functions.texturegrad.*
---
src/mesa/drivers/dri/i965/brw_nir.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/src/mesa/drivers/dri/i965/brw_nir.c
b/src/mesa/drivers/dr
This is ported from the Intel lowering pass that we use with GLSL IR.
The NIR pass only handles cube maps, not shadow samplers, which are
also lowered for gen < 8 on Intel hardware. We will add support for
that in a later patch, at which point we should be able to remove
the GLSL IR lowering pass.
, where we need to pass 2
components instead of one for example.
Since we are swizzling away undesired components in the current GLSL IR lowering
and you gave your Rb without asking for that change I did not look too much
into it, but I can spend some more time into it if you think it is worth
This change makes it so we emit a load_input intrinsic when Layer
is read in a fragment shader.
---
Even with this, layered rendering does not seem to work in the Vulkan
driver, so there is something else that is broken. We are probably
not mapping the Layer input correctly somewhere.
src/compil
This has been ported to NIR now so we don'tneed to keep the GLSL IR
lowering any more.
---
src/mesa/drivers/dri/i965/Makefile.sources | 1 -
src/mesa/drivers/dri/i965/brw_context.h| 2 -
src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 2 +-
src/mesa/drivers/dri/i965/
This gets the lowering on the Vulkan driver too, which is required for
hardware that does not have the sample_l_d message (up to IvyBridge).
---
src/mesa/drivers/dri/i965/brw_nir.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/src/mesa/drivers/dri/i965/brw_nir.c
b/src/mesa/drivers/dri/i9
This is ported from the Intel lowering pass that we use with GLSL IR.
The NIR pass only handles cube maps, not shadow samplers, which are
also lowered for gen < 8 on Intel hardware. We will add support for
that in a later patch, at which point we should be able to remove
the GLSL IR lowering pass.
This is ported from the Intel lowering pass that we use with GLSL IR.
This takes care of lowering texture gradients on shadow samplers other
than cube maps. Intel hardware requires this for gen < 8.
---
src/compiler/nir/nir.h | 7 +++
src/compiler/nir/nir_lower_tex.c | 40 ++
GLSL IR lowering pass.
Iago Toral Quiroga (6):
nir/lower_tex: generalize get_texture_size()
nir/lower_tex: add lowering for texture gradient on cube maps
i965/nir: enable lowering of texture gradient for cube maps
nir/lower_tex: add lowering for texture gradient on shadow samplers
i965/nir
This gets the lowering on the Vulkan driver too.
Fixes Vulkan CTS cube map texture gradient tests in:
dEQP-VK.glsl.texture_functions.texturegrad.*
---
src/mesa/drivers/dri/i965/brw_nir.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/src/mesa/drivers/dri/i965/brw_nir.c
b/src/mesa/drivers/dr
This was written specifically for RECT samplers. Make it more generic so
we can call this from the gradient lowerings too.
---
src/compiler/nir/nir_lower_tex.c | 15 ++-
1 file changed, 10 insertions(+), 5 deletions(-)
diff --git a/src/compiler/nir/nir_lower_tex.c b/src/compiler/nir/n
Fixes multiple Vulkan CTS tests that combine anisotropy and VK_FILTER_NEAREST
in dEQP-VK.texture.filtering_anisotropy.*
---
src/intel/vulkan/genX_state.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/src/intel/vulkan/genX_state.c b/src/intel/vulkan/genX_state.c
index 4122395.
This patch improves pass rate of dEQP-VK.texture.explicit_lod.2d.sizes.*
from 68.0% (98/144) to 83.3% (120/144) by enabling sampler address
rounding mode when the selected filter is not nearest, which is the same
thing we do for OpenGL.
These tests check texture filtering for various texture sizes
Besides the logical operation involved, these also require that we test if the
operands are ordered / unordered.
For ordered operations, both operands must be ordered (and they must pass the
conditional test) while for unordered operations it is sufficient if only one
of the operands is unordered
Initial Vulkan documentation stated that it was invalid to free VK_NULL_HANDLE
objects, however this has been changed. From the Vulkan 1.0.33 specification.
2.6.2. Implicit Valid Usage. Valid Usage for Object Handles:
"It is valid to pass VK_NULL_HANDLE to any vkDestroy* or vkFree* command,
which
Besides the logical operation involved, these also require that we test if the
operands are ordered / unordered.
For ordered operations, both operands must be ordered (and they must pass the
conditional test) while for unordered operations it is sufficient if only one
of the operands is unordered
According to the spec for vkGetPhysicalDeviceImageFormatProperties:
"If format is not a supported image format, or if the combination of format,
type, tiling, usage, and flags is not supported for images, then
vkGetPhysicalDeviceImageFormatProperties returns
VK_ERROR_FORMAT_NOT_SUPPORTED."
Mak
Fixes dEQP-VK.api.image_clearing.clear_color_image.1d_r8g8b8_srgb
---
src/intel/vulkan/anv_formats.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/src/intel/vulkan/anv_formats.c b/src/intel/vulkan/anv_formats.c
index bca9aeb..2adde8b 100644
--- a/src/intel/vulkan/anv_formats.
Until now were validating in/out blocks by listing the inputs in the
consumer stage and then, for each output of the producer, we checked that
it was a match if it was consumed. This method does not catch the case
where the consumer has an input that is not present as an output in the
producer stag
This was being initialized just once to 0, but we want to reset it
every time we call run() on the visitor, which can happen more than
once in scenarios where we failed the compilation using a method and
we are falling back to a different configuration. This is possible,
for example, with geometry
---
src/mesa/drivers/dri/i965/brw_vec4.cpp | 10 +-
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp
b/src/mesa/drivers/dri/i965/brw_vec4.cpp
index 1fb65f3..03c83af 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
+++ b/src/mesa/drive
Right now, if this happens we trigger an assertion when we try to create
a GRF register > 128. Let the backend compiler fail the compilation in this
case. For things like geometry shaders, this can trigger a compilation
using interleaved attribute setup which might even help get the shader to
compi
So far we have been checking that interface block definitions had matching
matrix layouts by comparing the definitions of their fields, however, this
does not cover the case where the interface blocks are defined with
mismatching matrix layouts but don't define any field with a matrix type.
In this
ARB_gpu_shader_fp64 was the last piece missing. Notice that some
hardware and kernel combinations do not support pipelined register
writes, which are required for some OpenGL 4.0 features, in which
case the driver won't expose 4.0.
v2 (Ian, Ken):
- We should not set max_gl_core_version to 40 if
After the changes in comit 5b2675093e863a52, we moved this check to the
linker, but the spec expects this to be checked at compile-time. There are
dEQP tests that expect an error at compile time and the spec seems to confirm
that expectation:
"Except for the last declared member of a shader storag
It seems I initially wrote this as:
if (cond_for_opt) {
}
and then I modified the style at some point to be like:
if (!cond_for_opt)
continue;
But I did not re-write all the conditions accordingly.
---
I tested this quickly on a haswell with our fp64 branch and it did not
show any regre
From the ARB_shader_storage_buffer_object spec:
"In a shader storage block, the last member may be declared without an explicit
size. In this case, the effective array size is inferred at run-time from
the size of the data store backing the interface block. Such unsized
arrays may be indexed
From the ARB_shader_storage_buffer_object spec:
"In a shader storage block, the last member may be declared without an explicit
size. In this case, the effective array size is inferred at run-time from
the size of the data store backing the interface block. Such unsized
arrays may be indexed
Generally, we only check for the presence of compute shaders during
parsing when we find any language (like layout qualifiers) that are
specific to compute shaders, however, it is possible to define an
empty compute shader does not use any language specific to compute
shaders at all and we should f
Specifically, at least for now, we don't want to deal with the fact that
channel sizes for fp64 instructions are twice the size, so prevent
coalescing from instructions with a different type size.
Also, we should check that if we are coalescing a register from another
MOV we should be writing the
---
src/mesa/drivers/dri/i965/intel_extensions.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/src/mesa/drivers/dri/i965/intel_extensions.c
b/src/mesa/drivers/dri/i965/intel_extensions.c
index eef81e2..0491145 100644
--- a/src/mesa/drivers/dri/i965/intel_extensions.c
+++ b/src/mesa/drive
ARB_gpu_shader_fp64 was the last piece missing. Notice that some
hardware and kernel combinations do not support pipelined register
writes, which are required for some OpenGL 4.0 features, in which
case the driver won't expose 4.0.
---
src/mesa/drivers/dri/i965/intel_extensions.c | 2 ++
src/mesa/
DOUBLE_TO_SINGLE opcodes are setup so that they use a dst register
with a size of 2 even if they only produce a single-precison
result (this is so that the opcode can use the larger register to
produce a 64-bit aligned intermediary result as required by the
hardware during the conversion process).
---
src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp | 15 +--
1 file changed, 13 insertions(+), 2 deletions(-)
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp
b/src/mesa/drivers/dri/i965/brw_vec4_reg_allocate.cpp
index 0bf9f05..a551267 100644
--- a/src/mesa/drive
The current spilling code can't spill vgrf allocations larger than 1
but SIMD4x2 doubles require 2 vgrfs, so we need to permit this case (which
is handled properly for DF data types by emitting 2 scratch messages and
doing data shuffling). We accomplish this by not auto-disabling spilling
for vgrf
We are going to need this in the vec4 backend too.
---
src/mesa/drivers/dri/i965/brw_ir_fs.h | 7 ---
src/mesa/drivers/dri/i965/brw_shader.h | 7 +++
src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp | 1 +
3 files changed, 8 insertions(+), 7 deletions(-)
diff --git a/src/me
RepCtrl=1 does not work with 64-bit operands so we need to use RepCtrl=0.
In that situation, the regioning generated for the sources seems to be
equivalent to <4,4,1>:DF, so it will only work for components XY, which
means that we have to move any other swizzle to a temporary so that we can
source
8-wide compressed DF operations are executed as two separate 4-wide
DF operations. In that scenario, we have to be careful when we allocate
register space for their operands to prevent the case where the first
half of the instruction overwrites the source of the second half.
To do this we mark com
The hardware can only operate with 32-bit swizzles, which is a rather
limiting restriction. However, the idea is not to expose this to the
optimization passes, which would be a mess to deal with. Instead, we let
the bulk of the vec4 backend ignore this fact and we fix the swizzles right
at codegen
A vec4 is 16 bytes and a dvec4 is 32 bytes so for doubles we have
to multiply the reladdr by 2. The reg_offset part is in units of 16
bytes and is used to select the low/high 16-byte chunk of a full
dvec4, so we don't want to multiply that part of the address.
---
src/mesa/drivers/dri/i965/brw_vec
---
src/mesa/drivers/dri/i965/brw_vec4_tcs.cpp | 34 ++
1 file changed, 30 insertions(+), 4 deletions(-)
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_tcs.cpp
b/src/mesa/drivers/dri/i965/brw_vec4_tcs.cpp
index 498fb7c..f62dc9c 100644
--- a/src/mesa/drivers/dri/i965/
Just like the exec_size, we are going to need this in the vec4 backend
when we implement a simd splitting pass.
---
src/mesa/drivers/dri/i965/brw_ir_fs.h | 9 -
src/mesa/drivers/dri/i965/brw_shader.h | 9 +
src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp | 1 +
3 fi
From the BDW PRM, Workarounds chapter:
"DF->f format conversion for Align16 has wrong emask calculation when
source is immediate."
So detect the case and move the immediate source to a VGRF before we attempt
the conversion.
Notice that Broadwell and later are strictly scalar at the moment
---
src/mesa/drivers/dri/i965/brw_vec4.cpp | 20 +---
1 file changed, 17 insertions(+), 3 deletions(-)
diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp
b/src/mesa/drivers/dri/i965/brw_vec4.cpp
index 0788ba2..b0bc2d5 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
+++ b/src
We need to consider the fact that dvec3/4 require two vec4 slots.
---
src/mesa/drivers/dri/i965/brw_vec4.cpp | 11 +--
1 file changed, 9 insertions(+), 2 deletions(-)
diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp
b/src/mesa/drivers/dri/i965/brw_vec4.cpp
index e5391b9..b79fd5e 10064
There is a single bit for this, so it is a binary 0 or 1 meaning
offset 0B or 16B respectively.
v2:
- Since brw_inst_dst_da16_subreg_nr() is known to be 1, remove it
from the expression (Curro)
Reviewed-by: Francisco Jerez
---
src/mesa/drivers/dri/i965/brw_disasm.c | 2 +-
1 file changed,
v2 (Curro):
- Print it also for execsize < 4.
- QtrCtrl is still in effect, so print 2 * qtr_ctl + nib_ctl + 1
- Do not read the nib ctl from the instruction in gen < 7,
the field only exists in gen7+.
---
src/mesa/drivers/dri/i965/brw_disasm.c | 6 +-
1 file changed, 5 insertions(+)
---
src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 21 ++---
1 file changed, 18 insertions(+), 3 deletions(-)
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
index 37c3d7c..815082e 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_
---
src/mesa/drivers/dri/i965/brw_vec4.cpp | 60 +-
1 file changed, 30 insertions(+), 30 deletions(-)
diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp
b/src/mesa/drivers/dri/i965/brw_vec4.cpp
index 75e47f9..0788ba2 100644
--- a/src/mesa/drivers/dri/i965/brw_vec
The previous patch made sure that we do not generate MAD instructions
for any NIR's 64-bit ffma, but there is nothing preventing i965 from
producing MAD instructions as a result of lowerings or optimization
passes. This patch makes sure that any 64-bit MAD produced inside the
driver after translati
---
src/mesa/drivers/dri/i965/brw_vec4_tcs.cpp | 31 --
1 file changed, 29 insertions(+), 2 deletions(-)
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_tcs.cpp
b/src/mesa/drivers/dri/i965/brw_vec4_tcs.cpp
index f62dc9c..914396c 100644
--- a/src/mesa/drivers/dri/i965/
---
src/mesa/drivers/dri/i965/brw_vec4.cpp | 9 ++---
1 file changed, 6 insertions(+), 3 deletions(-)
diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp
b/src/mesa/drivers/dri/i965/brw_vec4.cpp
index 7af65ab..7f6acc3 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
+++ b/src/mesa/drivers
Gen7 hardware does not support double immediates so these need
to be moved in 32-bit chunks to a regular vgrf instead. Instead
of doing this every time we need to create a DF immediate,
create a helper function that does the right thing depending
on the hardware generation.
v2 (Curro):
- Use swi
---
src/mesa/drivers/dri/i965/brw_vec4.cpp | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp
b/src/mesa/drivers/dri/i965/brw_vec4.cpp
index 75a8473..2bde628 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
+++ b/src/mesa/drivers/dri/
Mostly the same stuff as usual: we ned to shuffle the data before we
write and we need to emit two 32-bit write messages (with appropriate
32-bit writemask channels set) for a full dvec4 scratch write.
---
src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp | 64 ++
1 file chang
From: Samuel Iglesias Gonsálvez
This means we would copy propagate partial reads or writes and that can affect
the result.
Signed-off-by: Samuel Iglesias Gonsálvez
---
src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp | 3 +++
1 file changed, 3 insertions(+)
diff --git a/src/mesa/driver
Generally, instructions in Align16 mode only ever write to a single
register and don't need any form of SIMD splitting, that's why we
have never had a SIMD splitting pass in the vec4 backend. However,
double-precision instructions typically write 2 registers and in
some cases they run into certain
---
src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
index 860ec51..c825aeb 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
+++ b/src/m
Specifically, at least for now, we don't want to deal with the fact that
channel sizes for fp64 instructions are twice the size, so prevent
coalescing from instructions with a different type size.
Also, we should check that if we are coalescing a register from another
MOV we should be reading the
Basically, ALIGN1 mode will ignore swizzles on the input vectors so we don't
want the copy propagation pass to mess with them.
---
.../drivers/dri/i965/brw_vec4_copy_propagation.cpp | 24 ++
1 file changed, 24 insertions(+)
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_copy_
---
src/mesa/drivers/dri/i965/brw_vec4_cse.cpp | 31 +++---
1 file changed, 24 insertions(+), 7 deletions(-)
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_cse.cpp
b/src/mesa/drivers/dri/i965/brw_vec4_cse.cpp
index bef897a..229d7b2 100644
--- a/src/mesa/drivers/dri/i965/
Add asserts so we remember to address this when we enable 64-bit
integer support, as suggested by Connor and Jason.
Reviewed-by: Francisco Jerez
---
src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 71 ++
1 file changed, 53 insertions(+), 18 deletions(-)
diff --git a/src
---
src/mesa/drivers/dri/i965/brw_vec4_tes.cpp | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_tes.cpp
b/src/mesa/drivers/dri/i965/brw_vec4_tes.cpp
index c8fa2ca..a1aa672 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_tes.cpp
+++ b/src/m
---
src/mesa/drivers/dri/i965/brw_vec4_tes.cpp | 72 +++---
1 file changed, 55 insertions(+), 17 deletions(-)
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_tes.cpp
b/src/mesa/drivers/dri/i965/brw_vec4_tes.cpp
index 226dcb4..f2a4507 100644
--- a/src/mesa/drivers/dri/i965
In gen < 8 instructions that write more than one register need to read
more than one register too. Make sure we don't break that restriction
by copy propagating from a uniform.
---
src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp | 7 +++
1 file changed, 7 insertions(+)
diff --git a/sr
Otherwise we end up producing code that violates the register region
restriction that says that when execsize == width and hstride != 0
the vstride can't be 0.
---
src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp | 11 +++
1 file changed, 11 insertions(+)
diff --git a/src/mesa/driv
Same requirements as for UBO loads.
---
src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 31 +-
1 file changed, 26 insertions(+), 5 deletions(-)
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
index f234e65..001a62f 100
We make scalar sources in 3src instructions use subnr instead of
swizzles because they don't really use swizzles.
With doubles it is more complicated because we use vstride=0 in
more scenarios in which they don't produce scalar regions. Also
RepCtrl=1 is not allowed with 64-bit operands, so we sho
Use a width of 2 with 64-bit attributes.
Also, if we have a dvec3/4 attribute that gets split across two registers
such that components XY are stored in the second half of a register and
components ZW are stored in the first half of the next, we need to fix
regioning for any instruction that reads
From: Samuel Iglesias Gonsálvez
v2 (Iago):
- Adapt 64-bit path to component packing changes.
Signed-off-by: Samuel Iglesias Gonsálvez
Signed-off-by: Iago Toral Quiroga
---
src/mesa/drivers/dri/i965/brw_vec4_gs_nir.cpp | 51 ++-
1 file changed, 34 insertions(+), 17
When 64-bit registers are (un)spilled, we need to execute data shuffling
code before writing to or after reading from memory. If we have instructions
that operate on 64-bit data via 32-bit instructions, (un)spills for the
register produced by 32-bit instructions will not do data shuffling at all
(b
From: Samuel Iglesias Gonsálvez
max_vector_size is used in the vec4 backend to pad out the uniform
components to match a size that is a multiple of a vec4. Double and dvec2
uniforms only require a single vec4 slot, not two.
Signed-off-by: Samuel Iglesias Gonsálvez
Signed-off-by: Iago Toral
This way callers don't need to know about 64-bit particularities and
we reuse some code.
---
src/mesa/drivers/dri/i965/brw_vec4.cpp | 22 ++-
src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp | 81 ++
2 files changed, 50 insertions(+), 53 deletions(-)
diff --git a
501 - 600 of 1739 matches
Mail list logo