From: Roland Scheidegger
Braces mismatch (flagged by CI, untested).
Fixes: 385d13f26d2 "util/atomic: Add a _return variant of p_atomic_add"
---
src/util/u_atomic.h | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/src/util/u_atomic.h b/src/util/u_atomic.h
index 9cbc6dd1eaa
From: Roland Scheidegger
LLVM 8 did remove both the signed and unsigned sse2/avx intrinsics in
the end, and provide arch-independent llvm intrinsics instead.
Fixes a crash when using snorm framebuffers (tested with piglit
arb_color_buffer_float-render GL_RGBA8_SNORM -auto).
CC:
---
src/gallium
From: Roland Scheidegger
The 1GB limit was arbitrary, increase this to 2GB (which is the max
possible without code changes).
---
src/gallium/drivers/llvmpipe/lp_limits.h | 6 +-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/src/gallium/drivers/llvmpipe/lp_limits.h
b/src/galli
From: Roland Scheidegger
Should fix some issues we're seeing. And use REALLOC instead of realloc.
---
src/gallium/drivers/llvmpipe/lp_cs_tpool.c | 6 +++---
src/gallium/drivers/llvmpipe/lp_state_cs.c | 3 ++-
2 files changed, 5 insertions(+), 4 deletions(-)
diff --git a/src/gallium/drivers/llvm
From: Roland Scheidegger
LLVM 7.0 ditched the pmulu intrinsics.
This is only a trivial patch to use the fallback code instead.
It'll likely produce atrocious code since the pattern doesn't match what
llvm itself uses in its autoupgrade paths, hence the pattern won't be
recognized.
Should fix htt
From: Roland Scheidegger
These versions still need wrapper but already have both success and
failure ordering.
(Compile tested on llvm 3.7, llvm 3.8.)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=02
---
src/gallium/auxiliary/gallivm/lp_bld_misc.cpp | 16 +++-
1 file ch
From: Roland Scheidegger
The x86asmprinter component is gone, and things seem to work by just
removing it.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110707
---
scons/llvm.py | 5 -
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/scons/llvm.py b/scons/llvm.py
index a
From: Roland Scheidegger
The default null_output really needs to be static, otherwise the values
we'll eventually get later are doubly random (they are not initialized,
and even if they were it's a pointer to a local stack variable).
VMware bug 2349556.
---
src/gallium/auxiliary/gallivm/lp_bld_t
From: Roland Scheidegger
transform feedback draws get the number of vertices from the transform
feedback object. In draw, we'll figure this out with the number of bytes
written divided by the stride. However, it is apparently possible we end
up with a stride of 0 there (not entirely sure it could
From: Roland Scheidegger
Brian noticed there was an uninitialized var for the 8-wide case and 128
bit blocks, which made it always crash. Likewise, the 64bit block case
had another crash bug due to type mismatch.
Color decode (used for all s3tc formats) also had a bogus shuffle for
this case, lea
From: Roland Scheidegger
llvm 8 removed saturated unsigned add / sub x86 sse2 intrinsics, and
now llvm 9 removed the signed versions as well - they were proposed for
removal earlier, but the pattern to recognize those was very complex,
so it wasn't done then. However, instead of these arch-specif
From: Roland Scheidegger
0 is a valid value as max index, and the code handles it fine. This isn't
commonly seen, as it will only happen with array declarations of size 1.
The assert was introduced with a3c898dc97ec5f0e0b93b2ee180bdf8ca3bab14c.
Fixes piglit tests/shaders/complex-loop-analysis-bu
From: Roland Scheidegger
Whenever llvm removes an intrinsic (we're using), we're hitting segfaults
due to llvm doing calls to address 0 in the jitted code instead.
However, Jose figured out we can actually detect this with
LLVMGetIntrinsicID(), so use this to abort, so we don't have to wonder
wha
From: Roland Scheidegger
This intrinsic disppeared with llvm 6.0, using it ends up in segfaults
(due to llvm issuing call to NULL address in the jited shaders).
Add code doing the same thing as the autoupgrade code in llvm so it
can be matched and replaced back with a pavgb.
While here, also imp
From: Roland Scheidegger
AoS sampling tries to use integers for coord wrapping when possible,
as it should be faster. However, for AVX, this was suboptimal, because
only floats can use 8x32bit vectors, whereas integers have to be split
into 4x32bit vectors. (I believe part of why it was slower wa
From: Roland Scheidegger
The calculated length of a line may be infinite, if the coords we
get are bogus. This leads to an infinite loop in line stippling.
To prevent this test for this explicitly (although technically
on at least x86 sse it would actually work without the explicit
test, as long
From: Roland Scheidegger
Because we only have one file_max for the (2d) gs input file, the value
actually represents the max of attrib and vertex index (although I'm
not entirely sure if we really want the max, since the max valid value
of the vertex dimension can be easily deduced from the input
From: Roland Scheidegger
These have been removed. Unfortunately auto-upgrade doesn't work for
jit. (Worse, it seems we don't get a compilation error anymore when
compiling the shader, rather llvm will just do a call to a null
function in the jitted shaders making it difficult to detect when
intri
From: Roland Scheidegger
d3d10 requires NaNs to get converted to 0 for float->unorm conversions
(and float->int etc.). GL spec probably doesn't care in general, but it
would make sense to have reasonable behavior in any case imho - the
old code was converting negative NaNs to 0, and positive NaNs
From: Roland Scheidegger
The pt emit path can only handle 65535 - the number of vertices is
truncated to a ushort, resulting in a too small buffer allocation, which
will crash.
Forcing the pipeline path looks suboptimal, then again this bug is
probably there ever since GS is supported, so it see
From: Roland Scheidegger
Empty initializer braces aren't valid c (it's a gnu extension, and
it's valid in c++).
Hopefully fixes appveyor / msvc build...
Fixes a3150c1d06ae7766c3d3fe3b33432e55c3c7527e
---
src/compiler/nir/nir_format_convert.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
From: Roland Scheidegger
fold_assoc() called from fold_alu_op3() can lower the number of src to 2,
which then leads to an invalid access to n.src[2]->gvalue().
This didn't seem to have caused much harm in the past, but on Fedora 28
it will crash (presumably because -D_GLIBCXX_ASSERTIONS is used,
From: Roland Scheidegger
Empty initializer braces aren't valid c (it's a gnu extension, and
it's valid in c++).
Hopefully fixes appveyor / msvc build...
Fixes 6677e131b806b10754adcb7cf3f427a7fcc2aa09
---
src/compiler/glsl/gl_nir_link_atomics.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(
From: Roland Scheidegger
The sampleMaskIn workaround (b936f4d1ca0d2ab1e828ff6a6e617f12469687fa)
tries to figure out if the shader is running at per-sample frequency, but
there's a typo bug so it will only recognize per-sample linar inputs,
not per-sample perspective ones.
Spotted by Eric Engestr
From: Roland Scheidegger
This unifies the explicit rasterization dicard as well as the implicit
rasterization disabled logic (which we need for another state tracker),
which really should do the exact same thing.
We'll now toss out the prims early on in setup with (implicit or
explicit) discard,
From: Roland Scheidegger
I've confirmed after 77554d220d6d74b4d913dc37ea3a874e9dc550e4 we no
longer need this to pass some tests from another api (as we no longer
generate the bogus extra null tris in the first place).
---
src/gallium/auxiliary/draw/draw_pipe_clip.c | 38
From: Roland Scheidegger
Use a single allocation of array type instead of the old-style array
allocation for the temp and immediate arrays.
Probably only makes a difference if they aren't used indirectly (so,
if we used them solely because there's too many temps or immediates).
In this case the s
From: Roland Scheidegger
We were never producing negative numbers for signed types.
Also fix only producing half the valid range for uint32, and
properly clamp signed values.
Because this now also properly tests snorm with actually negative
values, need to increase eps for such conversions. I be
From: Roland Scheidegger
The logic was flawed, since mul(x,y) will be <= 0 (exactly 0) when
the sign is the same but both numbers are sufficiently small
(if the product is smaller than 2^-128).
This could apparently lead to emitting a sufficient amount of
additional bogus vertices to overflow the
From: Roland Scheidegger
Simplifies the logic when to emit null tris (albeit the reasons why we
have to do this remain unclear).
This is strictly just logic simplification, the behavior doesn't change
at all.
---
src/gallium/auxiliary/draw/draw_pipe_clip.c | 19 +--
1 file change
From: Roland Scheidegger
If we dump the bitcode for off-line debug purposes, we really want the
pre-optimized bitcode, otherwise it's useless in identifying problems
with IR optimization (if you have a shader which takes an hour to do
IR optimization, it's also nice you don't have to wait that ho
From: Roland Scheidegger
Conversion to int can otherwise overflow if compile times are over
~71min. (Yes this can happen...)
---
src/gallium/auxiliary/gallivm/lp_bld_init.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/src/gallium/auxiliary/gallivm/lp_bld_init.c
b/src/gall
From: Roland Scheidegger
LICM is simply too expensive, even though it presumably can help quite
a bit in some cases.
It was definitely cheaper in llvm 3.3, though as far as I can tell with
llvm 3.3 it failed to do anything in most cases. early-cse also actually
seems to cause licm to be able to m
From: Roland Scheidegger
This pass is quite cheap, and can simplify the IR quite a bit for our
generated IR.
In particular on a variety of shaders I've found the time saved by
other passes due to the simplified IR more than makes up for the cost
of this pass, and on top of that the end result is
From: Roland Scheidegger
If a src was referencing the same temp as the dst, the per-component
copy code didn't work.
e.g.
cndge r0.xy, r0.xx, |r2|, r3
got expanded into
mov r12.x, |r2|
cndge r0.x, r0.x, r12, r3
mov r12.y, |r2|
cndge r0.y, r0.x, r12, r3
hence for the second cndge r0.x
From: Roland Scheidegger
(For the pipe_tex_filter enum)
---
src/gallium/auxiliary/util/u_blit.h | 1 +
1 file changed, 1 insertion(+)
diff --git a/src/gallium/auxiliary/util/u_blit.h
b/src/gallium/auxiliary/util/u_blit.h
index 085ea63..004ceae 100644
--- a/src/gallium/auxiliary/util/u_blit.h
+
From: Roland Scheidegger
The logic would not work correctly for line lengths smaller than 1.0,
even a degenerated line with length 0 would still produce a fragment
with anyhwere between alpha 0.0 and 0.5.
---
src/gallium/auxiliary/draw/draw_pipe_aaline.c | 25 -
src/gall
From: Roland Scheidegger
In contrast to non-aa, where stippling is based on either dx or dy
(depending on if it's a x or y major line), stippling is based on
actual distance with smooth lines, so adjust for this.
(It looks like there's some minor artifacts with mesa demos
line-sample with wide l
From: Roland Scheidegger
The motivation actually was to get rid of the additional tex
instruction, since that requires the draw fallback code to intercept
all sampler / view calls (even if the fallback is never hit).
Basically, the idea is to use coverage of the pixel to calculate
the alpha value
From: Roland Scheidegger
The motivation actually was to get rid of the additional tex
instruction, since that requires the draw fallback code to intercept
all sampler / view calls (even if the fallback is never hit).
Basically, the idea is to use coverage of the pixel to calculate
the alpha value
From: Roland Scheidegger
The comment said it will only represent the lowest 32 regs. This was
not entirely true in practice, since at least on x86 you'll get
masked shifts (unless the compiler could recognize it already and toss
it out). It turns out this actually works out alright (presumably
no
From: Roland Scheidegger
There's no point, we know the highest non-null one.
---
src/gallium/auxiliary/cso_cache/cso_context.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/src/gallium/auxiliary/cso_cache/cso_context.c
b/src/gallium/auxiliary/cso_cache/cso_context.c
ind
From: Roland Scheidegger
We were setting view to NULL if the iteration was larger than i.
But in fact if the view is NULL the code did nothing anyway...
---
src/gallium/drivers/softpipe/sp_state_sampler.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/src/gallium/drivers
From: Roland Scheidegger
Some state trackers require 128.
(There are no plans to increase PIPE_MAX_SAMPLERS too, since with gl
state tracker it's unlikely more than 32 will be needed, if you need
more use bindless.)
---
src/gallium/include/pipe/p_state.h | 2 +-
1 file changed, 1 insertion(+), 1
From: Roland Scheidegger
We already stored the highest (potentially) used number.
---
src/gallium/auxiliary/draw/draw_context.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/src/gallium/auxiliary/draw/draw_context.c
b/src/gallium/auxiliary/draw/draw_context.c
index 9791ec5
From: Roland Scheidegger
Shaders coming from dx10 state trackers have a RET before the END.
And the epilog needs to be placed before the RET (otherwise it will
get ignored).
Hence figure out if a RET is in main, in this case we'll place
the epilog there rather than before the END.
(At a closer lo
From: Roland Scheidegger
Shaders coming from dx10 state trackers have a RET before the END.
And the epilog needs to be placed before the RET (otherwise it will
get ignored).
Hence figure out if a RET is in main, in this case we'll place
the epilog there rather than before the END.
(At a closer lo
From: Roland Scheidegger
We need this to handle some oddball dx10 format
(DXGI_FORMAT_R10G10B10_XR_BIAS_A2_UNORM). What you can do with this
format is very limited, hence we don't want to add it as a gallium
format (we could not express the properties of this format as
ordinary format properties
From: Roland Scheidegger
The writemask handling was busted, since writing defaults to output
meant they got overwritten by the tex sampling anyway. Albeit the
affected components were undefined, so maybe with some luck it
still would have worked with some drivers - if not could as well
kill it...
From: Roland Scheidegger
The hw gives us coverage for pixel, not for individual fragment shader
invocations, in case execution isn't per pixel (note eg, unlike cm, actually
cannot do "real" minSampleShading, it's either per-pixel or per-fragment, but
it doesn't really make a difference here).
Als
From: Roland Scheidegger
No functional change (compile tested only).
---
src/gallium/drivers/r600/cayman_msaa.c | 14 ++
src/gallium/drivers/r600/evergreen_state.c | 10 ++
src/gallium/drivers/r600/r600_pipe_common.h | 6 ++
3 files changed, 14 insertions(+), 16 de
From: Roland Scheidegger
For some reason, we were iterating through the code twice (first just for
instructions needing barycentrics, then for instructions and input dcls).
Move things around slightly so this is no longer necessary.
There also was a unnedeed enabling of the fixed_pt_position_gpr
From: Roland Scheidegger
This parameter for _mesa_get_min_incations_per_fragment() was once used
by the intel driver, but it's long gone.
---
src/mesa/program/program.c| 11 ---
src/mesa/program/program.h| 3 +--
src/mesa/state_tracker/st_atom_msaa.c | 2 +-
3 f
From: Roland Scheidegger
By the looks of it it seems hemlock is treated separately to cypress, but
certainly it won't need the stack workarounds cedar/redwood (and
seemingly every other eg chip except cypress/juniper) need.
(Discovered by accident.)
---
src/gallium/drivers/r600/sb/sb_bc.h | 1 +
From: Roland Scheidegger
The size/type query is always legal (if we made it that far).
This causes a difference for GL_TEXTURE_BUFFER - the reason is that these
parameters are valid only with GetTexLevelParameter() if gl 3.1 is supported,
but not if only ARB_texture_buffer_object is supported.
Ho
From: Roland Scheidegger
The code just considered all formats as being supported if they were either
a valid fbo or texture format.
This was quite awkward since then the query would return "supported" for
e.g. GL_RGB9E5 or compressed formats and target RENDERBUFFER (albeit the driver
could still
From: Roland Scheidegger
Testing for gles there is just confusing - this is about target being
supported, if it was valid at all was already determined earlier
(in _legal_parameters). It didn't make sense at all in any case, since
it would only have said false there for gles for 2d but not 2d arr
From: Roland Scheidegger
We are not allowed to modify the incoming coords values, or things may
crash (as we may be inside a llvm conditional and the values may be used
in another branch).
I recently broke this when fixing an issue with NaNs and seamless cube
map filtering, and it causes crashes
From: Roland Scheidegger
Some apps are known to require more than 16. Albeit they probably still won't
run with 18 (since all new hw/drivers support 32) it shouldn't hurt to at
least support 18 (seemingly the hw limit on all r600-ni chips - the blob also
supports 18, at least for eg+ by the looks
From: Roland Scheidegger
Just inline the little bit of code.
---
src/gallium/auxiliary/draw/draw_pt_vsplit.c | 23 ---
1 file changed, 12 insertions(+), 11 deletions(-)
diff --git a/src/gallium/auxiliary/draw/draw_pt_vsplit.c
b/src/gallium/auxiliary/draw/draw_pt_vsplit.c
in
From: Roland Scheidegger
vsplit_add_cache uses the post-bias index for hashing, but the
vsplit_add_cache_uint/ushort/ubyte ones used the pre-bias index, therefore
the code for handling the special case (because -1 matches the initialization
value of the cache) wasn't actually working.
Commit 78a9
From: Roland Scheidegger
The command parser is very sad if we don't emit the relocs per hw query...
However, don't enable it. It mostly works, but piglit
arb_transform_feedback_overflow_query-basic shows 2 failures (it's really the
same case for the hw), conditional_render_any and conditional_re
From: Roland Scheidegger
ARB_ubo requires 12 UBOs (per stage) at least, but this limit has been
raised by GL 4.3 to 14, so don't advertize GL 4.3 without it (only checking
the vertex stage since all drivers probably have the same limit anyway for
other stages). (piglit has minmax tests for that k
From: Roland Scheidegger
We've seen some problems internally due to macro redefinition.
Fix this by adding HAVE_FUNC_ATTRIBUTE_NORETURN to c99_compat.h,
and defining it for msvc.
And avoid redefinition just in case.
---
include/c99_compat.h | 1 +
src/util/macros.h| 12
2 files
From: Roland Scheidegger
For eg/cm, the r600_gb_backend_map will always be 0. I assume this is a bug
in the drm kernel driver, as it just just never fills the information in.
I am not entirely sure if the map is supposed to be needed for these chips,
since unlike on r600/r700 the value calculated
From: Roland Scheidegger
Juniper really has a maximum of 4 RBEs (16 pixels). However, predication
always locks up on my HD 5750, and through experiments it looks like if we're
pretending it has a maximum of 8, with 4 disabled, it works correctly.
My conclusion would be that there's a bug (likely
From: Roland Scheidegger
The logic had two fatal flaws which completely killed the default value.
1) drm will overwrite the value anyway even if the chip can't be handled
2) the default value logic is relying on num_render_backends, which was
filled in later.
Luckily noone is relying on it, but i
From: Roland Scheidegger
Contrary to what the comment said, this appears to work just fine on my rv770
(tested with piglit textureSize 140 fs/vs samplerBuffer).
I have no clue though if it's actually preferrable to use it (unfortunately
we cannot get rid of the tex constants completely, as we sti
From: Roland Scheidegger
Contrary to what the comment said, this appears to work just fine on my rv770
(tested with piglit textureSize 140 fs/vs samplerBuffer).
I have no clue though if it's actually preferrable to use it (unfortunately
we cannot get rid of the tex constants completely, as we sti
From: Roland Scheidegger
The offset looks bogus to me. Albeit in the end it doesn't matter, by the
looks of it offsets smaller than 4 get ignored there (not sure of the rules,
I suppose either non-dword aligned offsets never work there or the offset
must be at least aligned to the size of a singl
From: Roland Scheidegger
This fixes the new piglit test.
(I could not actually figure out where the hell that index_1 parameter comes
from but in any case it's completely the same as for ordinary texturing...)
While here also fix up the logic for early exit of setting up driver consts.
---
src/g
From: Roland Scheidegger
Ideally we'd support 16 (d3d11 requires 15, and mesa subtracts one for non-ubo
constants), but that's kind of impossible (it would be only doable if either
we'd somehow merge the mesa non-ubo constants with the driver constants, or
only use the driver constants with vtx f
From: Roland Scheidegger
With the exception of the default tess levels only ever accessed
by the default tcs shader, the LDS_INFO const buffer was only accessed by vtx
instructions, and not through kcache. No idea why really, but use this to our
advantage by not using a constant buffer slot for i
From: Roland Scheidegger
Similar to const buffers. The driver must not emit any tes-related state if tes
is disabled, since the hw slots are all shared by VS, therefore it would
overwrite them (the mesa state tracker might not do this, but it would be
perfectly legal to do so).
Nevertheless I thi
From: Roland Scheidegger
We only did this for the other stages, but obviously tess eval/ctrl need it
too.
This fixes the (newly modified) piglit texturing/textureSize test when run
with tes stage and bufferSampler.
---
src/gallium/drivers/r600/r600_state_common.c | 16
1 file ch
From: Roland Scheidegger
Evergreen clearly has 32 slots, so it should just work (and the affected array
is already sized with PIPE_MAX_ATTRIB).
Note: As dx10.1 chips, r600/r700 should support this too, but seemingly there's
only 16 resource slots for fetch shaders (fs). However, a quick looks see
From: Roland Scheidegger
It looks like this reloc belongs to setting the constant reg, which is skipped
for gs ring.
---
src/gallium/drivers/r600/evergreen_state.c | 7 +++
src/gallium/drivers/r600/r600_state.c | 7 +++
2 files changed, 6 insertions(+), 8 deletions(-)
diff --git a/
From: Roland Scheidegger
Maybe upon a time it wasn't always true.
---
src/gallium/drivers/r600/r600_shader.c | 18 --
1 file changed, 18 deletions(-)
diff --git a/src/gallium/drivers/r600/r600_shader.c
b/src/gallium/drivers/r600/r600_shader.c
index 06d7ca02e9..6cdbfd3063 100644
From: Roland Scheidegger
piglit doesn't care, but I'm quite confident that the size actually bound
as range should be reported and not the base size of the resource.
Also, the array in the constant buffer looks overallocated by a factor of 4.
For eg, also decrease the size by another factor of 2
From: Roland Scheidegger
The spec says the missing texel (when we wrap around both x and y axis)
should be synthesized as the average of the 3 other texels. For bilinear
filtering however we instead adjusted the filter weights (because, while
the complexity looks similar, there would be 4 times a
From: Roland Scheidegger
Cube texture wrapping is a bit special since the values (post face
projection) always are within [0,1], so we took advantage of that and
omitted some clamps.
However, we can still get NaNs (either because the coords already had NaNs,
or the face projection generated them)
From: Roland Scheidegger
Care must be taken that all coords end up correct, the tests are very
sensitive that everything is correctly rounded. This doesn't matter
for bilinear filter (since picking a wrong texel with weight zero is
ok), and we could also switch the per-sample coords mistakenly.
W
From: Roland Scheidegger
I really intended to set this for all shader stages by
3835009796166968750ff46cf209f6d4208cda86 but missed it for compute shaders
(because it's in a different source file...).
---
src/gallium/drivers/r600/evergreen_compute.c | 5 +++--
1 file changed, 3 insertions(+), 2
From: Roland Scheidegger
The blend math gets a bit funky due to inverse blend factors being
in range [0,2] rather than [-1,1], our normalized math can't really
cover this.
src_alpha_saturate blend factor has a similar problem too.
(Note that piglit fbo-blending-formats test is mostly useless for
From: Roland Scheidegger
The blend math gets a bit funky due to inverse blend factors being
in range [0,2] rather than [-1,1], our normalized math can't really
cover this.
src_alpha_saturate blend factor has a similar problem too.
(Note that piglit fbo-blending-formats test is mostly useless for
From: Roland Scheidegger
r600 used the clamped version for rcp, whereas both evergreen and cayman
used the ieee version. I don't know why that discrepancy exists (it does so
since day 1) but there does not seem to be a valid reason for this, so make
it consistent. This seems now safer than before
From: Roland Scheidegger
Both r600 and evergreen used the clamped version, whereas cayman used the
ieee one. I don't think there's a valid reason for this discrepancy, so let's
switch to the ieee version for r600 and evergreen too, since we generally
want to stick to ieee arithmetic.
With this, b
From: Roland Scheidegger
I believe this is the safe thing to do, especially ever since the driver
actually generates NaNs for muls too.
The ISA docs are not very helpful here, however the dx10 versions will pick
a non-nan result over a NaN one (this is also the ieee754 behavior), whereas
the non-
From: Roland Scheidegger
Float rts were always set as unorm instead of float.
Not sure of the consequences, but at least it looks like the blend clamp
would have been enabled, which is against the rules (only eg really bothered
to even attempt to specify this correctly, r600 always used clamp any
From: Roland Scheidegger
The docs are not very concise in what this really does, however both
Alex Deucher and Nicolai Hähnle suggested this only really affects instructions
using the CLAMP output modifier, and I've confirmed that with the newly
changed piglit isinf_and_isnan test.
So, with this
From: Roland Scheidegger
I believe this is the safe thing to do, especially ever since the driver
actually generates NaNs for muls too.
Albeit since the radeon ISA docs are inaccurate/wrong there, I'm not
entirely sure what the non-dx10 versions do, but (as required by dx10)
the dx10 versions sho
From: Roland Scheidegger
Float rts were always set as unorm instead of float.
Not sure of the consequences, but at least it looks like the blend clamp
would have been enabled, which is against the rules (only eg really bothered
to even attempt to specify this correctly, r600 always used clamp any
From: Roland Scheidegger
I don't know what this bit really does. The docs are somewhere between
misleading and wrong however, as at least the newer ones (that bit exists with
GCN as well) imply all NaNs would get converted to zeros, which is definitely
NOT the case (and that would not be dx10 com
From: Roland Scheidegger
r600 used the clamped version for rcp, whereas both evergreen and cayman
used the ieee version. I don't know why that discrepancy exists (it does so
since day 1) but there does not seem to be a valid reason for this, so make
it consistent. This seems now safer than before
From: Roland Scheidegger
r600 already used the clamped versions, but for some reason this was
different to eg/cayman.
(Note that it has been different since essentially forever, 7 years, since
df62338c491f2cace1a48f99de78e83b5edd82fd in particular, which changed
this for r600 but not eg (cayman w
From: Roland Scheidegger
I believe this is the safe thing to do, especially ever since the driver
actually generates NaNs for muls too.
Albeit since the radeon ISA docs are inaccurate/wrong there, I'm not
entirely sure what the non-dx10 versions do, but (as required by dx10)
the dx10 versions sho
From: Roland Scheidegger
13b303ff9265b89bdd9100e32f905e9cdadfad81 added the actual enums but
didn't remove the already existing ones. (And also duplicated
the "fragment" names instead of using the "vertex" names.)
---
docs/specs/enums.txt | 26 --
1 file changed, 8 i
From: Roland Scheidegger
Culling tris with zero aera seems like a great idea, but apparently with
fill mode line (and point) we're supposed to draw them, at least some tests
for some other state tracker complained otherwise.
Such tris also always seem to be back facing (not sure if this can be
in
From: Roland Scheidegger
These assertions were revisited a couple of times in the past, and they
still weren't quite right.
The problem I was seeing (with some other state tracker) was a copy between
two 512x512 s3tc textures, but from mip level 0 to mip level 8. Therefore,
the destination has on
From: Roland Scheidegger
The logic for handling shadow coords was completely broken.
Fixes be3ab867bd444594f9d9e0f8e59d305d15769afd.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103265
---
src/gallium/auxiliary/tgsi/tgsi_util.c | 12 ++--
1 file changed, 6 insertions(+), 6 dele
1 - 100 of 640 matches
Mail list logo