from:"sroland"

[Mesa-dev] [PATCH] util/atomic: Fix p_atomic_add for unlocked and msvc paths

2019-12-09 Thread sroland

From: Roland Scheidegger Braces mismatch (flagged by CI, untested). Fixes: 385d13f26d2 "util/atomic: Add a _return variant of p_atomic_add" --- src/util/u_atomic.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/util/u_atomic.h b/src/util/u_atomic.h index 9cbc6dd1eaa

[Mesa-dev] [PATCH] gallivm: Fix saturated signed psub/padd intrinsics on llvm 8

2019-10-16 Thread sroland

From: Roland Scheidegger LLVM 8 did remove both the signed and unsigned sse2/avx intrinsics in the end, and provide arch-independent llvm intrinsics instead. Fixes a crash when using snorm framebuffers (tested with piglit arb_color_buffer_float-render GL_RGBA8_SNORM -auto). CC: --- src/gallium

[Mesa-dev] [PATCH] llvmpipe: increase max texture size to 2GB

2019-10-10 Thread sroland

From: Roland Scheidegger The 1GB limit was arbitrary, increase this to 2GB (which is the max possible without code changes). --- src/gallium/drivers/llvmpipe/lp_limits.h | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/src/gallium/drivers/llvmpipe/lp_limits.h b/src/galli

[Mesa-dev] [PATCH] llvmpipe: fix CALLOC vs. free mismatches

2019-09-05 Thread sroland

From: Roland Scheidegger Should fix some issues we're seeing. And use REALLOC instead of realloc. --- src/gallium/drivers/llvmpipe/lp_cs_tpool.c | 6 +++--- src/gallium/drivers/llvmpipe/lp_state_cs.c | 3 ++- 2 files changed, 5 insertions(+), 4 deletions(-) diff --git a/src/gallium/drivers/llvm

[Mesa-dev] [PATCH] gallivm: use fallback code for mul_hi with llvm >= 7.0

2019-08-28 Thread sroland

From: Roland Scheidegger LLVM 7.0 ditched the pmulu intrinsics. This is only a trivial patch to use the fallback code instead. It'll likely produce atrocious code since the pattern doesn't match what llvm itself uses in its autoupgrade paths, hence the pattern won't be recognized. Should fix htt

[Mesa-dev] [PATCH] gallivm: fix issue with AtomicCmpXchg wrapper on llvm 3.5-3.8

2019-08-02 Thread sroland

From: Roland Scheidegger These versions still need wrapper but already have both success and failure ordering. (Compile tested on llvm 3.7, llvm 3.8.) Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=02 --- src/gallium/auxiliary/gallivm/lp_bld_misc.cpp | 16 +++- 1 file ch

[Mesa-dev] [PATCH] scons: fix build with llvm 9.

2019-05-23 Thread sroland

From: Roland Scheidegger The x86asmprinter component is gone, and things seem to work by just removing it. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110707 --- scons/llvm.py | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/scons/llvm.py b/scons/llvm.py index a

[Mesa-dev] [PATCH] gallivm: fix default cbuf info.

2019-05-23 Thread sroland

From: Roland Scheidegger The default null_output really needs to be static, otherwise the values we'll eventually get later are doubly random (they are not initialized, and even if they were it's a pointer to a local stack variable). VMware bug 2349556. --- src/gallium/auxiliary/gallivm/lp_bld_t

[Mesa-dev] [PATCH] auxiliary/draw: fix crash with zero-stride draw auto

2019-05-15 Thread sroland

From: Roland Scheidegger transform feedback draws get the number of vertices from the transform feedback object. In draw, we'll figure this out with the number of bytes written divided by the stride. However, it is apparently possible we end up with a stride of 0 there (not entirely sure it could

[Mesa-dev] [PATCH] gallivm: fix broken 8-wide s3tc decoding

2019-05-06 Thread sroland

From: Roland Scheidegger Brian noticed there was an uninitialized var for the 8-wide case and 128 bit blocks, which made it always crash. Likewise, the 64bit block case had another crash bug due to type mismatch. Color decode (used for all s3tc formats) also had a bogus shuffle for this case, lea

[Mesa-dev] [PATCH] gallivm: fix saturated signed add / sub with llvm 9

2019-04-16 Thread sroland

From: Roland Scheidegger llvm 8 removed saturated unsigned add / sub x86 sse2 intrinsics, and now llvm 9 removed the signed versions as well - they were proposed for removal earlier, but the pattern to recognize those was very complex, so it wasn't done then. However, instead of these arch-specif

[Mesa-dev] [PATCH] gallivm: fix bogus assert in get_indirect_index

2019-04-15 Thread sroland

From: Roland Scheidegger 0 is a valid value as max index, and the code handles it fine. This isn't commonly seen, as it will only happen with array declarations of size 1. The assert was introduced with a3c898dc97ec5f0e0b93b2ee180bdf8ca3bab14c. Fixes piglit tests/shaders/complex-loop-analysis-bu

[Mesa-dev] [PATCH] gallivm: abort when trying to use non-existing intrinsic

2018-12-20 Thread sroland

From: Roland Scheidegger Whenever llvm removes an intrinsic (we're using), we're hitting segfaults due to llvm doing calls to address 0 in the jitted code instead. However, Jose figured out we can actually detect this with LLVMGetIntrinsicID(), so use this to abort, so we don't have to wonder wha

[Mesa-dev] [PATCH] gallivm: don't use pavg.b intrinsic on llvm >= 6.0

2018-12-20 Thread sroland

From: Roland Scheidegger This intrinsic disppeared with llvm 6.0, using it ends up in segfaults (due to llvm issuing call to NULL address in the jited shaders). Add code doing the same thing as the autoupgrade code in llvm so it can be matched and replaced back with a pavgb. While here, also imp

[Mesa-dev] [PATCH] gallivm: remove unused float coord wrapping for aos sampling

2018-12-06 Thread sroland

From: Roland Scheidegger AoS sampling tries to use integers for coord wrapping when possible, as it should be faster. However, for AVX, this was suboptimal, because only floats can use 8x32bit vectors, whereas integers have to be split into 4x32bit vectors. (I believe part of why it was slower wa

[Mesa-dev] [PATCH] draw: fix infinite loop in line stippling

2018-11-22 Thread sroland

From: Roland Scheidegger The calculated length of a line may be infinite, if the coords we get are bogus. This leads to an infinite loop in line stippling. To prevent this test for this explicitly (although technically on at least x86 sse it would actually work without the explicit test, as long

[Mesa-dev] [PATCH] gallivm: fix improper clamping of vertex index when fetching gs inputs

2018-11-07 Thread sroland

From: Roland Scheidegger Because we only have one file_max for the (2d) gs input file, the value actually represents the max of attrib and vertex index (although I'm not entirely sure if we really want the max, since the max valid value of the vertex dimension can be easily deduced from the input

[Mesa-dev] [PATCH] gallivm: don't use saturated unsigned add/sub intrinsics for llvm 8.0

2018-08-23 Thread sroland

From: Roland Scheidegger These have been removed. Unfortunately auto-upgrade doesn't work for jit. (Worse, it seems we don't get a compilation error anymore when compiling the shader, rather llvm will just do a call to a null function in the jitted shaders making it difficult to detect when intri

[Mesa-dev] [PATCH] util: return 0 for NaNs in float_to_ubyte

2018-08-02 Thread sroland

From: Roland Scheidegger d3d10 requires NaNs to get converted to 0 for float->unorm conversions (and float->int etc.). GL spec probably doesn't care in general, but it would make sense to have reasonable behavior in any case imho - the old code was converting negative NaNs to 0, and positive NaNs

[Mesa-dev] [PATCH] draw: force draw pipeline if there's more than 65535 vertices

2018-07-21 Thread sroland

From: Roland Scheidegger The pt emit path can only handle 65535 - the number of vertices is truncated to a ushort, resulting in a too small buffer allocation, which will crash. Forcing the pipeline path looks suboptimal, then again this bug is probably there ever since GS is supported, so it see

[Mesa-dev] [PATCH] nir: fix msvc build

2018-07-13 Thread sroland

From: Roland Scheidegger Empty initializer braces aren't valid c (it's a gnu extension, and it's valid in c++). Hopefully fixes appveyor / msvc build... Fixes a3150c1d06ae7766c3d3fe3b33432e55c3c7527e --- src/compiler/nir/nir_format_convert.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)

[Mesa-dev] [PATCH] r600/sb: fix crash in fold_alu_op3

2018-07-03 Thread sroland

From: Roland Scheidegger fold_assoc() called from fold_alu_op3() can lower the number of src to 2, which then leads to an invalid access to n.src[2]->gvalue(). This didn't seem to have caused much harm in the past, but on Fedora 28 it will crash (presumably because -D_GLIBCXX_ASSERTIONS is used,

[Mesa-dev] [PATCH] nir/linker: fix msvc build

2018-07-03 Thread sroland

From: Roland Scheidegger Empty initializer braces aren't valid c (it's a gnu extension, and it's valid in c++). Hopefully fixes appveyor / msvc build... Fixes 6677e131b806b10754adcb7cf3f427a7fcc2aa09 --- src/compiler/glsl/gl_nir_link_atomics.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(

[Mesa-dev] [PATCH] r600: fix copy/paste bug for sampleMaskIn workaround

2018-06-15 Thread sroland

From: Roland Scheidegger The sampleMaskIn workaround (b936f4d1ca0d2ab1e828ff6a6e617f12469687fa) tries to figure out if the shader is running at per-sample frequency, but there's a typo bug so it will only recognize per-sample linar inputs, not per-sample perspective ones. Spotted by Eric Engestr

[Mesa-dev] [PATCH] llvmpipe: improve rasterization discard logic

2018-05-21 Thread sroland

From: Roland Scheidegger This unifies the explicit rasterization dicard as well as the implicit rasterization disabled logic (which we need for another state tracker), which really should do the exact same thing. We'll now toss out the prims early on in setup with (implicit or explicit) discard,

[Mesa-dev] [PATCH] draw: get rid of special logic to not emit null tris

2018-05-17 Thread sroland

From: Roland Scheidegger I've confirmed after 77554d220d6d74b4d913dc37ea3a874e9dc550e4 we no longer need this to pass some tests from another api (as we no longer generate the bogus extra null tris in the first place). --- src/gallium/auxiliary/draw/draw_pipe_clip.c | 38

[Mesa-dev] [PATCH] gallivm: Use alloca_undef with array type instead of alloca_array

2018-05-14 Thread sroland

From: Roland Scheidegger Use a single allocation of array type instead of the old-style array allocation for the temp and immediate arrays. Probably only makes a difference if they aren't used indirectly (so, if we used them solely because there's too many temps or immediates). In this case the s

[Mesa-dev] [PATCH] llvmpipe: Fix random number generation for unit tests

2018-05-07 Thread sroland

From: Roland Scheidegger We were never producing negative numbers for signed types. Also fix only producing half the valid range for uint32, and properly clamp signed values. Because this now also properly tests snorm with actually negative values, need to increase eps for such conversions. I be

[Mesa-dev] [PATCH 2/2] draw: fix different sign logic when clipping

2018-04-24 Thread sroland

From: Roland Scheidegger The logic was flawed, since mul(x,y) will be <= 0 (exactly 0) when the sign is the same but both numbers are sufficiently small (if the product is smaller than 2^-128). This could apparently lead to emitting a sufficient amount of additional bogus vertices to overflow the

[Mesa-dev] [PATCH 1/2] draw: simplify clip null tri logic

2018-04-24 Thread sroland

From: Roland Scheidegger Simplifies the logic when to emit null tris (albeit the reasons why we have to do this remain unclear). This is strictly just logic simplification, the behavior doesn't change at all. --- src/gallium/auxiliary/draw/draw_pipe_clip.c | 19 +-- 1 file change

[Mesa-dev] [PATCH 4/4] gallivm: dump bitcode before optimization

2018-04-22 Thread sroland

From: Roland Scheidegger If we dump the bitcode for off-line debug purposes, we really want the pre-optimized bitcode, otherwise it's useless in identifying problems with IR optimization (if you have a shader which takes an hour to do IR optimization, it's also nice you don't have to wait that ho

[Mesa-dev] [PATCH 3/4] gallivm: (trivial) do division by 1000 with int64

2018-04-22 Thread sroland

From: Roland Scheidegger Conversion to int can otherwise overflow if compile times are over ~71min. (Yes this can happen...) --- src/gallium/auxiliary/gallivm/lp_bld_init.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/gallium/auxiliary/gallivm/lp_bld_init.c b/src/gall

[Mesa-dev] [PATCH 2/4] gallivm: remove LICM pass

2018-04-22 Thread sroland

From: Roland Scheidegger LICM is simply too expensive, even though it presumably can help quite a bit in some cases. It was definitely cheaper in llvm 3.3, though as far as I can tell with llvm 3.3 it failed to do anything in most cases. early-cse also actually seems to cause licm to be able to m

[Mesa-dev] [PATCH 1/4] gallivm: add early cse pass

2018-04-22 Thread sroland

From: Roland Scheidegger This pass is quite cheap, and can simplify the IR quite a bit for our generated IR. In particular on a variety of shaders I've found the time saved by other passes due to the simplified IR more than makes up for the cost of this pass, and on top of that the end result is

[Mesa-dev] [PATCH] r600: fix abs for op3 sources

2018-03-12 Thread sroland

From: Roland Scheidegger If a src was referencing the same temp as the dst, the per-component copy code didn't work. e.g. cndge r0.xy, r0.xx, |r2|, r3 got expanded into mov r12.x, |r2| cndge r0.x, r0.x, r12, r3 mov r12.y, |r2| cndge r0.y, r0.x, r12, r3 hence for the second cndge r0.x

[Mesa-dev] [PATCH] u_blit: (trivial) u_blit.h needs to include p_defines.h

2018-03-09 Thread sroland

From: Roland Scheidegger (For the pipe_tex_filter enum) --- src/gallium/auxiliary/util/u_blit.h | 1 + 1 file changed, 1 insertion(+) diff --git a/src/gallium/auxiliary/util/u_blit.h b/src/gallium/auxiliary/util/u_blit.h index 085ea63..004ceae 100644 --- a/src/gallium/auxiliary/util/u_blit.h +

[Mesa-dev] [PATCH] draw: fix alpha value for very short aa lines

2018-03-08 Thread sroland

From: Roland Scheidegger The logic would not work correctly for line lengths smaller than 1.0, even a degenerated line with length 0 would still produce a fragment with anyhwere between alpha 0.0 and 0.5. --- src/gallium/auxiliary/draw/draw_pipe_aaline.c | 25 - src/gall

[Mesa-dev] [PATCH 2/2] draw: fix line stippling with aa lines

2018-03-06 Thread sroland

From: Roland Scheidegger In contrast to non-aa, where stippling is based on either dx or dy (depending on if it's a x or y major line), stippling is based on actual distance with smooth lines, so adjust for this. (It looks like there's some minor artifacts with mesa demos line-sample with wide l

[Mesa-dev] [PATCH 1/2] draw: simplify (and correct) aaline fallback (v2)

2018-03-06 Thread sroland

From: Roland Scheidegger The motivation actually was to get rid of the additional tex instruction, since that requires the draw fallback code to intercept all sampler / view calls (even if the fallback is never hit). Basically, the idea is to use coverage of the pixel to calculate the alpha value

[Mesa-dev] [PATCH] draw: simplify (and correct) aaline fallback

2018-03-06 Thread sroland

From: Roland Scheidegger The motivation actually was to get rid of the additional tex instruction, since that requires the draw fallback code to intercept all sampler / view calls (even if the fallback is never hit). Basically, the idea is to use coverage of the pixel to calculate the alpha value

[Mesa-dev] [PATCH] tgsi/scan: use wrap-around shift behavior explicitly for file_mask

2018-03-01 Thread sroland

From: Roland Scheidegger The comment said it will only represent the lowest 32 regs. This was not entirely true in practice, since at least on x86 you'll get masked shifts (unless the compiler could recognize it already and toss it out). It turns out this actually works out alright (presumably no

[Mesa-dev] [PATCH 1/2] cso: don't cycle through PIPE_MAX_SHADER_SAMPLER_VIEWS on context destroy

2018-02-27 Thread sroland

From: Roland Scheidegger There's no point, we know the highest non-null one. --- src/gallium/auxiliary/cso_cache/cso_context.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/src/gallium/auxiliary/cso_cache/cso_context.c b/src/gallium/auxiliary/cso_cache/cso_context.c ind

[Mesa-dev] [PATCH 2/2] softpipe: don't iterate through PIPE_MAX_SHADER_SAMPLER_VIEWS

2018-02-27 Thread sroland

From: Roland Scheidegger We were setting view to NULL if the iteration was larger than i. But in fact if the view is NULL the code did nothing anyway... --- src/gallium/drivers/softpipe/sp_state_sampler.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/gallium/drivers

[Mesa-dev] [PATCH] RFC: gallium: increase PIPE_MAX_SHADER_SAMPLER_VIEWS to 128

2018-02-26 Thread sroland

From: Roland Scheidegger Some state trackers require 128. (There are no plans to increase PIPE_MAX_SAMPLERS too, since with gl state tracker it's unlikely more than 32 will be needed, if you need more use bindless.) --- src/gallium/include/pipe/p_state.h | 2 +- 1 file changed, 1 insertion(+), 1

[Mesa-dev] [PATCH] draw: don't needlessly iterate through all sampler view slots

2018-02-26 Thread sroland

From: Roland Scheidegger We already stored the highest (potentially) used number. --- src/gallium/auxiliary/draw/draw_context.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/gallium/auxiliary/draw/draw_context.c b/src/gallium/auxiliary/draw/draw_context.c index 9791ec5

[Mesa-dev] [PATCH] tgsi: Recognize RET in main for tgsi_transform

2018-02-13 Thread sroland

From: Roland Scheidegger Shaders coming from dx10 state trackers have a RET before the END. And the epilog needs to be placed before the RET (otherwise it will get ignored). Hence figure out if a RET is in main, in this case we'll place the epilog there rather than before the END. (At a closer lo

[Mesa-dev] [PATCH] tgsi: Recognize RET in main for tgsi_transform

2018-02-12 Thread sroland

From: Roland Scheidegger Shaders coming from dx10 state trackers have a RET before the END. And the epilog needs to be placed before the RET (otherwise it will get ignored). Hence figure out if a RET is in main, in this case we'll place the epilog there rather than before the END. (At a closer lo

[Mesa-dev] [PATCH 2/2] u_blit, u_simple_shaders: add shader to convert from xrbias format

2018-02-06 Thread sroland

From: Roland Scheidegger We need this to handle some oddball dx10 format (DXGI_FORMAT_R10G10B10_XR_BIAS_A2_UNORM). What you can do with this format is very limited, hence we don't want to add it as a gallium format (we could not express the properties of this format as ordinary format properties

[Mesa-dev] [PATCH 1/2] u_simple_shaders: fix mask handling in util_make_fragment_tex_shader_writemask

2018-02-06 Thread sroland

From: Roland Scheidegger The writemask handling was busted, since writing defaults to output meant they got overwritten by the tex sampling anyway. Albeit the affected components were undefined, so maybe with some luck it still would have worked with some drivers - if not could as well kill it...

[Mesa-dev] [PATCH 4/4] r600: partly fix sampleMaskIn value

2018-02-04 Thread sroland

From: Roland Scheidegger The hw gives us coverage for pixel, not for individual fragment shader invocations, in case execution isn't per pixel (note eg, unlike cm, actually cannot do "real" minSampleShading, it's either per-pixel or per-fragment, but it doesn't really make a difference here). Als

[Mesa-dev] [PATCH 1/4] r600/cm: (trivial) code cleanup for emitting msaa state

2018-02-04 Thread sroland

From: Roland Scheidegger No functional change (compile tested only). --- src/gallium/drivers/r600/cayman_msaa.c | 14 ++ src/gallium/drivers/r600/evergreen_state.c | 10 ++ src/gallium/drivers/r600/r600_pipe_common.h | 6 ++ 3 files changed, 14 insertions(+), 16 de

[Mesa-dev] [PATCH 3/4] r600: clean up fragment shader input scan code

2018-02-04 Thread sroland

From: Roland Scheidegger For some reason, we were iterating through the code twice (first just for instructions needing barycentrics, then for instructions and input dcls). Move things around slightly so this is no longer necessary. There also was a unnedeed enabling of the fixed_pt_position_gpr

[Mesa-dev] [PATCH 2/4] mesa: (trivial) remove unused ignore_sample_qualifier_parameter

2018-02-04 Thread sroland

From: Roland Scheidegger This parameter for _mesa_get_min_incations_per_fragment() was once used by the intel driver, but it's long gone. --- src/mesa/program/program.c| 11 --- src/mesa/program/program.h| 3 +-- src/mesa/state_tracker/st_atom_msaa.c | 2 +- 3 f

[Mesa-dev] [PATCH] r600: don't do stack workarounds for hemlock

2018-01-29 Thread sroland

From: Roland Scheidegger By the looks of it it seems hemlock is treated separately to cypress, but certainly it won't need the stack workarounds cedar/redwood (and seemingly every other eg chip except cypress/juniper) need. (Discovered by accident.) --- src/gallium/drivers/r600/sb/sb_bc.h | 1 +

[Mesa-dev] [PATCH 3/3] mesa: skip validation of legality of size/type queries for format queries

2018-01-26 Thread sroland

From: Roland Scheidegger The size/type query is always legal (if we made it that far). This causes a difference for GL_TEXTURE_BUFFER - the reason is that these parameters are valid only with GetTexLevelParameter() if gl 3.1 is supported, but not if only ARB_texture_buffer_object is supported. Ho

[Mesa-dev] [PATCH 2/3] mesa: restrict formats being supported by target type for formatquery

2018-01-26 Thread sroland

From: Roland Scheidegger The code just considered all formats as being supported if they were either a valid fbo or texture format. This was quite awkward since then the query would return "supported" for e.g. GL_RGB9E5 or compressed formats and target RENDERBUFFER (albeit the driver could still

[Mesa-dev] [PATCH 1/3] mesa: remove misleading gles checks for formatquery

2018-01-26 Thread sroland

From: Roland Scheidegger Testing for gles there is just confusing - this is about target being supported, if it was valid at all was already determined earlier (in _legal_parameters). It didn't make sense at all in any case, since it would only have said false there for gles for 2d but not 2d arr

[Mesa-dev] [PATCH] gallivm: fix crash with seamless cube filtering with different min/mag filter

2018-01-24 Thread sroland

From: Roland Scheidegger We are not allowed to modify the incoming coords values, or things may crash (as we may be inside a llvm conditional and the values may be used in another branch). I recently broke this when fixing an issue with NaNs and seamless cube map filtering, and it causes crashes

[Mesa-dev] [PATCH] r600: increase number of samplers/views from 16 to 18 on eg

2018-01-22 Thread sroland

From: Roland Scheidegger Some apps are known to require more than 16. Albeit they probably still won't run with 18 (since all new hw/drivers support 32) it shouldn't hurt to at least support 18 (seemingly the hw limit on all r600-ni chips - the blob also supports 18, at least for eg+ by the looks

[Mesa-dev] [PATCH] draw: remove VSPLIT_CREATE_IDX macro

2018-01-16 Thread sroland

From: Roland Scheidegger Just inline the little bit of code. --- src/gallium/auxiliary/draw/draw_pt_vsplit.c | 23 --- 1 file changed, 12 insertions(+), 11 deletions(-) diff --git a/src/gallium/auxiliary/draw/draw_pt_vsplit.c b/src/gallium/auxiliary/draw/draw_pt_vsplit.c in

[Mesa-dev] [PATCH] draw: fix vsplit code when the (post-bias) index value is -1

2018-01-15 Thread sroland

From: Roland Scheidegger vsplit_add_cache uses the post-bias index for hashing, but the vsplit_add_cache_uint/ushort/ubyte ones used the pre-bias index, therefore the code for handling the special case (because -1 matches the initialization value of the cache) wasn't actually working. Commit 78a9

[Mesa-dev] [PATCH] r600: fix relocs for PIPE_QUERY_SO_OVERFLOW_ANY_PREDICATE query

2018-01-11 Thread sroland

From: Roland Scheidegger The command parser is very sad if we don't emit the relocs per hw query... However, don't enable it. It mostly works, but piglit arb_transform_feedback_overflow_query-basic shows 2 failures (it's really the same case for the hw), conditional_render_any and conditional_re

[Mesa-dev] [PATCH] mesa: require at least 14 UBOs for GL 4.3

2018-01-10 Thread sroland

From: Roland Scheidegger ARB_ubo requires 12 UBOs (per stage) at least, but this limit has been raised by GL 4.3 to 14, so don't advertize GL 4.3 without it (only checking the vertex stage since all drivers probably have the same limit anyway for other stages). (piglit has minmax tests for that k

[Mesa-dev] [PATCH] util: fix NORETURN for msvc, add HAVE_FUNC_ATTRIBUTE_NORETURN to c99_compat.h

2018-01-09 Thread sroland

From: Roland Scheidegger We've seen some problems internally due to macro redefinition. Fix this by adding HAVE_FUNC_ATTRIBUTE_NORETURN to c99_compat.h, and defining it for msvc. And avoid redefinition just in case. --- include/c99_compat.h | 1 + src/util/macros.h| 12 2 files

[Mesa-dev] [PATCH 1/3] r600: fix enabled_rb_mask on eg/cm

2018-01-08 Thread sroland

From: Roland Scheidegger For eg/cm, the r600_gb_backend_map will always be 0. I assume this is a bug in the drm kernel driver, as it just just never fills the information in. I am not entirely sure if the map is supposed to be needed for these chips, since unlike on r600/r700 the value calculated

[Mesa-dev] [PATCH 3/3] r600: hack up num_render_backends on Juniper to 8

2018-01-08 Thread sroland

From: Roland Scheidegger Juniper really has a maximum of 4 RBEs (16 pixels). However, predication always locks up on my HD 5750, and through experiments it looks like if we're pretending it has a maximum of 8, with 4 disabled, it works correctly. My conclusion would be that there's a bug (likely

[Mesa-dev] [PATCH 2/3] winsys/radeon: fix up default enabled_rb_mask for r600

2018-01-08 Thread sroland

From: Roland Scheidegger The logic had two fatal flaws which completely killed the default value. 1) drm will overwrite the value anyway even if the chip can't be handled 2) the default value logic is relying on num_render_backends, which was filled in later. Luckily noone is relying on it, but i

[Mesa-dev] [PATCH] r600: RFC: use GET_BUFFER_RESINFO vtx fetch on eg instead of setting up consts

2018-01-03 Thread sroland

From: Roland Scheidegger Contrary to what the comment said, this appears to work just fine on my rv770 (tested with piglit textureSize 140 fs/vs samplerBuffer). I have no clue though if it's actually preferrable to use it (unfortunately we cannot get rid of the tex constants completely, as we sti

[Mesa-dev] [PATCH 4/6] r600: RFC: use GET_BUFFER_RESINFO vtx fetch on eg instead of setting up consts

2018-01-02 Thread sroland

From: Roland Scheidegger Contrary to what the comment said, this appears to work just fine on my rv770 (tested with piglit textureSize 140 fs/vs samplerBuffer). I have no clue though if it's actually preferrable to use it (unfortunately we cannot get rid of the tex constants completely, as we sti

[Mesa-dev] [PATCH 2/6] r600: don't use vtx offset for load_sample_position

2018-01-02 Thread sroland

From: Roland Scheidegger The offset looks bogus to me. Albeit in the end it doesn't matter, by the looks of it offsets smaller than 4 get ignored there (not sure of the rules, I suppose either non-dword aligned offsets never work there or the offset must be at least aligned to the size of a singl

[Mesa-dev] [PATCH 3/6] r600: fix sampler indexing with texture buffers sampling

2018-01-02 Thread sroland

From: Roland Scheidegger This fixes the new piglit test. (I could not actually figure out where the hell that index_1 parameter comes from but in any case it's completely the same as for ordinary texturing...) While here also fix up the logic for early exit of setting up driver consts. --- src/g

[Mesa-dev] [PATCH 1/6] r600: increase number of ubos by one to 14

2018-01-02 Thread sroland

From: Roland Scheidegger Ideally we'd support 16 (d3d11 requires 15, and mesa subtracts one for non-ubo constants), but that's kind of impossible (it would be only doable if either we'd somehow merge the mesa non-ubo constants with the driver constants, or only use the driver constants with vtx f

[Mesa-dev] [PATCH 5/6] r600: increase number of UBOs to 15

2018-01-02 Thread sroland

From: Roland Scheidegger With the exception of the default tess levels only ever accessed by the default tcs shader, the LDS_INFO const buffer was only accessed by vtx instructions, and not through kcache. No idea why really, but use this to our advantage by not using a constant buffer slot for i

[Mesa-dev] [PATCH 6/6] r600: don't emit tes samplers/views when tes isn't active

2018-01-02 Thread sroland

From: Roland Scheidegger Similar to const buffers. The driver must not emit any tes-related state if tes is disabled, since the hw slots are all shared by VS, therefore it would overwrite them (the mesa state tracker might not do this, but it would be perfectly legal to do so). Nevertheless I thi

[Mesa-dev] [PATCH 3/3] r600: set up constants needed for txq for buffers and cube maps with tes

2017-12-31 Thread sroland

From: Roland Scheidegger We only did this for the other stages, but obviously tess eval/ctrl need it too. This fixes the (newly modified) piglit texturing/textureSize test when run with tes stage and bufferSampler. --- src/gallium/drivers/r600/r600_state_common.c | 16 1 file ch

[Mesa-dev] [PATCH 2/3] r600: support 32 vertex attribs for evergreen

2017-12-31 Thread sroland

From: Roland Scheidegger Evergreen clearly has 32 slots, so it should just work (and the affected array is already sized with PIPE_MAX_ATTRIB). Note: As dx10.1 chips, r600/r700 should support this too, but seemingly there's only 16 resource slots for fetch shaders (fs). However, a quick looks see

[Mesa-dev] [PATCH 1/3] r600: don't emit reloc for ring buffer out into the blue

2017-12-31 Thread sroland

From: Roland Scheidegger It looks like this reloc belongs to setting the constant reg, which is skipped for gs ring. --- src/gallium/drivers/r600/evergreen_state.c | 7 +++ src/gallium/drivers/r600/r600_state.c | 7 +++ 2 files changed, 6 insertions(+), 8 deletions(-) diff --git a/

[Mesa-dev] [PATCH 1/2] r600: kill off native_integer shader ctx flag

2017-12-22 Thread sroland

From: Roland Scheidegger Maybe upon a time it wasn't always true. --- src/gallium/drivers/r600/r600_shader.c | 18 -- 1 file changed, 18 deletions(-) diff --git a/src/gallium/drivers/r600/r600_shader.c b/src/gallium/drivers/r600/r600_shader.c index 06d7ca02e9..6cdbfd3063 100644

[Mesa-dev] [PATCH 2/2] r600: fix textureSize queries with tbos

2017-12-22 Thread sroland

From: Roland Scheidegger piglit doesn't care, but I'm quite confident that the size actually bound as range should be reported and not the base size of the resource. Also, the array in the constant buffer looks overallocated by a factor of 4. For eg, also decrease the size by another factor of 2

[Mesa-dev] [PATCH 1/2] gallivm: implement accurate corner behavior for textureGather with cube maps

2017-12-12 Thread sroland

From: Roland Scheidegger The spec says the missing texel (when we wrap around both x and y axis) should be synthesized as the average of the 3 other texels. For bilinear filtering however we instead adjusted the filter weights (because, while the complexity looks similar, there would be 4 times a

[Mesa-dev] [PATCH 2/2] gallivm: fix an issue with NaNs with seamless cube filtering

2017-12-12 Thread sroland

From: Roland Scheidegger Cube texture wrapping is a bit special since the values (post face projection) always are within [0,1], so we took advantage of that and omitted some clamps. However, we can still get NaNs (either because the coords already had NaNs, or the face projection generated them)

[Mesa-dev] [PATCH] gallivm: fix texture wrapping for texture gather for mirror modes

2017-12-09 Thread sroland

From: Roland Scheidegger Care must be taken that all coords end up correct, the tests are very sensitive that everything is correctly rounded. This doesn't matter for bilinear filter (since picking a wrong texel with weight zero is ok), and we could also switch the per-sample coords mistakenly. W

[Mesa-dev] [PATCH] r600: set DX10_CLAMP for compute shader too

2017-11-21 Thread sroland

From: Roland Scheidegger I really intended to set this for all shader stages by 3835009796166968750ff46cf209f6d4208cda86 but missed it for compute shaders (because it's in a different source file...). --- src/gallium/drivers/r600/evergreen_compute.c | 5 +++-- 1 file changed, 3 insertions(+), 2

[Mesa-dev] [PATCH] llvmpipe: fix snorm blending

2017-11-17 Thread sroland

From: Roland Scheidegger The blend math gets a bit funky due to inverse blend factors being in range [0,2] rather than [-1,1], our normalized math can't really cover this. src_alpha_saturate blend factor has a similar problem too. (Note that piglit fbo-blending-formats test is mostly useless for

[Mesa-dev] [PATCH] llvmpipe: fix snorm blending

2017-11-16 Thread sroland

From: Roland Scheidegger The blend math gets a bit funky due to inverse blend factors being in range [0,2] rather than [-1,1], our normalized math can't really cover this. src_alpha_saturate blend factor has a similar problem too. (Note that piglit fbo-blending-formats test is mostly useless for

[Mesa-dev] [PATCH 3/5] r600: use ieee version of rcp

2017-11-09 Thread sroland

From: Roland Scheidegger r600 used the clamped version for rcp, whereas both evergreen and cayman used the ieee version. I don't know why that discrepancy exists (it does so since day 1) but there does not seem to be a valid reason for this, so make it consistent. This seems now safer than before

[Mesa-dev] [PATCH 4/5] r600: use ieee version of rsq

2017-11-09 Thread sroland

From: Roland Scheidegger Both r600 and evergreen used the clamped version, whereas cayman used the ieee one. I don't think there's a valid reason for this discrepancy, so let's switch to the ieee version for r600 and evergreen too, since we generally want to stick to ieee arithmetic. With this, b

[Mesa-dev] [PATCH 1/5] r600: use min_dx10/max_dx10 instead of min/max

2017-11-09 Thread sroland

From: Roland Scheidegger I believe this is the safe thing to do, especially ever since the driver actually generates NaNs for muls too. The ISA docs are not very helpful here, however the dx10 versions will pick a non-nan result over a NaN one (this is also the ieee754 behavior), whereas the non-

[Mesa-dev] [PATCH 5/5] r600: set the number type correctly for float rts in cb setup

2017-11-09 Thread sroland

From: Roland Scheidegger Float rts were always set as unorm instead of float. Not sure of the consequences, but at least it looks like the blend clamp would have been enabled, which is against the rules (only eg really bothered to even attempt to specify this correctly, r600 always used clamp any

[Mesa-dev] [PATCH 2/5] r600: use DX10_CLAMP bit in shader setup

2017-11-09 Thread sroland

From: Roland Scheidegger The docs are not very concise in what this really does, however both Alex Deucher and Nicolai Hähnle suggested this only really affects instructions using the CLAMP output modifier, and I've confirmed that with the newly changed piglit isinf_and_isnan test. So, with this

[Mesa-dev] [PATCH 1/4] r600: use min_dx10/max_dx10 instead of min/max

2017-11-08 Thread sroland

From: Roland Scheidegger I believe this is the safe thing to do, especially ever since the driver actually generates NaNs for muls too. Albeit since the radeon ISA docs are inaccurate/wrong there, I'm not entirely sure what the non-dx10 versions do, but (as required by dx10) the dx10 versions sho

[Mesa-dev] [PATCH 4/4] r600: set the number type correctly for float rts in cb setup

2017-11-08 Thread sroland

From: Roland Scheidegger Float rts were always set as unorm instead of float. Not sure of the consequences, but at least it looks like the blend clamp would have been enabled, which is against the rules (only eg really bothered to even attempt to specify this correctly, r600 always used clamp any

[Mesa-dev] [PATCH 2/4] r600: use mysterious DX10_CLAMP bit in pixel shader setup

2017-11-08 Thread sroland

From: Roland Scheidegger I don't know what this bit really does. The docs are somewhere between misleading and wrong however, as at least the newer ones (that bit exists with GCN as well) imply all NaNs would get converted to zeros, which is definitely NOT the case (and that would not be dx10 com

[Mesa-dev] [PATCH 3/4] r600: use ieee version of rcp

2017-11-08 Thread sroland

From: Roland Scheidegger r600 used the clamped version for rcp, whereas both evergreen and cayman used the ieee version. I don't know why that discrepancy exists (it does so since day 1) but there does not seem to be a valid reason for this, so make it consistent. This seems now safer than before

[Mesa-dev] [PATCH 2/2] r600: use the clamped versions of rcp/rsq for eg/cayman.

2017-11-07 Thread sroland

From: Roland Scheidegger r600 already used the clamped versions, but for some reason this was different to eg/cayman. (Note that it has been different since essentially forever, 7 years, since df62338c491f2cace1a48f99de78e83b5edd82fd in particular, which changed this for r600 but not eg (cayman w

[Mesa-dev] [PATCH 1/2] r600: use min_dx10/max_dx10 instead of min/max_dx10

2017-11-07 Thread sroland

From: Roland Scheidegger I believe this is the safe thing to do, especially ever since the driver actually generates NaNs for muls too. Albeit since the radeon ISA docs are inaccurate/wrong there, I'm not entirely sure what the non-dx10 versions do, but (as required by dx10) the dx10 versions sho

[Mesa-dev] [PATCH] docs: Fix GL_MESA_program_debug enums

2017-11-06 Thread sroland

From: Roland Scheidegger 13b303ff9265b89bdd9100e32f905e9cdadfad81 added the actual enums but didn't remove the already existing ones. (And also duplicated the "fragment" names instead of using the "vertex" names.) --- docs/specs/enums.txt | 26 -- 1 file changed, 8 i

[Mesa-dev] [PATCH] draw: don't cull tris with zero aera

2017-10-26 Thread sroland

From: Roland Scheidegger Culling tris with zero aera seems like a great idea, but apparently with fill mode line (and point) we're supposed to draw them, at least some tests for some other state tracker complained otherwise. Such tris also always seem to be back facing (not sure if this can be in

[Mesa-dev] [PATCH] gallium/util: remove some block alignment assertions

2017-10-24 Thread sroland

From: Roland Scheidegger These assertions were revisited a couple of times in the past, and they still weren't quite right. The problem I was seeing (with some other state tracker) was a copy between two 512x512 s3tc textures, but from mip level 0 to mip level 8. Therefore, the destination has on

[Mesa-dev] [PATCH] tgsi: fix tgsi_util_get_inst_usage_mask

2017-10-18 Thread sroland

From: Roland Scheidegger The logic for handling shadow coords was completely broken. Fixes be3ab867bd444594f9d9e0f8e59d305d15769afd. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103265 --- src/gallium/auxiliary/tgsi/tgsi_util.c | 12 ++-- 1 file changed, 6 insertions(+), 6 dele

1 2 3 4 5 6 7 >

1 - 100 of 640 matches

Mail list logo