[Mesa-dev] [PATCH] llvmpipe: handle shader sample mask output

2017-10-17 Thread sroland
From: Roland Scheidegger This probably isn't all that useful for GL, but there are apis where sample_mask is a valid output even without msaa. Just discard the pixel if the sample_mask doesn't include the bit for sample 0. --- src/gallium/drivers/llvmpipe/lp_state_fs.c | 26

[Mesa-dev] [PATCH] gallium: add new LOD opcode

2017-09-27 Thread sroland
From: Roland Scheidegger The operation performed is all the same as LODQ, but with the usual differences between dx10 and GL texture opcodes, that is separate resource and sampler indices (plus result swizzling, and setting z/w channels to zero). ---

[Mesa-dev] [PATCH] llvmpipe, gallivm: implement lod queries (LODQ opcode)

2017-09-17 Thread sroland
From: Roland Scheidegger This uses all the existing code to calculate lod values for mip linear filtering. Though we'll have to disable the simplifications (if we know some parts of the lod calculation won't actually matter for filtering purposes due to mip clamps etc.). For

[Mesa-dev] [PATCH 1/2] llvmpipe: enable PIPE_CAP_QUERY_PIPELINE_STATISTICS

2017-09-07 Thread sroland
From: Roland Scheidegger This was implemented since forever, but not enabled. It passes all piglit tests except one, arb_pipeline_statistics_query-frag. The reason is that the test (for drawing a 10x10 rect) expects between 100 and 150 pixel shader invocations. But since

[Mesa-dev] [PATCH 2/2] llvmpipe, draw: improve shader cache debugging

2017-09-07 Thread sroland
From: Roland Scheidegger With GALLIVM_DEBUG=perf set, output the relevant stats for shader cache usage whenever we have to evict shader variants. Also add some output when shaders are deleted (but not with the perf setting to keep this one less noisy). While here, also don't

[Mesa-dev] [PATCH] gallivm: fix gather implementation a bit

2017-09-07 Thread sroland
From: Roland Scheidegger gather is defined in terms of bilinear filtering, just without the filtering part. However, there's actually some subtle differences required in our implementation, because we use some tricks to simplify coord wrapping for the two coords per

[Mesa-dev] [PATCH] llvmpipe, tgsi: hook up dx10 gather4 opcode

2017-09-05 Thread sroland
From: Roland Scheidegger Trivial. We already support tg4 for legacy tex opcodes, so the actual texture sampling code already handles it. (Just like TG4, we don't handle additional capabilities and always sample red channel.) ---

[Mesa-dev] [PATCH] llvmpipe, draw: increase shader cache limits

2017-09-04 Thread sroland
From: Roland Scheidegger We're not particularly concerned with memory usage, if the tradeoff is shader recompiles. And it's common for apps to have a lot of shaders nowadays (and, since our shaders include a LOT of context state of course we may create quite a bit more

[Mesa-dev] [PATCH] st/mesa: fix view template initialization in try_pbo_readpixels

2017-08-31 Thread sroland
From: Roland Scheidegger I think this is what the code was meant to do, albeit as far as I can tell the redundant initialization some analyzers complain about should work as well just fine (only the first layer will be used, if the view contains one or more layers doesn't

[Mesa-dev] [PATCH] util: only use SCHED_IDLE in pthread_setschedparam() when it's defined

2017-08-26 Thread sroland
From: Roland Scheidegger Fixes build error when it's not. --- src/util/u_queue.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/util/u_queue.c b/src/util/u_queue.c index 49361c3..449da7d 100644 --- a/src/util/u_queue.c +++ b/src/util/u_queue.c @@

[Mesa-dev] [PATCH 2/2] llvmpipe: enable PIPE_CAP_QUERY_SO_OVERFLOW

2017-08-15 Thread sroland
From: Roland Scheidegger The driver supported this since way before the GL spec for it existed. Just need to support both the per-stream and for all streams variants (which are identical due to only supporting 1 stream). Passes piglit

[Mesa-dev] [PATCH 1/2] softpipe: enable PIPE_CAP_QUERY_SO_OVERFLOW

2017-08-15 Thread sroland
From: Roland Scheidegger The driver was supposed to support this since way before the GL spec for it existed, albeit it was apparently broken, so fix and enable it. --- docs/features.txt| 2 +- src/gallium/drivers/softpipe/sp_query.c | 7 ++-

[Mesa-dev] [PATCH] gallivm: handle call attributes for llvm < 4.0 in lp_add_function_attr

2017-07-21 Thread sroland
From: Roland Scheidegger We had some caller using LLVMAddInstrAttributes, which couldn't be converted to lp_add_function_attr, because attributes were only handled for functions in this case, so fix this. For llvm >= 4.0, this already works correctly. (radeonsi seems to avoid

[Mesa-dev] [PATCH] draw: handle more TGSI_SEMANTIC_COLOR indices

2017-07-07 Thread sroland
From: Roland Scheidegger It could only handle indices 0/1, otherwise what happened was bad (accessing array out of bounds, no crash but kind of random). This is enough for the gl state tracker (primary/secondary color) but not enough for some other state trackers (d3d9 has no

[Mesa-dev] [PATCH] llvmpipe: initialize default fb correctly in setup

2017-06-23 Thread sroland
From: Roland Scheidegger If lp_setup_bind_framebuffer() is never called, then setup fb x1/y1 was not correctly initialized. This can happen if there's never a fb set - both cso and llvmpipe would consider setting this with no cbufs and no zsbuf a redundant change and

[Mesa-dev] [PATCH 2/2] llvmpipe: fix using 32bit rasterization mistakenly, causing overflows

2017-06-23 Thread sroland
From: Roland Scheidegger We use the bounding box (triangle extents) to figure out if 32bit rasterization could potentially overflow. However, we used the bounding box which already got rounded up to 0 for negative coords for this, which is incorrect, leading to overflows and

[Mesa-dev] [PATCH 1/2] llvmpipe: fill in debug vertex info for tri rasterization

2017-06-23 Thread sroland
From: Roland Scheidegger This is pretty useful for debugging rasterization issues, so turn it on based on DEBUG (the actual existence of the fields is also conditionalized on DEBUG, lines fill it out the same too). --- src/gallium/drivers/llvmpipe/lp_setup_tri.c | 2 +- 1

[Mesa-dev] [PATCH] llvmpipe: add LP_NEW_GS flag for updating vertex info

2017-05-26 Thread sroland
From: Roland Scheidegger The vertex information we compute here is really dependent on the last stage before FS. It just happened to work most of the time because new GS tend to come with new VS and/or FS... (The LP_NEW_GS flag was previously set but never used.) ---

[Mesa-dev] [PATCH] glsl: fix compile errors with mingw due to missing PRIx64 definitions

2017-01-23 Thread sroland
From: Roland Scheidegger define __STDC_FORMAT_MACROS and include (same as ir_builder_print_visitor.cpp already does). Otherwise, some mingw build errors out (since 8e7e1ae0365ddc7edb0d4d98250ab46728e6c14a and bbce1c538dc0cb8bf3769510283d11847dc07540 presumably) with:

[Mesa-dev] [PATCH 2/3] gallivm: don't try to use fast rcp for fdiv

2017-01-23 Thread sroland
From: Roland Scheidegger The use of fast rcp instruction is disabled, and will always fall back to use a division instead (1 / x). Hence, if we get a division opcode, it doesn't make much sense trying to split that into rcp/mul. ---

[Mesa-dev] [PATCH 1/3] gallivm: (trivial) fix ddiv cpu implementation

2017-01-23 Thread sroland
From: Roland Scheidegger we can't use the cpu implementation of fdiv, as this one uses different lp_build_context, which causes assertion failure. Just use default fdiv action (there is no fast rcp for doubles which we could potentially use anyway). ---

[Mesa-dev] [PATCH 3/3] tgsi: implement ddiv opcode

2017-01-23 Thread sroland
From: Roland Scheidegger softpipe (along with llvmpipe) claims to support arb_gpu_shader_fp64, so we really need to support that opcode. --- src/gallium/auxiliary/tgsi/tgsi_exec.c | 14 ++ 1 file changed, 14 insertions(+) diff --git

[Mesa-dev] [PATCH] llvmpipe: separate logicop / mask / color mask from blending

2017-01-04 Thread sroland
From: Roland Scheidegger Doing these operations with blend format means that we have to convert the destination into blend format, which is entirely pointless if we don't do blending. For instance, we'd convert half floats to floats, or 10/10/10/2 to unorm16, just to apply

[Mesa-dev] [PATCH 4/4] llvmpipe: do transpose/untwiddle after conversion for 8bit formats

2016-12-21 Thread sroland
From: Roland Scheidegger Generally we should do tranpose after conversion, if the format has less than 32 bits per channel (if it has 32 bits, conversion is going to be a no-op anyway...). This is obviously because there's less vectors to deal with. Though the advantage for

[Mesa-dev] [PATCH 3/4] gallivm: generalize 4x4f->1x16ub special case conversion

2016-12-21 Thread sroland
From: Roland Scheidegger This special packing path can be easily extended to handle not just float->unorm8 but also float->snorm8 and uint32->uint8 and int32->int8 (i.e. all interesting cases for llvmpipe fs backend code). The packing parts all stay the same (only the last

[Mesa-dev] [PATCH 1/4] llvmpipe: use scalar load instead of vectors for small vectors in fs backend

2016-12-21 Thread sroland
From: Roland Scheidegger llvm has _huge_ problems trying to load things like <4 x i8> vectors and stitching such loads together to form 128bit vectors. My understanding of the problem is that the type legalizer tries to extend that to really a <4 x i32> vector and not a <16 x

[Mesa-dev] [PATCH 2/4] llvmpipe: use alpha from already converted color if possible

2016-12-21 Thread sroland
From: Roland Scheidegger For rgbx formats, there is no point in doing alpha conversion again (and with different tranpose even, so llvm can't eliminate it). Albeit it looks like there's some minimal changes needed in the blend code (found by code inspection, no test seemed to

[Mesa-dev] [PATCH 3/4] gallivm: optimize lp_build_unpack_arith_rgba_aos slightly

2016-12-20 Thread sroland
From: Roland Scheidegger This code uses a vector shift which has to be emulated on x86 unless there's AVX2. Luckily in some cases we can actually avoid the shift altogether, so do that. Also make sure we hit the fast lp_build_conv() path when applicable, albeit that's quite

[Mesa-dev] [PATCH 4/4] gallivm: implement aos unpack (to unorm8) for small unorm formats

2016-12-20 Thread sroland
From: Roland Scheidegger Using bit replication. This path now resembles something which might make sense. (The logic was mostly copied from llvmpipe fs backend.) I am not convinced though it is actually faster than SoA sampling (actually I'm quite certain it's always a loss

[Mesa-dev] [PATCH 1/4] llvmpipe: (trivial) minimally simplify mask construction

2016-12-20 Thread sroland
From: Roland Scheidegger simd instruction sets usually have comparisons for equal, not unequal. So use a different comparison against the mask itself - which also means we don't need a all-zero as well as a all-one (for the pxor) reg. Also add code to avoid scalar expansion

[Mesa-dev] [PATCH 2/4] gallivm: use 2 srcs for 32->16bit conversions in lp_bld_conv_auto

2016-12-20 Thread sroland
From: Roland Scheidegger If we only feed one source vector at a time, we cannot use pack intrinsics (as we only have a 64bit destination dst vector). lp_bld_conv_auto is specifically designed to alter the length and number of destination vectors, so this works just fine (if

[Mesa-dev] [PATCH 3/6] gallivm: optimize gather a bit, by using supplied destination type

2016-12-11 Thread sroland
From: Roland Scheidegger By using a dst_type in the the gather interface, gather has some more knowledge about how values should be fetched. E.g. if this is a 3x32bit fetch and dst_type is 4x32bit vector gather will no longer do a ZExt with a 96bit scalar value to 128bit, but

[Mesa-dev] [PATCH 2/6] gallivm: optimize SoA AoS fallback fetch path a little

2016-12-11 Thread sroland
From: Roland Scheidegger We should do transpose, not extract/insert, at least with "sufficient" amount of channels (for 4 channels, extract/insert shuffles generated otherwise look truly terrifying). Albeit we shouldn't fallback to that so often in any case. ---

[Mesa-dev] [PATCH 5/6] gallivm: generalize the compressed format soa fetch a bit

2016-12-11 Thread sroland
From: Roland Scheidegger This can now handle rgtc (unorm) too - this path no longer handles plain formats, but that's unnecessary they now all have their proper SoA unpack (this will still be dog-slow though due to the actual fetch being per-pixel util fallbacks). ---

[Mesa-dev] [PATCH 4/6] gallivm: provide soa fetch path handling formats with more than 32bit

2016-12-11 Thread sroland
From: Roland Scheidegger This previously always fell back to AoS conversion. Even for 4-float formats (which is the optimal case by far for that fallback case) this was suboptimal, since it meant the conversion couldn't be done with 256bit vectors. While this may still only

[Mesa-dev] [PATCH 6/6] draw: use SoA fetch, not AoS one

2016-12-11 Thread sroland
From: Roland Scheidegger Now that there's some SoA fetch which never falls back, we should usually get results which are better or at least not worse (something like rgba32f will stay the same). I suppose though it might be worse in some cases where the format doesn't require

[Mesa-dev] [PATCH 1/6] gallivm: (trivial) handle non-aligned fetch for lp_build_fetch_rgba_soa

2016-12-11 Thread sroland
From: Roland Scheidegger soa fetch so far always assumed that data was aligned. However, we want to use this for vertex fetch, and data might not be aligned there, so handle it in this path too (basically just pass through alignment through to other functions). (It looks like

[Mesa-dev] [PATCH] main: allow NEAREST_MIPMAP_NEAREST for stencil texturing

2016-12-05 Thread sroland
From: Roland Scheidegger As per GL 4.5 rules, which fixed a spec mistake in GL_ARB_stencil_texturing. The extension spec wasn't updated, but just allow it with older GL versions as well, hoping there aren't any crazy tests which want to see an error there... (Compile tested

[Mesa-dev] [PATCH 1/3] util: (trivial) ETC1 meets the criteria for fitting into unorm8

2016-12-03 Thread sroland
From: Roland Scheidegger Just like other similar compressed formats. --- src/gallium/auxiliary/util/u_format.c | 5 + 1 file changed, 5 insertions(+) diff --git a/src/gallium/auxiliary/util/u_format.c b/src/gallium/auxiliary/util/u_format.c index 72dd60f..3d28190

[Mesa-dev] [PATCH 2/3] gallivm: handle 16bit float fetches in lp_build_fetch_rgba_soa

2016-12-03 Thread sroland
From: Roland Scheidegger Note that we really want to _never_ reach the bottom of the function, which resorts to AoS fetch. Half floats can be handled just like other formats which fit into 32bit vectors (so, only 1x16 and 2x16 formats, albeit with more channels things are not

[Mesa-dev] [PATCH 3/3] gallivm: optimize gather a bit, by using supplied destination type

2016-12-03 Thread sroland
From: Roland Scheidegger By using a dst_type in the the gather interface, gather has some more knowledge about how values should be fetched. E.g. if this is a 3x32bit fetch and dst_type is 4x32bit vector gather will no longer do a ZExt with a 96bit scalar value to 128bit, but

[Mesa-dev] [PATCH] glsl: fix ldexp lowering if bitfield insert lowering is also requested

2016-12-03 Thread sroland
From: Roland Scheidegger Trivial, this just resurrects the code which was there once upon a time (the code can't lower instructions generated in the lowering pass there, and even if it could it would probably be suboptimal). This fixes piglit mesa_shader_integer_functions

[Mesa-dev] [PATCH 3/5] draw: unify linear and elts draw jit functions

2016-11-13 Thread sroland
From: Roland Scheidegger The code for elts and linear paths was nearly 100% identical by now - with the elts path simply having some additional gather for the elements in the main loop (with some additional small differences before the main loop). Hence nuke the separate

[Mesa-dev] [PATCH 4/5] draw: simplify fetch some more

2016-11-13 Thread sroland
From: Roland Scheidegger Don't keep the ofbit. This is just a minor simplification, just adjust the buffer size so that there will always be an overflow if buffers aren't valid to fetch from. Also, get rid of control flow from the instanced path too. Not worried about

[Mesa-dev] [PATCH 5/5] draw: drop some overflow computations

2016-11-13 Thread sroland
From: Roland Scheidegger It turns out that noone actually cares if the address computations overflow, be it the stride mul or the offset adds. Wrap around seems to be explicitly permitted even by some other API (which is a _very_ surprising result, as these overflow

[Mesa-dev] [PATCH 1/5] draw: drop unnecessary index overflow handling from vsplit code

2016-11-13 Thread sroland
From: Roland Scheidegger This was kind of strange, since it replaced indices which were only overflowing due to bias with MAX_UINT. This would cause an overflow later in the shader, except if stride was 0, however the vertex id would be essentially random then (-1 + eltBias).

[Mesa-dev] draw: simplify overflow handling, unify elts and linear jit code

2016-11-13 Thread sroland
Overflow handling is simplified quite a bit both in jit code and vsplit paths (basically just let things wrap around everywhere). This seems to be good enough for all apis. Also, elts and linear jit code is unified since the differences are minimal (even more so at the end of the series). The cost

[Mesa-dev] [PATCH 2/5] draw: use same argument order for jit draw linear / elts functions

2016-11-13 Thread sroland
From: Roland Scheidegger This is a bit simpler. Mostly to make it easier to unify the paths later... --- src/gallium/auxiliary/draw/draw_llvm.c | 48 ++ src/gallium/auxiliary/draw/draw_llvm.h | 8 ++--

[Mesa-dev] [PATCH 3/3] draw: simplify vsplit elts code a bit

2016-11-12 Thread sroland
From: Roland Scheidegger vsplit_get_base_idx explicitly returned idx 0 and set the ofbit in case of overflow. We'd then check the ofbit and use idx 0 instead of looking it up. This was necessary because DRAW_GET_IDX used to return DRAW_MAX_FETCH_IDX and not 0 in case of

[Mesa-dev] [PATCH 1/3] draw: use vectorized calculations for fetch (v2)

2016-11-12 Thread sroland
From: Roland Scheidegger Instead of doing all the math with scalars, use vectors. This means the overflow math needs to be done manually, albeit that's only really problematic for the stride/index mul, the rest has been pretty much moved outside the shader loop (albeit the

[Mesa-dev] [PATCH 2/3] draw: finally optimize bool clip mask generation

2016-11-12 Thread sroland
From: Roland Scheidegger lp_build_any_true_range is just what we need, though it will only produce optimal code with sse41 (ptest + set) - but even without it on 64bit x86 the code is still better (1 unpack, 2 movq + or + set), on 32bit x86 it's going to be roughly the same

[Mesa-dev] [PATCH 2/2] draw: use vectorized calculations for fetch

2016-11-03 Thread sroland
From: Roland Scheidegger Instead of doing all the math with scalars, use vectors. This means the overflow math needs to be done manually, albeit that's only really problematic for the stride/index mul, the rest has been pretty much moved outside the shader loop (albeit the

[Mesa-dev] [PATCH 1/2] gallivm: introduce 32x32->64bit lp_build_mul_32_lohi function

2016-11-03 Thread sroland
From: Roland Scheidegger This is used by shader umul_hi/imul_hi functions (and soon by draw). It's actually useful separating this out on its own, however the real reason for doing it is because we're using an optimized sse2 version, since the code llvm generates is atrocious

[Mesa-dev] [PATCH 1/2] draw: fix undefined input handling some more...

2016-11-02 Thread sroland
From: Roland Scheidegger Previous fixes were incomplete - some code still iterated through the number of elements provided by velem layout instead of the number stored in the key (which is the same as the number defined by the vs). And also actually accessed the elements from

[Mesa-dev] [PATCH 2/2] draw: use vectorized calculations for fetch

2016-11-02 Thread sroland
From: Roland Scheidegger Instead of doing all the math with scalars, use vectors. This means the overflow math needs to be done manually, albeit that's only really problematic for the stride/index mul, the rest has been pretty much moved outside the shader loop (albeit the

[Mesa-dev] [PATCH] draw: use vectorized calculations for fetch

2016-10-31 Thread sroland
From: Roland Scheidegger Instead of doing all the math with scalars, use vectors. This means the overflow math needs to be done manually, albeit that's only really problematic for the stride/index mul, the rest has been pretty much moved outside the shader loop (albeit the

[Mesa-dev] [PATCH] gallivm: Use native packs and unpacks for the lerps

2016-10-17 Thread sroland
From: Roland Scheidegger For the texturing packs, things looked pretty terrible. For every lerp, we were repacking the values, and while those look sort of cheap with 128bit, with 256bit we end up with 2 of them instead of just 1 but worse, plus 2 extracts too (the unpack,

[Mesa-dev] [PATCH] draw: improve vertex fetch (v2)

2016-10-14 Thread sroland
From: Roland Scheidegger The per-element fetch has quite some calculations which are constant, these can be moved outside both the per-element as well as the main shader loop (llvm can figure out it's constant mostly on its own, however this can have a significant compile

[Mesa-dev] [PATCH] gallivm: print out time for jitting functions with GALLIVM_DEBUG=perf

2016-10-13 Thread sroland
From: Roland Scheidegger Compilation to actual machine code can easily take as much time as the optimization passes on the IR if not more, so print this out too. --- src/gallium/auxiliary/gallivm/lp_bld_init.c | 11 +++ 1 file changed, 11 insertions(+) diff --git

[Mesa-dev] [PATCH] draw: improved handling of undefined inputs

2016-10-13 Thread sroland
From: Roland Scheidegger Previous attempts to zero initialize all inputs were not really optimal (though no performance impact was measurable). In fact this is not really necessary, since we know the max number of inputs used. Instead, just generate fetch for up to max inputs

[Mesa-dev] [PATCH] draw: improve vertex fetch

2016-10-11 Thread sroland
From: Roland Scheidegger The per-element fetch has quite some calculations which are constant, these can be moved outside both the per-element as well as the main shader loop (llvm can figure out it's constant mostly on its own, however this can have a significant compile

[Mesa-dev] [PATCH] draw: initialize shader inputs

2016-10-11 Thread sroland
From: Roland Scheidegger This should make the code more robust if a shader tries to use inputs which aren't defined by the vertex element layout (which usually shouldn't happen). No piglit change. --- src/gallium/auxiliary/draw/draw_llvm.c | 7 +++ 1 file changed, 7

[Mesa-dev] [PATCH] llvmpipe: fix issues with depth clamp

2016-08-14 Thread sroland
From: Roland Scheidegger We only did depth clamp when the value was written from the fs. This is very wrong both for d3d10 and GL, and only passed the corresponding piglit test due to pure luck (it no longer does with the enhanced test). Also, interpolation clamped values to

[Mesa-dev] [PATCH 1/2] llvmpipe: fix depth clamping wrt reversed near/far values

2016-08-14 Thread sroland
From: Roland Scheidegger This wasn't handled before (the result was that no matter what value got clamped, it always ended up as the near value in this case) (if clamping actually happened). Fix this by using the util helper for that (the math is otherwise "mostly" the same,

[Mesa-dev] [PATCH] gallivm: don't use integer min/max sse intrinsics with llvm >= 3.9

2016-06-18 Thread sroland
From: Roland Scheidegger Apparently, these are deprecated. There's some AutoUpgrade feature which is supposed to promote these to cmp/select, which apparently doesn't work with jit code. It is possible it's not actually even meant to work (see the bug filed against llvm which

[Mesa-dev] [PATCH] gallium/util: don't use blocksize for minify for assertions

2016-06-13 Thread sroland
From: Roland Scheidegger The previous assertions required for texture sizes smaller than block_size that src_box.x + src_box.width still be block size. (e.g. for a texture with width 3, and src_box.x = 0, src_box.width would have to be 4 to not assert.) This caused some

[Mesa-dev] [PATCH] llvmpipe: hack-fix bugs due to bogus bind flags

2016-06-13 Thread sroland
From: Roland Scheidegger The gallium contract would be that bind flags must indicate all possible bindings a resource might get used, but fact is the mesa state tracker does not set bind flags correctly, and this is more or less unfixable due to GL. This caused a bug with

[Mesa-dev] [PATCH] llvmpipe: hack-fix bugs due to bogus bind flags

2016-06-10 Thread sroland
From: Roland Scheidegger The gallium contract would be that bind flags must indicate all possible bindings a resource might get used, but fact is the mesa state tracker does not set bind flags correctly, and this is more or less unfixable due to GL. This caused a bug with

[Mesa-dev] [PATCH] gallium/util: use enum pipe_prim_type instead of unsigned some more

2016-05-27 Thread sroland
From: Roland Scheidegger There were complaints from a mingw build: u_draw.h:134:14: error: invalid conversion from ‘uint {aka unsigned int}’ to ‘pipe_prim_type’ [-fpermissive] --- src/gallium/auxiliary/util/u_draw.h | 21 - 1 file changed, 16

[Mesa-dev] [PATCH] gallivm: eliminate a unnecessary AND with unorm lerps

2016-05-12 Thread sroland
From: Roland Scheidegger Instead of doing a add and then mask out the upper bits, we can simply do a add with a half wide type (this, of course, assumes the hw can actually do it...), so we'll get the required zero in the upper bits automatically. ---

[Mesa-dev] [PATCH] gallivm: improve dumping of bitcode

2016-05-10 Thread sroland
From: Roland Scheidegger Use GALLIVM_DEBUG=dumpbc for dumping of modules as bitcode. Instead of a fixed llvmpipe.bc name, use ir_.bc so multiple modules can be dumped (albeit it might still overwrite previous modules, particularly the modules from draw tend to always have the

[Mesa-dev] [PATCH] gallivm: print declarations of intrinsics with GALLIVM_DEBUG=ir

2016-05-09 Thread sroland
From: Roland Scheidegger Those aren't really interesting, however outputting them is helpful when trying to feed the IR to llvm llc (or opt) for debugging. --- src/gallium/auxiliary/gallivm/lp_bld_intr.c | 5 + 1 file changed, 5 insertions(+) diff --git

[Mesa-dev] [PATCH] gallivm: disable avx512 features

2016-05-08 Thread sroland
From: Roland Scheidegger We don't target this yet, and some llvm versions incorrectly enable it based on cpu string, causing crashes. (Albeit this is a losing battle, it is pretty much guaranteed when the next new feature comes along llvm will mistakenly enable it on some

[Mesa-dev] [PATCH] gallivm: use InternalLinkage instead of PrivateLinkage for texture functions

2016-05-08 Thread sroland
From: Roland Scheidegger At least with MCJIT the disassembler will crash otherwise when trying to disassemble such functions. --- src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git

[Mesa-dev] [PATCH] gallivm: make sampling more robust against bogus coordinates

2016-04-22 Thread sroland
From: Roland Scheidegger Some cases (especially these using fract for coord wrapping) did not handle NaNs (or Infs) correctly - the following code assumed the fract result could not be outside [0,1], but if the input is a NaN (or +-Inf) the fract result was NaN - which then

[Mesa-dev] [PATCH] gallivm: fix bogus argument order to lp_build_sample_mipmap function

2016-04-20 Thread sroland
From: Roland Scheidegger Screwed up since 0753b135f6e83b171d8a1b08aea967374f3542bc. (Only an issue with different min/mag filters, and then only in some cases, which is probably why it went unnoticed for quite a while. The effect should have simply been nearest mip filter

[Mesa-dev] [PATCH] glsl: add forgotten textureOffset function for sampler2DArrayShadow

2016-04-18 Thread sroland
From: Roland Scheidegger This was part of EXT_gpu_shader4 - as such it should have been supported by glsl 130. It was however forgotten, and not added until glsl 430 - with the wrong syntax no less (glsl 430 mentions it was overlooked). glsl 440 (but revision 8 only) fixed

[Mesa-dev] [PATCH] glsl: add forgotten textureOffset function for sampler2DArrayShadow

2016-04-18 Thread sroland
From: Roland Scheidegger This was part of EXT_gpu_shader4 - as such it should have been supported by glsl 130. It was however forgotten, and not added until glsl 430 - with the wrong syntax no less (glsl 430 mentions it was overlooked). glsl 440 (but revision 8 only) fixed

[Mesa-dev] [PATCH] gallivm: don't use vector selects with llvm 3.7

2016-04-16 Thread sroland
From: Roland Scheidegger llvm 3.7 sometimes simply miscompiles vector selects. See https://bugs.freedesktop.org/show_bug.cgi?id=94972 --- src/gallium/auxiliary/gallivm/lp_bld_logic.c | 8 +--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git

[Mesa-dev] [PATCH] llvmpipe: (trivial) initialize src1_alpha var to NULL

2016-04-15 Thread sroland
From: Roland Scheidegger The blend code would do a conditional assignment based on it, causing valgrind to complain. Since that variable was actually unused in this case, this doesn't fix anything but the warning. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94955

[Mesa-dev] [PATCH] gallivm: use llvm.nearbyint instead of llvm.round

2016-04-12 Thread sroland
From: Roland Scheidegger We used to use sse roundps intrinsic directly, but switched to use the llvm intrinsics for rounding with e4f01da15d8c6ce3e8c77ff3ff3d2ce2574a3f7b. However, llvm semantics follows standard math lib round function which is specced to do

[Mesa-dev] [PATCH] llvmpipe: fix lp_rast_plane alignment on 32bit

2016-03-14 Thread sroland
From: Roland Scheidegger Some rasterization code relies (for sse) on the first and third planes (but not the second for now) being 128bit aligned, and we didn't get that on 32bit - I mistakenly thought the 64bit number in the struct would get the thing aligned to 64bit even

[Mesa-dev] [PATCH] draw: fix line stippling

2016-03-13 Thread sroland
From: Roland Scheidegger The logic was comparing actual ints, not true/false values. This meant that it was emitting always multiple line segments instead of just one even if the stipple test had the same result, which looks inefficient, and the segments also overlapped thus

[Mesa-dev] [PATCH] softpipe: fix misleading TGSI_QUAD_SIZE usage

2016-03-13 Thread sroland
From: Roland Scheidegger All these img filter loops iterate through NUM_CHANNELS, not QUAD_SIZE. In practice both are of course the same unchangeable value (4), but it makes the code look a bit confusing. Moreover, some of the functions were actually given an array of 4

[Mesa-dev] [PATCH] softpipe: fix anisotropic filtering crash

2016-03-12 Thread sroland
From: Roland Scheidegger The filt_args->offset wasn't assigned but was always used later leading to a crash (as far as I can tell, texel offsets don't actually make much sense with anisotropic filtering, but because there's no explicit setting if offsets are enabled there the

[Mesa-dev] [PATCH] gallivm, tgsi: provide fake sample_i_ms implementations

2016-02-17 Thread sroland
From: Roland Scheidegger Just like the rest of the msaa "implementation" it's just fake for now... --- src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c | 7 ++- src/gallium/auxiliary/tgsi/tgsi_exec.c | 8 +--- 2 files changed, 11 insertions(+), 4 deletions(-)

[Mesa-dev] [PATCH 3/3] llvmpipe: drop scissor planes early if the tri is fully inside them

2016-01-31 Thread sroland
From: Roland Scheidegger If the tri is fully inside a scissor edge (or rather, we just use the bounding box of the tri for the comparison), then we can drop these additional scissor "planes" early. We do not even need to allocate space for them in the tri. The math actually

[Mesa-dev] [PATCH 2/3] llvmpipe: minor cleanup of sse2 for calc_fixed_position

2016-01-31 Thread sroland
From: Roland Scheidegger Just slightly simpler assembly. --- src/gallium/drivers/llvmpipe/lp_setup_tri.c | 11 +-- 1 file changed, 5 insertions(+), 6 deletions(-) diff --git a/src/gallium/drivers/llvmpipe/lp_setup_tri.c b/src/gallium/drivers/llvmpipe/lp_setup_tri.c

[Mesa-dev] [PATCH 1/3] llvmpipe: use vector loads for (optimized) tri raster funcs

2016-01-31 Thread sroland
From: Roland Scheidegger When we switched to 64bit rasterization, we could no longer use straight aligned loads for loading the plane data. However, what the code actually does for loading 3 planes, is 12 scalar loads + 9 unpacks, and then there's another 8 unpacks for the

[Mesa-dev] [PATCH 2/2] gallivm: add PK2H/UP2H support

2016-01-30 Thread sroland
From: Roland Scheidegger Add support for these opcodes, the conversion functions were already there albeit need some new packing stuff. Just like the tgsi version, piglit won't like it for all the same reasons, so it's disabled (UP2H passes piglit arb_shader_language_packing

[Mesa-dev] [PATCH 1/2] tgsi: add PK2H/UP2H support

2016-01-30 Thread sroland
don't enable the cap bit (so the code is unused). (Code is from imirkin, comment from sroland) Signed-off-by: Ilia Mirkin <imir...@alum.mit.edu> Reviewed-by: Roland Scheidegger <srol...@vmvware.com> --- src/gallium/auxiliary/tgsi/tgsi_exec.c | 44 -- src

[Mesa-dev] [PATCH] llvmpipe, i915: add back NEW_RASTERIZER dependency when computing vertex info

2016-01-20 Thread sroland
From: Roland Scheidegger I removed this mistakenly in 2dbc20e45689e09766552517a74e2270e49817b5. I actually thought it should not be necessary and a piglit run didn't show any differences, but this shouldn't have been in there. draw_prepare_shader_outputs() is in fact

[Mesa-dev] [PATCH] llvmpipe: warn about illegal use of objects in different contexts

2016-01-19 Thread sroland
From: Roland Scheidegger Doing that is clearly a bug. We can't quite assert as st/mesa may hit this, but increase at least visibility of it a bit. (For the non-refcounted objects it would be illegal too, but we can't detect that unless we'd store the context ourselves. Plus,

[Mesa-dev] [PATCH 2/2] llvmpipe: turn depth clears into full depth/stencil clears for d24x8 formats

2016-01-17 Thread sroland
From: Roland Scheidegger If we have a d24x8 format, there is no stencil. Therefore, we can always clear these bits too, which means this will be some kind of memset rather than read-modify-write. This is good for some 7% increase or so in gears with huge window size - seems

[Mesa-dev] [PATCH 1/2] llvmpipe: drop scissor planes early if the tri is fully inside it

2016-01-17 Thread sroland
From: Roland Scheidegger If the tri is fully inside the scissor (or rather, we just use the bounding box of the tri for the comparison), then we can drop these additional scissor "planes" early. (We could, of course, not even emit the scissor planes in this case in the first

[Mesa-dev] [PATCH 1/4] mesa: (trivial) fix typo in python scripts

2016-01-17 Thread sroland
From: Roland Scheidegger --- src/gallium/auxiliary/util/u_format_parse.py | 2 +- src/mesa/main/format_parser.py | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/src/gallium/auxiliary/util/u_format_parse.py

[Mesa-dev] [PATCH 2/4] i965: Provide sse2 version for rgba8 <-> bgra8 swizzle

2016-01-17 Thread sroland
From: Roland Scheidegger The existing code used ssse3, and because it isn't compiled in a separate file compiled with that, it is usually not used (that, of course, could be fixed...), whereas sse2 is always present at least with 64bit builds. It is actually trivial to do

[Mesa-dev] [PATCH 4/4] main: add sse2/ssse3 code for handling all 4 channel ubyte unorm swizzles

2016-01-17 Thread sroland
From: Roland Scheidegger Like the previous patch, but this time instead of direct format pack functions, this handles convert_ubyte if the destination and source were both ubyte unorm with 4 channels (so this can do things like bgrx8->rgba8, apart from swizzling filling in

[Mesa-dev] [PATCH 3/4] main: add some sse2 ubyte pack functions for rgba8 / rgbx8 unorm formats

2016-01-17 Thread sroland
From: Roland Scheidegger This certainly isn't as generic as it would be ideally, but got to start somewhere... Handles just rgba8/rgbx8 formats (so just swizzling). Even when using cached regions, these functions are definitely quite a bit faster than the c ones (for larger

[Mesa-dev] [PATCH] i965: Provide sse2 version for rgba8 <-> bgra8 swizzle

2016-01-15 Thread sroland
From: Roland Scheidegger The existing code used ssse3, and because it isn't compiled in a separate file compiled with that, it is usually not used (that, of course, could be fixed...), whereas sse2 is always present at least with 64bit builds. It is actually trivial to do

<    1   2   3   4   5   6   7   >