[Mesa-dev] [PATCH 1/3] gallium/drivers: support more sampler views than samplers for more drivers

2013-11-25 Thread sroland
From: Roland Scheidegger srol...@vmware.com This adds support for this to more drivers, in particular for all the special ones useful for debugging. HW drivers are left alone, some should be able to support it if they want but they may not be interested at this point. ---

[Mesa-dev] [PATCH] llvmpipe: calculate more accurate interpolation value at origin

2013-11-20 Thread sroland
From: Roland Scheidegger srol...@vmware.com Some rounding errors could crop up when calculating a0. Use a more accurate method (barycentric interpolation essentially) to fix this, though to fix the REAL problem (which is that our interpolation will give very bad results with small triangles far

[Mesa-dev] [PATCH] llvmpipe: clean up state setup code a bit

2013-11-12 Thread sroland
From: Roland Scheidegger srol...@vmware.com In particular get rid of home-grown vector helpers which didn't add much. And while here fix formatting a bit. No functional change. --- src/gallium/drivers/llvmpipe/lp_state_setup.c | 183 + 1 file changed, 66 insertions(+),

[Mesa-dev] [PATCH] gallivm, llvmpipe: fix float-srgb conversion to handle NaNs

2013-11-11 Thread sroland
From: Roland Scheidegger srol...@vmware.com d3d10 requires us to convert NaNs to zero for any float-int conversion. We don't really do that but mostly seems to work. In particular I suspect the very common float-unorm8 path only really passes because it relies on sse2 pack intrinsics which just

[Mesa-dev] [PATCH] gallivm: fix indirect addressing of inputs

2013-11-06 Thread sroland
From: Roland Scheidegger srol...@vmware.com We weren't adding the soa offsets when constructing the indices for the gather functions. That meant that we were always returning the data in the first element. (Copied straight from the same fix for temps.) While here fix up a couple of broken

[Mesa-dev] [PATCH] gallivm: deduplicate some indirect register address code

2013-11-06 Thread sroland
From: Roland Scheidegger srol...@vmware.com There's only one minor functional change, for immediates the pixel offsets are no longer added since the values are all the same for all elements in any case (it might be better if those weren't stored as soa vectors in the first place maybe). ---

[Mesa-dev] [PATCH] gallivm: optimize lp_build_minify for sse

2013-11-05 Thread sroland
From: Roland Scheidegger srol...@vmware.com SSE can't handle true vector shifts (with variable shift count), so llvm is turning them into a mess of extracts, scalar shifts and inserts. It is however possible to emulate them in lp_build_minify with float muls, which should be way faster (saves

[Mesa-dev] [PATCH] llvmpipe: fix bogus layer clamping in setup

2013-10-25 Thread sroland
From: Roland Scheidegger srol...@vmware.com The layer coming from GS needs to be clamped (not sure if that's actually the correct error behavior but we need something) as the number can be higher than the amount of layers in the fb. However, this code was using the layer calculation from the

[Mesa-dev] [PATCH] gallium: kill off PIPE_FORMAT_Z32_UNORM with extreme prejudice

2013-10-24 Thread sroland
From: Roland Scheidegger srol...@vmware.com This format, while still supported in OpenGL (but optional) and glx, is just causing major nuisance everywhere and needs special code in some places, because things like 1 depth_bits don't work. It is also the reason why we chose (just like in GL)

[Mesa-dev] [PATCH] gallivm: implement fully accurate corner filtering for seamless cube maps

2013-10-21 Thread sroland
From: Roland Scheidegger srol...@vmware.com d3d10 requires that cube corners are filtered with accurate weights (that is, the weight of the non-existing corner texel should be evenly distributed to the other 3 texels). OpenGL does not require this (but recommends it). This requires us to use

[Mesa-dev] [PATCH 1/2] gallivm: implement seamless cube filtering

2013-10-18 Thread sroland
From: Roland Scheidegger srol...@vmware.com For seamless cube filtering it is necessary to determine new faces and new coords per sample. The logic for this is _seriously_ complex (what needs to happen is very asymmetric wrt face, x/y under/overflow), further complicated by the fact that if the 4

[Mesa-dev] [PATCH 2/2] llvmpipe: enable seamless cube filtering

2013-10-18 Thread sroland
From: Roland Scheidegger srol...@vmware.com --- src/gallium/drivers/llvmpipe/lp_screen.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/gallium/drivers/llvmpipe/lp_screen.c b/src/gallium/drivers/llvmpipe/lp_screen.c index 723e40e..4c81022 100644 ---

[Mesa-dev] [PATCH] llvmpipe: increase fs shader variant instruction cache limit by factor 4

2013-10-11 Thread sroland
From: Roland Scheidegger srol...@vmware.com The previous limit of of 128*1024 was reported to cause frequent recompiles in some apps due to shader variant thrashing on IRC in some apps leading to noticeable lags. Note that the LP_MAX_SHADER_VARIANTS limit (1024) was more or less impossible to

[Mesa-dev] [PATCH] softpipe: fix seamless cube filtering

2013-10-10 Thread sroland
From: Roland Scheidegger srol...@vmware.com Fix coord wrapping (and face selection too) in case of edges. Unfortunately, the coord wrapping is way more complicated than what the code did, as it depends on the face and the direction where the texel falls off the face (the logic needed to get this

[Mesa-dev] [PATCH 2/3] gallivm: handle explicit derivatives for cubemaps

2013-10-04 Thread sroland
From: Roland Scheidegger srol...@vmware.com They need some special handling. Quite complicated. Additionally, use the same code for implicit derivatives too if no_rho_approx and no_quad_lod is set, because it seems while generally it should be ok to use per quad lod for implicit derivatives

[Mesa-dev] [PATCH 1/3] gallivm: ignore rho approximation for cube maps

2013-10-04 Thread sroland
From: Roland Scheidegger srol...@vmware.com There's two reasons for this: 1) even when ignoring rho approximation for cube maps, the result is still not correct, but it's better as the max error at edges is now sqrt(2) instead of 2 (which was a full mip level), same as it is for ordinary 2d maps

[Mesa-dev] [PATCH 3/3] gallivm: kill old per-quad face selection code

2013-10-04 Thread sroland
From: Roland Scheidegger srol...@vmware.com Not used since ages, and it wouldn't work at all with explicit derivatives now (not that it did before as it ignored them but now the code would just use the derivs pre-projected which would be quite random numbers). v2: also get rid of 3 helper

[Mesa-dev] [PATCH 1/3] gallivm: ignore rho approximation for cube maps

2013-10-03 Thread sroland
From: Roland Scheidegger srol...@vmware.com There's two reasons for this: 1) even when ignoring rho approximation for cube maps, the result is still not correct, but it's better as the max error at edges is now sqrt(2) instead of 2 (which was a full mip level), same as it is for ordinary 2d maps

[Mesa-dev] [PATCH 2/3] gallivm: handle explicit derivatives for cubemaps

2013-10-03 Thread sroland
From: Roland Scheidegger srol...@vmware.com They need some special handling. Quite complicated. Additionally, use the same code for implicit derivatives too if no_rho_approx and no_quad_lod is set, because it seems while generally it should be ok to use per quad lod for implicit derivatives

[Mesa-dev] [PATCH 3/3] gallivm: kill old per-quad face selection code

2013-10-03 Thread sroland
From: Roland Scheidegger srol...@vmware.com Not used since ages, and it wouldn't work at all with explicit derivatives now (not that it did before as it ignored them but now the code would just use the derivs pre-projected which would be quite random numbers). ---

[Mesa-dev] [PATCH] gallivm: ignore rho approximation for cube maps

2013-09-30 Thread sroland
From: Roland Scheidegger srol...@vmware.com There's two reasons for this: 1) even when ignoring rho approximation for cube maps, the result is still not correct, but it's better as the max error at edges is now sqrt(2) instead of 2 (which was a full mip level), same as it is for ordinary 2d maps

[Mesa-dev] [PATCH] gallivm: adjust wrap mode to CLAMP_TO_EDGE always for cube maps.

2013-09-19 Thread sroland
From: Roland Scheidegger srol...@vmware.com Technically without seamless filtering enabled GL allows any wrap mode, which made sense when supporting true borders (can get seamless effect with border and CLAMP_TO_BORDER), but gallium doesn't support borders and d3d9 requires wrap modes to be

[Mesa-dev] [PATCH] gallivm: some bits of seamless cube filtering implementation

2013-09-13 Thread sroland
From: Roland Scheidegger srol...@vmware.com Simply adjust wrap mode to clamp_to_edge. This is all that's needed for a correct implementation for nearest filtering, and it's way better than using repeat wrap for instance for linear filtering (though obviously this doesn't actually do seamless

[Mesa-dev] [PATCH] gallivm: some bits of seamless cube filtering implementation

2013-09-13 Thread sroland
From: Roland Scheidegger srol...@vmware.com Simply adjust wrap mode to clamp_to_edge. This is all that's needed for a correct implementation for nearest filtering, and it's way better than using repeat wrap for instance for linear filtering (though obviously this doesn't actually do seamless

[Mesa-dev] [PATCH 2/3] softpipe: handle NULL sampler views for texture sampling / queries

2013-08-30 Thread sroland
From: Roland Scheidegger srol...@vmware.com Instead of crashing just return all zero. --- src/gallium/auxiliary/tgsi/tgsi_exec.c |1 + src/gallium/drivers/softpipe/sp_tex_sample.c | 30 +- 2 files changed, 26 insertions(+), 5 deletions(-) diff --git

[Mesa-dev] [PATCH 3/3] gallivm: handle unbound textures in texture sampling / texture queries

2013-08-30 Thread sroland
From: Roland Scheidegger srol...@vmware.com Turns out we don't need to do much extra work for detecting this case, since we are guaranteed to get a empty static texture state in this case, hence just rely on format being 0 and return all zero then. Previously needed dummy textures (would just

[Mesa-dev] [PATCH] draw: fix PIPE_MAX_SAMPLER/PIPE_MAX_SHADER_SAMPLER_VIEWS issues

2013-08-30 Thread sroland
From: Roland Scheidegger srol...@vmware.com pstipple/aaline stages used PIPE_MAX_SAMPLER instead of PIPE_MAX_SHADER_SAMPLER_VIEWS when dealing with sampler views. Now these stages can't actually handle sampler_unit != texture_unit anyway (they cannot work with d3d10 shaders at all due to using

[Mesa-dev] [PATCH 1/3] softpipe: check if so_target is NULL before accessing it

2013-08-30 Thread sroland
From: Roland Scheidegger srol...@vmware.com No idea if this is working right but copied straight from llvmpipe. (Not only does this check the so_target but also use buffer-data instead of buffer for the mapping.) Just trying to get rid of a segfault testing something else... ---

[Mesa-dev] [PATCH 1/2] gallivm: don't use AoS path if min/mag filter are different with multiple lods

2013-08-29 Thread sroland
From: Roland Scheidegger srol...@vmware.com Instead of enhancing the AoS path so it can deal with it, just use SoA. Fixing AoS path wouldn't be all that difficult (use all the same logic as SoA) but considered not worth it for now. --- src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c |7

[Mesa-dev] [PATCH 2/2] gallivm: (trivial) don't pass sampler_unit variable down to filtering funcs

2013-08-29 Thread sroland
From: Roland Scheidegger srol...@vmware.com The only reason this was needed was because the fetch texel function had to get the (dynamic) border color, but this is now done much earlier. --- src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c | 62 - 1 file changed, 22

[Mesa-dev] [PATCH 1/2] gallivm: refactor num_lods handling

2013-08-28 Thread sroland
From: Roland Scheidegger srol...@vmware.com This is just preparation for per-pixel (or per-quad in case of multiple quads) min/mag filter since some assumptions about number of miplevels being equal to number of lods no longer holds true. This change does not change behavior yet (though

[Mesa-dev] [PATCH 2/2] gallivm: don't calculate square root of rho if we use accurate rho method

2013-08-28 Thread sroland
From: Roland Scheidegger srol...@vmware.com While a sqrt here and there shouldn't hurt much (depending on the cpu) it is possible to completely omit it since rho is only used for calculating lod and there log2(x) == 0.5*log2(x^2). Depending on the exact path taken for calculating lod this means

[Mesa-dev] [PATCH] gallivm: support per-pixel min/mag filter in SoA path

2013-08-28 Thread sroland
From: Roland Scheidegger srol...@vmware.com Since we can have per-pixel lod we should also honor the filter per-pixel (in fact we didn't honor it per quad neither in the multiple quad case). Do this by running the linear path and simply beating the weights into shape (the sample with the higher

[Mesa-dev] [PATCH 1/3] softpipe: support nested/overlapping queries for all query types

2013-08-23 Thread sroland
From: Roland Scheidegger srol...@vmware.com There's just no way resetting the counters is working with nested/overlapping queries. --- src/gallium/drivers/softpipe/sp_prim_vbuf.c |2 +- src/gallium/drivers/softpipe/sp_query.c | 33 +-- 2 files changed, 17

[Mesa-dev] [PATCH 2/3] llvmpipe: support nested/overlapping queries for all query types

2013-08-23 Thread sroland
From: Roland Scheidegger srol...@vmware.com There's just no way resetting the counters is working with nested/overlapping queries. --- src/gallium/drivers/llvmpipe/lp_query.c | 35 ++ src/gallium/drivers/llvmpipe/lp_query.h |1 -

[Mesa-dev] [PATCH 3/3] draw: clean up setting stream out information a bit

2013-08-23 Thread sroland
From: Roland Scheidegger srol...@vmware.com In particular noone is interested in the vertex count, so drop that, and also drop the duplicated num_primitives_generated / so.primitives_storage_needed variables in drivers. I am unable for now to figure out if primitives_storage_needed in SO stats

[Mesa-dev] [PATCH] gallivm: fix min/mag switchover point for nearest/none mip filter

2013-08-22 Thread sroland
From: Roland Scheidegger srol...@vmware.com Previously, the min/mag switchover point when using nearest/none mip filter was effectively -0.5 which can't be right. Looks like new OpenGL thinks it's ok if it's always 0.0 (older versions required 0.5 in some cases), let's hope everybody else thinks

[Mesa-dev] [PATCH 2/2] gallivm: add comment for bogus min/mag filter selection with nearest mip filter

2013-08-21 Thread sroland
From: Roland Scheidegger srol...@vmware.com Detected this hunting some other bug, not sure if it really needs fixing but it is definitely wrong. --- src/gallium/auxiliary/gallivm/lp_bld_sample.c |8 src/gallium/auxiliary/gallivm/lp_bld_sample_aos.c |2 +-

[Mesa-dev] [PATCH 1/2] gallivm: fix rho calculation for 1d case

2013-08-21 Thread sroland
From: Roland Scheidegger srol...@vmware.com Was using wrong (undefined) vector element (the elements are at 0/2 position, not 0/1). --- src/gallium/auxiliary/gallivm/lp_bld_sample.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git

[Mesa-dev] [PATCH 2/2] gallivm: do per-element lod for lod bias and explicit derivs too

2013-08-21 Thread sroland
From: Roland Scheidegger srol...@vmware.com Except for explicit derivs with cube maps which are very bogus anyway. Just like explicit lod this is only used if no_quad_lod is set in GALLIVM_DEBUG env var. Minification is terrible on cpus which don't support true vector shifts (but should work

[Mesa-dev] [PATCH 1/2] gallivm: (trivial) fix linear aos sampling of 3d compressed formats

2013-08-21 Thread sroland
From: Roland Scheidegger srol...@vmware.com block size depth is always 1 even for compressed formats (unless someone invents true 3d compressed formats at least which we can't represent). Nearest (and soa) path had it right. --- src/gallium/auxiliary/gallivm/lp_bld_sample_aos.c |4 ++-- 1

[Mesa-dev] [PATCH] gallivm: (trivial) fix int/uint border color clamping

2013-08-21 Thread sroland
From: Roland Scheidegger srol...@vmware.com Just a copy paste error. Fixes https://bugs.freedesktop.org/show_bug.cgi?id=68409. Note that the test passing before probably simply means it doesn't verify clamping of the border color itself as required by the OpenGL spec. ---

[Mesa-dev] [PATCH] gallivm: unify sin and cos implementation

2013-08-20 Thread sroland
From: Roland Scheidegger srol...@vmware.com The (complicated!) math is all identical, there's just minimal differences how sign bit is calculated plus there's an additional subtraction for the argument going into the polynomial for cos. The logic stays 100% the same (with a small exception, sign

[Mesa-dev] [PATCH 3/3] util: add avx2 and xop detection to cpu detection code

2013-08-19 Thread sroland
From: Roland Scheidegger srol...@vmware.com Going to need this soon (not going to bother with avx2 intrinsics at this time but don't want to do workarounds for true vector shifts if llvm itself can use them just fine and won't need the gazillion instruction emulation). Not really tested other

[Mesa-dev] [PATCH 2/3] gallivm: fix bogus aos path detection

2013-08-19 Thread sroland
From: Roland Scheidegger srol...@vmware.com Need to check the wrap mode of the actually used coords not a fixed 2. While checking more than necessary would only potentially disable aos and not cause any harm I'm pretty sure for 3d textures it could have caused assertion failures (if s,t coords

[Mesa-dev] [PATCH 1/3] gallivm: do clamping of border color correctly for all formats

2013-08-19 Thread sroland
From: Roland Scheidegger srol...@vmware.com Turns out it is actually very complicated to figure out what a format really is wrt range, as using channel information for determining unorm/snorm etc. doesn't work for a bunch of cases - namely compressed, subsampled, other. Also while here add

[Mesa-dev] [PATCH] gallivm: do clamping of border color correctly for all formats

2013-08-16 Thread sroland
From: Roland Scheidegger srol...@vmware.com Turns out it is actually very complicated to figure out what a format really is wrt range, as using channel information for determining unorm/snorm etc. doesn't work for a bunch of cases - namely compressed, subsampled, other. Also while here add

[Mesa-dev] [PATCH] llvmpipe: fix stencil bug if we have both stencil and depth tests

2013-08-15 Thread sroland
From: Roland Scheidegger srol...@vmware.com This is a very well hidden bug found by accident (only the fixed glean tstencil2 test so far seems to hit it). We must use new mask with combined s_pass values and orig_mask values for zpass/zfail stencil ops, otherwise both the sfail op and one of

[Mesa-dev] [PATCH] gallivm: implement better control of per-quad/per-element/scalar lod

2013-08-15 Thread sroland
From: Roland Scheidegger srol...@vmware.com There's a new debug value used to disable per-quad lod optimizations in fragment shader (ignored for vs/gs as the results are just too wrong typically). Also trying to detect if a supplied lod value is really a scalar (if it's coming from immediate or

[Mesa-dev] [PATCH 1/2] gallivm: change coordinate handling throughout functions

2013-08-14 Thread sroland
From: Roland Scheidegger srol...@vmware.com Instead of passing s,t,r coordinates pass a coord array - the reason is that I need to pass more coords (in particular for shadow coord, future will also need another one for cube map arrays) so just pass them as an array. Also, to simplify things, use

[Mesa-dev] [PATCH 2/2] gallivm: already pass coords in the right place in the sampler interface

2013-08-14 Thread sroland
From: Roland Scheidegger srol...@vmware.com This makes things a bit nicer, and more importantly it fixes an issue where a downgraded array texture (due to view reduced to 1 layer and addressed with (non-array) samplec instruction) would use the wrong coord as shadow reference value. (This could

[Mesa-dev] [PATCH] gallivm: already pass coords in the right place in the sampler interface

2013-08-14 Thread sroland
From: Roland Scheidegger srol...@vmware.com This makes things a bit nicer, and more importantly it fixes an issue where a downgraded array texture (due to view reduced to 1 layer and addressed with (non-array) samplec instruction) would use the wrong coord as shadow reference value. (This could

[Mesa-dev] [PATCH] gallivm: do per-sample depth comparison instead of doing it post-filter

2013-08-14 Thread sroland
From: Roland Scheidegger srol...@vmware.com Doing the comparisons pre-filter is highly recommended by OpenGL (and d3d9) and definitely required by d3d10. This actually doesn't do it pre-filter but more in-filter as otherwise need to push the comparisons even further down into fetch code and this

[Mesa-dev] [PATCH 1/4] ilo: implement new float comparison instructions

2013-08-13 Thread sroland
From: Roland Scheidegger srol...@vmware.com untested. --- src/gallium/drivers/ilo/shader/toy_tgsi.c | 20 1 file changed, 12 insertions(+), 8 deletions(-) diff --git a/src/gallium/drivers/ilo/shader/toy_tgsi.c b/src/gallium/drivers/ilo/shader/toy_tgsi.c index

[Mesa-dev] [PATCH 3/4] r600/radeonsi: implement new float comparison instructions

2013-08-13 Thread sroland
From: Roland Scheidegger srol...@vmware.com Also use ordered comparisons for old cmp instructions. Untested. --- src/gallium/drivers/r600/r600_shader.c | 18 --- .../drivers/radeon/radeon_setup_tgsi_llvm.c| 49 2 files changed, 48 insertions(+),

[Mesa-dev] [PATCH 2/4] nv50: implement new float comparison instructions

2013-08-13 Thread sroland
From: Roland Scheidegger srol...@vmware.com untested. --- .../drivers/nv50/codegen/nv50_ir_from_tgsi.cpp | 17 + 1 file changed, 17 insertions(+) diff --git a/src/gallium/drivers/nv50/codegen/nv50_ir_from_tgsi.cpp b/src/gallium/drivers/nv50/codegen/nv50_ir_from_tgsi.cpp

[Mesa-dev] [PATCH 4/4] st/mesa: use new float comparison opcodes if native integers are supported

2013-08-13 Thread sroland
From: Roland Scheidegger srol...@vmware.com Should get rid of some float-to-int conversions (with negation). No piglit regressions (with llvmpipe). --- src/mesa/state_tracker/st_glsl_to_tgsi.cpp | 45 ++-- 1 file changed, 15 insertions(+), 30 deletions(-) diff --git

[Mesa-dev] [PATCH] gallivm: fix border color with normalized texture formats

2013-08-13 Thread sroland
From: Roland Scheidegger srol...@vmware.com We need to put border color into texture format color space which essentially means clamping for non-float, normalized formats (not entirely sure if we're also meant to quantize the float but it's probably ok not to do it thankfully). For OpenGL we

[Mesa-dev] [PATCH 1/2] gallivm: simplify geometry shader mask handling a bit

2013-08-12 Thread sroland
From: Roland Scheidegger srol...@vmware.com Instead of reducing masks to 0/1 simply use the mask directly as -1. Also use some signed comparison instead of unsigned (as far as I understand these values have to be (very) small and signed means llvm doesn't have to apply additional logic to do the

[Mesa-dev] [PATCH] gallivm: simplify geometry shader mask handling a bit

2013-08-12 Thread sroland
From: Roland Scheidegger srol...@vmware.com Instead of reducing masks to 0/1 simply use the mask directly as -1. Also use some signed comparison instead of unsigned (as far as I understand these values have to be (very) small and signed means llvm doesn't have to apply additional logic to do the

[Mesa-dev] [PATCH] gallivm: fix exec_mask interaction with geometry shader after end of main

2013-08-12 Thread sroland
From: Roland Scheidegger srol...@vmware.com Because we must maintain an exec_mask even if there's currently nothing on the mask stack, we can still have an exec_mask at the end of the program. Effectively, this mask should be set back to default when returning from main. Without relying on

[Mesa-dev] [PATCH 1/3] gallium: add new float comparison instructions returning integer masks

2013-08-12 Thread sroland
From: Roland Scheidegger srol...@vmware.com Newer graphic languages don't want messy float mask results but instead true boolean mask results for float comparisons. Otherwise just need to convert the floats back to integers. Need to keep the old opcodes however due to both legacy (gl and d3d9)

[Mesa-dev] [PATCH 2/3] tgsi: implement new float comparison instructions returning integer masks

2013-08-12 Thread sroland
From: Roland Scheidegger srol...@vmware.com Also while here add a bunch of other forgotten (integer) instructions to tgsi_util_get_inst_usage_mask() (which isn't used for much except optimizing away unused input components), though it may still be incomplete. ---

[Mesa-dev] [PATCH 3/3] gallivm: implement new float comparison instructions returning integer masks

2013-08-12 Thread sroland
From: Roland Scheidegger srol...@vmware.com FSEQ/FSGE/FSLT/FSNE work just the same as SEQ/SGE/SLT/SNE except skip the select. And just for consistency use the same appropriate ordered/unordered comparisons for the old opcodes as well. --- src/gallium/auxiliary/gallivm/lp_bld_tgsi_action.c | 81

[Mesa-dev] [PATCH] gallium: add new float comparison opcodes returning integer booleans

2013-08-09 Thread sroland
From: Roland Scheidegger srol...@vmware.com The old float comparison opcodes always return floats 0.0 and 1.0 (clarified in docs these were really floats, was always the case) for legacy graphics. But everybody else (opengl,opencl,d3d10) just has to work around their return results (converting

[Mesa-dev] [PATCH] gallivm: set non-existing values really to zero in size queries for d3d10

2013-08-08 Thread sroland
From: Roland Scheidegger srol...@vmware.com My previous attempt at doing so double-failed miserably (minification of zero still gives one, and even if it would not the value was never written anyway). While here also rename the confusingly named int_vec bld as we have int vecs of different sizes,

[Mesa-dev] [PATCH 2/2] gallivm: propagate scalar_lod to emit_size_query too

2013-08-07 Thread sroland
From: Roland Scheidegger srol...@vmware.com Clearly the returned values need to be per-element if the lod is per element. Does not actually change behavior yet. --- src/gallium/auxiliary/draw/draw_llvm_sample.c |2 ++ src/gallium/auxiliary/gallivm/lp_bld_sample.h |1 +

[Mesa-dev] [PATCH 1/2] gallium: clarify SVIEWINFO opcode

2013-08-07 Thread sroland
From: Roland Scheidegger srol...@vmware.com This opcode is quite problematic in tgsi, while it tries to mirror d3d10 resinfo it can't really do what's stated there due to missing the crazy return type modifiers. Hence specify this is ignored along with the swizzle. (Other options would be to have

[Mesa-dev] [PATCH 1/2] gallivm: don't clamp reference value for shadow comparison for float formats

2013-08-07 Thread sroland
From: Roland Scheidegger srol...@vmware.com This is wrong both for OpenGL and d3d. (In fact clamping is a side effect of converting to depth format, so this should really do quantization too at least in d3d10 for the comparisons to be truly correct.) ---

[Mesa-dev] [PATCH 2/2] softpipe: don't clamp reference value for shadow comparison for float formats

2013-08-07 Thread sroland
From: Roland Scheidegger srol...@vmware.com Clamping is only done for fixed-point formats as part of conversion to texture format. --- src/gallium/drivers/softpipe/sp_tex_sample.c | 44 +++--- 1 file changed, 32 insertions(+), 12 deletions(-) diff --git

[Mesa-dev] [PATCH] gallivm: honor d3d10 floating point rules for shadow comparisons

2013-08-07 Thread sroland
From: Roland Scheidegger srol...@vmware.com d3d10 specifies ordered comparisons for everything but not_equal which is unordered (http://msdn.microsoft.com/en-us/library/windows/desktop/cc308050.aspx). OpenGL probably doesn't care. --- src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c | 20

[Mesa-dev] [PATCH 1/2] gallivm: honor d3d10's wishes of out-of-bounds behavior for texture size query

2013-08-07 Thread sroland
From: Roland Scheidegger srol...@vmware.com Specifically, must return 0 for non-existent mip levels (and non-existent textures which is an unsolved problem) for everything but total mip count. --- src/gallium/auxiliary/gallivm/lp_bld_sample_soa.c | 35 - 1 file changed, 27

[Mesa-dev] [PATCH 2/2] gallivm: use texture target from shader instead of static state for size query

2013-08-07 Thread sroland
From: Roland Scheidegger srol...@vmware.com d3d10 has no notion of distinct array resources neither at the resource nor sampler view level. However, shader dcl of resources certainly has, and d3d10 expects resinfo to return the values according to that - in particular a resource might have been a

[Mesa-dev] [PATCH] gallivm: fix out-of-bounds behavior for fetch/ld

2013-08-06 Thread sroland
From: Roland Scheidegger srol...@vmware.com For d3d10 and ARB_robust_buffer_access_behavior, we are required to return 0 for out-of-bounds coordinates (for which we can just enable the code already there was just disabled). Additionally, also need to return 0 for out-of-bounds mip level and

[Mesa-dev] [PATCH] util: implement table-based + linear interpolation linear-to-srgb conversion

2013-08-05 Thread sroland
From: Roland Scheidegger srol...@vmware.com Should be much faster, seems to work in softpipe. While here (also it's now disabled) fix up the pow factor - the former value is what is in GL core it is however not actually accurate to fp32 standard (as it is 1.0/2.4), and if someone would do all the

[Mesa-dev] [PATCH 1/2] gallivm: fix comment wrt srgb accuracy.

2013-08-02 Thread sroland
From: Roland Scheidegger srol...@vmware.com I think it's actually not good enough now... --- src/gallium/auxiliary/gallivm/lp_bld_format_srgb.c |6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/src/gallium/auxiliary/gallivm/lp_bld_format_srgb.c

[Mesa-dev] [PATCH 2/2] util: implement table-based + linear interpolation linear-to-srgb conversion

2013-08-02 Thread sroland
From: Roland Scheidegger srol...@vmware.com Should be much faster, seems to work in softpipe. While here (also it's now disabled) fix up the pow factor - the former value is what is in GL core it is however not actually accurate to fp32 standard (as it is 1.0/2.4), and if someone would do all the

[Mesa-dev] [PATCH] util: try much harder to set DAZ flag

2013-08-02 Thread sroland
From: Roland Scheidegger srol...@vmware.com While so far this only causes some harmless test failures, there's lots more cpus with DAZ. All 64bit capable ones can do it (particularly relevant for AMD cpus as they supported sse3 very very late) but if really necessary we can check support for that

[Mesa-dev] [PATCH] gallivm: use nearest rounding for float-unorm24 conversion

2013-07-31 Thread sroland
From: Roland Scheidegger srol...@vmware.com Previously we were using truncation, which gives the correct result only for numbers in [0.5-1.0] range (because there's no mantissa bits to do any rounding there). This is frequently hit (and probably only used there) when converting fragment depth to

[Mesa-dev] [PATCH 1/3] gallium: clarify shift behavior with shift count = 32

2013-07-30 Thread sroland
From: Roland Scheidegger srol...@vmware.com Previously, nothing was said what happens with shift counts exceeding bit width of the values to shift. In theory 3 behaviors are possible: 1) undefined (classic c definition) 2) just shift out all bits (so result is zero, or -1 potentially for ashr) 3)

[Mesa-dev] [PATCH 3/3] gallivm: obey clarified shift behavior

2013-07-30 Thread sroland
From: Roland Scheidegger srol...@vmware.com llvm shifts are undefined for shift counts exceeding (or matching) bit width, so need to apply a mask. NOTE: there's internal callers using this which guarantee the shift count is smaller than the type width. However, all of these use constant shift

[Mesa-dev] [PATCH 2/3] tgsi: obey clarified shift behavior

2013-07-30 Thread sroland
From: Roland Scheidegger srol...@vmware.com c shifts are undefined for shift counts exceeding (or matching) bit width, so need to apply a mask (on x86 it actually would usually probably work as shifts do masking on int domain shifts - unless some auto-vectorizer would come along at last as simd

[Mesa-dev] [PATCH] gallivm: obey clarified shift behavior

2013-07-30 Thread sroland
From: Roland Scheidegger srol...@vmware.com llvm shifts are undefined for shift counts exceeding (or matching) bit width, so need to apply a mask for the tgsi shift instructions. v2: only use mask for the tgsi shift instructions, not for the build shift helpers. None of the internal callers need

[Mesa-dev] [PATCH 1/2] gallivm: handle texel swizzles correctly for d3d10-style sample opcodes

2013-07-26 Thread sroland
From: Roland Scheidegger srol...@vmware.com unlike OpenGL, the texel swizzle is embedded in the instruction, so honor that. (Technically we now execute both the sampler_view swizzle and the per-instruction swizzle but this should be quite ok.) --- src/gallium/auxiliary/gallivm/lp_bld_tgsi_soa.c

[Mesa-dev] [PATCH 2/2] tgsi: handle texel swizzles correctly for d3d10-style sample opcodes

2013-07-26 Thread sroland
From: Roland Scheidegger srol...@vmware.com Same as for gallivm (though these don't quite work correctly in softpipe, so untested). --- src/gallium/auxiliary/tgsi/tgsi_exec.c | 40 1 file changed, 35 insertions(+), 5 deletions(-) diff --git

[Mesa-dev] [PATCH 1/2] util: don't flush overflowing values to infinity in half-float conversion

2013-07-26 Thread sroland
From: Roland Scheidegger srol...@vmware.com I am not able to find _any_ rounding behavior specified for OpenGL for float to half-float conversions. However, it is specified for fp11/fp10 which suggests round to next finite value but round-to-zero would also be allowed, but finite values must not

[Mesa-dev] [PATCH 2/2] gallivm: fix float-SNORM conversion

2013-07-26 Thread sroland
From: Roland Scheidegger srol...@vmware.com Just like the UNORM case we need to use round to nearest, not trunc. (There's also another problem, we're using the formula for SNORM-float which will produce a value below -1.0 for the most negative value which according to both OpenGL and d3d10 would

[Mesa-dev] [PATCH] draw: always call and move util_cpu_detect() to draw context creation.

2013-07-23 Thread sroland
From: Roland Scheidegger srol...@vmware.com CPU detection is not really x86 specific, the ifdef in particular didn't even catch x86_64. Also move to draw context creation which seems a lot cleaner, and just call it always (which seems like a better idea than rely on drivers doing this especially

[Mesa-dev] [PATCH] draw: always call util_cpu_detect() in draw context creation.

2013-07-23 Thread sroland
From: Roland Scheidegger srol...@vmware.com Since disabling denorms in draw_vbo() we require the util_cpu_caps to be initialized there. Hence add another util_cpu_detect() call in draw_create_context() which should ensure this. (There is another call in draw_get_option_use_llvm() which only gets

[Mesa-dev] [PATCH] mesa: fix rgtc snorm decoding

2013-07-22 Thread sroland
From: Roland Scheidegger srol...@vmware.com The codeword must be unsigned (otherwise will shift in 1's from above when merging low/high parts so some texels decode wrong). This also affects gallium's util/u_format_rgtc. --- src/mesa/main/texcompress_rgtc_tmp.h |6 +++--- 1 file changed, 3

[Mesa-dev] [PATCH 1/2] llvmpipe: fix blending with SRC_ALPHA_SATURATE with some formats without alpha

2013-07-17 Thread sroland
From: Roland Scheidegger srol...@vmware.com We were fixing up the blend factor to ZERO, however this only works correctly with fixed point render buffers where the input values are clamped to 0/1 (because src_alpha_saturate is min(As, 1-Ad) so can be negative with unclamped inputs). Haven't seen

[Mesa-dev] [PATCH 2/2] llvmpipe: clamp inputs for srgb render buffers

2013-07-17 Thread sroland
From: Roland Scheidegger srol...@vmware.com Usually with fixed point renderbuffers clamping is done as part of conversion. However, since we blend in float format, we essentially skip all conversion steps pre-blend but since this is still a fixed point renderbuffer we must still clamp the inputs

[Mesa-dev] [PATCH] util/u_format_s3tc: handle srgb formats correctly.

2013-07-16 Thread sroland
From: Roland Scheidegger srol...@vmware.com Instead of just ignoring the srgb/linear conversions, simply call the corresponding conversion functions, for all of pack/unpack/fetch, both for float and unorm8 versions (though some don't make a whole lot of sense, i.e. unorm8/unorm8 srgb/linear

[Mesa-dev] [PATCH] llvmpipe: support sRGB framebuffers

2013-07-13 Thread sroland
From: Roland Scheidegger srol...@vmware.com Just use the new conversion functions to do the work. The way it's plugged in into the blend code is quite hacktastic but follows all the same hacks as used by packed float format already. Only support 4x8bit srgb formats (rgba/rgbx plus swizzle), 24bit

[Mesa-dev] [PATCH 1/2] gallivm: better support for fast rsqrt

2013-07-11 Thread sroland
From: Roland Scheidegger srol...@vmware.com We had to disable fast rsqrt before because it wasn't precise enough etc. However in situations when we know we're not going to need more precision we can still use a fast rsqrt (which can be several times faster than the quite expensive sqrt). Hence

[Mesa-dev] [PATCH 2/2] gallivm: handle srgb-to-linear and linear-to-srgb conversions

2013-07-11 Thread sroland
From: Roland Scheidegger srol...@vmware.com srgb-to-linear is using 3rd degree polynomial for now which should be _just_ good enough. Reverse is using some rational polynomials and is quite accurate, though not hooked into llvmpipe's blend code yet and hence unused (untested). Using a table might

[Mesa-dev] [PATCH] gallivm: do per-pixel lod calculations for explicit lod

2013-07-03 Thread sroland
From: Roland Scheidegger srol...@vmware.com d3d10 requires per-pixel lod calculations for explicit lod, lod bias and explicit derivatives, and we should probably do it for OpenGL too - at least if they are used from vertex or geometry shaders (so doesn't apply to lod bias) this doesn't just

[Mesa-dev] [PATCH] llvmpipe: fix timer query if there's no bins

2013-06-28 Thread sroland
From: Roland Scheidegger srol...@vmware.com b04a295a4a0cd2defe352b3193b5fa79ca8fc9fc removed seemingly unnecessary code in get_query. Turns out this code could in fact be reached - while timestamps are always binned, if there are no bins (which happens if fb size is 0) then the rasterization

[Mesa-dev] [PATCH] gallivm: do per-pixel lod calculations for explicit lod

2013-06-27 Thread sroland
From: Roland Scheidegger srol...@vmware.com d3d10 requires per-pixel lod calculations for explicit lod, lod bias and explicit derivatives, and we should probably do it for OpenGL too - at least if they are used from vertex or geometry shaders (so doesn't apply to lod bias) this doesn't just

[Mesa-dev] [PATCH] llvmpipe: fix a bug in opaque optimization

2013-06-26 Thread sroland
From: Roland Scheidegger srol...@vmware.com If there are queries active the opaque optimization reseting the bin needs to be disabled. (Not really tested since the bug was discovered by code inspection not an actual test failure.) --- src/gallium/drivers/llvmpipe/lp_query.c |2 ++

<    1   2   3   4   5   6   7   >