Package: release.debian.org Severity: normal User: release.debian....@packages.debian.org Usertags: unblock
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256 Please unblock package darktable [ Reason ] This version contains a fix for #989222. This involves a crash when exporting raws of a certain format. According to Jonas this bug is triggered by output from megapixels which is in bullseye and used by (at least) the Librem 5 and pinephone (with mobian). [ Impact ] Users of some free software friendly phones will be unable to process their images with darktable from bullseye. [ Tests ] I have verified the basic functionality of darktable is still OK. Jonas tested the DNG images in question and verified that they exported OK now. [ Risks ] darktable is a leaf package. The diff is a bit large, but most of it is deletions of SSE2 specialized code. The additions are only 7 lines and easy to sanity check. [ Checklist ] [x] all changes are documented in the d/changelog [x] I reviewed all changes and I approve them [x] attach debdiff against the package in testing [ Other info ] I also attach a "reduced diff" with the deleted #ifdef __SSE__ blocks collapsed. unblock darktable/3.4.1-4 -----BEGIN PGP SIGNATURE----- iQIzBAEBCAAdFiEEkiyHYXwaY0SiY6fqA0U5G1WqFSEFAmC+o1YACgkQA0U5G1Wq FSGEug/+NjvWDdVP6jwcU0rXEUCHpgPbqYXygkVn4TIyVeqRh1e6DJCwU3mzkNo8 DnR7siTEdXp6F9e1MpCaN9G404ptk7MZasN6Aswu5Fj37knj6YzhYnrqp6fbgurL w1dcbNhnSSlPf6czeDtSIe0uIIR3TNbhG0ICX8D6xhTumolW0+EtPHTcG8E9y7Ib f+wlp/0mwwdpmeYB32ObkF8v4t7g4f9Y1SWrjPI0xZ/tgYiDgY8nOW39a4Nj0HQX HzqW0oQXMaLsjFecEv7Wuf3VTWmmBubKKANvs++Lg/EQi3pbjeVMzDa2WuZBTxUL YHe0bW012OWOtgnfuLuKdIvots8afNYpi1jtS58e4ZT1wHxEvUW2ww09jjcrnsdP CnKFT5Ybg3WZ7rqUQ8VsYXkgCe5CdauFAlKdWluTK2SAXn7brfvnpzpUpTzFbxRN zOtZfwPqsCJt8l3rPoMdLIlD5IQAxkPavyc1ow3bym/IIEiuVXCSSbohRHYyUBDT lQyM7aAVi8aawGVpbB/2MeuBsdWMPCx37etU/Jz3YMtqhC1rIi6OMVoXWFb1BAAQ sGjgRvrSes/2bkODcC/YBE9jNKinsLXbCbhQU50ObEQqHb7yeec9DsPe7NYfvhGN 22ueQyjNT1LguYVwsNzPE1WBobrSwghdFh8MFcJwNuqJR3SnEDI= =o+Yk -----END PGP SIGNATURE-----
diff -Nru darktable-3.4.1/debian/changelog darktable-3.4.1/debian/changelog --- darktable-3.4.1/debian/changelog 2021-05-20 14:07:16.000000000 -0300 +++ darktable-3.4.1/debian/changelog 2021-06-05 12:41:39.000000000 -0300 @@ -1,3 +1,11 @@ +darktable (3.4.1-4) unstable; urgency=medium + + * Bug fix: "crashes with 'Floating point exception (core dumped)' after + loading some DNG files", thanks to Jonas Smedegaard (Closes: #989222). + Cherry pick upstream commit 2ff4fc58e44. + + -- David Bremner <brem...@debian.org> Sat, 05 Jun 2021 12:41:39 -0300 + darktable (3.4.1-3) unstable; urgency=medium * Bug fix: "broken symlinks: /usr/share/darktable/js/*.js -> diff -Nru darktable-3.4.1/debian/patches/0002-Avoid-div-by-zero-in-dt_iop_clip_and_zoom_mosaic_hal.patch darktable-3.4.1/debian/patches/0002-Avoid-div-by-zero-in-dt_iop_clip_and_zoom_mosaic_hal.patch --- darktable-3.4.1/debian/patches/0002-Avoid-div-by-zero-in-dt_iop_clip_and_zoom_mosaic_hal.patch 1969-12-31 20:00:00.000000000 -0400 +++ darktable-3.4.1/debian/patches/0002-Avoid-div-by-zero-in-dt_iop_clip_and_zoom_mosaic_hal.patch 2021-06-05 12:41:39.000000000 -0300 @@ -0,0 +1,1001 @@ +From: Hanno Schwalm <ha...@schwalm-bremen.de> +Date: Fri, 14 May 2021 18:20:37 +0200 +Subject: Avoid div by zero in dt_iop_clip_and_zoom_mosaic_half_size (#8954) + +* Avoid div by zero in dt_iop_clip_and_zoom_mosaic_half_size_plain + +Fixes #8951 + +Although the file given in the issue is crippled we can avoid the crash. +In `dt_iop_clip_and_zoom_mosaic_half_size` and the sse friend there is possibly a div/0 +problem that should be checked. + +* Fixing same dib by zero in dt_iop_clip_and_zoom_mosaic_half_size_f + +* Remove sse code for dt_iop_clip_and_zoom_mosaic... after testing performance + +checked performance non-sse vs sse specific code +- with added local timers +- using gcc 10.2 +- testing -t 1/4/8/16 +- intel (xeon like 9900) with fixed clock rate + +in +- dt_iop_clip_and_zoom_mosaic_half_size +- dt_iop_clip_and_zoom_mosaic_half_size_f +- dt_iop_clip_and_zoom_demosaic_passthrough_monochrome_f +- dt_iop_clip_and_zoom_demosaic_half_size_f + +with consitant results. For all functions the sse specific code was somewhat slower (~20%) +than the vectorized compiler code. Number of omp cores didn't matter, just made the results +more measurable because of low execution times. + +So i removed all the sse specific code for less code burden and better performance. + +* Fix sse header plus div/0 + +At least for bayer images we absolutely want to be sure there is no div by zero as there might +be buggy dng files. +--- + src/develop/imageop_math.c | 890 +-------------------------------------------- + 1 file changed, 7 insertions(+), 883 deletions(-) + +diff --git a/src/develop/imageop_math.c b/src/develop/imageop_math.c +index ef55965..0066a83 100644 +--- a/src/develop/imageop_math.c ++++ b/src/develop/imageop_math.c +@@ -18,14 +18,8 @@ + + #include "develop/imageop_math.h" + #include <assert.h> // for assert +-#ifdef __SSE__ +-#include <emmintrin.h> // for _mm_set_epi32, _mm_add_epi32 +-#endif + #include <glib.h> // for MIN, MAX, CLAMP, inline + #include <math.h> // for round, floorf, fmaxf +-#ifdef __SSE__ +-#include <xmmintrin.h> // for _mm_set_ps, _mm_mul_ps, _mm_set... +-#endif + #include "common/darktable.h" // for darktable, darktable_t, dt_code... + #include "common/imageio.h" // for FILTERS_ARE_4BAYER + #include "common/interpolation.h" // for dt_interpolation_new, dt_interp... +@@ -177,7 +171,7 @@ int dt_iop_clip_and_zoom_roi_cl(int devid, cl_mem dev_out, cl_mem dev_in, const + + #endif + +-void dt_iop_clip_and_zoom_mosaic_half_size_plain(uint16_t *const out, const uint16_t *const in, ++void dt_iop_clip_and_zoom_mosaic_half_size(uint16_t *const out, const uint16_t *const in, + const dt_iop_roi_t *const roi_out, + const dt_iop_roi_t *const roi_in, const int32_t out_stride, + const int32_t in_stride, const uint32_t filters) +@@ -244,224 +238,12 @@ void dt_iop_clip_and_zoom_mosaic_half_size_plain(uint16_t *const out, const uint + num++; + } + } +- *outc = col / num; +- } +- } +-} +- +-#if defined(__SSE__) +-void dt_iop_clip_and_zoom_mosaic_half_size_sse2(uint16_t *const out, const uint16_t *const in, +- const dt_iop_roi_t *const roi_out, +- const dt_iop_roi_t *const roi_in, const int32_t out_stride, +- const int32_t in_stride, const uint32_t filters) +-{ +- // adjust to pixel region and don't sample more than scale/2 nbs! +- // pixel footprint on input buffer, radius: +- const float px_footprint = 1.f / roi_out->scale; +- // how many 2x2 blocks can be sampled inside that area +- const int samples = round(px_footprint / 2); +- +- // move p to point to an rggb block: +- int trggbx = 0, trggby = 0; +- if(FC(trggby, trggbx + 1, filters) != 1) trggbx++; +- if(FC(trggby, trggbx, filters) != 0) +- { +- trggbx = (trggbx + 1) & 1; +- trggby++; +- } +- const int rggbx = trggbx, rggby = trggby; +- +-#ifdef _OPENMP +-#pragma omp parallel for default(none) \ +- dt_omp_firstprivate(in, in_stride, out, out_stride, px_footprint, rggbx, rggby, roi_in, roi_out, samples) \ +- schedule(static) +-#endif +- for(int y = 0; y < roi_out->height; y++) +- { +- uint16_t *outc = out + out_stride * y; +- +- const float fy = (y + roi_out->y) * px_footprint; +- int py = (int)fy & ~1; +- const float dy = (fy - py) / 2; +- py = MIN(((roi_in->height - 6) & ~1u), py) + rggby; +- +- const int maxj = MIN(((roi_in->height - 5) & ~1u) + rggby, py + 2 * samples); +- +- for(int x = 0; x < roi_out->width; x++) +- { +- __m128 col = _mm_setzero_ps(); +- +- const float fx = (x + roi_out->x) * px_footprint; +- int px = (int)fx & ~1; +- const float dx = (fx - px) / 2; +- px = MIN(((roi_in->width - 6) & ~1u), px) + rggbx; +- +- const int maxi = MIN(((roi_in->width - 5) & ~1u) + rggbx, px + 2 * samples); +- +- float p1, p2, p3, p4; +- float num = 0; +- +- // upper left 2x2 block of sampling region +- p1 = in[px + in_stride * py]; +- p2 = in[px + 1 + in_stride * py]; +- p3 = in[px + in_stride * (py + 1)]; +- p4 = in[px + 1 + in_stride * (py + 1)]; +- col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps((1 - dx) * (1 - dy)), _mm_set_ps(p4, p3, p2, p1))); +- +- // left 2x2 block border of sampling region +- for(int j = py + 2; j <= maxj; j += 2) +- { +- p1 = in[px + in_stride * j]; +- p2 = in[px + 1 + in_stride * j]; +- p3 = in[px + in_stride * (j + 1)]; +- p4 = in[px + 1 + in_stride * (j + 1)]; +- col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps(1 - dx), _mm_set_ps(p4, p3, p2, p1))); +- } +- +- // upper 2x2 block border of sampling region +- for(int i = px + 2; i <= maxi; i += 2) +- { +- p1 = in[i + in_stride * py]; +- p2 = in[i + 1 + in_stride * py]; +- p3 = in[i + in_stride * (py + 1)]; +- p4 = in[i + 1 + in_stride * (py + 1)]; +- col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps(1 - dy), _mm_set_ps(p4, p3, p2, p1))); +- } +- +- // 2x2 blocks in the middle of sampling region +- for(int j = py + 2; j <= maxj; j += 2) +- for(int i = px + 2; i <= maxi; i += 2) +- { +- p1 = in[i + in_stride * j]; +- p2 = in[i + 1 + in_stride * j]; +- p3 = in[i + in_stride * (j + 1)]; +- p4 = in[i + 1 + in_stride * (j + 1)]; +- col = _mm_add_ps(col, _mm_set_ps(p4, p3, p2, p1)); +- } +- +- if(maxi == px + 2 * samples && maxj == py + 2 * samples) +- { +- // right border +- for(int j = py + 2; j <= maxj; j += 2) +- { +- p1 = in[maxi + 2 + in_stride * j]; +- p2 = in[maxi + 3 + in_stride * j]; +- p3 = in[maxi + 2 + in_stride * (j + 1)]; +- p4 = in[maxi + 3 + in_stride * (j + 1)]; +- col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps(dx), _mm_set_ps(p4, p3, p2, p1))); +- } +- +- // upper right +- p1 = in[maxi + 2 + in_stride * py]; +- p2 = in[maxi + 3 + in_stride * py]; +- p3 = in[maxi + 2 + in_stride * (py + 1)]; +- p4 = in[maxi + 3 + in_stride * (py + 1)]; +- col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps(dx * (1 - dy)), _mm_set_ps(p4, p3, p2, p1))); +- +- // lower border +- for(int i = px + 2; i <= maxi; i += 2) +- { +- p1 = in[i + in_stride * (maxj + 2)]; +- p2 = in[i + 1 + in_stride * (maxj + 2)]; +- p3 = in[i + in_stride * (maxj + 3)]; +- p4 = in[i + 1 + in_stride * (maxj + 3)]; +- col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps(dy), _mm_set_ps(p4, p3, p2, p1))); +- } +- +- // lower left 2x2 block +- p1 = in[px + in_stride * (maxj + 2)]; +- p2 = in[px + 1 + in_stride * (maxj + 2)]; +- p3 = in[px + in_stride * (maxj + 3)]; +- p4 = in[px + 1 + in_stride * (maxj + 3)]; +- col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps((1 - dx) * dy), _mm_set_ps(p4, p3, p2, p1))); +- +- // lower right 2x2 block +- p1 = in[maxi + 2 + in_stride * (maxj + 2)]; +- p2 = in[maxi + 3 + in_stride * (maxj + 2)]; +- p3 = in[maxi + 2 + in_stride * (maxj + 3)]; +- p4 = in[maxi + 3 + in_stride * (maxj + 3)]; +- col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps(dx * dy), _mm_set_ps(p4, p3, p2, p1))); +- +- num = (samples + 1) * (samples + 1); +- } +- else if(maxi == px + 2 * samples) +- { +- // right border +- for(int j = py + 2; j <= maxj; j += 2) +- { +- p1 = in[maxi + 2 + in_stride * j]; +- p2 = in[maxi + 3 + in_stride * j]; +- p3 = in[maxi + 2 + in_stride * (j + 1)]; +- p4 = in[maxi + 3 + in_stride * (j + 1)]; +- col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps(dx), _mm_set_ps(p4, p3, p2, p1))); +- } +- +- // upper right +- p1 = in[maxi + 2 + in_stride * py]; +- p2 = in[maxi + 3 + in_stride * py]; +- p3 = in[maxi + 2 + in_stride * (py + 1)]; +- p4 = in[maxi + 3 + in_stride * (py + 1)]; +- col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps(dx * (1 - dy)), _mm_set_ps(p4, p3, p2, p1))); +- +- num = ((maxj - py) / 2 + 1 - dy) * (samples + 1); +- } +- else if(maxj == py + 2 * samples) +- { +- // lower border +- for(int i = px + 2; i <= maxi; i += 2) +- { +- p1 = in[i + in_stride * (maxj + 2)]; +- p2 = in[i + 1 + in_stride * (maxj + 2)]; +- p3 = in[i + in_stride * (maxj + 3)]; +- p4 = in[i + 1 + in_stride * (maxj + 3)]; +- col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps(dy), _mm_set_ps(p4, p3, p2, p1))); +- } +- +- // lower left 2x2 block +- p1 = in[px + in_stride * (maxj + 2)]; +- p2 = in[px + 1 + in_stride * (maxj + 2)]; +- p3 = in[px + in_stride * (maxj + 3)]; +- p4 = in[px + 1 + in_stride * (maxj + 3)]; +- col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps((1 - dx) * dy), _mm_set_ps(p4, p3, p2, p1))); +- +- num = ((maxi - px) / 2 + 1 - dx) * (samples + 1); +- } +- else +- { +- num = ((maxi - px) / 2 + 1 - dx) * ((maxj - py) / 2 + 1 - dy); +- } +- +- num = 1.0f / num; +- col = _mm_mul_ps(col, _mm_set1_ps(num)); +- +- float fcol[4] __attribute__((aligned(64))); +- _mm_store_ps(fcol, col); +- +- const int c = (2 * ((y + rggby) % 2) + ((x + rggbx) % 2)); +- *outc = (uint16_t)(fcol[c]); +- outc++; ++ if(num) *outc = col / num; + } + } +- _mm_sfence(); +-} +-#endif +- +-void dt_iop_clip_and_zoom_mosaic_half_size(uint16_t *const out, const uint16_t *const in, +- const dt_iop_roi_t *const roi_out, const dt_iop_roi_t *const roi_in, +- const int32_t out_stride, const int32_t in_stride, +- const uint32_t filters) +-{ +- if(1)//(darktable.codepath.OPENMP_SIMD) +- return dt_iop_clip_and_zoom_mosaic_half_size_plain(out, in, roi_out, roi_in, out_stride, in_stride, filters); +-#if defined(__SSE__) +- else if(darktable.codepath.SSE2) +- return dt_iop_clip_and_zoom_mosaic_half_size_sse2(out, in, roi_out, roi_in, out_stride, in_stride, filters); +-#endif +- else +- dt_unreachable_codepath(); + } + +-void dt_iop_clip_and_zoom_mosaic_half_size_f_plain(float *const out, const float *const in, ++void dt_iop_clip_and_zoom_mosaic_half_size_f(float *const out, const float *const in, + const dt_iop_roi_t *const roi_out, + const dt_iop_roi_t *const roi_in, const int32_t out_stride, + const int32_t in_stride, const uint32_t filters) +@@ -643,223 +425,10 @@ void dt_iop_clip_and_zoom_mosaic_half_size_f_plain(float *const out, const float + } + + const int c = (2 * ((y + rggby) % 2) + ((x + rggbx) % 2)); +- *outc = col[c] / num; +- outc++; +- } +- } +-} +- +-#if defined(__SSE__) +-void dt_iop_clip_and_zoom_mosaic_half_size_f_sse2(float *const out, const float *const in, +- const dt_iop_roi_t *const roi_out, +- const dt_iop_roi_t *const roi_in, const int32_t out_stride, +- const int32_t in_stride, const uint32_t filters) +-{ +- // adjust to pixel region and don't sample more than scale/2 nbs! +- // pixel footprint on input buffer, radius: +- const float px_footprint = 1.f / roi_out->scale; +- // how many 2x2 blocks can be sampled inside that area +- const int samples = round(px_footprint / 2); +- +- // move p to point to an rggb block: +- int trggbx = 0, trggby = 0; +- if(FC(trggby, trggbx + 1, filters) != 1) trggbx++; +- if(FC(trggby, trggbx, filters) != 0) +- { +- trggbx = (trggbx + 1) & 1; +- trggby++; +- } +- const int rggbx = trggbx, rggby = trggby; +- +-#ifdef _OPENMP +-#pragma omp parallel for default(none) \ +- dt_omp_firstprivate(in, in_stride, out, out_stride, px_footprint, rggbx, \ +- rggby, roi_in, roi_out, samples) \ +- schedule(static) +-#endif +- for(int y = 0; y < roi_out->height; y++) +- { +- float *outc = out + out_stride * y; +- +- const float fy = (y + roi_out->y) * px_footprint; +- int py = (int)fy & ~1; +- const float dy = (fy - py) / 2; +- py = MIN(((roi_in->height - 6) & ~1u), py) + rggby; +- +- const int maxj = MIN(((roi_in->height - 5) & ~1u) + rggby, py + 2 * samples); +- +- for(int x = 0; x < roi_out->width; x++) +- { +- __m128 col = _mm_setzero_ps(); +- +- const float fx = (x + roi_out->x) * px_footprint; +- int px = (int)fx & ~1; +- const float dx = (fx - px) / 2; +- px = MIN(((roi_in->width - 6) & ~1u), px) + rggbx; +- +- const int maxi = MIN(((roi_in->width - 5) & ~1u) + rggbx, px + 2 * samples); +- +- float p1, p2, p3, p4; +- float num = 0; +- +- // upper left 2x2 block of sampling region +- p1 = in[px + in_stride * py]; +- p2 = in[px + 1 + in_stride * py]; +- p3 = in[px + in_stride * (py + 1)]; +- p4 = in[px + 1 + in_stride * (py + 1)]; +- col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps((1 - dx) * (1 - dy)), _mm_set_ps(p4, p3, p2, p1))); +- +- // left 2x2 block border of sampling region +- for(int j = py + 2; j <= maxj; j += 2) +- { +- p1 = in[px + in_stride * j]; +- p2 = in[px + 1 + in_stride * j]; +- p3 = in[px + in_stride * (j + 1)]; +- p4 = in[px + 1 + in_stride * (j + 1)]; +- col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps(1 - dx), _mm_set_ps(p4, p3, p2, p1))); +- } +- +- // upper 2x2 block border of sampling region +- for(int i = px + 2; i <= maxi; i += 2) +- { +- p1 = in[i + in_stride * py]; +- p2 = in[i + 1 + in_stride * py]; +- p3 = in[i + in_stride * (py + 1)]; +- p4 = in[i + 1 + in_stride * (py + 1)]; +- col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps(1 - dy), _mm_set_ps(p4, p3, p2, p1))); +- } +- +- // 2x2 blocks in the middle of sampling region +- for(int j = py + 2; j <= maxj; j += 2) +- for(int i = px + 2; i <= maxi; i += 2) +- { +- p1 = in[i + in_stride * j]; +- p2 = in[i + 1 + in_stride * j]; +- p3 = in[i + in_stride * (j + 1)]; +- p4 = in[i + 1 + in_stride * (j + 1)]; +- col = _mm_add_ps(col, _mm_set_ps(p4, p3, p2, p1)); +- } +- +- if(maxi == px + 2 * samples && maxj == py + 2 * samples) +- { +- // right border +- for(int j = py + 2; j <= maxj; j += 2) +- { +- p1 = in[maxi + 2 + in_stride * j]; +- p2 = in[maxi + 3 + in_stride * j]; +- p3 = in[maxi + 2 + in_stride * (j + 1)]; +- p4 = in[maxi + 3 + in_stride * (j + 1)]; +- col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps(dx), _mm_set_ps(p4, p3, p2, p1))); +- } +- +- // upper right +- p1 = in[maxi + 2 + in_stride * py]; +- p2 = in[maxi + 3 + in_stride * py]; +- p3 = in[maxi + 2 + in_stride * (py + 1)]; +- p4 = in[maxi + 3 + in_stride * (py + 1)]; +- col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps(dx * (1 - dy)), _mm_set_ps(p4, p3, p2, p1))); +- +- // lower border +- for(int i = px + 2; i <= maxi; i += 2) +- { +- p1 = in[i + in_stride * (maxj + 2)]; +- p2 = in[i + 1 + in_stride * (maxj + 2)]; +- p3 = in[i + in_stride * (maxj + 3)]; +- p4 = in[i + 1 + in_stride * (maxj + 3)]; +- col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps(dy), _mm_set_ps(p4, p3, p2, p1))); +- } +- +- // lower left 2x2 block +- p1 = in[px + in_stride * (maxj + 2)]; +- p2 = in[px + 1 + in_stride * (maxj + 2)]; +- p3 = in[px + in_stride * (maxj + 3)]; +- p4 = in[px + 1 + in_stride * (maxj + 3)]; +- col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps((1 - dx) * dy), _mm_set_ps(p4, p3, p2, p1))); +- +- // lower right 2x2 block +- p1 = in[maxi + 2 + in_stride * (maxj + 2)]; +- p2 = in[maxi + 3 + in_stride * (maxj + 2)]; +- p3 = in[maxi + 2 + in_stride * (maxj + 3)]; +- p4 = in[maxi + 3 + in_stride * (maxj + 3)]; +- col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps(dx * dy), _mm_set_ps(p4, p3, p2, p1))); +- +- num = (samples + 1) * (samples + 1); +- } +- else if(maxi == px + 2 * samples) +- { +- // right border +- for(int j = py + 2; j <= maxj; j += 2) +- { +- p1 = in[maxi + 2 + in_stride * j]; +- p2 = in[maxi + 3 + in_stride * j]; +- p3 = in[maxi + 2 + in_stride * (j + 1)]; +- p4 = in[maxi + 3 + in_stride * (j + 1)]; +- col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps(dx), _mm_set_ps(p4, p3, p2, p1))); +- } +- +- // upper right +- p1 = in[maxi + 2 + in_stride * py]; +- p2 = in[maxi + 3 + in_stride * py]; +- p3 = in[maxi + 2 + in_stride * (py + 1)]; +- p4 = in[maxi + 3 + in_stride * (py + 1)]; +- col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps(dx * (1 - dy)), _mm_set_ps(p4, p3, p2, p1))); +- +- num = ((maxj - py) / 2 + 1 - dy) * (samples + 1); +- } +- else if(maxj == py + 2 * samples) +- { +- // lower border +- for(int i = px + 2; i <= maxi; i += 2) +- { +- p1 = in[i + in_stride * (maxj + 2)]; +- p2 = in[i + 1 + in_stride * (maxj + 2)]; +- p3 = in[i + in_stride * (maxj + 3)]; +- p4 = in[i + 1 + in_stride * (maxj + 3)]; +- col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps(dy), _mm_set_ps(p4, p3, p2, p1))); +- } +- +- // lower left 2x2 block +- p1 = in[px + in_stride * (maxj + 2)]; +- p2 = in[px + 1 + in_stride * (maxj + 2)]; +- p3 = in[px + in_stride * (maxj + 3)]; +- p4 = in[px + 1 + in_stride * (maxj + 3)]; +- col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps((1 - dx) * dy), _mm_set_ps(p4, p3, p2, p1))); +- +- num = ((maxi - px) / 2 + 1 - dx) * (samples + 1); +- } +- else +- { +- num = ((maxi - px) / 2 + 1 - dx) * ((maxj - py) / 2 + 1 - dy); +- } +- +- num = 1.0f / num; +- col = _mm_mul_ps(col, _mm_set1_ps(num)); +- +- float fcol[4] __attribute__((aligned(64))); +- _mm_store_ps(fcol, col); +- +- const int c = (2 * ((y + rggby) % 2) + ((x + rggbx) % 2)); +- *outc = fcol[c]; ++ if(num) *outc = col[c] / num; + outc++; + } + } +- _mm_sfence(); +-} +-#endif +- +-void dt_iop_clip_and_zoom_mosaic_half_size_f(float *const out, const float *const in, +- const dt_iop_roi_t *const roi_out, const dt_iop_roi_t *const roi_in, +- const int32_t out_stride, const int32_t in_stride, +- const uint32_t filters) +-{ +- if(darktable.codepath.OPENMP_SIMD) +- return dt_iop_clip_and_zoom_mosaic_half_size_f_plain(out, in, roi_out, roi_in, out_stride, in_stride, filters); +-#if defined(__SSE__) +- else if(darktable.codepath.SSE2) +- return dt_iop_clip_and_zoom_mosaic_half_size_f_sse2(out, in, roi_out, roi_in, out_stride, in_stride, filters); +-#endif +- else +- dt_unreachable_codepath(); + } + + /** +@@ -951,7 +520,7 @@ void dt_iop_clip_and_zoom_mosaic_third_size_xtrans_f(float *const out, const flo + } + } + +-void dt_iop_clip_and_zoom_demosaic_passthrough_monochrome_f_plain(float *out, const float *const in, ++void dt_iop_clip_and_zoom_demosaic_passthrough_monochrome_f(float *out, const float *const in, + const dt_iop_roi_t *const roi_out, + const dt_iop_roi_t *const roi_in, + const int32_t out_stride, +@@ -1085,7 +654,7 @@ void dt_iop_clip_and_zoom_demosaic_passthrough_monochrome_f_plain(float *out, co + num = ((maxi - px) / 2 + 1 - dx) * ((maxj - py) / 2 + 1 - dy); + } + +- const float pix = col / num; ++ const float pix = (num) ? col / num : 0.0f; + outc[0] = pix; + outc[1] = pix; + outc[2] = pix; +@@ -1095,256 +664,7 @@ void dt_iop_clip_and_zoom_demosaic_passthrough_monochrome_f_plain(float *out, co + } + } + +-#if defined(__SSE__) +-void dt_iop_clip_and_zoom_demosaic_passthrough_monochrome_f_sse2(float *out, const float *const in, +- const dt_iop_roi_t *const roi_out, +- const dt_iop_roi_t *const roi_in, +- const int32_t out_stride, +- const int32_t in_stride) +-{ +- // adjust to pixel region and don't sample more than scale/2 nbs! +- // pixel footprint on input buffer, radius: +- const float px_footprint = 1.f / roi_out->scale; +- // how many pixels can be sampled inside that area +- const int samples = round(px_footprint); +- +-#ifdef _OPENMP +-#pragma omp parallel for default(none) \ +- dt_omp_firstprivate(in, in_stride, out_stride, px_footprint, roi_in, roi_out, samples) \ +- shared(out) \ +- schedule(static) +-#endif +- for(int y = 0; y < roi_out->height; y++) +- { +- float *outc = out + 4 * (out_stride * y); +- +- const float fy = (y + roi_out->y) * px_footprint; +- int py = (int)fy; +- const float dy = fy - py; +- py = MIN(((roi_in->height - 3)), py); +- +- const int maxj = MIN(((roi_in->height - 2)), py + samples); +- +- for(int x = 0; x < roi_out->width; x++) +- { +- __m128 col = _mm_setzero_ps(); +- +- const float fx = (x + roi_out->x) * px_footprint; +- int px = (int)fx; +- const float dx = fx - px; +- px = MIN(((roi_in->width - 3)), px); +- +- const int maxi = MIN(((roi_in->width - 2)), px + samples); +- +- float p; +- float num = 0; +- +- // upper left pixel of sampling region +- p = in[px + in_stride * py]; +- col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps((1 - dx) * (1 - dy)), _mm_set_ps(0.0f, p, p, p))); +- +- // left pixel border of sampling region +- for(int j = py + 1; j <= maxj; j++) +- { +- p = in[px + in_stride * j]; +- col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps(1 - dx), _mm_set_ps(0.0f, p, p, p))); +- } +- +- // upper pixel border of sampling region +- for(int i = px + 1; i <= maxi; i++) +- { +- p = in[i + in_stride * py]; +- col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps(1 - dy), _mm_set_ps(0.0f, p, p, p))); +- } +- +- // pixels in the middle of sampling region +- for(int j = py + 1; j <= maxj; j++) +- for(int i = px + 1; i <= maxi; i++) +- { +- p = in[i + in_stride * j]; +- col = _mm_add_ps(col, _mm_set_ps(0.0f, p, p, p)); +- } +- +- if(maxi == px + samples && maxj == py + samples) +- { +- // right border +- for(int j = py + 1; j <= maxj; j++) +- { +- p = in[maxi + 1 + in_stride * j]; +- col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps(dx), _mm_set_ps(0.0f, p, p, p))); +- } +- +- // upper right +- p = in[maxi + 1 + in_stride * py]; +- col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps(dx * (1 - dy)), _mm_set_ps(0.0f, p, p, p))); +- +- // lower border +- for(int i = px + 1; i <= maxi; i++) +- { +- p = in[i + in_stride * (maxj + 1)]; +- col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps(dy), _mm_set_ps(0.0f, p, p, p))); +- } +- +- // lower left pixel +- p = in[px + in_stride * (maxj + 1)]; +- col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps((1 - dx) * dy), _mm_set_ps(0.0f, p, p, p))); +- +- // lower right pixel +- p = in[maxi + 1 + in_stride * (maxj + 1)]; +- col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps(dx * dy), _mm_set_ps(0.0f, p, p, p))); +- +- num = (samples + 1) * (samples + 1); +- } +- else if(maxi == px + samples) +- { +- // right border +- for(int j = py + 1; j <= maxj; j++) +- { +- p = in[maxi + 1 + in_stride * j]; +- col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps(dx), _mm_set_ps(0.0f, p, p, p))); +- } +- +- // upper right +- p = in[maxi + 1 + in_stride * py]; +- col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps(dx * (1 - dy)), _mm_set_ps(0.0f, p, p, p))); +- +- num = ((maxj - py) / 2 + 1 - dy) * (samples + 1); +- } +- else if(maxj == py + samples) +- { +- // lower border +- for(int i = px + 1; i <= maxi; i++) +- { +- p = in[i + in_stride * (maxj + 1)]; +- col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps(dy), _mm_set_ps(0.0f, p, p, p))); +- } +- +- // lower left pixel +- p = in[px + in_stride * (maxj + 1)]; +- col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps((1 - dx) * dy), _mm_set_ps(0.0f, p, p, p))); +- +- num = ((maxi - px) / 2 + 1 - dx) * (samples + 1); +- } +- else +- { +- num = ((maxi - px) / 2 + 1 - dx) * ((maxj - py) / 2 + 1 - dy); +- } +- +- num = 1.0f / num; +- col = _mm_mul_ps(col, _mm_set_ps(0.0f, num, num, num)); +- _mm_stream_ps(outc, col); +- outc += 4; +- } +- } +- _mm_sfence(); +-} +-#endif +- +-void dt_iop_clip_and_zoom_demosaic_passthrough_monochrome_f(float *out, const float *const in, +- const dt_iop_roi_t *const roi_out, +- const dt_iop_roi_t *const roi_in, +- const int32_t out_stride, const int32_t in_stride) +-{ +- if(darktable.codepath.OPENMP_SIMD) +- return dt_iop_clip_and_zoom_demosaic_passthrough_monochrome_f_plain(out, in, roi_out, roi_in, out_stride, +- in_stride); +-#if defined(__SSE__) +- else if(darktable.codepath.SSE2) +- return dt_iop_clip_and_zoom_demosaic_passthrough_monochrome_f_sse2(out, in, roi_out, roi_in, out_stride, +- in_stride); +-#endif +- else +- dt_unreachable_codepath(); +-} +- +-#if 0 // gets rid of pink artifacts, but doesn't do sub-pixel sampling, so shows some staircasing artifacts. +-void +-dt_iop_clip_and_zoom_demosaic_half_size_f( +- float *out, +- const float *const in, +- const dt_iop_roi_t *const roi_out, +- const dt_iop_roi_t *const roi_in, +- const int32_t out_stride, +- const int32_t in_stride, +- const uint32_t filters, +- const float clip) +-{ +- // adjust to pixel region and don't sample more than scale/2 nbs! +- // pixel footprint on input buffer, radius: +- const float px_footprint = 1.f/roi_out->scale; +- // how many 2x2 blocks can be sampled inside that area +- const int samples = round(px_footprint/2); +- +- // move p to point to an rggb block: +- int trggbx = 0, trggby = 0; +- if(FC(trggby, trggbx+1, filters) != 1) trggbx ++; +- if(FC(trggby, trggbx, filters) != 0) +- { +- trggbx = (trggbx + 1)&1; +- trggby ++; +- } +- const int rggbx = trggbx, rggby = trggby; +- +-#ifdef _OPENMP +-#pragma omp parallel for default(none) shared(out) schedule(static) +-#endif +- for(int y=0; y<roi_out->height; y++) +- { +- float *outc = out + 4*(out_stride*y); +- +- const float fy = (y + roi_out->y)*px_footprint; +- int py = (int)fy & ~1; +- py = MIN(((roi_in->height-4) & ~1u), py) + rggby; +- +- int maxj = MIN(((roi_in->height-3)&~1u)+rggby, py+2*samples); +- +- const float fx = roi_out->x*px_footprint; +- +- for(int x=0; x<roi_out->width; x++) +- { +- __m128 col = _mm_setzero_ps(); +- +- fx += px_footprint; +- int px = (int)fx & ~1; +- px = MIN(((roi_in->width -4) & ~1u), px) + rggbx; +- +- const int maxi = MIN(((roi_in->width -3)&~1u)+rggbx, px+2*samples); +- +- int num = 0; +- +- const int idx = px + in_stride*py; +- const float pc = MAX(MAX(in[idx], in[idx+1]), MAX(in[idx + in_stride], in[idx+1 + in_stride])); +- +- // 2x2 blocks in the middle of sampling region +- __m128 sum = _mm_setzero_ps(); +- +- for(int j=py; j<=maxj; j+=2) +- for(int i=px; i<=maxi; i+=2) +- { +- const float p1 = in[i + in_stride*j]; +- const float p2 = in[i+1 + in_stride*j]; +- const float p3 = in[i + in_stride*(j + 1)]; +- const float p4 = in[i+1 + in_stride*(j + 1)]; +- +- if (!((pc >= clip) ^ (MAX(MAX(p1,p2),MAX(p3,p4)) >= clip))) +- { +- sum = _mm_add_ps(sum, _mm_set_ps(0,p4,p3+p2,p1)); +- num++; +- } +- } +- +- col = _mm_mul_ps(sum, _mm_div_ps(_mm_set_ps(0.0f,1.0f,0.5f,1.0f),_mm_set1_ps(num))); +- _mm_stream_ps(outc, col); +- outc += 4; +- } +- } +- _mm_sfence(); +-} +- +-#else +-// very fast and smooth, but doesn't handle highlights: +- +-void dt_iop_clip_and_zoom_demosaic_half_size_f_plain(float *out, const float *const in, ++void dt_iop_clip_and_zoom_demosaic_half_size_f(float *out, const float *const in, + const dt_iop_roi_t *const roi_out, + const dt_iop_roi_t *const roi_in, const int32_t out_stride, + const int32_t in_stride, const uint32_t filters) +@@ -1522,202 +842,6 @@ void dt_iop_clip_and_zoom_demosaic_half_size_f_plain(float *out, const float *co + } + } + +-#if defined(__SSE__) +-void dt_iop_clip_and_zoom_demosaic_half_size_f_sse2(float *out, const float *const in, +- const dt_iop_roi_t *const roi_out, +- const dt_iop_roi_t *const roi_in, const int32_t out_stride, +- const int32_t in_stride, const uint32_t filters) +-{ +- // adjust to pixel region and don't sample more than scale/2 nbs! +- // pixel footprint on input buffer, radius: +- const float px_footprint = 1.f / roi_out->scale; +- // how many 2x2 blocks can be sampled inside that area +- const int samples = round(px_footprint / 2); +- +- // move p to point to an rggb block: +- int trggbx = 0, trggby = 0; +- if(FC(trggby, trggbx + 1, filters) != 1) trggbx++; +- if(FC(trggby, trggbx, filters) != 0) +- { +- trggbx = (trggbx + 1) & 1; +- trggby++; +- } +- const int rggbx = trggbx, rggby = trggby; +- +-#ifdef _OPENMP +-#pragma omp parallel for default(none) \ +- dt_omp_firstprivate(in, in_stride, px_footprint, rggbx, rggby, out_stride, roi_in, roi_out, samples) \ +- shared(out) \ +- schedule(static) +-#endif +- for(int y = 0; y < roi_out->height; y++) +- { +- float *outc = out + 4 * (out_stride * y); +- +- const float fy = (y + roi_out->y) * px_footprint; +- int py = (int)fy & ~1; +- const float dy = (fy - py) / 2; +- py = MIN(((roi_in->height - 6) & ~1u), py) + rggby; +- +- const int maxj = MIN(((roi_in->height - 5) & ~1u) + rggby, py + 2 * samples); +- +- for(int x = 0; x < roi_out->width; x++) +- { +- __m128 col = _mm_setzero_ps(); +- +- const float fx = (x + roi_out->x) * px_footprint; +- int px = (int)fx & ~1; +- const float dx = (fx - px) / 2; +- px = MIN(((roi_in->width - 6) & ~1u), px) + rggbx; +- +- const int maxi = MIN(((roi_in->width - 5) & ~1u) + rggbx, px + 2 * samples); +- +- float p1, p2, p4; +- float num = 0; +- +- // upper left 2x2 block of sampling region +- p1 = in[px + in_stride * py]; +- p2 = in[px + 1 + in_stride * py] + in[px + in_stride * (py + 1)]; +- p4 = in[px + 1 + in_stride * (py + 1)]; +- col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps((1 - dx) * (1 - dy)), _mm_set_ps(0.0f, p4, p2, p1))); +- +- // left 2x2 block border of sampling region +- for(int j = py + 2; j <= maxj; j += 2) +- { +- p1 = in[px + in_stride * j]; +- p2 = in[px + 1 + in_stride * j] + in[px + in_stride * (j + 1)]; +- p4 = in[px + 1 + in_stride * (j + 1)]; +- col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps(1 - dx), _mm_set_ps(0.0f, p4, p2, p1))); +- } +- +- // upper 2x2 block border of sampling region +- for(int i = px + 2; i <= maxi; i += 2) +- { +- p1 = in[i + in_stride * py]; +- p2 = in[i + 1 + in_stride * py] + in[i + in_stride * (py + 1)]; +- p4 = in[i + 1 + in_stride * (py + 1)]; +- col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps(1 - dy), _mm_set_ps(0.0f, p4, p2, p1))); +- } +- +- // 2x2 blocks in the middle of sampling region +- for(int j = py + 2; j <= maxj; j += 2) +- for(int i = px + 2; i <= maxi; i += 2) +- { +- p1 = in[i + in_stride * j]; +- p2 = in[i + 1 + in_stride * j] + in[i + in_stride * (j + 1)]; +- p4 = in[i + 1 + in_stride * (j + 1)]; +- col = _mm_add_ps(col, _mm_set_ps(0.0f, p4, p2, p1)); +- } +- +- if(maxi == px + 2 * samples && maxj == py + 2 * samples) +- { +- // right border +- for(int j = py + 2; j <= maxj; j += 2) +- { +- p1 = in[maxi + 2 + in_stride * j]; +- p2 = in[maxi + 3 + in_stride * j] + in[maxi + 2 + in_stride * (j + 1)]; +- p4 = in[maxi + 3 + in_stride * (j + 1)]; +- col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps(dx), _mm_set_ps(0.0f, p4, p2, p1))); +- } +- +- // upper right +- p1 = in[maxi + 2 + in_stride * py]; +- p2 = in[maxi + 3 + in_stride * py] + in[maxi + 2 + in_stride * (py + 1)]; +- p4 = in[maxi + 3 + in_stride * (py + 1)]; +- col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps(dx * (1 - dy)), _mm_set_ps(0.0f, p4, p2, p1))); +- +- // lower border +- for(int i = px + 2; i <= maxi; i += 2) +- { +- p1 = in[i + in_stride * (maxj + 2)]; +- p2 = in[i + 1 + in_stride * (maxj + 2)] + in[i + in_stride * (maxj + 3)]; +- p4 = in[i + 1 + in_stride * (maxj + 3)]; +- col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps(dy), _mm_set_ps(0.0f, p4, p2, p1))); +- } +- +- // lower left 2x2 block +- p1 = in[px + in_stride * (maxj + 2)]; +- p2 = in[px + 1 + in_stride * (maxj + 2)] + in[px + in_stride * (maxj + 3)]; +- p4 = in[px + 1 + in_stride * (maxj + 3)]; +- col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps((1 - dx) * dy), _mm_set_ps(0.0f, p4, p2, p1))); +- +- // lower right 2x2 block +- p1 = in[maxi + 2 + in_stride * (maxj + 2)]; +- p2 = in[maxi + 3 + in_stride * (maxj + 2)] + in[maxi + 2 + in_stride * (maxj + 3)]; +- p4 = in[maxi + 3 + in_stride * (maxj + 3)]; +- col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps(dx * dy), _mm_set_ps(0.0f, p4, p2, p1))); +- +- num = (samples + 1) * (samples + 1); +- } +- else if(maxi == px + 2 * samples) +- { +- // right border +- for(int j = py + 2; j <= maxj; j += 2) +- { +- p1 = in[maxi + 2 + in_stride * j]; +- p2 = in[maxi + 3 + in_stride * j] + in[maxi + 2 + in_stride * (j + 1)]; +- p4 = in[maxi + 3 + in_stride * (j + 1)]; +- col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps(dx), _mm_set_ps(0.0f, p4, p2, p1))); +- } +- +- // upper right +- p1 = in[maxi + 2 + in_stride * py]; +- p2 = in[maxi + 3 + in_stride * py] + in[maxi + 2 + in_stride * (py + 1)]; +- p4 = in[maxi + 3 + in_stride * (py + 1)]; +- col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps(dx * (1 - dy)), _mm_set_ps(0.0f, p4, p2, p1))); +- +- num = ((maxj - py) / 2 + 1 - dy) * (samples + 1); +- } +- else if(maxj == py + 2 * samples) +- { +- // lower border +- for(int i = px + 2; i <= maxi; i += 2) +- { +- p1 = in[i + in_stride * (maxj + 2)]; +- p2 = in[i + 1 + in_stride * (maxj + 2)] + in[i + in_stride * (maxj + 3)]; +- p4 = in[i + 1 + in_stride * (maxj + 3)]; +- col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps(dy), _mm_set_ps(0.0f, p4, p2, p1))); +- } +- +- // lower left 2x2 block +- p1 = in[px + in_stride * (maxj + 2)]; +- p2 = in[px + 1 + in_stride * (maxj + 2)] + in[px + in_stride * (maxj + 3)]; +- p4 = in[px + 1 + in_stride * (maxj + 3)]; +- col = _mm_add_ps(col, _mm_mul_ps(_mm_set1_ps((1 - dx) * dy), _mm_set_ps(0.0f, p4, p2, p1))); +- +- num = ((maxi - px) / 2 + 1 - dx) * (samples + 1); +- } +- else +- { +- num = ((maxi - px) / 2 + 1 - dx) * ((maxj - py) / 2 + 1 - dy); +- } +- +- num = 1.0f / num; +- col = _mm_mul_ps(col, _mm_set_ps(0.0f, num, 0.5f * num, num)); +- _mm_stream_ps(outc, col); +- outc += 4; +- } +- } +- _mm_sfence(); +-} +-#endif +-#endif +- +-void dt_iop_clip_and_zoom_demosaic_half_size_f(float *out, const float *const in, +- const dt_iop_roi_t *const roi_out, const dt_iop_roi_t *const roi_in, +- const int32_t out_stride, const int32_t in_stride, +- const uint32_t filters) +-{ +- if(darktable.codepath.OPENMP_SIMD) +- return dt_iop_clip_and_zoom_demosaic_half_size_f_plain(out, in, roi_out, roi_in, out_stride, in_stride, +- filters); +-#if defined(__SSE__) +- else if(darktable.codepath.SSE2) +- return dt_iop_clip_and_zoom_demosaic_half_size_f_sse2(out, in, roi_out, roi_in, out_stride, in_stride, filters); +-#endif +- else +- dt_unreachable_codepath(); +-} + + void dt_iop_clip_and_zoom_demosaic_third_size_xtrans_f(float *out, const float *const in, + const dt_iop_roi_t *const roi_out, diff -Nru darktable-3.4.1/debian/patches/series darktable-3.4.1/debian/patches/series --- darktable-3.4.1/debian/patches/series 2021-05-20 14:07:16.000000000 -0300 +++ darktable-3.4.1/debian/patches/series 2021-06-05 12:41:39.000000000 -0300 @@ -1 +1,2 @@ 0001-add-explicit-dependency-on-generate_conf.patch +0002-Avoid-div-by-zero-in-dt_iop_clip_and_zoom_mosaic_hal.patch
commit f007e678d47f5662326824725cae2ab9e2455e66 Author: Hanno Schwalm <ha...@schwalm-bremen.de> Date: Fri May 14 18:20:37 2021 +0200 Avoid div by zero in dt_iop_clip_and_zoom_mosaic_half_size (#8954) * Avoid div by zero in dt_iop_clip_and_zoom_mosaic_half_size_plain Fixes #8951 Although the file given in the issue is crippled we can avoid the crash. In `dt_iop_clip_and_zoom_mosaic_half_size` and the sse friend there is possibly a div/0 problem that should be checked. * Fixing same dib by zero in dt_iop_clip_and_zoom_mosaic_half_size_f * Remove sse code for dt_iop_clip_and_zoom_mosaic... after testing performance checked performance non-sse vs sse specific code - with added local timers - using gcc 10.2 - testing -t 1/4/8/16 - intel (xeon like 9900) with fixed clock rate in - dt_iop_clip_and_zoom_mosaic_half_size - dt_iop_clip_and_zoom_mosaic_half_size_f - dt_iop_clip_and_zoom_demosaic_passthrough_monochrome_f - dt_iop_clip_and_zoom_demosaic_half_size_f with consitant results. For all functions the sse specific code was somewhat slower (~20%) than the vectorized compiler code. Number of omp cores didn't matter, just made the results more measurable because of low execution times. So i removed all the sse specific code for less code burden and better performance. * Fix sse header plus div/0 At least for bayer images we absolutely want to be sure there is no div by zero as there might be buggy dng files. diff --git a/src/develop/imageop_math.c b/src/develop/imageop_math.c index ef559652d..0066a83c9 100644 --- a/src/develop/imageop_math.c +++ b/src/develop/imageop_math.c @@ -18,14 +18,8 @@ #include "develop/imageop_math.h" #include <assert.h> // for assert -#ifdef __SSE__... -#endif #include <glib.h> // for MIN, MAX, CLAMP, inline #include <math.h> // for round, floorf, fmaxf -#ifdef __SSE__... -#endif #include "common/darktable.h" // for darktable, darktable_t, dt_code... #include "common/imageio.h" // for FILTERS_ARE_4BAYER #include "common/interpolation.h" // for dt_interpolation_new, dt_interp... @@ -177,7 +171,7 @@ int dt_iop_clip_and_zoom_roi_cl(int devid, cl_mem dev_out, cl_mem dev_in, const #endif -void dt_iop_clip_and_zoom_mosaic_half_size_plain(uint16_t *const out, const uint16_t *const in, +void dt_iop_clip_and_zoom_mosaic_half_size(uint16_t *const out, const uint16_t *const in, const dt_iop_roi_t *const roi_out, const dt_iop_roi_t *const roi_in, const int32_t out_stride, const int32_t in_stride, const uint32_t filters) @@ -244,224 +238,12 @@ void dt_iop_clip_and_zoom_mosaic_half_size_plain(uint16_t *const out, const uint num++; } } - *outc = col / num; - } - } -} - -#if defined(__SSE__)... -#endif - -void dt_iop_clip_and_zoom_mosaic_half_size(uint16_t *const out, const uint16_t *const in, - const dt_iop_roi_t *const roi_out, const dt_iop_roi_t *const roi_in, - const int32_t out_stride, const int32_t in_stride, - const uint32_t filters) -{ - if(1)//(darktable.codepath.OPENMP_SIMD) - return dt_iop_clip_and_zoom_mosaic_half_size_plain(out, in, roi_out, roi_in, out_stride, in_stride, filters); -#if defined(__SSE__) - else if(darktable.codepath.SSE2) - return dt_iop_clip_and_zoom_mosaic_half_size_sse2(out, in, roi_out, roi_in, out_stride, in_stride, filters); -#endif - else - dt_unreachable_codepath(); } -void dt_iop_clip_and_zoom_mosaic_half_size_f_plain(float *const out, const float *const in, +void dt_iop_clip_and_zoom_mosaic_half_size_f(float *const out, const float *const in, const dt_iop_roi_t *const roi_out, const dt_iop_roi_t *const roi_in, const int32_t out_stride, const int32_t in_stride, const uint32_t filters) @@ -643,223 +425,10 @@ void dt_iop_clip_and_zoom_mosaic_half_size_f_plain(float *const out, const float } const int c = (2 * ((y + rggby) % 2) + ((x + rggbx) % 2)); - *outc = col[c] / num; - outc++; - } - } -} - -#if defined(__SSE__)... -#endif - -void dt_iop_clip_and_zoom_mosaic_half_size_f(float *const out, const float *const in, - const dt_iop_roi_t *const roi_out, const dt_iop_roi_t *const roi_in, - const int32_t out_stride, const int32_t in_stride, - const uint32_t filters) -{ - if(darktable.codepath.OPENMP_SIMD) - return dt_iop_clip_and_zoom_mosaic_half_size_f_plain(out, in, roi_out, roi_in, out_stride, in_stride, filters); -#if defined(__SSE__) - else if(darktable.codepath.SSE2) - return dt_iop_clip_and_zoom_mosaic_half_size_f_sse2(out, in, roi_out, roi_in, out_stride, in_stride, filters); -#endif - else - dt_unreachable_codepath(); } /** @@ -951,7 +520,7 @@ void dt_iop_clip_and_zoom_mosaic_third_size_xtrans_f(float *const out, const flo } } -void dt_iop_clip_and_zoom_demosaic_passthrough_monochrome_f_plain(float *out, const float *const in, +void dt_iop_clip_and_zoom_demosaic_passthrough_monochrome_f(float *out, const float *const in, const dt_iop_roi_t *const roi_out, const dt_iop_roi_t *const roi_in, const int32_t out_stride, @@ -1085,7 +654,7 @@ void dt_iop_clip_and_zoom_demosaic_passthrough_monochrome_f_plain(float *out, co num = ((maxi - px) / 2 + 1 - dx) * ((maxj - py) / 2 + 1 - dy); } - const float pix = col / num; + const float pix = (num) ? col / num : 0.0f; outc[0] = pix; outc[1] = pix; outc[2] = pix; @@ -1095,256 +664,7 @@ void dt_iop_clip_and_zoom_demosaic_passthrough_monochrome_f_plain(float *out, co } } -#if defined(__SSE__)... -#endif - -void dt_iop_clip_and_zoom_demosaic_passthrough_monochrome_f(float *out, const float *const in, - const dt_iop_roi_t *const roi_out, - const dt_iop_roi_t *const roi_in, - const int32_t out_stride, const int32_t in_stride) -{ - if(darktable.codepath.OPENMP_SIMD) - return dt_iop_clip_and_zoom_demosaic_passthrough_monochrome_f_plain(out, in, roi_out, roi_in, out_stride, - in_stride); -#if defined(__SSE__)... -#endif - else - dt_unreachable_codepath(); -} - -#if 0 // gets rid of pink artifacts, but doesn't do sub-pixel sampling, so shows some staircasing artifacts.... -#else -// very fast and smooth, but doesn't handle highlights: - -void dt_iop_clip_and_zoom_demosaic_half_size_f_plain(float *out, const float *const in, +void dt_iop_clip_and_zoom_demosaic_half_size_f(float *out, const float *const in, const dt_iop_roi_t *const roi_out, const dt_iop_roi_t *const roi_in, const int32_t out_stride, const int32_t in_stride, const uint32_t filters) @@ -1522,202 +842,6 @@ void dt_iop_clip_and_zoom_demosaic_half_size_f_plain(float *out, const float *co } } -#if defined(__SSE__)... -#endif -#endif - -void dt_iop_clip_and_zoom_demosaic_half_size_f(float *out, const float *const in, - const dt_iop_roi_t *const roi_out, const dt_iop_roi_t *const roi_in, - const int32_t out_stride, const int32_t in_stride, - const uint32_t filters) -{ - if(darktable.codepath.OPENMP_SIMD) - return dt_iop_clip_and_zoom_demosaic_half_size_f_plain(out, in, roi_out, roi_in, out_stride, in_stride, - filters); -#if defined(__SSE__)... -#endif - else - dt_unreachable_codepath(); -} void dt_iop_clip_and_zoom_demosaic_third_size_xtrans_f(float *out, const float *const in, const dt_iop_roi_t *const roi_out,
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN"> <!-- Created by htmlize-1.55 in css mode. --> <html> <head> <title>darktable.diff</title> <style type="text/css"> <!-- body { color: #93a1a1; background-color: #002b36; } .diff-added { /* diff-added */ color: #98fb98; } .diff-context { } .diff-file-header { /* diff-file-header */ background-color: #8b7500; font-weight: bold; } .diff-function { /* diff-function */ background-color: #333333; } .diff-header { /* diff-header */ background-color: #333333; } .diff-hunk-header { /* diff-hunk-header */ background-color: #333333; } .diff-indicator-added { /* diff-indicator-added */ color: #22aa22; } .diff-indicator-removed { /* diff-indicator-removed */ color: #aa2222; } .diff-refine-added { /* diff-refine-added */ background-color: #22aa22; } .diff-refine-removed { /* diff-refine-removed */ background-color: #aa2222; } .diff-removed { /* diff-removed */ color: #cd5555; } a { color: inherit; background-color: inherit; font: inherit; text-decoration: inherit; } a:hover { text-decoration: underline; } --> </style> </head> <body> <pre> <span class="diff-context">commit f007e678d47f5662326824725cae2ab9e2455e66 Author: Hanno Schwalm <a href="mailto:hanno%40schwalm-bremen.de"><ha...@schwalm-bremen.de></a> Date: Fri May 14 18:20:37 2021 +0200 Avoid div by zero in dt_iop_clip_and_zoom_mosaic_half_size (#8954) * Avoid div by zero in dt_iop_clip_and_zoom_mosaic_half_size_plain Fixes #8951 Although the file given in the issue is crippled we can avoid the crash. In `dt_iop_clip_and_zoom_mosaic_half_size` and the sse friend there is possibly a div/0 problem that should be checked. * Fixing same dib by zero in dt_iop_clip_and_zoom_mosaic_half_size_f * Remove sse code for dt_iop_clip_and_zoom_mosaic... after testing performance checked performance non-sse vs sse specific code - with added local timers - using gcc 10.2 - testing -t 1/4/8/16 - intel (xeon like 9900) with fixed clock rate in - dt_iop_clip_and_zoom_mosaic_half_size - dt_iop_clip_and_zoom_mosaic_half_size_f - dt_iop_clip_and_zoom_demosaic_passthrough_monochrome_f - dt_iop_clip_and_zoom_demosaic_half_size_f with consitant results. For all functions the sse specific code was somewhat slower (~20%) than the vectorized compiler code. Number of omp cores didn't matter, just made the results more measurable because of low execution times. So i removed all the sse specific code for less code burden and better performance. * Fix sse header plus div/0 At least for bayer images we absolutely want to be sure there is no div by zero as there might be buggy dng files. </span> <span class="diff-header">diff --git a/src/develop/imageop_math.c b/src/develop/imageop_math.c index ef559652d..0066a83c9 100644 --- </span><span class="diff-header"><span class="diff-file-header">a/src/develop/imageop_math.c</span></span><span class="diff-header"> +++ </span><span class="diff-header"><span class="diff-file-header">b/src/develop/imageop_math.c</span></span><span class="diff-header"> </span><span class="diff-hunk-header">@@ -18,14 +18,8 @@</span> <span class="diff-context"> #include "develop/imageop_math.h" #include <assert.h> // for assert </span><span class="diff-indicator-removed">-</span><span class="diff-removed">#ifdef __SSE__... </span><span class="diff-indicator-removed">-</span><span class="diff-removed">#endif </span><span class="diff-context"> #include <glib.h> // for MIN, MAX, CLAMP, inline #include <math.h> // for round, floorf, fmaxf </span><span class="diff-indicator-removed">-</span><span class="diff-removed">#ifdef __SSE__... </span><span class="diff-indicator-removed">-</span><span class="diff-removed">#endif </span><span class="diff-context"> #include "common/darktable.h" // for darktable, darktable_t, dt_code... #include "common/imageio.h" // for FILTERS_ARE_4BAYER #include "common/interpolation.h" // for dt_interpolation_new, dt_interp... </span><span class="diff-hunk-header">@@ -177,7 +171,7 @@</span><span class="diff-function"> int dt_iop_clip_and_zoom_roi_cl(int devid, cl_mem dev_out, cl_mem dev_in, const</span> <span class="diff-context"> #endif </span><span class="diff-indicator-removed">-</span><span class="diff-removed">void dt_iop_clip_and_zoom_mosaic_half_size</span><span class="diff-removed"><span class="diff-refine-removed">_plain</span></span><span class="diff-removed">(uint16_t *const out, const uint16_t *const in, </span><span class="diff-indicator-added">+</span><span class="diff-added">void dt_iop_clip_and_zoom_mosaic_half_size(uint16_t *const out, const uint16_t *const in, </span><span class="diff-context"> const dt_iop_roi_t *const roi_out, const dt_iop_roi_t *const roi_in, const int32_t out_stride, const int32_t in_stride, const uint32_t filters) </span><span class="diff-hunk-header">@@ -244,224 +238,12 @@</span><span class="diff-function"> void dt_iop_clip_and_zoom_mosaic_half_size_plain(uint16_t *const out, const uint</span> <span class="diff-context"> num++; } } </span><span class="diff-indicator-removed">-</span><span class="diff-removed"> </span><span class="diff-removed"><span class="diff-refine-removed">*outc = col / num; </span></span><span class="diff-indicator-removed"><span class="diff-refine-removed">-</span></span><span class="diff-removed"><span class="diff-refine-removed"> } </span></span><span class="diff-indicator-removed"><span class="diff-refine-removed">-</span></span><span class="diff-removed"><span class="diff-refine-removed"> } </span></span><span class="diff-indicator-removed"><span class="diff-refine-removed">-</span></span><span class="diff-removed"><span class="diff-refine-removed">} </span></span><span class="diff-indicator-removed"><span class="diff-refine-removed">-</span></span><span class="diff-removed"><span class="diff-refine-removed"> </span></span><span class="diff-indicator-removed"><span class="diff-refine-removed">-</span></span><span class="diff-removed"><span class="diff-refine-removed">#if defined(__SSE__)...</span></span><span class="diff-removed"> </span><span class="diff-indicator-removed">-</span><span class="diff-removed">#endif </span><span class="diff-indicator-removed">-</span><span class="diff-removed"> </span><span class="diff-indicator-removed">-</span><span class="diff-removed">void dt_iop_clip_and_zoom_mosaic_half_size(uint16_t *const out, const uint16_t *const in, </span><span class="diff-indicator-removed">-</span><span class="diff-removed"> const dt_iop_roi_t *const roi_out, const dt_iop_roi_t *const roi_in, </span><span class="diff-indicator-removed">-</span><span class="diff-removed"> const int32_t out_stride, const int32_t in_stride, </span><span class="diff-indicator-removed">-</span><span class="diff-removed"> const uint32_t filters) </span><span class="diff-indicator-removed">-</span><span class="diff-removed">{ </span><span class="diff-indicator-removed">-</span><span class="diff-removed"> if(1)//(darktable.codepath.OPENMP_SIMD) </span><span class="diff-indicator-removed">-</span><span class="diff-removed"> return dt_iop_clip_and_zoom_mosaic_half_size_plain(out, in, roi_out, roi_in, out_stride, in_stride, filters); </span><span class="diff-indicator-removed">-</span><span class="diff-removed">#if defined(__SSE__) </span><span class="diff-indicator-removed">-</span><span class="diff-removed"> else if(darktable.codepath.SSE2) </span><span class="diff-indicator-removed">-</span><span class="diff-removed"> return dt_iop_clip_and_zoom_mosaic_half_size_sse2(out, in, roi_out, roi_in, out_stride, in_stride, filters); </span><span class="diff-indicator-removed">-</span><span class="diff-removed">#endif </span><span class="diff-indicator-removed">-</span><span class="diff-removed"> else </span><span class="diff-indicator-removed">-</span><span class="diff-removed"> dt_unreachable_codepath(); </span><span class="diff-context"> } </span><span class="diff-indicator-removed">-</span><span class="diff-removed">void dt_iop_clip_and_zoom_mosaic_half_size_f</span><span class="diff-removed"><span class="diff-refine-removed">_plain</span></span><span class="diff-removed">(float *const out, const float *const in, </span><span class="diff-indicator-added">+</span><span class="diff-added">void dt_iop_clip_and_zoom_mosaic_half_size_f(float *const out, const float *const in, </span><span class="diff-context"> const dt_iop_roi_t *const roi_out, const dt_iop_roi_t *const roi_in, const int32_t out_stride, const int32_t in_stride, const uint32_t filters) </span><span class="diff-hunk-header">@@ -643,223 +425,10 @@</span><span class="diff-function"> void dt_iop_clip_and_zoom_mosaic_half_size_f_plain(float *const out, const float</span> <span class="diff-context"> } const int c = (2 * ((y + rggby) % 2) + ((x + rggbx) % 2)); </span><span class="diff-indicator-removed">-</span><span class="diff-removed"> </span><span class="diff-removed"><span class="diff-refine-removed">*outc = col[c] / num; </span></span><span class="diff-indicator-removed"><span class="diff-refine-removed">-</span></span><span class="diff-removed"><span class="diff-refine-removed"> outc++; </span></span><span class="diff-indicator-removed"><span class="diff-refine-removed">-</span></span><span class="diff-removed"><span class="diff-refine-removed"> } </span></span><span class="diff-indicator-removed"><span class="diff-refine-removed">-</span></span><span class="diff-removed"><span class="diff-refine-removed"> } </span></span><span class="diff-indicator-removed"><span class="diff-refine-removed">-</span></span><span class="diff-removed"><span class="diff-refine-removed">} </span></span><span class="diff-indicator-removed"><span class="diff-refine-removed">-</span></span><span class="diff-removed"><span class="diff-refine-removed"> </span></span><span class="diff-indicator-removed"><span class="diff-refine-removed">-</span></span><span class="diff-removed"><span class="diff-refine-removed">#if defined(__SSE__)...</span></span><span class="diff-removed"> </span><span class="diff-indicator-removed">-</span><span class="diff-removed">#endif </span><span class="diff-indicator-removed">-</span><span class="diff-removed"> </span><span class="diff-indicator-removed">-</span><span class="diff-removed">void dt_iop_clip_and_zoom_mosaic_half_size_f(float *const out, const float *const in, </span><span class="diff-indicator-removed">-</span><span class="diff-removed"> const dt_iop_roi_t *const roi_out, const dt_iop_roi_t *const roi_in, </span><span class="diff-indicator-removed">-</span><span class="diff-removed"> const int32_t out_stride, const int32_t in_stride, </span><span class="diff-indicator-removed">-</span><span class="diff-removed"> const uint32_t filters) </span><span class="diff-indicator-removed">-</span><span class="diff-removed">{ </span><span class="diff-indicator-removed">-</span><span class="diff-removed"> if(darktable.codepath.OPENMP_SIMD) </span><span class="diff-indicator-removed">-</span><span class="diff-removed"> return dt_iop_clip_and_zoom_mosaic_half_size_f_plain(out, in, roi_out, roi_in, out_stride, in_stride, filters); </span><span class="diff-indicator-removed">-</span><span class="diff-removed">#if defined(__SSE__) </span><span class="diff-indicator-removed">-</span><span class="diff-removed"> else if(darktable.codepath.SSE2) </span><span class="diff-indicator-removed">-</span><span class="diff-removed"> return dt_iop_clip_and_zoom_mosaic_half_size_f_sse2(out, in, roi_out, roi_in, out_stride, in_stride, filters); </span><span class="diff-indicator-removed">-</span><span class="diff-removed">#endif </span><span class="diff-indicator-removed">-</span><span class="diff-removed"> else </span><span class="diff-indicator-removed">-</span><span class="diff-removed"> dt_unreachable_codepath(); </span><span class="diff-context"> } /** </span><span class="diff-hunk-header">@@ -951,7 +520,7 @@</span><span class="diff-function"> void dt_iop_clip_and_zoom_mosaic_third_size_xtrans_f(float *const out, const flo</span> <span class="diff-context"> } } </span><span class="diff-indicator-removed">-</span><span class="diff-removed">void dt_iop_clip_and_zoom_demosaic_passthrough_monochrome_f</span><span class="diff-removed"><span class="diff-refine-removed">_plain</span></span><span class="diff-removed">(float *out, const float *const in, </span><span class="diff-indicator-added">+</span><span class="diff-added">void dt_iop_clip_and_zoom_demosaic_passthrough_monochrome_f(float *out, const float *const in, </span><span class="diff-context"> const dt_iop_roi_t *const roi_out, const dt_iop_roi_t *const roi_in, const int32_t out_stride, </span><span class="diff-hunk-header">@@ -1085,7 +654,7 @@</span><span class="diff-function"> void dt_iop_clip_and_zoom_demosaic_passthrough_monochrome_f_plain(float *out, co</span> <span class="diff-context"> num = ((maxi - px) / 2 + 1 - dx) * ((maxj - py) / 2 + 1 - dy); } </span><span class="diff-indicator-removed">-</span><span class="diff-removed"> const float pix = col / num; </span><span class="diff-indicator-added">+</span><span class="diff-added"> const float pix = </span><span class="diff-added"><span class="diff-refine-added">(num) ?</span></span><span class="diff-added"> col / num </span><span class="diff-added"><span class="diff-refine-added">: 0.0f</span></span><span class="diff-added">; </span><span class="diff-context"> outc[0] = pix; outc[1] = pix; outc[2] = pix; </span><span class="diff-hunk-header">@@ -1095,256 +664,7 @@</span><span class="diff-function"> void dt_iop_clip_and_zoom_demosaic_passthrough_monochrome_f_plain(float *out, co</span> <span class="diff-context"> } } </span><span class="diff-indicator-removed">-</span><span class="diff-removed"><span class="diff-refine-removed">#if defined(__SSE__)... </span></span><span class="diff-indicator-removed"><span class="diff-refine-removed">-</span></span><span class="diff-removed"><span class="diff-refine-removed">#endif </span></span><span class="diff-indicator-removed"><span class="diff-refine-removed">-</span></span><span class="diff-removed"><span class="diff-refine-removed"> </span></span><span class="diff-indicator-removed"><span class="diff-refine-removed">-</span></span><span class="diff-removed">void dt_iop_clip_and_zoom_demosaic_</span><span class="diff-removed"><span class="diff-refine-removed">passthrough_monochrome_f(float *out, const float *const in, </span></span><span class="diff-indicator-removed"><span class="diff-refine-removed">-</span></span><span class="diff-removed"><span class="diff-refine-removed"> const dt_iop_roi_t *const roi_out, </span></span><span class="diff-indicator-removed"><span class="diff-refine-removed">-</span></span><span class="diff-removed"><span class="diff-refine-removed"> const dt_iop_roi_t *const roi_in, </span></span><span class="diff-indicator-removed"><span class="diff-refine-removed">-</span></span><span class="diff-removed"><span class="diff-refine-removed"> const int32_t out_stride, const int32_t in_stride) </span></span><span class="diff-indicator-removed"><span class="diff-refine-removed">-</span></span><span class="diff-removed"><span class="diff-refine-removed">{ </span></span><span class="diff-indicator-removed"><span class="diff-refine-removed">-</span></span><span class="diff-removed"><span class="diff-refine-removed"> if(darktable.codepath.OPENMP_SIMD) </span></span><span class="diff-indicator-removed"><span class="diff-refine-removed">-</span></span><span class="diff-removed"><span class="diff-refine-removed"> return dt_iop_clip_and_zoom_demosaic_passthrough_monochrome_f_plain(out, in, roi_out, roi_in, out_stride, </span></span><span class="diff-indicator-removed"><span class="diff-refine-removed">-</span></span><span class="diff-removed"><span class="diff-refine-removed"> in_stride); </span></span><span class="diff-indicator-removed"><span class="diff-refine-removed">-</span></span><span class="diff-removed"><span class="diff-refine-removed">#if defined(__SSE__)... </span></span><span class="diff-indicator-removed"><span class="diff-refine-removed">-</span></span><span class="diff-removed"><span class="diff-refine-removed">#endif </span></span><span class="diff-indicator-removed"><span class="diff-refine-removed">-</span></span><span class="diff-removed"><span class="diff-refine-removed"> else </span></span><span class="diff-indicator-removed"><span class="diff-refine-removed">-</span></span><span class="diff-removed"><span class="diff-refine-removed"> dt_unreachable_codepath(); </span></span><span class="diff-indicator-removed"><span class="diff-refine-removed">-</span></span><span class="diff-removed"><span class="diff-refine-removed">} </span></span><span class="diff-indicator-removed"><span class="diff-refine-removed">-</span></span><span class="diff-removed"><span class="diff-refine-removed"> </span></span><span class="diff-indicator-removed"><span class="diff-refine-removed">-</span></span><span class="diff-removed"><span class="diff-refine-removed">#if 0 // gets rid of pink artifacts, but doesn't do sub-pixel sampling, so shows some staircasing artifacts.... </span></span><span class="diff-indicator-removed"><span class="diff-refine-removed">-</span></span><span class="diff-removed"><span class="diff-refine-removed">#else </span></span><span class="diff-indicator-removed"><span class="diff-refine-removed">-</span></span><span class="diff-removed"><span class="diff-refine-removed">// very fast and smooth, but doesn't handle highlights: </span></span><span class="diff-indicator-removed"><span class="diff-refine-removed">-</span></span><span class="diff-removed"><span class="diff-refine-removed"> </span></span><span class="diff-indicator-removed"><span class="diff-refine-removed">-</span></span><span class="diff-removed"><span class="diff-refine-removed">void dt_iop_clip_and_zoom_demosaic_half_size_f_plain</span></span><span class="diff-removed">(float *out, const float *const in, </span><span class="diff-indicator-added">+</span><span class="diff-added">void dt_iop_clip_and_zoom_demosaic_half_size_f(float *out, const float *const in, </span><span class="diff-context"> const dt_iop_roi_t *const roi_out, const dt_iop_roi_t *const roi_in, const int32_t out_stride, const int32_t in_stride, const uint32_t filters) </span><span class="diff-hunk-header">@@ -1522,202 +842,6 @@</span><span class="diff-function"> void dt_iop_clip_and_zoom_demosaic_half_size_f_plain(float *out, const float *co</span> <span class="diff-context"> } } </span><span class="diff-indicator-removed">-</span><span class="diff-removed">#if defined(__SSE__)... </span><span class="diff-indicator-removed">-</span><span class="diff-removed">#endif </span><span class="diff-indicator-removed">-</span><span class="diff-removed">#endif </span><span class="diff-indicator-removed">-</span><span class="diff-removed"> </span><span class="diff-indicator-removed">-</span><span class="diff-removed">void dt_iop_clip_and_zoom_demosaic_half_size_f(float *out, const float *const in, </span><span class="diff-indicator-removed">-</span><span class="diff-removed"> const dt_iop_roi_t *const roi_out, const dt_iop_roi_t *const roi_in, </span><span class="diff-indicator-removed">-</span><span class="diff-removed"> const int32_t out_stride, const int32_t in_stride, </span><span class="diff-indicator-removed">-</span><span class="diff-removed"> const uint32_t filters) </span><span class="diff-indicator-removed">-</span><span class="diff-removed">{ </span><span class="diff-indicator-removed">-</span><span class="diff-removed"> if(darktable.codepath.OPENMP_SIMD) </span><span class="diff-indicator-removed">-</span><span class="diff-removed"> return dt_iop_clip_and_zoom_demosaic_half_size_f_plain(out, in, roi_out, roi_in, out_stride, in_stride, </span><span class="diff-indicator-removed">-</span><span class="diff-removed"> filters); </span><span class="diff-indicator-removed">-</span><span class="diff-removed">#if defined(__SSE__)... </span><span class="diff-indicator-removed">-</span><span class="diff-removed">#endif </span><span class="diff-indicator-removed">-</span><span class="diff-removed"> else </span><span class="diff-indicator-removed">-</span><span class="diff-removed"> dt_unreachable_codepath(); </span><span class="diff-indicator-removed">-</span><span class="diff-removed">} </span><span class="diff-context"> void dt_iop_clip_and_zoom_demosaic_third_size_xtrans_f(float *out, const float *const in, const dt_iop_roi_t *const roi_out, </span></pre> </body> </html>