Re: [Beignet] [PATCH v2] Provide more possible candidate of load/store as possible.

2017-03-09 Thread Song, Ruiling
The load store optimizer did not do aggressive merge. Normally the successive load instructions are not too far. The performance difference is much higher than I thought. So the performance number comes for SKL platform? Have you tried this patch on a BDW? The performance behavior you observed

Re: [Beignet] [PATCH 5/5] Backend: add double support to bitselect

2017-03-09 Thread Song, Ruiling
LGTM - Ruiling > -Original Message- > From: Beignet [mailto:beignet-boun...@lists.freedesktop.org] On Behalf Of > rander > Sent: Tuesday, March 7, 2017 9:56 AM > To: beig...@freedesktop.org > Cc: Wang, Rander > Subject: [Beignet] [PATCH 5/5] Backend: add double

Re: [Beignet] [PATCH v2] Provide more possible candidate of load/store as possible.

2017-03-09 Thread yan . wang
Some typo. Sorry for it. I have modified it. yan.wang From: yan.wang Date: 2017-03-10 10:52 To: ruiling.song; beignet Subject: Re: [Beignet] [PATCH v2] Provide more possible candidate of load/store as possible. It comes from darktable perforamnce tuning. For float type, maxVecSize is 4, so

Re: [Beignet] [PATCH 4/4] Backend:add double support for some relation function

2017-03-09 Thread Song, Ruiling
LGTM - Ruiling > -Original Message- > From: Beignet [mailto:beignet-boun...@lists.freedesktop.org] On Behalf Of > rander > Sent: Monday, March 6, 2017 5:48 PM > To: beig...@freedesktop.org > Cc: Wang, Rander > Subject: [Beignet] [PATCH 4/4] Backend:add double

Re: [Beignet] [PATCH v2] Provide more possible candidate of load/store as possible.

2017-03-09 Thread yan . wang
It comes from darktable perforamnce tuning. For float type, maxVecSize is 4, so maxLimit = 4 * 8 = 32. I am not sure the reason of maxLimit = maxVecSize * 8. 32 is too samll for saerching and could not find more available load after leading load. It will improve eaw_decompose kernel of darktable

Re: [Beignet] [PATCH v2] Provide more possible candidate of load/store as possible.

2017-03-09 Thread Song, Ruiling
> -Original Message- > From: Beignet [mailto:beignet-boun...@lists.freedesktop.org] On Behalf Of > yan.w...@linux.intel.com > Sent: Thursday, March 9, 2017 5:41 PM > To: beignet@lists.freedesktop.org > Cc: Yan Wang > Subject: [Beignet] [PATCH v2] Provide more

[Beignet] [PATCH] Backend: add double support to convert_u|char|u|short|u|int_rte(double x)

2017-03-09 Thread rander
Signed-off-by: rander --- backend/src/libocl/script/ocl_convert.sh | 9 + 1 file changed, 9 insertions(+) diff --git a/backend/src/libocl/script/ocl_convert.sh b/backend/src/libocl/script/ocl_convert.sh index ef65ff5..53fb82c 100755 ---

[Beignet] [PATCH] Backend: add double support to convert_u|long_rte(double x)

2017-03-09 Thread rander
Signed-off-by: rander --- backend/src/libocl/script/ocl_convert.sh | 39 ++-- 1 file changed, 37 insertions(+), 2 deletions(-) diff --git a/backend/src/libocl/script/ocl_convert.sh b/backend/src/libocl/script/ocl_convert.sh index

[Beignet] [PATCH] Backend: add double support to convert_float_rtn(double x)

2017-03-09 Thread rander
Signed-off-by: rander --- backend/src/libocl/include/ocl_float.h | 5 + backend/src/libocl/script/ocl_convert.sh | 26 ++ 2 files changed, 31 insertions(+) diff --git a/backend/src/libocl/include/ocl_float.h

[Beignet] [PATCH v2] Provide more possible candidate of load/store as possible.

2017-03-09 Thread yan . wang
From: Yan Wang Avoid searching range too small in some case like vector of float. It will lead more load/store merged for improving perforamnce. Signed-off-by: Yan Wang --- backend/src/llvm/llvm_loadstore_optimization.cpp | 2 +- 1 file

[Beignet] [PATCH] Provide more possible candidate of load/store as possible.

2017-03-09 Thread yan . wang
From: Yan Wang Avoid search range too small in same case like vector of float. It will lead more load/store merged for improving perforamnce. Signed-off-by: Yan Wang --- backend/src/llvm/llvm_loadstore_optimization.cpp | 2 +- 1 file

[Beignet] [Patch V2 1/2] add extension intel_planar_yuv.

2017-03-09 Thread xionghu . luo
From: Luo Xionghu create a w* (3/2*h) size bo for the whole CL_NV12_INTEL format surface, and the y surface (format CL_R) share the first w * h part, uv surface (format CL_RG) share the left w * 1/2h part; set correct bo offset for uv surface per different platforms. v2:

[Beignet] [Patch V2 2/2] add utest for extension intel_planar_yuv.

2017-03-09 Thread xionghu . luo
From: Luo Xionghu v2: use read_only/write_only instead of read_write to run on OpenCL-1.2 platform; fix local size issue on IVB platform; Signed-off-by: Luo Xionghu --- kernels/image_planar_yuv.cl | 24 ++ utests/CMakeLists.txt | 1 +