The load store optimizer did not do aggressive merge.
Normally the successive load instructions are not too far.
The performance difference is much higher than I thought.
So the performance number comes for SKL platform? Have you tried this patch on
a BDW?
The performance behavior you observed
LGTM
- Ruiling
> -Original Message-
> From: Beignet [mailto:beignet-boun...@lists.freedesktop.org] On Behalf Of
> rander
> Sent: Tuesday, March 7, 2017 9:56 AM
> To: beig...@freedesktop.org
> Cc: Wang, Rander
> Subject: [Beignet] [PATCH 5/5] Backend: add double
Some typo. Sorry for it.
I have modified it.
yan.wang
From: yan.wang
Date: 2017-03-10 10:52
To: ruiling.song; beignet
Subject: Re: [Beignet] [PATCH v2] Provide more possible candidate of load/store
as possible.
It comes from darktable perforamnce tuning.
For float type, maxVecSize is 4, so
LGTM
- Ruiling
> -Original Message-
> From: Beignet [mailto:beignet-boun...@lists.freedesktop.org] On Behalf Of
> rander
> Sent: Monday, March 6, 2017 5:48 PM
> To: beig...@freedesktop.org
> Cc: Wang, Rander
> Subject: [Beignet] [PATCH 4/4] Backend:add double
It comes from darktable perforamnce tuning.
For float type, maxVecSize is 4, so maxLimit = 4 * 8 = 32.
I am not sure the reason of maxLimit = maxVecSize * 8.
32 is too samll for saerching and could not find more available load after
leading load.
It will improve eaw_decompose kernel of darktable
> -Original Message-
> From: Beignet [mailto:beignet-boun...@lists.freedesktop.org] On Behalf Of
> yan.w...@linux.intel.com
> Sent: Thursday, March 9, 2017 5:41 PM
> To: beignet@lists.freedesktop.org
> Cc: Yan Wang
> Subject: [Beignet] [PATCH v2] Provide more
Signed-off-by: rander
---
backend/src/libocl/script/ocl_convert.sh | 9 +
1 file changed, 9 insertions(+)
diff --git a/backend/src/libocl/script/ocl_convert.sh
b/backend/src/libocl/script/ocl_convert.sh
index ef65ff5..53fb82c 100755
---
Signed-off-by: rander
---
backend/src/libocl/script/ocl_convert.sh | 39 ++--
1 file changed, 37 insertions(+), 2 deletions(-)
diff --git a/backend/src/libocl/script/ocl_convert.sh
b/backend/src/libocl/script/ocl_convert.sh
index
Signed-off-by: rander
---
backend/src/libocl/include/ocl_float.h | 5 +
backend/src/libocl/script/ocl_convert.sh | 26 ++
2 files changed, 31 insertions(+)
diff --git a/backend/src/libocl/include/ocl_float.h
From: Yan Wang
Avoid searching range too small in some case like vector of float.
It will lead more load/store merged for improving perforamnce.
Signed-off-by: Yan Wang
---
backend/src/llvm/llvm_loadstore_optimization.cpp | 2 +-
1 file
From: Yan Wang
Avoid search range too small in same case like vector of float.
It will lead more load/store merged for improving perforamnce.
Signed-off-by: Yan Wang
---
backend/src/llvm/llvm_loadstore_optimization.cpp | 2 +-
1 file
From: Luo Xionghu
create a w* (3/2*h) size bo for the whole CL_NV12_INTEL format
surface, and the y surface (format CL_R) share the first w * h
part, uv surface (format CL_RG) share the left w * 1/2h part; set
correct bo offset for uv surface per different platforms.
v2:
From: Luo Xionghu
v2: use read_only/write_only instead of read_write to run on OpenCL-1.2
platform; fix local size issue on IVB platform;
Signed-off-by: Luo Xionghu
---
kernels/image_planar_yuv.cl | 24 ++
utests/CMakeLists.txt | 1 +
13 matches
Mail list logo