[Beignet] [PATCH] Add kernels performance output

2014-03-18 Thread Yongjia Zhang
if environment variable OCL_OUTPUT_KERNEL_PERF is set non-zero, then after the executable program exits, beignet will output the time information of each kernel executed. Signed-off-by:Yongjia Zhang --- src/CMakeLists.txt | 3 +- src/cl_api.c | 23 - src/cl_command_queue.c |

[Beignet] [PATCH] GBE: make byte/short vload/vstore process one element each time.

2014-03-18 Thread Ruiling Song
Per OCL Spec, the computed address (p+offset*n) is 8-bit aligned for char, and 16-bit aligned for short in vloadn & vstoren. That is we can not assume that vload4 with char pointer is 4byte aligned. The previous implementation will make Clang generate an load or store with alignment 4 which is in f