Hi Karl,
I have been trying on and off to get this to work and I am completely
stumped. I have reversed the kernel so it is for row-major format. Right
now, if I use the logic you suggested (switching the '<' for '>') though
the kernel stops after the first column. So only the first element of the
rows are imputed. I have tried several iterations to try and figure out
this logic but I am stuck. Here is the current kernel with all the
different attempts commented out (where MdimPad and PdimPad or the padded
dimensions). If I don't have a size condition check, the device quickly
runs out of resources (Error: ViennaCL: FATAL ERROR: CL_OUT_OF_RESOURCES
). Any thoughts? I feel like I must be missing something simple at this
point.
__kernel void iMatMult(const int Mdim, const int MdimPad,
const int Pdim, const int PdimPad,
__global const int *A, __global const int *B,
__global int *C) {
// Get the index of the elements to be processed
const int globalRow = get_global_id(0); // C Row ID
const int globalCol = get_global_id(1); // C Col ID
int tmp = 0;
//int index = (globalRow + 1) * (globalCol + 1);
//int max_index = MdimPad * PdimPad;
//int max_index = (Mdim - 1) * MdimPad + Mdim;
if (globalRow > MdimPad || globalCol > MdimPad)
return;
//if(index >= Mdim * Pdim){
// return;
//}
//if(globalRow >= MdimPad || globalCol >= PdimPad){
// return;
//}
//if(globalRow > Mdim - 1){
// return;
//}
//if((globalRow * MdimPad) >= max_index){
// return;
//}
//printf("index = %d\n", index);
printf("globalCol = %d\n", globalCol);
printf("globalRow = %d\n", globalRow);
// Do the operation
for(int k=0; k < Pdim; k++){
tmp += A[globalRow * MdimPad + k] * B[globalCol+PdimPad*k];
}
C[globalCol+MdimPad*globalRow] = tmp;
}
Kind Regards,
Charles
On Mon, May 23, 2016 at 3:15 PM, Karl Rupp <r...@iue.tuwien.ac.at> wrote:
> Hey,
>
> > Ah yes, thanks Karl. I remember that now. With that said, are there
>
>> recommendations on how kernels should be written to address the padded
>> columns? I am imagining some if/else or loop limits on indices but
>> thought I would ask here before I start trying to do that. I am trying
>> to look through the kernels and I am seeing things along the lines of
>> 'global_size(0) < size' where I assume size refers to one of the
>> dimensions?
>>
>
> It depends on the respective assumptions and guarantees you make on the
> underlying data. The 'safest' way to deal with it is with something like
> the following (based on the kernel code you provided):
>
> const int globalRow = get_global_id(0); // C Row ID
> const int globalCol = get_global_id(1); // C Col ID
> int tmp = 0;
>
> if (globalRow < rowsC || globalCol < colsC)
> return;
>
> for(int k=0; k < Msize; k++) // Msize instead of Mdim here!
> tmp += A[k*Mdim+globalRow] * B[globalCol*Pdim+k];
>
> C[globalCol*Mdim+globalRow] = tmp;
>
> Note that this code assumes column-major (Fortan) data layout, whereas the
> standard layout in ViennaCL is row-major (C).
>
> If so, I humbly recommend that although the padding is mentioned with
>> respect to the matrix types either an example or explanation would be
>> valuable in the custom kernel section (at the very least another
>> friendly reminder). Not all repetition is bad :)
>>
>
> Agreed. :-)
>
> Best regards,
> Karli
>
>
------------------------------------------------------------------------------
What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
patterns at an interface-level. Reveals which users, apps, and protocols are
consuming the most bandwidth. Provides multi-vendor support for NetFlow,
J-Flow, sFlow and other flows. Make informed decisions using capacity
planning reports. https://ad.doubleclick.net/ddm/clk/305295220;132659582;e
_______________________________________________
ViennaCL-devel mailing list
ViennaCL-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/viennacl-devel