So fast_copy still copies the memory and has copying overhead, even with
MAIN_MEMORY context?

Is there a way to do shallow copying  (i.e. just pointer initialization) to
the matrix data buffer? Isn't it what some constructors of matrix or
matrix_base do?

What i am getting at, it looks like i am getting a significant overhead for
just copying -- actually, it seems i am getting double overhead -- once
when i prepare padding and all as required by the internal_size?(), and
then i pass it into the fast_copy() which apparently does copying again,
even if we are using host memory matrices.

all in all, by my estimates this copying back and forth (which is, granted,
is not greatly optimized on our side) takes ~15..17 seconds out of 60
seconds total when multiplying 10k x 10k dense arguments via ViennaCL. I
also optimize to -march=haswell  and use -ffast-math, without those i seem
to fall too far behind what R + openblas can do in this test. Then, my
processing time swells up to 2 minutes without optimizing for non-compliant
arithmetics.

If i can wrap the buffer and avoid copying for MAIN_MEMORY context, i'd be
shaving off another 10% or so of the execution time. Which would make me
happier, as i probably would be able to beat openblas given custom cpu
architecture flags.

On the other hand, bidmat (which allegedly uses mkl) does the same test,
double precision, in under 10 seconds. I can't fathom how, but it does. I
have a haswell-E platform.

thank you.
dmitriy

On Tue, Jul 12, 2016 at 9:27 AM, Karl Rupp <r...@iue.tuwien.ac.at> wrote:

> Hi,
>
> > One question: you mentioned padding for the `matrix` type. When i
>
>> initialize the `matrix` instance, i only specify dimensions. how do I
>> know padding values?
>>
>
> if you want to provide your own padded dimensions, consider using
> matrix_base directly. If you want to query the padded dimensions, use
> internal_size1() and internal_size2() for the internal number of rows and
> columns.
>
> http://viennacl.sourceforge.net/doc/manual-types.html#manual-types-matrix
>
> Best regards,
> Karli
>
>
>
>
>> On Tue, Jul 12, 2016 at 5:53 AM, Karl Rupp <r...@iue.tuwien.ac.at
>> <mailto:r...@iue.tuwien.ac.at>> wrote:
>>
>>     Hi Dmitriy,
>>
>>     On 07/12/2016 07:17 AM, Dmitriy Lyubimov wrote:
>>
>>         Hi,
>>
>>         I am trying to create some elementary wrappers for VCL in javacpp.
>>
>>         Everything goes fine, except i really would rather not use those
>>         "cpu"
>>         types (std::map,
>>         std::vector) and rather initialize matrices directly by feeding
>>         row-major or CCS formats.
>>
>>         I see that matrix () constructor accepts this form of
>>         initialization;
>>         but it really states that
>>         it does "wrapping" for the device memory.
>>
>>
>>     Yes, the constructors either create their own memory buffer
>>     (zero-initialized) or wrap an existing buffer. These are the only
>>     reasonable options.
>>
>>
>>         Now, i can create a host matrix() using host memory and row-major
>>         packing. This works ok it seems.
>>
>>         However, these are still host instances. Can i copy host
>>         instances to
>>         instances on opencl context?
>>
>>
>>     Did you look at viennacl::copy() or viennacl::fast_copy()?
>>
>>
>>         That might be one way bypassing unnecessary (in my case)
>>         complexities of
>>         working with std::vector and std::map classes from java side.
>>
>>         But it looks like there's no copy() variation that would accept a
>>         matrix-on-host and matrix-on-opencl arguments (or rather, it of
>>         course
>>         declares those to be ambiguous since two methods fit).
>>
>>
>>     If you want to copy your OpenCL data into a viennacl::matrix, you
>>     may wrap the memory handle (obtained with .elements()) into a vector
>>     and copy that. If you have plain host data, use
>>     viennacl::fast_copy() and mind the data layout (padding of
>>     rows/columns!)
>>
>>
>>         For compressed_matrix, there seems to be a set() method, but i
>> guess
>>         this also requires CCS arrays in the device memory if I use it.
>> Same
>>         question, is there a way to send-and-wrap CCS arrays to an
>>         opencl device
>>         instance of compressed matrix without using std::map?
>>
>>
>>     Currently you have to use .set() if you want to bypass
>>     viennacl::copy() and std::map.
>>
>>     I acknowledge that the C++ type system is a pain when interfacing
>>     from other languages. We will make this much more convenient in
>>     ViennaCL 2.0. The existing interface in ViennaCL 1.x is too hard to
>>     fix without breaking lots of user code, so we won't invest time in
>>     that (contributions welcome, though :-) )
>>
>>     Best regards,
>>     Karli
>>
>>
>>
>>
>
------------------------------------------------------------------------------
What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
patterns at an interface-level. Reveals which users, apps, and protocols are 
consuming the most bandwidth. Provides multi-vendor support for NetFlow, 
J-Flow, sFlow and other flows. Make informed decisions using capacity planning
reports.http://sdm.link/zohodev2dev
_______________________________________________
ViennaCL-devel mailing list
ViennaCL-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/viennacl-devel

Reply via email to