Also: when i am using a wrapping constructor to initilaize a MAIN_MEMORY
matrix around preexisting row-major buffer, when i subsequently try to use
this matrix, i get the message:
ViennaCL: Internal memory error: not initialised!
why?
On Wed, Jul 13, 2016 at 2:01 PM, Dmitriy Lyubimov <dlie...@gmail.com> wrote:
> So fast_copy still copies the memory and has copying overhead, even with
> MAIN_MEMORY context?
>
> Is there a way to do shallow copying (i.e. just pointer initialization)
> to the matrix data buffer? Isn't it what some constructors of matrix or
> matrix_base do?
>
> What i am getting at, it looks like i am getting a significant overhead
> for just copying -- actually, it seems i am getting double overhead -- once
> when i prepare padding and all as required by the internal_size?(), and
> then i pass it into the fast_copy() which apparently does copying again,
> even if we are using host memory matrices.
>
> all in all, by my estimates this copying back and forth (which is,
> granted, is not greatly optimized on our side) takes ~15..17 seconds out of
> 60 seconds total when multiplying 10k x 10k dense arguments via ViennaCL. I
> also optimize to -march=haswell and use -ffast-math, without those i seem
> to fall too far behind what R + openblas can do in this test. Then, my
> processing time swells up to 2 minutes without optimizing for non-compliant
> arithmetics.
>
> If i can wrap the buffer and avoid copying for MAIN_MEMORY context, i'd be
> shaving off another 10% or so of the execution time. Which would make me
> happier, as i probably would be able to beat openblas given custom cpu
> architecture flags.
>
> On the other hand, bidmat (which allegedly uses mkl) does the same test,
> double precision, in under 10 seconds. I can't fathom how, but it does. I
> have a haswell-E platform.
>
> thank you.
> dmitriy
>
> On Tue, Jul 12, 2016 at 9:27 AM, Karl Rupp <r...@iue.tuwien.ac.at> wrote:
>
>> Hi,
>>
>> > One question: you mentioned padding for the `matrix` type. When i
>>
>>> initialize the `matrix` instance, i only specify dimensions. how do I
>>> know padding values?
>>>
>>
>> if you want to provide your own padded dimensions, consider using
>> matrix_base directly. If you want to query the padded dimensions, use
>> internal_size1() and internal_size2() for the internal number of rows and
>> columns.
>>
>> http://viennacl.sourceforge.net/doc/manual-types.html#manual-types-matrix
>>
>> Best regards,
>> Karli
>>
>>
>>
>>
>>> On Tue, Jul 12, 2016 at 5:53 AM, Karl Rupp <r...@iue.tuwien.ac.at
>>> <mailto:r...@iue.tuwien.ac.at>> wrote:
>>>
>>> Hi Dmitriy,
>>>
>>> On 07/12/2016 07:17 AM, Dmitriy Lyubimov wrote:
>>>
>>> Hi,
>>>
>>> I am trying to create some elementary wrappers for VCL in
>>> javacpp.
>>>
>>> Everything goes fine, except i really would rather not use those
>>> "cpu"
>>> types (std::map,
>>> std::vector) and rather initialize matrices directly by feeding
>>> row-major or CCS formats.
>>>
>>> I see that matrix () constructor accepts this form of
>>> initialization;
>>> but it really states that
>>> it does "wrapping" for the device memory.
>>>
>>>
>>> Yes, the constructors either create their own memory buffer
>>> (zero-initialized) or wrap an existing buffer. These are the only
>>> reasonable options.
>>>
>>>
>>> Now, i can create a host matrix() using host memory and row-major
>>> packing. This works ok it seems.
>>>
>>> However, these are still host instances. Can i copy host
>>> instances to
>>> instances on opencl context?
>>>
>>>
>>> Did you look at viennacl::copy() or viennacl::fast_copy()?
>>>
>>>
>>> That might be one way bypassing unnecessary (in my case)
>>> complexities of
>>> working with std::vector and std::map classes from java side.
>>>
>>> But it looks like there's no copy() variation that would accept a
>>> matrix-on-host and matrix-on-opencl arguments (or rather, it of
>>> course
>>> declares those to be ambiguous since two methods fit).
>>>
>>>
>>> If you want to copy your OpenCL data into a viennacl::matrix, you
>>> may wrap the memory handle (obtained with .elements()) into a vector
>>> and copy that. If you have plain host data, use
>>> viennacl::fast_copy() and mind the data layout (padding of
>>> rows/columns!)
>>>
>>>
>>> For compressed_matrix, there seems to be a set() method, but i
>>> guess
>>> this also requires CCS arrays in the device memory if I use it.
>>> Same
>>> question, is there a way to send-and-wrap CCS arrays to an
>>> opencl device
>>> instance of compressed matrix without using std::map?
>>>
>>>
>>> Currently you have to use .set() if you want to bypass
>>> viennacl::copy() and std::map.
>>>
>>> I acknowledge that the C++ type system is a pain when interfacing
>>> from other languages. We will make this much more convenient in
>>> ViennaCL 2.0. The existing interface in ViennaCL 1.x is too hard to
>>> fix without breaking lots of user code, so we won't invest time in
>>> that (contributions welcome, though :-) )
>>>
>>> Best regards,
>>> Karli
>>>
>>>
>>>
>>>
>>
>
------------------------------------------------------------------------------
What NetFlow Analyzer can do for you? Monitors network bandwidth and traffic
patterns at an interface-level. Reveals which users, apps, and protocols are
consuming the most bandwidth. Provides multi-vendor support for NetFlow,
J-Flow, sFlow and other flows. Make informed decisions using capacity planning
reports.http://sdm.link/zohodev2dev
_______________________________________________
ViennaCL-devel mailing list
ViennaCL-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/viennacl-devel