Re: [ViennaCL-devel] Coding Style Unification

Philippe Tillet Sun, 06 Jul 2014 07:21:07 -0700

Hi,

ViennaCL 1.6.0 should come out with an optimized kernel for GEMM that
achieves (on large square matrices) 4 TFLOP/s SP and 1.8 TFLOP/s DP on
FirePro W9100 (or 4 TFLOP/s SP on R9 290x). This is for full matrices
column-major NoTrans X Trans, or equivalently row-major Trans X NoTrans.
I have no benchmarks at the moment for the other operations... The W9100
will soon be made available to ViennaCL, though.


Philippe


2014-07-06 16:13 GMT+02:00 Matthew Musto <[email protected]>:

> Philippe,
>
> I couldn't help but notice you have a FirePro W9100.  Any chance you have
> updated benchmarks similar to
> http://viennacl.sourceforge.net/viennacl-benchmarks.html?  I realize the
> intent of the graphical benchmark SoC project is to get more data from a
> variety of systems but my curiosity is getting the better of me.
>
> Thanks,
> -Matt
>
>
> On Sun, Jul 6, 2014 at 7:43 AM, Philippe Tillet <[email protected]>
> wrote:
>
>> Hey,
>>
>> @Toby : I'll also try to take care of runing the auto-tuner/benchmark
>> script on my HD5850 or Hawaii (It's the Firepro W9100, but you can
>> advertise a R9 290x for single-precision, as it is much cheaper and has the
>> same chip for single precision). Which one do you prefer?
>>
>>
>> @Karl : Yes, an easy fix :) Considering the portability problems we've
>> had with GEMM for ViennaCL 1.5.2, I want to be careful. There will be a
>> better default profiles for CPUs/Accelerators, a better default profiles
>> for GPUs, and a better default profile for each NVidia / AMD / Intel GPUs.
>> (In the meanwhile, there is one default crappy global profile. Although the
>> 1 work-group thing is a bug anyway...)
>>
>> Philippe
>>
>>
>> 2014-07-06 13:37 GMT+02:00 Karl Rupp <[email protected]>:
>>
>> Hey,
>>>
>>>
>>> > I made a small mistake when creating these "conservative" profiles.
>>> GEMV
>>>
>>>> runs with only one work group. I'll fix this, don't worry :)
>>>>
>>>
>>> ah, that's an easy fix then. Thanks!
>>>
>>> Best regards,
>>> Karli
>>>
>>>
>>>
>>>
>>>> 2014-07-06 13:31 GMT+02:00 Karl Rupp <[email protected]
>>>> <mailto:[email protected]>>:
>>>>
>>>>
>>>>     Hey,
>>>>
>>>>       > I'm getting on the plane in a couple of hours, so this might be
>>>>     the last
>>>>      > you here from me till the middle of the night Europe time.
>>>>
>>>>     Have a good flight and enjoy Texas! :-)
>>>>
>>>>
>>>>
>>>>      >>   > I suggest we start unifying in a couple of days indeed. I
>>>>     still have a
>>>>      >>> couple of things to merge, essentially having GEMM dynamically
>>>>     generated
>>>>      >>> for some cases and publishing the repo for auto-tuning using
>>>>     pyviennacl.
>>>>      >>> These have to be done soon so that Toby can present some good
>>>>     benchmarks
>>>>      >>> at the talk.
>>>>      >
>>>>      > I'm currently struggling to get decent performance out of
>>>>     viennacl-dev
>>>>      > master, even when not doing GEMM. Consider single-precision dense
>>>>     GEMV
>>>>      > using a square matrix and vector with 4096 rows/cols. On the GTX
>>>>     470 on
>>>>      > krupp2, one execution takes ~0.100s; on the C2050, ~0.106s.
>>>> Execution
>>>>      > overhead is about 0.0003s. But NumPy with MKL takes only ~0.004s;
>>>>     I know
>>>>      > that krupp2 has an 8(?)-core i7, so (something like) 512
>>>>     rows/cols per
>>>>      > core, but I still didn't expect the gap to be like that. It's
>>>>     strange,
>>>>      > because my GeForce 610M takes ~0.090s, and my Intel Ivy Bridge M
>>>>     GT2 GPU
>>>>      > takes ~0.001s (at last competitive with MKL, though I'm waiting
>>>>     to test
>>>>      > correctness as I write this). And NumPy with OpenBLAS on my i5
>>>> takes
>>>>      > ~0.009s. Any hints?
>>>>
>>>>     So 16M entries with 4 bytes each need to be transferred for the
>>>> matrix,
>>>>     which amounts to 64 MB of data. At an execution time of 0.1 sec,
>>>> this is
>>>>     equivalent to 640 MB/sec of memory bandwidth, which is about a
>>>> factor of
>>>>     100 off the peak. Way too low. I'll check this today, it can only
>>>> be a
>>>>     tiny detail in the generator integration.
>>>>
>>>>
>>>>      >> These are all valid points. What about cropping this 'offset'
>>>>     and use
>>>>      >> something like the following:
>>>>      >>
>>>>      >> namespace viennacl { namespace linalg {
>>>>      >>
>>>>      >> void some_api_function() { ... }
>>>>      >>
>>>>      >> namespace detail
>>>>      >> {
>>>>      >>     void some_implementation_detail() { ... }
>>>>      >> }
>>>>      >>
>>>>      >> }}
>>>>      >>
>>>>      >> This would preserve the benefit of a visual separation of public
>>>>     API and
>>>>      >> private implementations, yet remove the 'global' indent offset
>>>>       from the
>>>>      >> source file.
>>>>      >
>>>>      > I like this; and indeed is what I tend to do when indentation
>>>>     gets out
>>>>      > of hand.
>>>>
>>>>     Thanks for the feedback :-)
>>>>
>>>>     Best regards,
>>>>     Karli
>>>>
>>>>
>>>>     ------------------------------------------------------------
>>>> ------------------
>>>>     Open source business process management suite built on Java and
>>>> Eclipse
>>>>     Turn processes into business applications with Bonita BPM Community
>>>>     Edition
>>>>     Quickly connect people, data, and systems into organized workflows
>>>>     Winner of BOSSIE, CODIE, OW2 and Gartner awards
>>>>     http://p.sf.net/sfu/Bonitasoft
>>>>     _______________________________________________
>>>>     ViennaCL-devel mailing list
>>>>     [email protected]
>>>>     <mailto:[email protected]>
>>>>     https://lists.sourceforge.net/lists/listinfo/viennacl-devel
>>>>
>>>>
>>>>
>>>
>>
>>
>> ------------------------------------------------------------------------------
>> Open source business process management suite built on Java and Eclipse
>> Turn processes into business applications with Bonita BPM Community
>> Edition
>> Quickly connect people, data, and systems into organized workflows
>> Winner of BOSSIE, CODIE, OW2 and Gartner awards
>> http://p.sf.net/sfu/Bonitasoft
>> _______________________________________________
>> ViennaCL-devel mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/viennacl-devel
>>
>>
>
>
> --
> --------------------
> Matthew Musto
> [email protected]
>

------------------------------------------------------------------------------
Open source business process management suite built on Java and Eclipse
Turn processes into business applications with Bonita BPM Community Edition
Quickly connect people, data, and systems into organized workflows
Winner of BOSSIE, CODIE, OW2 and Gartner awards
http://p.sf.net/sfu/Bonitasoft

_______________________________________________
ViennaCL-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/viennacl-devel

Re: [ViennaCL-devel] Coding Style Unification

Reply via email to