Re: OpenBLAS and performance

2017-12-22 Thread Ricardo Wurmus
Pjotr Prins writes: >> > If I compile for a target it >> > makes a large difference. >> >> The FAQ document[1] says this: >> >> The environment variable which control the kernel selection is >> OPENBLAS_CORETYPE (see driver/others/dynamic.c) e.g. export >>

Re: OpenBLAS and performance

2017-12-22 Thread Pjotr Prins
On Fri, Dec 22, 2017 at 04:10:39PM +0100, Ludovic Courtès wrote: > Static binding has a cost, as you write, but it gives us control over > the environment, and the ability to capture and replicate the software > environment. As a user, that’s something I value a lot. > I’d also argue that this

Re: OpenBLAS and performance

2017-12-22 Thread Ludovic Courtès
Hi, Dave Love skribis: > Ludovic Courtès writes: > >> Hello, >> >> Dave Love skribis: >> >>> Fedora sensibly builds separately-named libraries for different flavours >>> , but I'd

Re: OpenBLAS and performance

2017-12-22 Thread Dave Love
For what it's worth, I get 37000 Mflops from the dgemm.goto benchmark using the current Guix openblas and OPENBLAS_NUM_THREADS=1 at a size of 7000 on a laptop with "i5-6200U CPU @ 2.30GHz" (avx2). That looks about right, and it should more-or-less plateau at that size. For comparison, I get

Re: OpenBLAS and performance

2017-12-22 Thread Dave Love
Ludovic Courtès writes: > Hello, > > Dave Love skribis: > >> Fedora sensibly builds separately-named libraries for different flavours >> , but I'd >> argue also for threaded versions being

Re: OpenBLAS and performance

2017-12-22 Thread Dave Love
Ricardo Wurmus writes: >> I was confused. I see the only version of the library shipped is built >> with pthreads. I think there should be serial, pthreads, and OpenMP >> versions, as for Fedora. > > Do these library variants have the same binary interface, so that a user >

Re: OpenBLAS and performance

2017-12-21 Thread Ricardo Wurmus
Hi Dave, > I wrote: > >> If you do provide some sort of threaded version for Python, then as far >> as I remember it must use pthreads, not OpenMP, though you want the >> OpenMP version for other purposes, and I hadn't realized there wasn't >> one currently. > > I was confused. I see the only

Re: OpenBLAS and performance

2017-12-21 Thread Ricardo Wurmus
Dave Love writes: > Another point about the OB package is that it excludes LAPACK for some > reason that doesn't seem to be recorded. I think that should be > included, partly for convenience, and partly because it optimizes some > of LAPACK. That was me, I think. I did this

Re: OpenBLAS and performance

2017-12-21 Thread Dave Love
Eric Bavier writes: > Related only to this specific case of BLAS libraries, and not to the > general idea of optimized libraries: > I recently discovered "FlexiBLAS" from the Max Planck Institute > https://www.mpi-magdeburg.mpg.de/projects/flexiblas which I thought >

Re: OpenBLAS and performance

2017-12-21 Thread Dave Love
Ricardo Wurmus writes: > Hi Pjotr, > >> I was just stating that the default openblas package does not perform >> well (it is single threaded, for one). > > Is it really single-threaded? I remember having a couple of problems > with OpenBLAS on our cluster when it is used

Re: OpenBLAS and performance

2017-12-21 Thread Ludovic Courtès
Hello, Dave Love skribis: > Fedora sensibly builds separately-named libraries for different flavours > , but I'd > argue also for threaded versions being available with the generic soname > in librray sub-directories.

Re: OpenBLAS and performance

2017-12-21 Thread Ludovic Courtès
Pjotr Prins skribis: > On Wed, Dec 20, 2017 at 07:15:16PM +0100, Ricardo Wurmus wrote: [...] >> The FAQ document[1] says this: >> >> The environment variable which control the kernel selection is >> OPENBLAS_CORETYPE (see driver/others/dynamic.c) e.g. export >>

Re: OpenBLAS and performance

2017-12-21 Thread Pjotr Prins
On Thu, Dec 21, 2017 at 12:02:55AM +0100, Ricardo Wurmus wrote: > > Pjotr Prins writes: > > > On Wed, Dec 20, 2017 at 09:00:46PM +0100, Ricardo Wurmus wrote: > >> > I do think we need to default to a conservative openblas for general > >> > use. Question is how we

Re: OpenBLAS and performance

2017-12-20 Thread Eric Bavier
On Wed, 20 Dec 2017 21:32:15 +0100 Pjotr Prins wrote: > On Wed, Dec 20, 2017 at 09:00:46PM +0100, Ricardo Wurmus wrote: > > > I do think we need to default to a conservative openblas for general > > > use. Question is how we make it fly on dedicated hardware. > > >

Re: OpenBLAS and performance

2017-12-20 Thread Ricardo Wurmus
Pjotr Prins writes: > On Wed, Dec 20, 2017 at 09:00:46PM +0100, Ricardo Wurmus wrote: >> > I do think we need to default to a conservative openblas for general >> > use. Question is how we make it fly on dedicated hardware. >> >> Have you tried preloading the special

Re: OpenBLAS and performance

2017-12-20 Thread Pjotr Prins
On Wed, Dec 20, 2017 at 09:00:46PM +0100, Ricardo Wurmus wrote: > > I do think we need to default to a conservative openblas for general > > use. Question is how we make it fly on dedicated hardware. > > Have you tried preloading the special library with LD_PRELOAD? It is not a question of what

Re: OpenBLAS and performance

2017-12-20 Thread Ricardo Wurmus
Hi Pjotr, > I was just stating that the default openblas package does not perform > well (it is single threaded, for one). Is it really single-threaded? I remember having a couple of problems with OpenBLAS on our cluster when it is used with Numpy as both would spawn lots of threads. The

Re: OpenBLAS and performance

2017-12-20 Thread Pjotr Prins
On Wed, Dec 20, 2017 at 07:15:16PM +0100, Ricardo Wurmus wrote: > Is it really single-threaded? I remember having a couple of problems > with OpenBLAS on our cluster when it is used with Numpy as both would > spawn lots of threads. The solution was to limit OpenBLAS to at most > two threads.

Re: OpenBLAS and performance

2017-12-20 Thread Pjotr Prins
On Wed, Dec 20, 2017 at 02:48:42PM +, Dave Love wrote: > I wrote: > > > If you do provide some sort of threaded version for Python, then as far > > as I remember it must use pthreads, not OpenMP, though you want the > > OpenMP version for other purposes, and I hadn't realized there wasn't >

Re: OpenBLAS and performance

2017-12-20 Thread Dave Love
I wrote: > If you do provide some sort of threaded version for Python, then as far > as I remember it must use pthreads, not OpenMP, though you want the > OpenMP version for other purposes, and I hadn't realized there wasn't > one currently. I was confused. I see the only version of the

Re: OpenBLAS and performance

2017-12-20 Thread Dave Love
Pjotr Prins writes: > The last weeks I have been toying with OpenBlas and tweaking it boosts > performance magnificently over the standard install we do now. How so? I haven't measured it from Guix, but I have with Fedora packages, and OB is basically equivalent to

Re: OpenBLAS and performance

2017-12-19 Thread Ludovic Courtès
Pjotr Prins skribis: > The last weeks I have been toying with OpenBlas and tweaking it boosts > performance magnificently over the standard install we do now. A > configuration for Haswell looks like: > > >

OpenBLAS and performance

2017-12-19 Thread Pjotr Prins
The last weeks I have been toying with OpenBlas and tweaking it boosts performance magnificently over the standard install we do now. A configuration for Haswell looks like: https://gitlab.com/genenetwork/guix-bioinformatics/blob/master/gn/packages/gemma.scm#L64 It will benefit python-numpy