As a point of comparison. I've been running a PETSc CG algorithm on an
Nvidia K20. The simulation has 1.4e7 elements.
The PETSc AXPY takes .001 seconds in single precision. That's 26 GFlops.
In another simulation using a double complex BiCG algorithm with 1.e6
unknowns, the Petsc MatMult on the
Hi guys,
today I got a gentle introduction into our testing machine equipped with
two Intel MICs. They are still beta, yet I could run some simple kernels
in native mode. As an example, without any modification of existing
OpenMP code for vector addition in double precision of 3e6 elements, I
Karl,
Could you get me an account as well?
I'm at INL right now and we are trying to build libmesh with petsc on a mic.
Building petsc turns out to be pretty straightforward, but libmesh is
another story.
libmesh can use tbb to parallelize FEM assembly, and I'm curious to see how
that performs.
T