Nick Maclaren,
Yes, this is a hard problem. It is not endemic to OpenMPI, however.
This hints at the distributed memory/process/thread issues either
through the various OSs or alternately external to them in many solution
spaces.
Jeff Squyers statement that "flexible dynamic processing is not
Makes sense to me.
On Thu, 2011-02-17 at 08:49 -0700, Barrett, Brian W wrote:
> Why did "we" make this change? It was originally this way, and we changed it
> to the no-auth way for a reason.
>
> Brian
>
>
> - Original Message -
> From: Jeff Squyres [mailto:jsquy...@cisco.com]
> Sent:
Please, do.
On Wed, 2011-03-09 at 15:58 -0500, Jeff Squyres wrote:
> Crud. It's specifically listed in the NEWS, but somehow it didn't get
> included in the tarball. I'll investigate.
>
> Should we do a 1.5.3 in the immediate future with the affinity extension?
>
--
Kenneth A. Lloyd
Directo
Thanks! I'll put it through its paces ASAP.
On Fri, 2011-03-11 at 12:34 -0500, Jeff Squyres wrote:
> The only difference is the addition of the "affinity" MPI extension that was
> missing in 1.5.2.
>
> It seems to be ok for me. If no one else finds any problems, I'll release it
> Sunday or so
Rolf,
I haven't had a chance to review the code yet, but how do these changes
relate to CUDA 4.0 - especially the UVA and GPUDirect 2.0
implementation?
Ken
On Wed, 2011-04-13 at 09:47 -0700, Rolf vandeVaart wrote:
> WHAT: Add support to send data directly from CUDA device memory via
> MPI calls.
George, Yes. GPUDirect eliminated an additional (host) memory buffering
step between the HCA and the GPU that took CPU cycles.
I was never very comfortable with the kernel patch necessary, nor the
patched OFED required to make it all work. Having said that, it did
provide a ~14% improvement in th
Point well made, Nick. In other words, irrespective of OS or language,
are we citing the need for "application correcting code" from OpenMPI,
(relocate a/o retry) similar to ECC in memory?
Ken
On Thu, 2011-04-14 at 14:31 +0100, N.M. Maclaren wrote:
> On Apr 14 2011, Ralph Castain wrote:
> >>
>
I'd suggest supporting CUDA device queries in carto and hwloc.
Ken
On Thu, 2011-04-14 at 11:25 -0400, Jeff Squyres wrote:
> On Apr 13, 2011, at 12:47 PM, Rolf vandeVaart wrote:
>
> > By default, the code is disable and has to be configured into the library.
> > --with-cuda(=DIR) Build
Thanks Rolf. We'll try it out.
Ken
On Tue, 2011-04-19 at 13:45 -0700, Rolf vandeVaart wrote:
> WHAT: Second try to add support to send data directly from CUDA device
> memory via MPI calls.
>
>
>
> TIMEOUT: 4/26/2011
>
>
>
> DETAILS: Based on all the feedback (thanks to everyone who look
Before I jump in, is anyone already actively working in this area?
Ken
Thanks. I've read your (Joshua Hersey's) Ph.D. thesis on fault
tolerance using checkpointing with much interest. It would be of further
interest to get the range of possible user requirements for defining the
behaviors in response to various faults.
Ken Lloyd
On Fri, 2011-04-22 at 1
Rolf,
We have identified a need in certain high performance applications to
specify memory sections -> L3 -> L2 - >L1 -> specific core -> specific
CPU -> specific machine. These tend toward hybridized CUDA apps where
other sections of the CPU are involved in non-CUDA (non-GPU) functions.
In our di
Josh and Wesley,
Will you be presenting Resilient ORTE at Resilience 2011 in Bordeaux?
http://xcr.cenit.latech.edu/resilience2011/
=
Kenneth A. Lloyd
CEO - Director of Systems Science
Watt Systems Technologies Inc.
www.wattsys.com
kenneth.ll...@wattsys.com
This e-mail is co
that has occurred, we'll probably be
> close to what we would call a "resilient" system.
>
>
> Until then, we are improving, but still far from "resilient".
>
>
>
>
>
> On Jun 24, 2011, at 10:24 AM, Ken Lloyd wrote:
>
14 matches
Mail list logo