date:20160316

[OMPI users] Issue about cm PML

2016-03-16 Thread dpchoudh .

Hello all I have a simple test setup, consisting of two Dell workstation nodes with similar hardware profile. Both the nodes have (identical) 1. Qlogic 4x DDR infiniband 2. Chelsio C310 iWARP ethernet. Both of these cards are connected back to back, without a switch. With this setup, I can run O

Re: [OMPI users] running OpenMPI jobs (either 1.10.1 or 1.8.7) on SoGE more problems

2016-03-16 Thread Ralph Castain

That’s an SGE error message - looks like your tmp file system on one of the remote nodes is full. We don’t control where SGE puts its files, but it might be that your backend nodes are having issues with us doing a tree-based launch (i.e., where each backend daemon launches more daemons along th

[OMPI users] running OpenMPI jobs (either 1.10.1 or 1.8.7) on SoGE more problems

2016-03-16 Thread Lane, William

I'm getting an error message early on: [csclprd3-0-11:17355] [[36373,0],17] plm:rsh: using "/opt/sge/bin/lx-amd64/qrsh -inherit -nostdin -V -verbose" for launching unable to write to file /tmp/285019.1.verylong.q/qrsh_error: No space left on device[csclprd3-6-10:18352] [[36373,0],21] plm:rsh: usi

Re: [OMPI users] locked memory and queue pairs

2016-03-16 Thread Cabral, Matias A

I didn't go into the code to see who is actually calling this error message, but I suspect this may be a generic error for "out of memory" kind of thing and not specific to the que pair. To confirm please add -mca pml_base_verbose 100 and add -mca mtl_base_verbose 100 to see what is being sel

Re: [OMPI users] locked memory and queue pairs

2016-03-16 Thread Michael Di Domenico

On Wed, Mar 16, 2016 at 3:37 PM, Cabral, Matias A wrote: > Hi Michael, > > I may be missing some context, if you are using the qlogic cards you will > always want to use the psm mtl (-mca pml cm -mca mtl psm) and not openib btl. > As Tom suggest, confirm the limits are setup on every node: could

Re: [OMPI users] locked memory and queue pairs

2016-03-16 Thread Cabral, Matias A

Hi Michael, I may be missing some context, if you are using the qlogic cards you will always want to use the psm mtl (-mca pml cm -mca mtl psm) and not openib btl. As Tom suggest, confirm the limits are setup on every node: could it be the alltoall is reaching a node that "others" are not? Plea

Re: [OMPI users] locked memory and queue pairs

2016-03-16 Thread Michael Di Domenico

On Wed, Mar 16, 2016 at 12:12 PM, Elken, Tom wrote: > Hi Mike, > > In this file, > $ cat /etc/security/limits.conf > ... > < do you see at the end ... > > > * hard memlock unlimited > * soft memlock unlimited > # -- All InfiniBand Settings End here -- > ? Yes. I double checked that it's set on a

Re: [OMPI users] locked memory and queue pairs

2016-03-16 Thread Elken, Tom

Hi Mike, In this file, $ cat /etc/security/limits.conf ... < do you see at the end ... > * hard memlock unlimited * soft memlock unlimited # -- All InfiniBand Settings End here -- ? -Tom > -Original Message- > From: users [mailto:users-boun...@open-mpi.org] On Behalf Of Michael Di > Do

Re: [OMPI users] locked memory and queue pairs

2016-03-16 Thread Michael Di Domenico

On Thu, Mar 10, 2016 at 11:54 AM, Michael Di Domenico wrote: > when i try to run an openmpi job with >128 ranks (16 ranks per node) > using alltoall or alltoallv, i'm getting an error that the process was > unable to get a queue pair. > > i've checked the max locked memory settings across my machi

Re: [OMPI users] Error with MPI_Register_datarep

2016-03-16 Thread Edgar Gabriel

On 3/16/2016 7:06 AM, Éric Chamberland wrote: Le 16-03-14 15:07, Rob Latham a écrit : On mpich's discussion list the point was made that libraries like HDF5 and (Parallel-)NetCDF provide not only the sort of platform portability Eric desires, but also provide a self-describing file format. ==ro

Re: [OMPI users] Error with MPI_Register_datarep

2016-03-16 Thread Éric Chamberland

Le 16-03-14 15:07, Rob Latham a écrit : On mpich's discussion list the point was made that libraries like HDF5 and (Parallel-)NetCDF provide not only the sort of platform portability Eric desires, but also provide a self-describing file format. ==rob But I do not agree with that. If MPI

Re: [OMPI users] Fault tolerant feature in Open MPI

2016-03-16 Thread Husen R

In the case of MPI application (not gromacs), How do I relocate MPI application from one node to another node while it is running ? I'm sorry, as far as I know the *ompi-restart *command is used to restart application, based on checkpoint file, once the application already terminated (no longer run

Re: [OMPI users] Fault tolerant feature in Open MPI

2016-03-16 Thread Jeff Hammond

Just checkpoint-restart the app to relocate. The overhead will be lower than trying to do with MPI. Jeff On Wednesday, March 16, 2016, Husen R wrote: > Hi Jeff, > > Thanks for the reply. > > After consulting the Gromacs docs, as you suggested, Gromacs already > supports checkpoint/restart. than

Re: [OMPI users] Fault tolerant feature in Open MPI

2016-03-16 Thread Husen R

Hi Jeff, Thanks for the reply. After consulting the Gromacs docs, as you suggested, Gromacs already supports checkpoint/restart. thanks for the suggestion. Previously, I asked about checkpoint/restart in Open MPI because I want to checkpoint MPI Application and restart/migrate it while it is run

Re: [OMPI users] Open SHMEM Error

2016-03-16 Thread Gilles Gouaillardet

Ray, from shmem_ptr man page : RETURN VALUES shmem_ptr returns a pointer to the data object on the specified remote PE. If target is not remotely accessible, a NULL pointer is returned. since you are running your application on two hosts and one task per host, the target is not remote

Re: [OMPI users] Open SHMEM Error

2016-03-16 Thread RYAN RAY

Dear Gilles I have attached the source code and the hostfile. Regards Ryan From: Gilles Gouaillardet Sent: Tue, 15 Mar 2016 15:44:48 To: Open MPI Users Subject: Re: [OMPI users] Open SHMEM Error Ryan, can you please post your source code and h

Re: [OMPI users] Fault tolerant feature in Open MPI

2016-03-16 Thread Jeff Hammond

Why do you need OpenMPI to do this? Molecular dynamics trajectories are trivial to checkpoint and restart at the application level. I'm sure Gromacs already supports this. Please consult the Gromacs docs or user support for details. Jeff On Tuesday, March 15, 2016, Husen R wrote: > Dear Open MP

[OMPI users] Fault tolerant feature in Open MPI

2016-03-16 Thread Husen R

Dear Open MPI Users, Does the current stable release of Open MPI (v1.10 series) support fault tolerant feature ? I got the information from Open MPI FAQ that The checkpoint/restart support was last released as part of the v1.6 series. I just want to make sure about this. and by the way, does Ope

[OMPI users] Issue about cm PML

Re: [OMPI users] running OpenMPI jobs (either 1.10.1 or 1.8.7) on SoGE more problems

[OMPI users] running OpenMPI jobs (either 1.10.1 or 1.8.7) on SoGE more problems

Re: [OMPI users] locked memory and queue pairs

Re: [OMPI users] locked memory and queue pairs

Re: [OMPI users] locked memory and queue pairs

Re: [OMPI users] locked memory and queue pairs

Re: [OMPI users] locked memory and queue pairs

Re: [OMPI users] locked memory and queue pairs

Re: [OMPI users] Error with MPI_Register_datarep

Re: [OMPI users] Error with MPI_Register_datarep

Re: [OMPI users] Fault tolerant feature in Open MPI

Re: [OMPI users] Fault tolerant feature in Open MPI

Re: [OMPI users] Fault tolerant feature in Open MPI

Re: [OMPI users] Open SHMEM Error

Re: [OMPI users] Open SHMEM Error

Re: [OMPI users] Fault tolerant feature in Open MPI

[OMPI users] Fault tolerant feature in Open MPI

18 matches

Site Navigation

Mail list logo

Footer information