Re: [gridengine users] May I build hybrid SGE?

2015-12-17 Thread Steven Du
Hi Reuti, Please ignore my previous email. I am getting closer to my goal by re-compiling the source. I have finished compile successfully, except missing PAM (As I am not root user, I cannot install PAM at this moment.) However, it looks work and Qexecd is communicating to 64bit QMASTER now, and

Re: [gridengine users] May I build hybrid SGE?

2015-12-17 Thread Steven Du
Hi Reuti, Thank you very much for explaining and pointing to the valuable learning resource. I ran into a very weird issue. My running env: one 64bit QMASTER and a few of 64bit QEXECD running good. One 32bit Qexecd on 32bit SLES10SP4. 32bit Qexecd works fine if it join all 32bit SGE environmen

Re: [gridengine users] GE 2011.11p1: got no connection within 60 seconds

2015-12-17 Thread Kamel Mazouzi
Hi, mpirun (Intel) is just a wrapper for mpdboot + mpiexec + mpdallexit while mpiexec.hydra is the new intel mpi process spawner which is tightly integrated with Grid Engine since the version 4.3.1 Regards, On Thu, Dec 17, 2015 at 4:19 PM, Reuti wrote: > Maybe `mpirun` doesn't support/use Hydr

Re: [gridengine users] GE 2011.11p1: got no connection within 60 seconds

2015-12-17 Thread Reuti
Maybe `mpirun` doesn't support/use Hydra. Although not required, the MPI standard specifies `mpiexec` as a portable startup mechanism. Doesn't Intel MPI also have an `mpiexec`, which would match the `mpirun` behavior (and doesn't use Hydra)? -- Reuti > Am 17.12.2015 um 15:06 schrieb Gowtham :

Re: [gridengine users] GE 2011.11p1: got no connection within 60 seconds

2015-12-17 Thread Gowtham
Yes sir. mpirun and mpiexec.hydra are both from Intel Cluster Studio suite. To make sure of this, I ran a quick batch job with which mpirun which mpiexec.hydra and it returned /share/apps/intel/2013.0.028/impi/4.1.0.024/intel64/bin/mpirun /share/apps/intel/2013.0.028/impi/4.1.0.024/i

Re: [gridengine users] GE 2011.11p1: got no connection within 60 seconds

2015-12-17 Thread Reuti
> Am 17.12.2015 um 13:41 schrieb Gowtham : > > > I tried replacing the call to mpirun with mpiexec.hydra and it seems to work > successfully as before. Please find below the contents of *.sh.o file > corresponding to the Hello, World! run spanning more than one compute node: Are both `mpi

Re: [gridengine users] May I build hybrid SGE?

2015-12-17 Thread Reuti
> Am 17.12.2015 um 13:19 schrieb Steven Du : > > Hi Reuti/Joshua, > > I tried but failed. Here I wonder ( I am using 2011.11p1 version.): > > 1. If all GRID members have to be NFS shared spooling dir, such as > $SGE_CELL/common. Otherwise, the client cannot be started. I doubt, but once > I u

Re: [gridengine users] GE 2011.11p1: got no connection within 60 seconds

2015-12-17 Thread Gowtham
I tried replacing the call to mpirun with mpiexec.hydra and it seems to work successfully as before. Please find below the contents of *.sh.o file corresponding to the Hello, World! run spanning more than one compute node: Parallel version of 'Go Huskies!' with 16 processors

Re: [gridengine users] GE 2011.11p1: got no connection within 60 seconds

2015-12-17 Thread Gowtham
Here you go, Sir. These two PEs are created by me (not from Rocks) to help our researchers pick one depending on the nature of their job. If a software suite required that all processors/cores belong to the same physical compute node (e.g., MATLAB with Parallel Computing Toolbox), then they wo

Re: [gridengine users] May I build hybrid SGE?

2015-12-17 Thread Steven Du
Hi Reuti/Joshua, I tried but failed. Here I wonder ( I am using 2011.11p1 version.): 1. If all GRID members have to be NFS shared spooling dir, such as $SGE_CELL/common. Otherwise, the client cannot be started. I doubt, but once I used the different name, the sge_execd cannot be started. 2. This

Re: [gridengine users] GE 2011.11p1: got no connection within 60 seconds

2015-12-17 Thread Reuti
> Am 16.12.2015 um 21:32 schrieb Gowtham : > > > Hi Reuti, > > The MPI associated with Intel Cluster Studio 2013.0.028 is 4.1.0.024, and I > do not need mpdboot. The PE used for this purpose is called mpich_unstaged > (basically, a copy of the original mpich with '$fill_up' rule). The only >