Hello everyone,    I am facing problem while calling mpirun in a 
loop when using with SGE. My sge version is SGE6.1AR_snapshot3. The script i am 
submitting via sge is 
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxlet
 i=0while [ $i -lt 100 ]do        echo 
"############################################################################################"       
 echo "Iteration :$i"        
/usr/local/openmpi-1.2.4/bin/mpirun -np $NP -hostfile $TMP/machines 
send        let 
"i+=1"        echo 
"############################################################################################"donexxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxNow
 above script runs well for 15-20 iteration and then fails with following 
message-------------------------Error 
Message-------------------------------------------------------------------error:
 executing task of job 3869 failed: execution daemon on host "n101" didn't 
accept task[n199:11989] ERROR: A daemon on node n101 failed to start as 
expected.[n199:11989] ERROR: There may be more information available 
from[n199:11989] ERROR: the 'qstat -t' command on the Grid Engine 
tasks.[n199:11989] ERROR: If the problem persists, please restart 
the[n199:11989] ERROR: Grid Engine PE job[n199:11989] ERROR: The daemon exited 
unexpectedly with status 
1.-----------------------------------------------------------------------------------------------------------When
 i do ssh to n101, there is no orted and qrsh_starter running. While checking 
its spool file, i came across following 
message-----------------------------------------------Execd spool Error 
Message---------------------------------|execd|n101|E|no free queue for job 
3869 of user neeraj@n199 (localhost = 
n101)---------------------------------------------------------------------------------------
--------------------------------What could be the reason for it.While checking 
the mailing list, i come across following link    
    
http://www.open-mpi.org/community/lists/users/2007/03/2771.phpbut, i dont think 
its the same problem. Any help is appreciated.RegardsNeeraj

Reply via email to