Hello,

Some time ago we had an issue with the takedownnode script killing parallel
job processes.  The issue occurred if two parallel jobs owned by the same
user ran on a common node--when the takedownnode script ran, it would kill
both jobs belonging to the user.

We have used the patched script for production use for quite some time,
and hopefully it will be useful for others experiencing this issue.  I am
unable at this time to agree to the licensing agreement for inclusion
with xCAT, nor is the actual patch author, but I offer this without
warranty of any kind, etc. in the hopes it may be useful for others.

Cheers,
Drew.


-- --- ---- ----- ---- --- -- --- ---- ----- ---- --- -- --- ---- ----- ---- --
Drew Leske, Senior Systems Administrator and WestGrid Site Lead
Data Centre Services, University Systems, University of Victoria
Office: 250-472-5055, mobile: 250-588-4311
diff -uNr scripts.orig/takedownnode scripts/takedownnode
--- scripts.orig/takedownnode   2010-05-17 11:27:59.000000000 -0700
+++ scripts/takedownnode        2012-03-12 10:38:38.000000000 -0700
@@ -74,12 +74,18 @@
 then
        if [ "$USER" != "root" ]
        then
-               perl -pi -e "s/ $USER\b//g" /etc/security/access.conf
-               killuser $USER
+               perl -pi -e "s/ $USER\b//" /etc/security/access.conf
        fi
+
+       PROCESS_IDS="`ps jeax | grep PBS_JOBID=${PBS_JOBID} | grep -v 'grep'| 
awk '{print $2}'`"
+       for pid in $PROCESS_IDS
+       do
+               logger "Killing $pid belonging to job $PBS_JOBID"
+               kill -9 $pid
+       done
 fi
 
-for i in /scr /nobackup /tmp
+for i in /scr /scratch /tmp
 do
        if [ -d $i/$PBS_JOBID ]
        then
@@ -87,5 +93,3 @@
        fi
 done
 
-#rm -f /tmp/pvm[dl].$(id -u $USER) 2>&1
-
------------------------------------------------------------------------------
Try New Relic Now & We'll Send You this Cool Shirt
New Relic is the only SaaS-based application performance monitoring service 
that delivers powerful full stack analytics. Optimize and monitor your
browser, app, & servers with just a few lines of code. Try New Relic
and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_may
_______________________________________________
xCAT-user mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/xcat-user

Reply via email to