Please have a look at the end of case.outputup_* which gives the real cpu and wall times and post those. It may be that the times being reported are misleading.
In addition, I do not understand why you are seeing an error and the script is continuing - it should not. Maybe some of the tasks are not working or there are bugs in the csh. It may be useful to post the dayfile. --------------------------- Professor Laurence Marks Department of Materials Science and Engineering Northwestern University www.numis.northwestern.edu 1-847-491-3996 "Research is to see what everybody else has seen, and to think what nobody else has thought" Albert Szent-Gyorgi On May 3, 2013 6:47 PM, "Oliver Albertini" <o...@georgetown.edu> wrote: > Thanks to you both for the suggestions. The OS was recently updated > beyond those versions mentioned in the link (now 6100-08). > > Adding the iostat statement to all the errclr.f files prevents the > program from stopping altogether although error messages sill appear in the > output: > > STOP LAPW0 END > STOP LAPW0 END > STOP LAPW0 END > STOP LAPW0 END > STOP LAPW0 END > STOP LAPW1 - Error > STOP LAPW1 END > STOP LAPW1 END > STOP LAPW1 END > STOP LAPW1 END > STOP LAPW1 - Error > STOP LAPW1 END > STOP LAPW1 END > STOP LAPW1 END > STOP LAPW1 END > STOP LAPW2 - FERMI; weighs written > STOP LAPW2 END > STOP LAPW2 END > STOP LAPW2 END > STOP LAPW2 END > STOP LAPW2 END > STOP SUMPARA END > STOP LAPW2 - FERMI; weighs written > STOP LAPW2 END > STOP LAPW2 END > STOP LAPW2 END > STOP LAPW2 END > STOP LAPW2 END > STOP SUMPARA END > STOP CORE END > STOP CORE END > STOP MIXER END > > > which are more prevalent when using higher processor counts. After > completing a few runs with more processors, the times have continually > increased: > > real 6m43.33s > > > user 6m19.18s serial > > > sys 0m13.59s > > > > > > real 10m36.03s > > > user 1m4.68s 2proc > > > sys 0m47.79s > > > > > > real 11m11.25s > > > user 1m5.24s 4proc > > > sys 0m52.17s > > > > > > real 11m39.17s > > > user 1m6.18s 8proc > > > sys 1m10.65s > > > > > > real 14m31.16s > > > user 1m7.95s 16proc > > > sys 2m7.63s > > After looking into various IBM Parallel Operating Environment (poe) > environmental variables (MP_SHARED_MEMORY,MP_IO_BUFFER_SIZE,MP_EAGER_LIMIT) > it seems like none of them are improving performance. Any ideas why this is > getting slower? > > > On Thu, May 2, 2013 at 8:49 PM, Gavin Abo <gs...@crimson.ua.edu> wrote: > >> >> STOP LAPW0 END >>> "inilpw.f", line 233: 1525-142 The CLOSE statement on unit 200 cannot be >>> completed because an errno value of 2 (A file or directory in the path name >>> does not exist.) was received while closing the file. The program will >>> stop. >>> STOP LAPW1 END >>> >> If this is on operating system AIX 6.1 [http://zeus.theochem.tuwien.** >> ac.at/pipermail/wien/2013-**March/018560.html<http://zeus.theochem.tuwien.ac.at/pipermail/wien/2013-March/018560.html>], >> the following link mentions that a fix might be needed for some release >> levels: >> >> http://www-01.ibm.com/support/**docview.wss?uid=isg1IZ23555<http://www-01.ibm.com/support/docview.wss?uid=isg1IZ23555> >> ______________________________**_________________ >> Wien mailing list >> w...@zeus.theochem.tuwien.ac.**at <Wien@zeus.theochem.tuwien.ac.at> >> http://zeus.theochem.tuwien.**ac.at/mailman/listinfo/wien<http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien> >> SEARCH the MAILING-LIST at: http://www.mail-archive.com/** >> w...@zeus.theochem.tuwien.ac.**at/index.html<http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html> >> > >
_______________________________________________ Wien mailing list Wien@zeus.theochem.tuwien.ac.at http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien SEARCH the MAILING-LIST at: http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html