-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Luis,
First of all, I wonder: To what extent is this problem reproducible? E.g., does your job always run on the same 4 nodes? Is it always the same node(s) that are slow? Does the problem also show up in other calculations (maybe just changing the number of k-points, or restarting the same case from scratch). Is it only lapw1 that is slow? Second, how did you make those ‘top’s? As for ‘lapw0’ and ‘lapw1’, I am guessing that this is just because the snapshots were taken at different times (notice that the CPU times of lapw0 on the two nodes are quite different, too). About the CPU usage on ‘n2’, I find this very suspicious. If it is as Peter said that the jobs are in the initialization and therefore not computing much, that may be fine; but I have to disagree with his assessment, because the memory usage of lapw1 on the two nodes is basically the same (if anything, the image sizes on ‘n2’ are slightly larger). Note also that it is *not* the case that other processes are using the CPU; the total usage is at 7.5 %. It would be good to clarify that by getting a ‘top’ such that we know that lapw1 had been running for a while. To this end, top has an ‘-n’ option which says how many frames to output, e.g. ‘top -bn 10’. I am also curious about the load averages. ‘n2’ has larger “mid-term” and “long-term” load averages than the others, and its “short-term” average is just as large. I am not sure what that means. On 09/23/2015 02:21 PM, Luis Ogando wrote: > I can not access the nodes. SSH among them is forbidden ! We have > to ask the administrators for anything !! It is the hell !! Of > course, only the PBS jobs can "travel" among the nodes. I do not know about PBS Pro, but Torque and SGE have an option (I think ‘-I’ in either case) to submit an interactive job where you get a login on a node. Of course that is only a realistic option when the queuing time is not too long. Otherwise, any information that a more sophisticated tool can give you will also be available from the command line (just more painful to extract!) via ‘top’, ‘ps’, ‘/proc’, etc. You can also put these things in a jobs script (which you apparently already did with ‘top’). Good luck, Elias -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 Comment: Using GnuPG with Icedove - http://www.enigmail.net/ iQIcBAEBAgAGBQJWA7M8AAoJEE/4gtQZfOqPu5AQAJERPcJ8VBgVJdiVmDPSmfC0 9lJ+NUXWbNKxP9oXVChniwB/p0TUn588xVtVGIiXuviIW6jWM/reh7aU4NkXfxz/ J3zQq+yZ/gqMnK3JseNpq5hosU6f8keG4dGvq/qz3a+fDefe3Q1KoaTotG3oOyzY foq3RJjIoY0M7Yl2VJXhhDU6fLWNuu2Uixd9DpbWDmUzhY2o7y8zUZrCdEN0CMN7 OcaUWAkPzFwAdGY/ZVzmc4AvBICXAndBRd29KIMF5JJAxKqwXzbCbROZC14spCl5 Yt8A3deCiUrCGKTuT8w4or8shtkfGxFXXWAEKxY9kKpsHRGmbcOmIVljXk3x6JpV VOo5y3xHOEmaGOGGRZSDRGK0AWpkiep71us9zOYmnTd0GVuulOOAfi6m4FyTS0vc 3FPws2FUaOZWHm+K0AEMJyyxY5Sz6NwN6sTmiPfelvUdKLDHpDDVyig1a0X+x39+ jfgOx/J927rCYvyWA1/n5h6Mqj7ByUYA3zM9nrrTt3mw5YM/fgCyqlFp8M9cWWRF cW54Aes9cnV2GdhnbLy7cuOwXK5J7FV6uyQFPipaAkuGEG7ynvUWQdvnftX9j1hL O8S6WOzZDUYduB3mXJ5XT2iV2jjRd3zEk1niQcRfyFuQUYneY9zuGjpxkknmxEln 5KaBqwFCLo4XnRrvlDkg =PO9e -----END PGP SIGNATURE----- _______________________________________________ Wien mailing list Wien@zeus.theochem.tuwien.ac.at http://zeus.theochem.tuwien.ac.at/mailman/listinfo/wien SEARCH the MAILING-LIST at: http://www.mail-archive.com/wien@zeus.theochem.tuwien.ac.at/index.html