Hard to say what could be the cause of the problem without a better understanding of the code, but the root cause appears to be some code path that allows you to call an MPI function after you called MPI_Finalize. From your description, it appears you have a race condition in the code that activates the code path.
On Jan 19, 2014, at 6:33 AM, thomas.fo...@ulstein.com wrote: > Yes. It's a shared NSF partition on the nodes. > > Sendt fra min iPhone > > > Den 19. jan. 2014 kl. 13:29 skrev "Reuti" <re...@staff.uni-marburg.de>: > > > > Hi, > > > > Am 18.01.2014 um 22:43 schrieb thomas.fo...@ulstein.com: > > > > > I have had a running cluster going good for a while, and 2 days ago we > > > decided to upgrade it from 128 to 256 cores. > > > > > > Most om my deployment of nodes goes through cobbler and scripting, and it > > > has worked fine before.on the first 8 nodes. > > > > The same version of Open MPI is installed also on the new nodes? > > > > -- Reuti > > > > > > > But after adding new nodes, everything is fucked up and i have no idea > > > why:( > > > > > > #*** The MPI_Comm_f2c() function was called after MPI_FINALIZE was > > > invoked. > > > *** This is disallowed by the MPI standard. > > > *** Your MPI job will now abort. > > > [dpn10.cfd.local:14994] Local abort after MPI_FINALIZE completed > > > successfully; not able to aggregate error messages, and not able to > > > guarantee that all other processes were killed! > > > *** The MPI_Comm_f2c() function was called after MPI_FINALIZE was > > > invoked. > > > *** This is disallowed by the MPI standard. > > > *** Your MPI job will now abort. > > > # > > > > > > The random strange issue that if i launch 8 32core jobs, 3 end of > > > running, while the other 5 dies with this error, and its even using a few > > > of new nodes in the job. > > > > > > Any idea what is causing it?, its so random i dont know where to start.. > > > > > > > > > ./Thomas > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Denne e-posten kan innehalde informasjon som er konfidensiell > > > og/eller underlagt lovbestemt teieplikt. Kun den tiltenkte adressat har > > > adgang > > > til å lese eller vidareformidle denne e-posten eller tilhøyrande vedlegg. > > > Dersom De ikkje er den tiltenkte mottakar, vennligst kontakt avsendar pr > > > e-post, slett denne e-posten med vedlegg og makuler samtlige utskrifter > > > og kopiar av den. > > > > > > > > > This e-mail may contain confidential information, or otherwise > > > be protected against unauthorised use. Any disclosure, distribution or > > > other use of the information by anyone but the intended recipient is > > > strictly prohibited. > > > If you have received this e-mail in error, please advise the sender by > > > immediate reply and destroy the received documents and any copies hereof. > > > > > > > > > > > > PBefore > > > printing, think about the environment > > > > > > > > > > > > _______________________________________________ > > > users mailing list > > > us...@open-mpi.org > > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > > > > _______________________________________________ > > users mailing list > > us...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > Denne e-posten kan innehalde informasjon som er konfidensiell og/eller > underlagt lovbestemt teieplikt. Kun den tiltenkte adressat har adgang til å > lese eller vidareformidle denne e-posten eller tilhøyrande vedlegg. Dersom De > ikkje er den tiltenkte mottakar, vennligst kontakt avsendar pr e-post, slett > denne e-posten med vedlegg og makuler samtlige utskrifter og kopiar av den. > > This e-mail may contain confidential information, or otherwise be protected > against unauthorised use. Any disclosure, distribution or other use of the > information by anyone but the intended recipient is strictly prohibited. If > you have received this e-mail in error, please advise the sender by immediate > reply and destroy the received documents and any copies hereof. > > PBefore printing, think about the environment > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users