Hard to say what could be the cause of the problem without a better 
understanding of the code, but the root cause appears to be some code path that 
allows you to call an MPI function after you called MPI_Finalize. From your 
description, it appears you have a race condition in the code that activates 
the code path.


On Jan 19, 2014, at 6:33 AM, thomas.fo...@ulstein.com wrote:

> Yes. It's a shared NSF partition on the nodes. 
> 
> Sendt fra min iPhone
> 
> > Den 19. jan. 2014 kl. 13:29 skrev "Reuti" <re...@staff.uni-marburg.de>:
> > 
> > Hi,
> > 
> > Am 18.01.2014 um 22:43 schrieb thomas.fo...@ulstein.com:
> > 
> > > I have had a running cluster going good for a while, and 2 days ago we 
> > > decided to upgrade it from 128 to 256 cores. 
> > > 
> > > Most om my deployment of nodes goes through cobbler and scripting, and it 
> > > has worked fine before.on the first 8 nodes. 
> > 
> > The same version of Open MPI is installed also on the new nodes?
> > 
> > -- Reuti
> > 
> > 
> > > But after adding new nodes, everything is fucked up and i have no idea 
> > > why:( 
> > > 
> > > #*** The MPI_Comm_f2c() function was called after MPI_FINALIZE was 
> > > invoked. 
> > > *** This is disallowed by the MPI standard. 
> > > *** Your MPI job will now abort. 
> > > [dpn10.cfd.local:14994] Local abort after MPI_FINALIZE completed 
> > > successfully; not able to aggregate error messages, and not able to 
> > > guarantee that all other processes were killed! 
> > > *** The MPI_Comm_f2c() function was called after MPI_FINALIZE was 
> > > invoked. 
> > > *** This is disallowed by the MPI standard. 
> > > *** Your MPI job will now abort. 
> > > # 
> > > 
> > > The random strange issue that if i launch 8 32core jobs, 3 end of 
> > > running, while the other 5 dies with this error, and its even using a few 
> > > of new nodes in the job. 
> > > 
> > > Any idea what is causing it?, its so random i dont know where to start.. 
> > > 
> > > 
> > > ./Thomas 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > Denne e-posten kan innehalde informasjon som er konfidensiell 
> > > og/eller underlagt lovbestemt teieplikt. Kun den tiltenkte adressat har 
> > > adgang 
> > > til å lese eller vidareformidle denne e-posten eller tilhøyrande vedlegg. 
> > > Dersom De ikkje er den tiltenkte mottakar, vennligst kontakt avsendar pr 
> > > e-post, slett denne e-posten med vedlegg og makuler samtlige utskrifter 
> > > og kopiar av den.
> > > 
> > > 
> > > This e-mail may contain confidential information, or otherwise 
> > > be protected against unauthorised use. Any disclosure, distribution or 
> > > other use of the information by anyone but the intended recipient is 
> > > strictly prohibited. 
> > > If you have received this e-mail in error, please advise the sender by 
> > > immediate reply and destroy the received documents and any copies hereof.
> > > 
> > > 
> > > 
> > > PBefore 
> > > printing, think about the environment
> > > 
> > > 
> > > 
> > > _______________________________________________
> > > users mailing list
> > > us...@open-mpi.org
> > > http://www.open-mpi.org/mailman/listinfo.cgi/users
> > > 
> > 
> > _______________________________________________
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> 
> Denne e-posten kan innehalde informasjon som er konfidensiell og/eller 
> underlagt lovbestemt teieplikt. Kun den tiltenkte adressat har adgang til å 
> lese eller vidareformidle denne e-posten eller tilhøyrande vedlegg. Dersom De 
> ikkje er den tiltenkte mottakar, vennligst kontakt avsendar pr e-post, slett 
> denne e-posten med vedlegg og makuler samtlige utskrifter og kopiar av den.
> 
> This e-mail may contain confidential information, or otherwise be protected 
> against unauthorised use. Any disclosure, distribution or other use of the 
> information by anyone but the intended recipient is strictly prohibited. If 
> you have received this e-mail in error, please advise the sender by immediate 
> reply and destroy the received documents and any copies hereof.
> 
> PBefore printing, think about the environment
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

Reply via email to