Re: [OMPI users] segfault when resuming on different host

2011-12-29 Thread Lloyd Brown
Josh, When I use cr_{run,checkpoint,restart} to start a checkpoint and restart a single-threaded, single-process app on a different host, it works, even with prelinking enabled. That's kinda why I assumed the problem was with the OpenMPI code, and didn't look at the BLCR FAQ that closely, to be h

Re: [OMPI users] segfault when resuming on different host

2011-12-29 Thread Josh Hursey
Often this type of problem is due to the 'prelink' option in Linux. BLCR has a FAQ item that discusses this issue and how to resolve it: https://upc-bugs.lbl.gov/blcr/doc/html/FAQ.html#prelink I would give that a try. If that does not help then you might want to try checkpointing a single (non-M

[OMPI users] segfault when resuming on different host

2011-12-29 Thread Lloyd Brown
Hi, all. I'm in the middle of testing some of the checkpoint/restart capabilities of OpenMPI with BLCR on our cluster. I've been able to checkpoint and restart successfully when I restart on the same nodes as it was running previously. But when I try to restart on a different host, I always get