On Jan 19, 2007, at 6:19 PM, Arif Ali wrote:

> [0,1,59][btl_openib_component.c: 1153:btl_openib_component_progress] from > node16 to: node02 error polling HP CQ with status REMOTE ACCESS ERROR
> status number 10 for wr_id 268919352 opcode 256614836
> mpirun noticed that job rank 0 with PID 0 on node node02 exited on
> signal 15 (Terminated).
> 55 additional processes aborted (not shown)
does this happen with btl_openib_flags=1? Does this also happen without
this setting. This doesn't happen with OpenMPI-1.2b3 right?


That's Correct, I tried all the flags that was suggested, and a few more, which I listed in previous mails

I can parse your text either way, so forgive me for belaboring the point:

- Does this happen with btl_openib_flags=1 on the nightly snapshot of OMPI v1.2? - Does this happen without setting btl_openib_flags on the nightly snapshot of OMPI v1.2? - What is the exact version of the nightly snapshot for OMPI v1.2 that you are using?

Yes, correct, this doesn't happen with 1.2b3

Good to know.

Were you able to experiment with the various MCA parameters that I described in the other mail to see if such problems went away? (i.e., ensure that you're not running out of DMA-able memory)

--
Jeff Squyres
Server Virtualization Business Unit
Cisco Systems

Reply via email to