Running the openib stack from Redhat on a 2.6.9-34.ELsmp kernel, dual 
Xeon.  Running with openmpi v1.0.2 compiled w/gcc.

While we still have the problem with btl_openib_endpoint.c returning  0 
byte(s) for max inline data, and realize that another IB stack addresses 
this, another problem when running across more than a single host pops 
up generating huge amounts of error messages.

The errors go something like this:

mca_mpool_openib_register: ibv_reg_mr(0x2ac2622000,1052672) failed with 
error: Cannot allocate memory
[0,1,1][btl_openib.c:496:mca_btl_openib_prepare_dst] 
mpool_register(0x2ac2622040,1048576) failed: base 0x2ac2222040 lb 0 
offset 4194304

We fixed the /etc/security/limits.conf problem but I don't know what to 
do about this one.  The job seems to complete without error on 2 nodes 
(4 processors) but to scale any larger just generates megabyte files of 
these types of error messages.

Any insights for this problem?  All searches lead me to the limits.conf 
which we have set to 8192.  These are 8G machines if that makes any 
difference.

Thanks,
Bill

_______________________________________________
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Reply via email to