Running the openib stack from Redhat on a 2.6.9-34.ELsmp kernel, dual Xeon. Running with openmpi v1.0.2 compiled w/gcc.
While we still have the problem with btl_openib_endpoint.c returning 0 byte(s) for max inline data, and realize that another IB stack addresses this, another problem when running across more than a single host pops up generating huge amounts of error messages. The errors go something like this: mca_mpool_openib_register: ibv_reg_mr(0x2ac2622000,1052672) failed with error: Cannot allocate memory [0,1,1][btl_openib.c:496:mca_btl_openib_prepare_dst] mpool_register(0x2ac2622040,1048576) failed: base 0x2ac2222040 lb 0 offset 4194304 We fixed the /etc/security/limits.conf problem but I don't know what to do about this one. The job seems to complete without error on 2 nodes (4 processors) but to scale any larger just generates megabyte files of these types of error messages. Any insights for this problem? All searches lead me to the limits.conf which we have set to 8192. These are 8G machines if that makes any difference. Thanks, Bill _______________________________________________ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general