Bonjour John,

 First, Thanks for your feedback.

Le 17 déc. 10 à 16:13, John Hearns a écrit :

On 17 December 2010 14:45, Gilbert Grosdidier
<gilbert.grosdid...@cern.ch> wrote:
Bonjour,
 About this issue, for which I got NO feedback ;-)

Gilbert, as you have an SGI cluster, have you filed a support request to SGI?

gg= Yes, I filed one, but with no more luck yet.

Also, which firmware do you have installed?
I have        Firmware version: 2.5.0

gg= I don't know, and firmware_revs does not seem to be available.
Only thing I got on a worker node was with lspci :
03:00.0 InfiniBand: Mellanox Technologies MT26418 [ConnectX IB DDR, PCIe 2.0 5GT/s] (rev a0)


http://www.openfabrics.org/downloads/OFED/ofed-1.4/OFED-1.4-docs/mlx4_release_notes.txt

gg= Looking into this one, I noticed pointers towards /etc/infiniband/ connectx.conf
and /sbin/connectx_port_config, but they are not available either.


Features that are enabled with FW 2.5.0 only:
- Send with invalidate and Local invalidate send queue work requests.
- Resize CQ support.

gg= I also spotted some special hooks inside openib code about
HAVE_IBV_GET_DEVICE_LIST, HAVE_IBV_CREATE_XRC_RCV_QP and HAVE_IBV_FORK_INIT.
Are any of them suspicious together with ConnectX HCAs, please ?


 Thanks,    Best,    G.






I recently spotted
into btl_openib.c code, that this error message could come from
some missing ConnectX HCA ibv_resize_cq function. Well ...
 I was unable yet to figure out why/how this could occur, but I have
a now a closely related question about ConnectX Infiniband HCA :
does anybody know which other unimplemented IB functionalities
could be lacking for this ConnectX HCA ?
 This could allow me to patch appropriately by hand the OpenMPI code,
since I currently believe these functionalities are going
undetected as missing by the configure step.
 Thanks,    Best,    G.

Le 15 déc. 10 à 08:59, Gilbert Grosdidier a écrit :

Bonjour,

Running with OpenMPI 1.4.3 on an SGI Altix cluster with 2048 cores, I got
this error message on all cores, right at startup :

btl_openib.c:211:adjust_cq] cannot resize completion queue, error: 12

What could be the culprit please ?
Is there a workaround ?
What parameter is to be tuned ?

Thanks in advance for any help,    Best,    G.







Reply via email to