Joseph,
Indeed, there was a problem in the MXM rpm.
The fixed MXM has been published at the same location:
http://mellanox.com/downloads/hpc/mxm/v1.1/mxm-latest.tar
-- YK
On 12/4/2012 9:20 AM, Joseph Farran wrote:
> Hi Mike.
>
> Removed the old mxm, downloaded and installed:
>
> /tmp/mxm/v
You can also set these parameters in /etc/modprobe.conf:
options mlx4_core log_num_mtt=24 log_mtts_per_seg=1
-- YK
On 11/30/2012 2:12 AM, Yevgeny Kliteynik wrote:
> On 11/30/2012 12:47 AM, Joseph Farran wrote:
>> I'll assume: /etc/modprobe.d/mlx4_en.conf
>
> Add thes
On 11/30/2012 12:47 AM, Joseph Farran wrote:
> I'll assume: /etc/modprobe.d/mlx4_en.conf
Add these to /etc/modprobe.d/mofed.conf:
options mlx4_core log_num_mtt=24
options mlx4_core log_mtts_per_seg=1
And then restart the driver.
You need to do it on all the machines.
-- YK
>
> On 11/29/2012 0
Joseph,
On 11/29/2012 11:50 PM, Joseph Farran wrote:
> make[2]: Entering directory
> `/data/apps/sources/openmpi-1.6.3/ompi/mca/mtl/mxm'
> CC mtl_mxm.lo
> CC mtl_mxm_cancel.lo
> CC mtl_mxm_component.lo
> CC mtl_mxm_endpoint.lo
> CC mtl_mxm_probe.lo
> CC mtl_mxm_recv.lo
> CC mtl_mxm_send.lo
> CCLD
On 11/28/2012 10:52 AM, Pavel Mezentsev wrote:
> You can try downloading and installing a fresher version of MXM from mellanox
> web site. There was a thread on the list with the same problem, you can
> search for it.
Indeed, that OFED version comes with older version of MXM.
You can get the new
------
>
Randolph,
On 9/7/2012 7:43 AM, Randolph Pullen wrote:
> Yevgeny,
> The ibstat results:
> CA 'mthca0'
> CA type: MT25208 (MT23108 compat mode)
What you have is InfiniHost III HCA, which is 4x SDR card.
This card has theoretical peak of 10 Gb/s, which is 1GB/s in IB bit coding.
> And more interest
------
On 9/4/2012 7:21 PM, Yong Qin wrote:
> On Tue, Sep 4, 2012 at 5:42 AM, Yevgeny Kliteynik
> wrote:
>> On 8/30/2012 10:28 PM, Yong Qin wrote:
>>> On Thu, Aug 30, 2012 at 5:12 AM, Jeff Squyres wrote:
>>>> On Aug 29, 2012, at 2:25 PM, Yong Qin wrote:
>>>&g
On 8/30/2012 10:28 PM, Yong Qin wrote:
> On Thu, Aug 30, 2012 at 5:12 AM, Jeff Squyres wrote:
>> On Aug 29, 2012, at 2:25 PM, Yong Qin wrote:
>>
>>> This issue has been observed on OMPI 1.6 and 1.6.1 with openib btl but
>>> not on 1.4.5 (tcp btl is always fine). The application is VASP and
>>> onl
Randolph,
Some clarification on the setup:
"Melanox III HCA 10G cards" - are those ConnectX 3 cards configured to Ethernet?
That is, when you're using openib BTL, you mean RoCE, right?
Also, have you had a chance to try some newer OMPI release?
Any 1.6.x would do.
-- YK
On 8/31/2012 10:53 AM,
Hi,
I just noticed that my previous mail bounced,
but it doesn't matter. Please ignore it if
you got it anyway - I re-read the thread and
there is a much simpler way to do it.
If you want to check whether LID L is reachable
through HCA H from port P, you can run this command:
smpquery --Ca H
On 24-Jan-12 5:59 PM, Ronald Heerema wrote:
> I was wondering if anyone can comment on the current state of support for the
> openib btl when MPI_THREAD_MULTIPLE is enabled.
Short version - it's not supported.
Longer version - no one really spent time on testing it and fixing all
the places where
On 13-Jan-12 12:23 AM, Nathan Hjelm wrote:
> I would start by adjusting btl_openib_receive_queues . The default uses
> a per-peer QP which can eat up a lot of memory. I recommend using no
> per-peer and several shared receive queues.
> We use S,4096,1024:S,12288,512:S,65536,512
And here's the FAQ
Hi,
Does OMPI with IMP work OK on the official OFED release?
Do the usual ibv performance tests (ibv_rc_*) work on your customized OFED?
-- YK
On 29-Dec-11 9:34 AM, Venkateswara Rao Dokku wrote:
> Hi,
> We tried running the Intel Benchmarks(IMB_3.2) on the customized
> OFED(that was build
Hi,
> By any chance is it a particular node (or pair of nodes) this seems to
> happen with?
No. I've got 40 nodes total with this hardware configuration, and the
problem has been seen on most/all nodes at one time or another. It
doesn't seem, based on the limited numb
On 16-Dec-11 4:28 AM, Jeff Squyres wrote:
> Very strange. I have a lot of older mthca-based HCAs in my Cisco MPI test
> cluster, and I don't see these kinds of problems.
>
> Mellanox -- any ideas?
So if I understand it right, you have a mixed cluster - some
machines with ConnecX HCAs family (ml
On 05-Oct-11 3:41 PM, Jeff Squyres wrote:
> On Oct 5, 2011, at 9:35 AM, Yevgeny Kliteynik wrote:
>
>>> Yevgeny -- can you check that out?
>>
>> Yep, indeed - configure doesn't abort when "--enable-openib-rdmacm"
>> is provided and "rdma/rdma
On 05-Oct-11 3:15 PM, Jeff Squyres wrote:
>> You shouldn't use the "--enable-openib-rdmacm" option - rdmacm
>> support is enabled by default, providing librdmacm is found on
>> the machine.
>
> Actually, this might be a configure bug. We have lots of other configure
> options that, even if "foo"
Jeff,
On 01-Oct-11 1:01 AM, Konz, Jeffrey (SSA Solution Centers) wrote:
> Encountered a problem when trying to run OpenMPI 1.5.4 with RoCE over 10GbE
> fabric.
>
> Got this run time error:
>
> An invalid CPC name was specified via the btl_openib_cpc_include MCA
> parameter.
>
>Local host:
On 26-Sep-11 11:27 AM, Yevgeny Kliteynik wrote:
> On 22-Sep-11 12:09 AM, Jeff Squyres wrote:
>> On Sep 21, 2011, at 4:24 PM, Sébastien Boisvert wrote:
>>
>>>> What happens if you run 2 ibv_rc_pingpong's on each node? Or N
>>>> ibv_rc_pingpongs?
&g
On 22-Sep-11 12:09 AM, Jeff Squyres wrote:
> On Sep 21, 2011, at 4:24 PM, Sébastien Boisvert wrote:
>
>>> What happens if you run 2 ibv_rc_pingpong's on each node? Or N
>>> ibv_rc_pingpongs?
>>
>> With 11 ibv_rc_pingpong's
>>
>> http://pastebin.com/85sPcA47
>>
>> Code to do that => https://gist
Hi Sébastien,
If I understand you correctly, you are running your application on two
different MPIs on two different clusters with two different IB vendors.
Could you make a comparison more "apples to apples"-ish?
For instance:
- run the same version of Open MPI on both clusters
- run the same
On 14-Sep-11 12:59 PM, Jeff Squyres wrote:
> On Sep 13, 2011, at 6:33 PM, kevin.buck...@ecs.vuw.ac.nz wrote:
>
>> there have been two runs of jobs that invoked the mpirun using these
>> OpenMPI parameter setting flags (basically, these mimic what I have
>> in the global config file)
>>
>> -mca btl
This means that you have some problem on that node,
and it's probably unrelated to Open MPI.
Bad cable? Bad port? FW/driver in some bad state?
Do other IB performance tests work OK on this node?
Try rebooting the node.
-- YK
On 12-Sep-11 7:52 AM, Ahsan Ali wrote:
> Hello all
>
> I am getting fol
On 30-Aug-11 4:50 PM, Michael Shuey wrote:
> I'm using RoCE (or rather, attempting to) and need to select a
> non-default GID to get my traffic properly classified.
You probably saw it, but just making sure:
http://www.open-mpi.org/faq/?category=openfabrics#ompi-over-roce
> Both 1.4.4rc2
> and 1.
Egor,
If updating OFED doesn't solve the problem (and I kinda have the
feeling that it does), you might want to try this mailing list
for IB interoperability questions:
linux-r...@vger.kernel.org
-- YK
On 26-Aug-11 4:42 PM, Shamis, Pavel wrote:
> You may try to update your OFED version. I think
plication Performance Tools Group
> Computer Science and Math Division
> Oak Ridge National Laboratory
>
>
>
>
>
>
> On Aug 1, 2011, at 11:41 AM, Yevgeny Kliteynik wrote:
>
>> Hi,
>>
>> Please try running OMPI with XRC:
>>
>> m
Hi,
Please try running OMPI with XRC:
mpirun --mca btl openib... --mca btl_openib_receive_queues
X,128,256,192,128:X,2048,256,128,32:X,12288,256,128,32:X,65536,256,128,32 ...
XRC (eXtended Reliable Connection) decreases memory consumption
of Open MPI by decreasing number of QP per machine.
I
On 11-Jul-11 5:23 PM, Bill Johnstone wrote:
> Hi Yevgeny and list,
>
> - Original Message -
>
>> From: Yevgeny Kliteynik
>
>> I'll check the MCA_BTL_OPENIB_TRANSPORT_UNKNOWN thing and get back to you.
>
> Thank you.
That's interesting...
T
Hi Yiguang,
On 08-Jul-11 4:38 PM, ya...@adina.com wrote:
> Hi all,
>
> The message says :
>
> [[17549,1],0][btl_openib_component.c:3224:handle_wc] from
> gulftown to: gulftown error polling LP CQ with status LOCAL
> LENGTH ERROR status number 1 for wr_id 492359816 opcode
> 32767 vendor error 10
Hi Bill,
On 08-Jul-11 7:59 PM, Bill Johnstone wrote:
> Hello, and thanks for the reply.
>
>
>
> - Original Message -
>> From: Jeff Squyres
>> Sent: Thursday, July 7, 2011 5:14 PM
>> Subject: Re: [OMPI users] InfiniBand, different OpenFabrics transport types
>>
>> On Jun 28, 2011, at 1:4
Gretchen,
Could you please send stack-trace of the processes when it hangs? (with
padb/gdb)
Does the same problem persist in small scale (2,3 nodes)?
What is the minimal setup that reproduces the problem?
-- YK
>
> -- Forwarded message --
> From: *Gretchen* mailto:umassastroh..
Michael,
Could you try to run this again with "--mca mpi_leave_pinned 0" parameter?
I suspect that this might be due to a message size problem - MPI
tries to do RDMA with a message bigger than what HCA supports.
-- YK
On 11-Apr-11 7:44 PM, Michael Di Domenico wrote:
> Here's a chunk of code that
You can explicitly specify the type of buffering
that you want to get with setvbuf() C function.
It can be block-buffered, line-buffered and unbuffered.
Stdout is line-buffered by default.
To make it un-buffered, you need something like this:
setvbuf(stdout, NULL, _IONBF, 0)
-- YK
On 30-Mar-11
35 matches
Mail list logo