Ok, I finally was able to get on and run some ofed tests - it looks to me like I must have something configured wrong with the qlogic cards, but I have no idea what???

Mellanox to Qlogic:
 ibv_rc_pingpong n15
  local address:  LID 0x0006, QPN 0x240049, PSN 0x87f83a, GID ::
  remote address: LID 0x000d, QPN 0x00b7cb, PSN 0xcc9dee, GID ::
8192000 bytes in 0.01 seconds = 4565.38 Mbit/sec
1000 iters in 0.01 seconds = 14.35 usec/iter

ibv_srq_pingpong n15
  local address:  LID 0x0006, QPN 0x280049, PSN 0xf83e06, GID ::
 ...
8192000 bytes in 0.01 seconds = 9829.91 Mbit/sec
1000 iters in 0.01 seconds = 6.67 usec/iter

ibv_uc_pingpong n15
  local address:  LID 0x0006, QPN 0x680049, PSN 0x7b33d2, GID ::
  remote address: LID 0x000d, QPN 0x00b7ed, PSN 0x7fafaa, GID ::
8192000 bytes in 0.02 seconds = 4080.19 Mbit/sec
1000 iters in 0.02 seconds = 16.06 usec/iter

Qlogic to Qlogic

ibv_rc_pingpong n15
  local address:  LID 0x000b, QPN 0x00afb7, PSN 0x3f08df, GID ::
  remote address: LID 0x000d, QPN 0x00b7ef, PSN 0xd15096, GID ::
8192000 bytes in 0.02 seconds = 3223.13 Mbit/sec
1000 iters in 0.02 seconds = 20.33 usec/iter

ibv_srq_pingpong n15
  local address:  LID 0x000b, QPN 0x00afb9, PSN 0x9cdde3, GID ::
 ...
8192000 bytes in 0.01 seconds = 9018.30 Mbit/sec
1000 iters in 0.01 seconds = 7.27 usec/iter

ibv_uc_pingpong n15
  local address:  LID 0x000b, QPN 0x00afd9, PSN 0x98cfa0, GID ::
  remote address: LID 0x000d, QPN 0x00b811, PSN 0x0a0d6e, GID ::
8192000 bytes in 0.02 seconds = 3318.28 Mbit/sec
1000 iters in 0.02 seconds = 19.75 usec/iter

Mellanox to Mellanox

ibv_rc_pingpong n5
  local address:  LID 0x0009, QPN 0x240049, PSN 0xd72119, GID ::
  remote address: LID 0x0006, QPN 0x6c0049, PSN 0xc1909e, GID ::
8192000 bytes in 0.01 seconds = 7121.93 Mbit/sec
1000 iters in 0.01 seconds = 9.20 usec/iter

ibv_srq_pingpong n5
  local address:  LID 0x0009, QPN 0x280049, PSN 0x78f4f7, GID ::
...
8192000 bytes in 0.00 seconds = 24619.08 Mbit/sec
1000 iters in 0.00 seconds = 2.66 usec/iter

ibv_uc_pingpong n5
  local address:  LID 0x0009, QPN 0x680049, PSN 0x4002ea, GID ::
  remote address: LID 0x0006, QPN 0x300049, PSN 0x29abf0, GID ::
8192000 bytes in 0.01 seconds = 7176.52 Mbit/sec
1000 iters in 0.01 seconds = 9.13 usec/iter


On 07/17/11 05:49, Jeff Squyres wrote:
Interesting.

Try with the native OFED benchmarks -- i.e., get MPI out of the way and see if 
the raw/native performance of the network between the devices reflects the same 
dichotomy.

(e.g., ibv_rc_pingpong)


On Jul 15, 2011, at 7:58 PM, David Warren wrote:

All OFED 1.4 and 2.6.32 (that's what I can get to today)
qib to qib:

# OSU MPI Latency Test v3.3
# Size            Latency (us)
0                         0.29
1                         0.32
2                         0.31
4                         0.32
8                         0.32
16                        0.35
32                        0.35
64                        0.47
128                       0.47
256                       0.50
512                       0.53
1024                      0.66
2048                      0.88
4096                      1.24
8192                      1.89
16384                     3.94
32768                     5.94
65536                     9.79
131072                   18.93
262144                   37.36
524288                   71.90
1048576                 189.62
2097152                 478.55
4194304                1148.80

# OSU MPI Bandwidth Test v3.3
# Size        Bandwidth (MB/s)
1                         2.48
2                         5.00
4                        10.04
8                        20.02
16                       33.22
32                       67.32
64                      134.65
128                     260.30
256                     486.44
512                     860.77
1024                   1385.54
2048                   1940.68
4096                   2231.20
8192                   2343.30
16384                  2944.99
32768                  3213.77
65536                  3174.85
131072                 3220.07
262144                 3259.48
524288                 3277.05
1048576                3283.97
2097152                3288.91
4194304                3291.84

# OSU MPI Bi-Directional Bandwidth Test v3.3
# Size     Bi-Bandwidth (MB/s)
1                         3.10
2                         6.21
4                        13.08
8                        26.91
16                       41.00
32                       78.17
64                      161.13
128                     312.08
256                     588.18
512                     968.32
1024                   1683.42
2048                   2513.86
4096                   2948.11
8192                   2918.39
16384                  3370.28
32768                  3543.99
65536                  4159.99
131072                 4709.73
262144                 4733.31
524288                 4795.44
1048576                4753.69
2097152                4786.11
4194304                4779.40

mlx4 to mlx4:
# OSU MPI Latency Test v3.3
# Size            Latency (us)
0                         1.62
1                         1.66
2                         1.67
4                         1.66
8                         1.70
16                        1.71
32                        1.75
64                        1.91
128                       3.11
256                       3.32
512                       3.66
1024                      4.46
2048                      5.57
4096                      6.62
8192                      8.95
16384                    11.07
32768                    15.94
65536                    25.57
131072                   44.93
262144                   83.58
524288                  160.85
1048576                 315.47
2097152                 624.68
4194304                1247.17

# OSU MPI Bandwidth Test v3.3
# Size        Bandwidth (MB/s)
1                         1.80
2                         4.21
4                         8.79
8                        18.14
16                       35.79
32                       68.58
64                      132.72
128                     221.89
256                     399.62
512                     724.13
1024                   1267.36
2048                   1959.22
4096                   2354.26
8192                   2519.50
16384                  3225.44
32768                  3227.86
65536                  3350.76
131072                 3369.86
262144                 3378.76
524288                 3384.02
1048576                3386.60
2097152                3387.97
4194304                3388.66

# OSU MPI Bi-Directional Bandwidth Test v3.3
# Size     Bi-Bandwidth (MB/s)
1                         1.70
2                         3.86
4                        10.42
8                        20.99
16                       41.22
32                       79.17
64                      151.25
128                     277.64
256                     495.44
512                     843.44
1024                    162.53
2048                   2427.23
4096                   2989.63
8192                   3587.58
16384                  5391.08
32768                  6051.56
65536                  6314.33
131072                 6439.04
262144                 6506.51
524288                 6539.51
1048576                6558.34
2097152                6567.24
4194304                6555.76

mixed:
# OSU MPI Latency Test v3.3
# Size            Latency (us)
0                         3.81
1                         3.88
2                         3.86
4                         3.85
8                         3.92
16                        3.93
32                        3.93
64                        4.02
128                       4.60
256                       4.80
512                       5.14
1024                      5.94
2048                      7.26
4096                      8.50
8192                     10.98
16384                    19.92
32768                    26.35
65536                    39.93
131072                   64.45
262144                  106.93
524288                  191.89
1048576                 358.31
2097152                 694.25
4194304                1429.56

# OSU MPI Bandwidth Test v3.3
# Size        Bandwidth (MB/s)
1                         0.64
2                         1.39
4                         2.76
8                         5.58
16                       11.03
32                       22.17
64                       43.70
128                     100.49
256                     179.83
512                     305.87
1024                    544.68
2048                    838.22
4096                   1187.74
8192                   1542.07
16384                  1260.93
32768                  1708.54
65536                  2180.45
131072                 2482.28
262144                 2624.89
524288                 2680.55
1048576                2728.58
never gets past here

# OSU MPI Bi-Directional Bandwidth Test v3.3
# Size     Bi-Bandwidth (MB/s)
1                         0.41
2                         0.83
4                         1.68
8                         3.37
16                        6.71
32                       13.37
64                       26.64
128                      63.47
256                     113.23
512                     202.92
1024                    362.48
2048                    578.53
4096                    830.31
8192                   1143.16
16384                  1303.02
32768                  1913.07
65536                  2463.83
131072                 2793.83
262144                 2918.32
524288                 2987.92
1048576                3033.31
never gets past here



On 07/15/11 09:03, Jeff Squyres wrote:
I don't think too many people have done combined QLogic + Mellanox runs, so 
this probably isn't a well-explored space.

Can you run some microbenchmarks to see what kind of latency / bandwidth you're 
getting between nodes of the same type and nodes of different types?

On Jul 14, 2011, at 8:21 PM, David Warren wrote:


On my test runs (wrf run just long enough to go beyond the spinup influence)
On just 6 of the the old mlx4 machines I get about 00:05:30 runtime
On 3 mlx4 and 3 qib nodes I get avg of 00:06:20
So the slow down is about 11+%
When this is a full run 11% becomes a evry long time.  This has held for some 
longer tests as well before I went to ofed 1.6.

On 07/14/11 05:55, Jeff Squyres wrote:

On Jul 13, 2011, at 7:46 PM, David Warren wrote:



I finally got access to the systems again (the original ones are part of our 
real time system). I thought I would try one other test I had set up first.  I 
went to OFED 1.6 and it started running with no errors. It must have been an 
OFED bug. Now I just have the speed problem. Anyone have a way to make the 
mixture of mlx4 and qlogic work together without slowing down?


What do you mean by "slowing down"?



<warren.vcf>


<warren.vcf>

<<attachment: warren.vcf>>

Reply via email to