RE: [openib-general] ip over ib throughtput

2005-01-06 Thread Stephen Poole
Title: RE: [openib-general] ip over ib throughtput First, I love Hardware Reliability. 1QP per node, this might be fine for small clusters, but what about larger clusters, where I have an all-to-all communications pattern ? What about say *IF* something like IB was ever designed into a BG/L

RE: [openib-general] OpenSM died a horrible death

2005-01-06 Thread Eitan Zahavi
Title: RE: [openib-general] OpenSM died a horrible death Hi Shahaf The assert are in: osm_lid_mgr.c:968: CL_ASSERT( p_mgr-p_subn-sm_port_guid ); osm_lid_mgr.c:1011: CL_ASSERT( p_mgr-p_subn-sm_port_guid ); osm_mcast_mgr.c:1150: CL_ASSERT( port_guid ); osm_port.c:977: CL_ASSERT(

RE: [openib-general] OpenSM died a horrible death

2005-01-06 Thread Hal Rosenstock
On Thu, 2005-01-06 at 10:11, Eitan Zahavi wrote: The assert are in: osm_lid_mgr.c:968: CL_ASSERT( p_mgr-p_subn-sm_port_guid ); osm_lid_mgr.c:1011: CL_ASSERT( p_mgr-p_subn-sm_port_guid ); osm_mcast_mgr.c:1150: CL_ASSERT( port_guid ); osm_port.c:977: CL_ASSERT( port_guid );

Re: [openib-general] Some Missing Features from mthca/user MAD access

2005-01-06 Thread Michael S. Tsirkin
Hello! Quoting r. Hal Rosenstock ([EMAIL PROTECTED]) Re: [openib-general] Some Missing Features from mthca/user MAD access: On Thu, 2005-01-06 at 05:53, Michael S. Tsirkin wrote: Hello, Roland! Quoting r. Roland Dreier ([EMAIL PROTECTED]) Re: [openib-general] Some Missing Features from

Re: [openib-general] Some Missing Features from mthca/user MAD access

2005-01-06 Thread Hal Rosenstock
On Thu, 2005-01-06 at 10:32, Michael S. Tsirkin wrote: Making sure the bit is cleared is indeed important. It is simple when SM exits properly. Hopefully something can occur at process cleanup time to ensure this does happen even in the case where the SM dies. Otherwise a special utility

Re: [openib-general] Some Missing Features from mthca/user MAD access

2005-01-06 Thread Roland Dreier
The simplest approach I think is, keep some file open, when its closed (which happends automatically when the process dies) clean the is_sm bit. This does seem to be the simplest approach. However, there are two issues I'm still trying to figure out: - Where should the file be? Do we really

Re: [openib-general] Some Missing Features from mthca/user MAD access

2005-01-06 Thread Hal Rosenstock
On Thu, 2005-01-06 at 10:56, Roland Dreier wrote: I don't understand this. The (logical) port state shows everything the LEDs show: a state of INIT means one LED is on, ACTIVE means both are on. What if it doesn't get to INIT ? Also, the justification for doing this in the kernel is that

Re: [openib-general] Some Missing Features from mthca/user MAD access

2005-01-06 Thread Michael S. Tsirkin
Hello! Quoting r. Hal Rosenstock ([EMAIL PROTECTED]) Re: [openib-general] Some Missing Features from mthca/user MAD access: On Thu, 2005-01-06 at 11:01, Roland Dreier wrote: The simplest approach I think is, keep some file open, when its closed (which happends automatically when the

Re: [openib-general] Some Missing Features from mthca/user MAD access

2005-01-06 Thread Roland Dreier
I wander if sysfs can be used for this somehow. Not as we're discussing, because all the file operations are already set by the sysfs code. However, is it so bad to make the existing cap_mask sysfs file writable and just say that userspace has to clean up if the SM exits uncleanly? - R.

[openib-general] suggestions for features

2005-01-06 Thread Sean Hefty
While working on the CM, I came up with the following list of features that could be useful. I don't have time currently to implement any of these properly, but they're probably worth discussing. * It would be nice to be able to take a received MAD and turn it around as a send. * When

RE: [openib-general] ip over ib throughtput

2005-01-06 Thread Michael Krause
At 04:43 AM 1/6/2005, Diego Crupnicoff wrote: I feel like we are talking about different things here: The ***IP*** MTU is relevant for IPoIB performance because it determines the number of times that you are going to be hit by the per-packet overhead of the ***host*** networking stack. My

Re: [openib-general] Some Missing Features from mthca/user MAD access

2005-01-06 Thread Michael S. Tsirkin
Hello! Quoting r. Roland Dreier ([EMAIL PROTECTED]) Re: [openib-general] Some Missing Features from mthca/user MAD access: I wander if sysfs can be used for this somehow. Not as we're discussing, because all the file operations are already set by the sysfs code. I know, I was thinking

Re: [openib-general] ip over ib throughtput

2005-01-06 Thread Ronald G. Minnich
On Thu, 6 Jan 2005, Grant Grundler wrote: That's a limitation of linux. Linux drivers assume physically contigous pages are available for anything that crosses a page boundary. KISS when it works but not robust. yeah, I know, freebsd never had this problem ... FWIW, I had the impression

Re: [openib-general] Some Missing Features from mthca/user MAD access

2005-01-06 Thread Hal Rosenstock
On Thu, 2005-01-06 at 12:38, Michael S. Tsirkin wrote: After consideration, I think the proper way is add a reference count and clean is_sm when it falls to 0. Why is a reference count needed ? (Just want to understand). This way runnning two opensms on the same HCA and two different HCAs in

[openib-general] Sockets Extensions Completed

2005-01-06 Thread Michael Krause
FYI.The specification can be found at: http://www.opengroup.org/bookstore/catalog/c050.htm Use of this new interface will enable Sockets based applications to fully exploit the performance of RDMA interconnects through the SDP wire protocol. This API also provides explicit memory

Re: [openib-general] Some Missing Features from mthca/user MAD access

2005-01-06 Thread Roland Dreier
I am trying to say it could be transparent. you could be running two sms and it could work more or less. So for example opensm hangs. If I kill it, it will clear the is_sm bit, *but* I dont want that. The way to do it cold be to start a new one, then kill the old one. How can two SMs on the

Re: [openib-general] Re: mstflint failing on sparc64

2005-01-06 Thread Roland Dreier
I know. But where does lspci get the domain number? From sysfs -- lspci goes through the entries in /sys/bus/pci/devices. If you strace lspci on a modern distro, you can see it doesn't open anything in /proc. Cool, how do *they* look in /proc/bus/pci/devices? As you can see from show_device

Re: [openib-general] Some Missing Features from mthca/user MAD access

2005-01-06 Thread Roland Dreier
They dont have to work, for example I can CTRL-Z one of them and start another one. Unfortunately this won't work with the current MAD layer. The first SM will register an agent to receive SM class MADs, and the second SM will fail because the agent is already registered. - R.

Re: [openib-general] Some Missing Features from mthca/user MAD access

2005-01-06 Thread Sean Hefty
Roland Dreier wrote: They dont have to work, for example I can CTRL-Z one of them and start another one. Unfortunately this won't work with the current MAD layer. The first SM will register an agent to receive SM class MADs, and the second SM will fail because the agent is already registered.

[openib-general] Examples...

2005-01-06 Thread St. Clair, Timothy (GE Healthcare)
Are there any code examples + docs available for the additions to the 2.6 kernel updates. I am somewhat familiar with the VAPI interface from Mellanox, but I am uncertain what changes or modifications may be necessary given the new additions. Also, are you aware of any docs which help to

[openib-general] SRQ field in CM messages

2005-01-06 Thread Sean Hefty
I've been coding the CM messages, and just setting the SRQ field in them based on whether a QP has a SRQ. My guess is that this will work fine, but my question is does anyone know why the CM or remote QP cares about this at all? I want to make sure that I'm not missing something here. -

Re: [openib-general] Re: IPoIB Path Static Rate

2005-01-06 Thread Michael Krause
At 09:58 AM 12/18/2004, Hal Rosenstock wrote: On Sat, 2004-12-18 at 12:55, Roland Dreier wrote: Surely link width and/or speed can't change without the port state changing, can they? As I understand it, the link layer can't renegotiate this sort of thing without bringing the link down. In

RE: [openib-general] SRQ field in CM messages

2005-01-06 Thread Diego Crupnicoff
Title: RE: [openib-general] SRQ field in CM messages This bit was added to the CM protocol so that the remote side QP can distinguish between a SRQ and a TCA that does not generate e2e credits. Thanks, Diego -Original Message- From: Sean Hefty [mailto:[EMAIL PROTECTED]] Sent:

Re: [openib-general] SRQ field in CM messages

2005-01-06 Thread Sean Hefty
Diego Crupnicoff wrote: This bit was added to the CM protocol so that the remote side QP can distinguish between a SRQ and a TCA that does not generate e2e credits. Thanks, Diego thanks ___ openib-general mailing list openib-general@openib.org

Re: [openib-general] OpenSM died a horrible death

2005-01-06 Thread Tom Duffy
On Wed, 2005-01-05 at 20:03 -0500, Hal Rosenstock wrote: Do you know what was going on on the subnet at the time ? Did a end node SA client request a PathRecord with a SGID of 0 but turn the component mask bit for SGID on ? Running the subnet with opensm exposed a bug on the Solaris side that

Re: [openib-general] Re: mstflint failing on sparc64

2005-01-06 Thread Ronald G. Minnich
On Thu, 6 Jan 2005, Michael S. Tsirkin wrote: Well, I see regular 8100 there, where does lspci get another : ? Its a mystery. that's the pci domain stuff. Turns out on newer machines you can have multiple pci configuration domains. Oh joy :-) ron

Re: [openib-general] Re: mstflint failing on sparc64

2005-01-06 Thread Grant Grundler
On Thu, Jan 06, 2005 at 10:55:19AM -0800, Roland Dreier wrote: I know. But where does lspci get the domain number? From sysfs -- lspci goes through the entries in /sys/bus/pci/devices. If you strace lspci on a modern distro, you can see it doesn't open anything in /proc. That's an

RE: [openib-general] OpenSM died a horrible death

2005-01-06 Thread Tom Duffy
On Thu, 2005-01-06 at 20:05 +0200, shaharf wrote: Hi Tom, Are you able to reproduce this problem? If you are I would like you to reproduce it will full verbosity (-V). If you cant or the scenario is not consistent, please tell me too. It might direct us to some other directions. Well, I

Re: [openib-general] Re: mstflint failing on sparc64

2005-01-06 Thread Michael S. Tsirkin
Hello! Quoting r. Roland Dreier ([EMAIL PROTECTED]) Re: [openib-general] Re: mstflint failing on sparc64: I know. But where does lspci get the domain number? From sysfs -- lspci goes through the entries in /sys/bus/pci/devices. If you strace lspci on a modern distro, you can see it doesn't

Re: [openib-general] Re: mstflint failing on sparc64

2005-01-06 Thread Michael S. Tsirkin
Hello! Quoting r. Michael S. Tsirkin ([EMAIL PROTECTED]) [openib-general] Re: mstflint failing on sparc64: tat:~# ./mstflint -d 81:00.0 q Bus error Interesting. Maybe mmap does not work as it should? Could you run it under gdb and do a backtrace? I also added a sanity checks

Re: [openib-general] Re: mstflint failing on sparc64

2005-01-06 Thread Michael S. Tsirkin
Hello! Quoting r. Ronald G. Minnich (rminnich@lanl.gov) Re: [openib-general] Re: mstflint failing on sparc64: On Thu, 6 Jan 2005, Michael S. Tsirkin wrote: Well, I see regular 8100 there, where does lspci get another : ? Its a mystery. that's the pci domain stuff. Turns out on

Re: [openib-general] Some Missing Features from mthca/user MAD access

2005-01-06 Thread Hal Rosenstock
On Thu, 2005-01-06 at 16:44, Michael S. Tsirkin wrote: Well, I was thinking for things like failover it could be nice. I would think failover is more reliable with SMs on different machines but this is a conceivable scenario. I for one need to convince myself that the SM state machine works fine

[openib-general] Re: mstflint failing on sparc64

2005-01-06 Thread Tom Duffy
On Thu, 2005-01-06 at 20:28 +0200, Michael S. Tsirkin wrote: Crashes on access to mapped memory. Could you print mf-ptr and offset at that point? (gdb) print mf-ptr $1 = (void *) 0x70304000 (gdb) print offset $2 = 984060 Generally, do yo happend to know if mmapping /dev/mem to userspace

Re: [openib-general] Sockets Extensions Completed

2005-01-06 Thread Roland Dreier
Michael Use of this new interface will enable Sockets based Michael applications to fully exploit the performance of RDMA Michael interconnects through the SDP wire protocol. This API Michael also provides explicit memory management taking some of Michael the guesswork out of

Re: [openib-general] Re: mstflint failing on sparc64

2005-01-06 Thread Tom Duffy
On Thu, 2005-01-06 at 23:52 +0200, Michael S. Tsirkin wrote: Tom, if you can try it before the weekend I'll be thankful, I am working on Sundays, but I dont have a sparc. I took the plunge and tried to flash the firmware, and it took! tat:~# ./mstflint -d /proc/bus/pci/\:81/00.0 q Image

Re: [openib-general] Re: mstflint failing on sparc64

2005-01-06 Thread Grant Grundler
On Thu, Jan 06, 2005 at 03:26:13PM -0800, Roland Dreier wrote: Generally, do yo happend to know if mmapping /dev/mem to userspace works on this architecture? I can't imagine that it would not. I will see if I can dig info up. The one thing that is weird on sparc64 is that the pci bus

[openib-general] [PATCH] ibstatus script quickfix

2005-01-06 Thread Tom Duffy
If certain fields do not exist on the node you are running ibstatus script on, like when Roland adds a new one and you haven't upgraded yet, have ibstatus behave better. Signed-off-by: Tom Duffy [EMAIL PROTECTED] Index: gen2/trunk/src/userspace/management/diags/host/scripts/ibstatus

Re: [openib-general] ip over ib throughtput

2005-01-06 Thread Grant Grundler
On Tue, Jan 04, 2005 at 05:11:11PM -0800, Roland Dreier wrote: Grant I'll see how hard it would be to try it with tg3. My previous patch seems like it won't work (need MSGINT_MODE_ENABLE set too, according to http://www.ussg.iu.edu/hypermail/linux/kernel/0301.3/0123.html) I've applied

Re: [openib-general] [PATCH] ibstatus script quickfix

2005-01-06 Thread Hal Rosenstock
On Thu, 2005-01-06 at 19:22, Tom Duffy wrote: If certain fields do not exist on the node you are running ibstatus script on, like when Roland adds a new one and you haven't upgraded yet, have ibstatus behave better. Signed-off-by: Tom Duffy [EMAIL PROTECTED] Thanks. Applied. Patch is line