[openib-general] thanks and a question

2006-04-12 Thread Ronald G Minnich
I was working with someone and watching a 256-node bproc cluster boot friday. The openib folks have done a lot of very nice work. It booted quite well once we set hoq and slv to 17 in the voltaire switch. It was really snappy coming up. It was actually as fast to boot as a myrinet cluster,

Re: [openib-general] thanks and a question

2006-04-12 Thread Ronald G Minnich
Hal Rosenstock wrote: hoq is HOQLife. Is slv the switch LifeTimeValue ? I believe so. Does that have anything to do with those settings ? it would not work until hoq and slv were 17. Truly hanging ? yes, and it was the only real connection at that point, from the bproc daemon on the

Re: [openib-general] switch from svn to git

2006-04-07 Thread Ronald G Minnich
Bryan O'Sullivan wrote: On Fri, 2006-04-07 at 12:39 -0700, Sean Hefty wrote: I wanted to start a discussion about migrating the openib code repository from svn to git. I'm not very open to using git; it has a horrible user interface. I'd much prefer to see a switch to something cleaner,

Re: [openib-general] PathScale license

2006-01-05 Thread Ronald G Minnich
Bryan O'Sullivan wrote: On Thu, 2005-12-29 at 15:42 +0100, Christoph Hellwig wrote: PathScale's use of this language is not original. SGI has used, and perhaps originated, the additional language. XFS has been switched to a normal short GPL boilerplate exactly because this wording is not

[openib-general] simple rarp code for gen2

2006-01-04 Thread Ronald G Minnich
I have some rarp code for gen2. It does not work. Does anyone have a VERY simple example for RARP over ib? I have some code that is supposed to work; it appears not to work. thanks ron ___ openib-general mailing list openib-general@openib.org

Re: [openib-general] PathScale license

2005-12-31 Thread Ronald G Minnich
is there any chance that pathscale could reword that to be less confusing? It clearly caused a lot of confusion and worry for folks on this list. ron ___ openib-general mailing list openib-general@openib.org

Re: [openib-general] PathScale license

2005-12-24 Thread Ronald G. Minnich
Hi, The PathScale OpenIB license includes the following which is beyond the normal OpenIB license: * Patent licenses, if any, provided herein do not apply to * combinations of this program with other software, or any other * product whatsoever. ??? What the heck could this mean? This

Re: [openib-general] Re: Opensm - casting issues #2

2005-09-13 Thread Ronald G Minnich
On Tue, Sep 13, 2005 at 09:26:31AM -0700, Sean Hefty wrote: My understanding is that the labs, who control the OpenIB servers, refused to host any Windows related code, forcing it to have a separate repository. wow, that's news to me! Maybe I'm at the wrong lab! Anybody have a source for

Re: [openib-general] Re: Opensm - casting issues #2

2005-09-13 Thread Ronald G Minnich
Roland Dreier wrote: Actually I think the issue was somewhat different. Microsoft is so allergic to the GPL that they asked for the code to be in a physically separate repository. that makes much more sense, ah, well, not really, but it is easier to understand. I doubt the Labs would have

Re: [openib-general] Re: [PATCH 05/16] IB uverbs: core implementation

2005-06-29 Thread Ronald G. Minnich
On Tue, 28 Jun 2005, Greg KH wrote: On Tue, Jun 28, 2005 at 04:03:43PM -0700, Roland Dreier wrote: +++ linux/drivers/infiniband/core/uverbs_main.c 2005-06-28 15:20:04.363963991 -0700 @@ -0,0 +1,708 @@ +/* + * Copyright (c) 2005 Topspin Communications. All rights reserved. + *

[openib-general] Re: [PATCH] rdma_lat-09 and results

2005-06-23 Thread Ronald G. Minnich
On Fri, 24 Jun 2005, Michael S. Tsirkin wrote: I had this impression that I can have a .so not being present on the slave at boot, and then dlopen could pull it across the network with some custom protocol without going over NFS. not at present, at least on bproc. dlopen needs a path name.

[openib-general] Re: [PATCH] rdma_lat-09 and results

2005-06-23 Thread Ronald G. Minnich
On Fri, 24 Jun 2005, Michael S. Tsirkin wrote: I had this impression that I can have a .so not being present on the slave at boot, and then dlopen could pull it across the network with some custom protocol without going over NFS. And I was asking, if so, what other calls can do this

[openib-general] Re: [PATCH] rdma_lat-09 and results

2005-06-23 Thread Ronald G. Minnich
On Fri, 24 Jun 2005, Michael S. Tsirkin wrote: So, if you want to run without nfs (or such), you basically need to link the applications statically, is that right? the .so files you want have to be in /lib on the node, e.g. [EMAIL PROTECTED] ~]$ bpsh 0 ls /lib ld-2.3.3.so ld-linux.so.2

[openib-general] Re: performance counters in /sys

2005-05-23 Thread Ronald G. Minnich
On Mon, 23 May 2005, Michael S. Tsirkin wrote: I guess the thing that has me mystified about all this is I can certainly appreciate the potential 'goodness' of having 1 var/file for user oriented access but perhaps one of the better examples of why this is just a bad idea for

Re: [openib-general] performance counters in /sys

2005-05-20 Thread Ronald G. Minnich
On Fri, 20 May 2005, Grant Grundler wrote: Not entirely. One could fill unimplemented values with -1 or 0. or use s-expressions a la supermon. That's worked the best for us in widely varying environments. These fixed-format tables of the type that /proc delivers are painful. For

Re: [openib-general] Re: RDMA memory registration

2005-05-03 Thread Ronald G. Minnich
On 5/3/05, David Addison [EMAIL PROTECTED] wrote: as our recent IOPROC patch on lkml shows, it's not that invasive. There are just 24 hooks added to the Linux VM code paths - which we have been able to maintain outside the mainline tree for many years now. As these hooks only need to

Re: [openib-general] Re: RDMA memory registration

2005-04-29 Thread Ronald G. Minnich
On Fri, 29 Apr 2005, Bill Jordan wrote: I'm very confused at this point. Can you briefly explain how this works, or point me to a description? I don't see how you could do user level I/O without registering the memory with the hardware. I'm especially confused by the comment (may not have

RE: [openib-general] Re: RDMA memory registration

2005-04-29 Thread Ronald G. Minnich
On Fri, 29 Apr 2005, Rimmer, Todd wrote: But that implies the hardware has an MMU and it also puts an interrupt in the path per page sent. yes. it does. and it doesn't do per page sent, just per page that has no pte on the nic when received. ron

Re: [openib-general] Re: RDMA memory registration

2005-04-29 Thread Ronald G. Minnich
On Fri, 29 Apr 2005, Greg Lindahl wrote: It doesn't imply that there's an MMU, either. I know that Myricom uses a little lookup routine in software on their nic, which most people wouldn't call an MMU. I don't know what Mellanox does for this, they don't talk much about what's hardware and

Re: [openib-general] Re: RDMA memory registration

2005-04-29 Thread Ronald G. Minnich
On Fri, 29 Apr 2005, Caitlin Bestler wrote: One is that the RDMA hardware, however it is marketed, essentially needs to act as an MMU. That means that it has to be synchronized with normal MMU. The traditional sledge-hammer approach to ah ha! his RDMA mmu just crashed his mm

[openib-general] Link encap:UNSPEC

2005-03-31 Thread Ronald G. Minnich
is there a number that this means, i.e. is ifconfig saying I don't know this number so it is UNSPEC or is it a number that means NaN? thanks ron ___ openib-general mailing list openib-general@openib.org

Re: [openib-general] vstat error on bproc slave node (VAPI_EGEN)

2005-03-14 Thread Ronald G. Minnich
On Mon, 14 Mar 2005, gshipman wrote: I am attempting to configure our small cluster to use bproc and openib. Note I am using gen1 on kernel 2.6.6 patched with the clustermatic stuff, (should I be using gen2, is it stable for general use?). use gen2. I have tested it and it is ok. I have

RE: [openib-general] Question

2005-02-28 Thread Ronald G. Minnich
On Tue, 1 Mar 2005, Yaron Haviv wrote: Ron, I believe netdiscover uses direct route MADs So it can work also when the fabric is not fully initialized ok, that makes sense. So this brings up another question. ibnetdiscover is plenty fast, and opensm is plenty slow. What kind of messaging

RE: [openib-general] Question

2005-02-28 Thread Ronald G. Minnich
On Mon, 28 Feb 2005, Hal Rosenstock wrote: OpenSM is responsible for initializing the fabric (and needs to work with an uninitialized fabric). ?? You mean, if you have an initialized fabric, opensm can't work? Is there useful material on the web that explains this? I keep looking for

Re: [openib-general] Question

2005-02-28 Thread Ronald G. Minnich
On Mon, 28 Feb 2005, Hal Rosenstock wrote: In what state is the subnet stuck in ? how should I tell? Was this a dead switch or did it have redundant power supplies ? redundant; it was not dead. ron ___ openib-general mailing list

RE: [openib-general] Question

2005-02-28 Thread Ronald G. Minnich
On Mon, 28 Feb 2005, Hal Rosenstock wrote: The opensm log preferably in verbose (-V) mode. It's a long log :-) Here is the point at which it goes to pieces. [1109360115:000609200][40BFF970] - osm_pkey_rcv_process: Got GetResp(PKey) block:1 port_num 1 with GUID = 0x2c90108d192e0 for parent

Re: [openib-general] Question

2005-02-28 Thread Ronald G. Minnich
On Mon, 28 Feb 2005, Hal Rosenstock wrote: opensm logs. Also, I would be curious to see what ibstat showed about all endport LIDs in the network. Are all the ports active that should be (plugged into subnet) ? They all look like this: CA 'mthca0': CA type: MT23108 Number

Re: [openib-general] Question

2005-02-28 Thread Ronald G. Minnich
On Mon, 28 Feb 2005, Hal Rosenstock wrote: Also, wasn't that the same failure as a while ago when one of those 96 port switches kept forwarding but didn't terminate MADs ? (Yes, I know you recycled everything which would seem to be inconsistent with this). yeah, something has gone south and

Re: [openib-general] Question

2005-02-28 Thread Ronald G. Minnich
ok, here you go. THis is the first one that appears to fail. YOu can probably guess why :-) [1109632561:000646260][411FF970] - __osm_sm_mad_ctrl_process_get_resp: [ [1109632561:000646268][411FF970] - __osm_sm_mad_ctrl_update_wire_stats: [ [1109632561:000646277][411FF970] -

Re: [openib-general] question on opensm error

2005-02-15 Thread Ronald G. Minnich
On Tue, 15 Feb 2005, Hal Rosenstock wrote: ibstatus/ibstat can show the local port logical and physical port state. bluesteel:~ # ibstat CA 'mthca0': CA type: MT23108 Number of ports: 2 Firmware version: 3.3.2 Hardware version: a1 Node GUID:

[openib-general] question on opensm error

2005-02-14 Thread Ronald G. Minnich
formerly working opensm starts to get these: [1108414727:000284173][411FF970] - umad_receiver: send completed with error(method=1 attr=11) -- dropping. [1108414727:000384171][411FF970] - umad_receiver: send completed with error(method=1 attr=11) -- dropping. [1108414727:000484169][411FF970] -

Re: FW: [openib-general] Minutes from DAPL BOF at OpenIB Workshop

2005-02-10 Thread Ronald G. Minnich
On Thu, 10 Feb 2005, Christoph Hellwig wrote: The *DAPL API is already decided in a spec. If we change it, it will become lose compliance. Who cares? Specs don't matter at all for kernel APIs. The kDAPL API as-is is won't go in the kernel, and no amount of cosmetic cleanup can

Re: FW: [openib-general] Minutes from DAPL BOF at OpenIB Workshop

2005-02-10 Thread Ronald G. Minnich
On Thu, 10 Feb 2005, Grant Grundler wrote: Well, that works best IFF one has time and a clue what to write. oops, that whole 'you have to have a clue' thing just ruled me out. To a large extent, I am writing tongue-in-cheek about the 'burn the spec' idea. My concern is that we avoid a

RE: [openib-general] Re: [KJ] [RFC] TODO file cleanups

2005-01-21 Thread Ronald G. Minnich
On Fri, 21 Jan 2005, Woodruff, Robert J wrote: If it was the GPL license, then the code that is in kernel.org is the GPL-only fork. this keeps getting more and more interesting. For example, now that the code is in the kernel, is there any need to maintain the openib tree? that code in

Re: [openib-general] got ipoib up once but not twice :-)

2005-01-17 Thread Ronald G. Minnich
On Sat, 15 Jan 2005, Hal Rosenstock wrote: How many 96 port switches ? I'd be curious how long it does take to initialize this (as I do not have access to a large cluster). Also, right now I'm pretty sure things are being done without pipelining on so it is likely slower. More on this later.

[openib-general] got ipoib up once but not twice :-)

2005-01-14 Thread Ronald G. Minnich
OK, I had all of bluesteel up yesterday. It all just worked insmod the right stuff on front end, i.e. ib_ipoib 53856 0 ib_sa 12564 1 ib_ipoib ib_umad12224 5 ib_mthca 90976 9 ib_mad 29872 3 ib_sa,ib_umad,ib_mthca

Re: [openib-general] got ipoib up once but not twice :-)

2005-01-14 Thread Ronald G. Minnich
On Fri, 14 Jan 2005, Hal Rosenstock wrote: Are all the links active ? (What is your topology ?) It is a hierarchy of 96-port switches. Is there an openib command I can use to test state? Going to look at the 'blinken lights' is a headache due to the location of the cluster. Does IPoIB

Re: [openib-general] got ipoib up once but not twice :-)

2005-01-14 Thread Ronald G. Minnich
Hmm, it's back. I guess I was not patient enough. Not sure when it all got back. I will have to time it next time, I assume it won't take 6 hours each time :-) I'm working on making this 256-node cluster work over infiniband only, same as our myrinet clusters which are myrinet-only. ron

[openib-general] question

2005-01-12 Thread Ronald G. Minnich
I am ins'moding everything on a bproc master node, and I see ib0 when I'm done. Same on a slave node, and no ib0. What does the correct operation of all this depend on? I am hoping there is not some daemon required, just checking. Or is there some thing I might have gotten wrong? ron

Re: [openib-general] question

2005-01-12 Thread Ronald G. Minnich
life is now good, it was a script error (my mistake) for starting up the bproc nodes. I've got ipoib on my opteron cluster, with 96-port switches running in hierarchical mode, for the first time with openib. This is really great. Thanks again to this list and the people who wrote the code.

Re: [openib-general] question

2005-01-12 Thread Ronald G. Minnich
On Wed, 12 Jan 2005, Grant Grundler wrote: PCI 2.2 introduced MSI. ie 1998 or so. not sure when MSI-X was introduced. that's the problem with working with old PCI books. Darn it. ron ___ openib-general mailing list openib-general@openib.org

Re: [openib-general] question

2005-01-12 Thread Ronald G. Minnich
On Wed, 12 Jan 2005, Grant Grundler wrote: http://cmclab.rice.edu/projects/giganic/datasheets/PCI/SPECS/Pci22.pdf and http://www.singlix.org/trdos/PCI22.pdf yikes. That's cool. They used to charge for it, I think all my copies are bootleg. Wait, was the microphone on when I said that?

[openib-general] osm in openib

2005-01-10 Thread Ronald G. Minnich
[1105400337:000955212][95E128E0] - OpenSM Rev:openib-1.0.0 [1105400337:000956454][95E128E0] - osm_opensm_init: Forcing single threaded dispatcher. [1105400337:000957121][95E128E0] - osm_report_notice: Reporting Generic Notice type:3 num:66 from LID:0x

Re: [openib-general] ip over ib throughtput

2005-01-06 Thread Ronald G. Minnich
On Thu, 6 Jan 2005, Grant Grundler wrote: That's a limitation of linux. Linux drivers assume physically contigous pages are available for anything that crosses a page boundary. KISS when it works but not robust. yeah, I know, freebsd never had this problem ... FWIW, I had the impression

Re: [openib-general] Re: mstflint failing on sparc64

2005-01-06 Thread Ronald G. Minnich
On Thu, 6 Jan 2005, Michael S. Tsirkin wrote: Well, I see regular 8100 there, where does lspci get another : ? Its a mystery. that's the pci domain stuff. Turns out on newer machines you can have multiple pci configuration domains. Oh joy :-) ron

Re: [openib-general] Re: tvflash HCA numbering with multiple HCAs

2005-01-03 Thread Ronald G. Minnich
On Mon, 3 Jan 2005, Michael S. Tsirkin wrote: Or use mstflint for flashing which already does exactly that. I have not used this, does it still require that you insmod three modules to work? ron ___ openib-general mailing list

Re: [openib-general] announcement: mstflint flash burning package uploaded

2004-11-04 Thread Ronald G. Minnich
On Thu, 4 Nov 2004, Michael S. Tsirkin wrote: I have uploaded an mstflint flash burning package to openib.org. You can find it here: https://openib.org/svn/trunk/contrib/mellanox/mstflint/ neat. How does this differ from tvflash that Roland wrote? thanks ron

Re: [openib-general] how we DON'T want to make openib

2004-10-27 Thread Ronald G. Minnich
On Wed, 27 Oct 2004, Ronald G. Minnich wrote: I just noticed this tree from a VAPI make :-) sshdbashmakemakeshcat 2*[grep] well that didn't translate make make sh make make make (cat|grep) was the tree. ron