I was working with someone and watching a 256-node bproc cluster boot
friday. The openib folks have done a lot of very nice work. It booted
quite well once we set hoq and slv to 17 in the voltaire switch. It was
really snappy coming up. It was actually as fast to boot as a myrinet
cluster,
Hal Rosenstock wrote:
hoq is HOQLife. Is slv the switch LifeTimeValue ?
I believe so.
Does that have anything to do with those settings ?
it would not work until hoq and slv were 17.
Truly hanging ?
yes, and it was the only real connection at that point, from the bproc
daemon on the
Bryan O'Sullivan wrote:
On Fri, 2006-04-07 at 12:39 -0700, Sean Hefty wrote:
I wanted to start a discussion about migrating the openib code repository from
svn to git.
I'm not very open to using git; it has a horrible user interface. I'd
much prefer to see a switch to something cleaner,
Bryan O'Sullivan wrote:
On Thu, 2005-12-29 at 15:42 +0100, Christoph Hellwig wrote:
PathScale's use of this language is not original. SGI has used, and perhaps
originated, the additional language.
XFS has been switched to a normal short GPL boilerplate exactly because
this wording is not
I have some rarp code for gen2. It does not work. Does anyone have a
VERY simple example for RARP over ib? I have some code that is supposed
to work; it appears not to work.
thanks
ron
___
openib-general mailing list
openib-general@openib.org
is there any chance that pathscale could reword that to be less
confusing? It clearly caused a lot of confusion and worry for folks on
this list.
ron
___
openib-general mailing list
openib-general@openib.org
Hi,
The PathScale OpenIB license includes the following which
is beyond the normal OpenIB license:
* Patent licenses, if any, provided herein do not apply to
* combinations of this program with other software, or any other
* product whatsoever.
??? What the heck could this mean? This
On Tue, Sep 13, 2005 at 09:26:31AM -0700, Sean Hefty wrote:
My understanding is that the labs, who control the OpenIB servers, refused
to host any Windows related code, forcing it to have a separate repository.
wow, that's news to me! Maybe I'm at the wrong lab!
Anybody have a source for
Roland Dreier wrote:
Actually I think the issue was somewhat different. Microsoft is so
allergic to the GPL that they asked for the code to be in a physically
separate repository.
that makes much more sense, ah, well, not really, but it is easier to
understand. I doubt the Labs would have
On Tue, 28 Jun 2005, Greg KH wrote:
On Tue, Jun 28, 2005 at 04:03:43PM -0700, Roland Dreier wrote:
+++ linux/drivers/infiniband/core/uverbs_main.c 2005-06-28
15:20:04.363963991 -0700
@@ -0,0 +1,708 @@
+/*
+ * Copyright (c) 2005 Topspin Communications. All rights reserved.
+ *
On Fri, 24 Jun 2005, Michael S. Tsirkin wrote:
I had this impression that I can have a .so not being present on the
slave at boot, and then dlopen could pull it across the network with
some custom protocol without going over NFS.
not at present, at least on bproc. dlopen needs a path name.
On Fri, 24 Jun 2005, Michael S. Tsirkin wrote:
I had this impression that I can have a .so not being present on the slave
at boot, and then dlopen could pull it across the network with some
custom protocol without going over NFS.
And I was asking, if so, what other calls can do this
On Fri, 24 Jun 2005, Michael S. Tsirkin wrote:
So, if you want to run without nfs (or such), you basically need to link
the applications statically, is that right?
the .so files you want have to be in /lib on the node, e.g.
[EMAIL PROTECTED] ~]$ bpsh 0 ls /lib
ld-2.3.3.so
ld-linux.so.2
On Mon, 23 May 2005, Michael S. Tsirkin wrote:
I guess the thing that has me mystified about all this is I can
certainly appreciate the potential 'goodness' of having 1 var/file for
user oriented access but perhaps one of the better examples of why this
is just a bad idea for
On Fri, 20 May 2005, Grant Grundler wrote:
Not entirely. One could fill unimplemented values with -1 or 0.
or use s-expressions a la supermon. That's worked the best for us in
widely varying environments. These fixed-format tables of the type
that /proc delivers are painful.
For
On 5/3/05, David Addison [EMAIL PROTECTED] wrote:
as our recent IOPROC patch on lkml shows, it's not that invasive. There
are just 24 hooks added to the Linux VM code paths - which we have been
able to
maintain outside the mainline tree for many years now.
As these hooks only need to
On Fri, 29 Apr 2005, Bill Jordan wrote:
I'm very confused at this point. Can you briefly explain how this works,
or point me to a description? I don't see how you could do user level
I/O without registering the memory with the hardware. I'm especially
confused by the comment (may not have
On Fri, 29 Apr 2005, Rimmer, Todd wrote:
But that implies the hardware has an MMU and it also puts an interrupt
in the path per page sent.
yes. it does. and it doesn't do per page sent, just per page that has no
pte on the nic when received.
ron
On Fri, 29 Apr 2005, Greg Lindahl wrote:
It doesn't imply that there's an MMU, either. I know that Myricom uses a
little lookup routine in software on their nic, which most people
wouldn't call an MMU. I don't know what Mellanox does for this, they
don't talk much about what's hardware and
On Fri, 29 Apr 2005, Caitlin Bestler wrote:
One is that the RDMA hardware, however it is marketed, essentially
needs to act as an MMU. That means that it has to be synchronized
with normal MMU. The traditional sledge-hammer approach to
ah ha! his RDMA mmu just crashed his mm
is there a number that this means, i.e. is ifconfig saying I don't know
this number so it is UNSPEC or is it a number that means NaN?
thanks
ron
___
openib-general mailing list
openib-general@openib.org
On Mon, 14 Mar 2005, gshipman wrote:
I am attempting to configure our small cluster to use bproc and openib.
Note I am using gen1 on kernel 2.6.6 patched with the clustermatic
stuff, (should I be using gen2, is it stable for general use?).
use gen2. I have tested it and it is ok.
I have
On Tue, 1 Mar 2005, Yaron Haviv wrote:
Ron, I believe netdiscover uses direct route MADs
So it can work also when the fabric is not fully initialized
ok, that makes sense.
So this brings up another question. ibnetdiscover is plenty fast, and
opensm is plenty slow. What kind of messaging
On Mon, 28 Feb 2005, Hal Rosenstock wrote:
OpenSM is responsible for initializing the fabric (and needs to work
with an uninitialized fabric).
??
You mean, if you have an initialized fabric, opensm can't work?
Is there useful material on the web that explains this? I keep looking for
On Mon, 28 Feb 2005, Hal Rosenstock wrote:
In what state is the subnet stuck in ?
how should I tell?
Was this a dead switch or did it have redundant power supplies ?
redundant; it was not dead.
ron
___
openib-general mailing list
On Mon, 28 Feb 2005, Hal Rosenstock wrote:
The opensm log preferably in verbose (-V) mode.
It's a long log :-)
Here is the point at which it goes to pieces.
[1109360115:000609200][40BFF970] - osm_pkey_rcv_process: Got
GetResp(PKey) block:1 port_num 1 with GUID = 0x2c90108d192e0 for parent
On Mon, 28 Feb 2005, Hal Rosenstock wrote:
opensm logs. Also, I would be curious to see what ibstat showed about
all endport LIDs in the network. Are all the ports active that should be
(plugged into subnet) ?
They all look like this:
CA 'mthca0':
CA type: MT23108
Number
On Mon, 28 Feb 2005, Hal Rosenstock wrote:
Also, wasn't that the same failure as a while ago when one of those 96
port switches kept forwarding but didn't terminate MADs ? (Yes, I know
you recycled everything which would seem to be inconsistent with this).
yeah, something has gone south and
ok, here you go. THis is the first one that appears to fail.
YOu can probably guess why :-)
[1109632561:000646260][411FF970] - __osm_sm_mad_ctrl_process_get_resp: [
[1109632561:000646268][411FF970] - __osm_sm_mad_ctrl_update_wire_stats: [
[1109632561:000646277][411FF970] -
On Tue, 15 Feb 2005, Hal Rosenstock wrote:
ibstatus/ibstat can show the local port logical and physical port state.
bluesteel:~ # ibstat
CA 'mthca0':
CA type: MT23108
Number of ports: 2
Firmware version: 3.3.2
Hardware version: a1
Node GUID:
formerly working opensm starts to get these:
[1108414727:000284173][411FF970] - umad_receiver: send completed with
error(method=1 attr=11) -- dropping.
[1108414727:000384171][411FF970] - umad_receiver: send completed with
error(method=1 attr=11) -- dropping.
[1108414727:000484169][411FF970] -
On Thu, 10 Feb 2005, Christoph Hellwig wrote:
The *DAPL API is already decided in a spec. If we change it, it will
become lose compliance.
Who cares? Specs don't matter at all for kernel APIs. The kDAPL API
as-is is won't go in the kernel, and no amount of cosmetic cleanup
can
On Thu, 10 Feb 2005, Grant Grundler wrote:
Well, that works best IFF one has time and a clue what to write.
oops, that whole 'you have to have a clue' thing just ruled me out.
To a large extent, I am writing tongue-in-cheek about the 'burn the spec'
idea.
My concern is that we avoid a
On Fri, 21 Jan 2005, Woodruff, Robert J wrote:
If it was the GPL license, then the code that is in
kernel.org is the GPL-only fork.
this keeps getting more and more interesting. For example, now that the
code is in the kernel, is there any need to maintain the openib tree?
that code in
On Sat, 15 Jan 2005, Hal Rosenstock wrote:
How many 96 port switches ? I'd be curious how long it does take to
initialize this (as I do not have access to a large cluster). Also,
right now I'm pretty sure things are being done without pipelining on so
it is likely slower. More on this later.
OK, I had all of bluesteel up yesterday. It all just worked
insmod the right stuff on front end, i.e.
ib_ipoib 53856 0
ib_sa 12564 1 ib_ipoib
ib_umad12224 5
ib_mthca 90976 9
ib_mad 29872 3 ib_sa,ib_umad,ib_mthca
On Fri, 14 Jan 2005, Hal Rosenstock wrote:
Are all the links active ? (What is your topology ?)
It is a hierarchy of 96-port switches. Is there an openib command I can
use to test state? Going to look at the 'blinken lights' is a headache due
to the location of the cluster.
Does IPoIB
Hmm, it's back. I guess I was not patient enough. Not sure when it all got
back. I will have to time it next time, I assume it won't take 6 hours
each time :-)
I'm working on making this 256-node cluster work over infiniband only,
same as our myrinet clusters which are myrinet-only.
ron
I am ins'moding everything on a bproc master node, and I see ib0 when I'm
done.
Same on a slave node, and no ib0.
What does the correct operation of all this depend on? I am hoping there
is not some daemon required, just checking. Or is there some
thing I might have gotten wrong?
ron
life is now good, it was a script error (my mistake) for starting up the
bproc nodes.
I've got ipoib on my opteron cluster, with 96-port switches running in
hierarchical mode, for the first time with openib.
This is really great.
Thanks again to this list and the people who wrote the code.
On Wed, 12 Jan 2005, Grant Grundler wrote:
PCI 2.2 introduced MSI. ie 1998 or so.
not sure when MSI-X was introduced.
that's the problem with working with old PCI books. Darn it.
ron
___
openib-general mailing list
openib-general@openib.org
On Wed, 12 Jan 2005, Grant Grundler wrote:
http://cmclab.rice.edu/projects/giganic/datasheets/PCI/SPECS/Pci22.pdf
and
http://www.singlix.org/trdos/PCI22.pdf
yikes. That's cool. They used to charge for it, I think all my copies are
bootleg.
Wait, was the microphone on when I said that?
[1105400337:000955212][95E128E0] - OpenSM Rev:openib-1.0.0
[1105400337:000956454][95E128E0] - osm_opensm_init: Forcing single threaded
dispatcher.
[1105400337:000957121][95E128E0] - osm_report_notice: Reporting Generic Notice
type:3 num:66 from LID:0x
On Thu, 6 Jan 2005, Grant Grundler wrote:
That's a limitation of linux. Linux drivers assume physically
contigous pages are available for anything that crosses
a page boundary. KISS when it works but not robust.
yeah, I know, freebsd never had this problem ...
FWIW, I had the impression
On Thu, 6 Jan 2005, Michael S. Tsirkin wrote:
Well, I see regular 8100 there, where does lspci get another : ?
Its a mystery.
that's the pci domain stuff. Turns out on newer machines you can have
multiple pci configuration domains. Oh joy :-)
ron
On Mon, 3 Jan 2005, Michael S. Tsirkin wrote:
Or use mstflint for flashing which already does exactly that.
I have not used this, does it still require that you insmod three modules
to work?
ron
___
openib-general mailing list
On Thu, 4 Nov 2004, Michael S. Tsirkin wrote:
I have uploaded an mstflint flash burning package to openib.org.
You can find it here: https://openib.org/svn/trunk/contrib/mellanox/mstflint/
neat. How does this differ from tvflash that Roland wrote?
thanks
ron
On Wed, 27 Oct 2004, Ronald G. Minnich wrote:
I just noticed this tree from a VAPI make :-)
sshdbashmakemakeshcat
2*[grep]
well that didn't translate
make make sh make make make (cat|grep)
was the tree.
ron
48 matches
Mail list logo