Hi Ashish,
On Mon, 2007-02-26 at 16:04, Batwara, Ashish wrote:
> Hi,
> I am trying to bring up opensm, but it not letting me. When I look at
> the /var/log/messages, I see that it becomes UP for a moment and then
> again it goes down. Look for " SUBNET UP " in below logs. Can anyone
> know what t
> Looking at the log file, the problem appears to be related to:
>
> http://openib.org/pipermail/openib-general/2006-December/029962.html
This should be fixed in my rdma-dev tree.
The problem was that a patch got lost moving between svn and git that caused
failed multicast requests to be retrie
On Fri, 2006-12-15 at 17:05, Woodruff, Robert J wrote:
> Hal wrote,
> >Any idea what filled up the log ? but that's a side issue.
>
> Yes we were getting a bunch of multicast errors, Sean is investigating
> this.
>
> >This has been discussed on the list before. This is one option which
> can
> >
Hal Rosenstock wrote:
> Yes, LID 5 is a switch LID and there is a port which is flapping. Bad
>cable ?
>
>
When this port is disconnected the OpenSM stops logging these
messages. It could have been bad connection.
> The code is reducing the messages which are similar (approx 128 traps).
> IGMP turned on where?
> |
> Not sure what turns this on. I think IP multicast needs to be
> configured in the kernel. I don't think it is automatic although that
> might be the default config. Also, using IP multicast (via Sean's
> multicast code) likely causes IGMP to be used so the routers kn
Steve,
See ... embedded comments below.
-- Hal
From: Steve Wise [mailto:[EMAIL PROTECTED]
Sent: Thu 11/16/2006 4:51 PM
To: Hal Rosenstock
Cc: openib-general@openib.org
Subject: RE: [openib-general] opensm problem
On Thu, 2006-11-16 at 23:37 +0200, Hal
On Thu, 2006-11-16 at 23:37 +0200, Hal Rosenstock wrote:
> Steve,
>
> Did you configure the kernel differently ? Is IGMP turned on somehow ?
> (I haven't run with Sean's multicast code.)
IGMP turned on where?
>
> BTW, as I mentioned, this can be solved on the client side equally as
> well a
Hi Venkat,
See embedded ... comments below.
-- Hal
From: Venkatesh Babu [mailto:[EMAIL PROTECTED]
Sent: Thu 11/16/2006 1:39 PM
To: Hal Rosenstock
Cc: openib-general@openib.org
Subject: Re: [openib-general] OpenSM log growing too big
Hal Rosenstock wrote
lto:[EMAIL PROTECTED]
Sent: Thu 11/16/2006 4:32 PM
To: Hal Rosenstock
Cc: openib-general@openib.org
Subject: RE: [openib-general] opensm problem
On Thu, 2006-11-16 at 23:28 +0200, Hal Rosenstock wrote:
> Steve,
>
> Those messages mean that you are joining a MC group which is not
> alr
On Thu, 2006-11-16 at 23:28 +0200, Hal Rosenstock wrote:
> Steve,
>
> Those messages mean that you are joining a MC group which is not
> already created. The MGID iof 0xff12401b : 0x0016
> is for 224.0.0.22. That is for IGMP on your IPoIB subnet. The group
> either needs to be
Steve,
Those messages mean that you are joining a MC group which is not already
created. The MGID iof 0xff12401b : 0x0016 is for
224.0.0.22. That is for IGMP on your IPoIB subnet. The group either needs to be
preconfigured or the "first" joiner needs to create the group (wh
Hal Rosenstock wrote:
> Not sure what question you are asking exactly.
>
> Is it what do those messages mean or the file getting large or both ?
>
>
Both. The message looks like LID 5 is generating too many events. The
log file grows few MBs a second. What ever the problem with the port it
Not sure what question you are asking exactly.
Is it what do those messages mean or the file getting large or both ?
What options are you using on OpenSM startup ?
Also, any chance you can move forward on a more recent and better OpenSM ?
-- Hal
From: [EMA
On Thu, 2006-11-02 at 13:33, Viswanath Krishnamurthy wrote:
>
> When we run opensm (OFED) release and if a Topspin HCA is in the IB
> network, opensm crashes in umad_receiver with NULL pointer exception.
> The transaction ID is zero is the MAD'S from topspin HCA on windows.
> The crashes seems to
On 10:33 Thu 02 Nov , Viswanath Krishnamurthy wrote:
> When we run opensm (OFED) release and if a Topspin HCA is in the IB network,
> opensm crashes in umad_receiver with NULL pointer exception.
Do you have any logs, gdb backtrace or any other details?
Sasha
> The
> transaction ID is zero i
On 13:35 Tue 31 Oct , Hal Rosenstock wrote:
> The following OpenSM header files appear to be unused:
>
> 183 osm_errors.h
> 230 osm_ft_config_ctrl.h
> 291 osm_mcast_config_ctrl.h
> 289 osm_pi_config_ctrl.h
> 289 osm_pkey_config_ctrl.h
> 297 osm_sm_info_get_ctrl.h
> 290 osm_subnet
Hi,
On 04:21 Wed 27 Sep , [EMAIL PROTECTED] wrote:
> I'm trying to setup OpenSM on one of our boxes. I've installed the RPMs from
> ofed-1.0-sles10-rpms_i686.tar.gz and updated the firmware on our Mellanox
> card.
> When I try to start opensm I get the following error message:
> 'umad_open
Hi,
Do you have udev installed and configured ? You may want to refer to the wiki
(https://openib.org/tiki/tiki-index.php) for more troubleshooting info. There's
some info in the cheat sheet
(https://openib.org/tiki/tiki-index.php?page=Installation+Cheat+Sheet) which
may help.
-- Hal
_
Hi Michael,
On Tue, 2006-09-12 at 07:20, Michael Arndt wrote:
> Hi,
>
> in the osm/docs
Which doc ?
BTW, what version of OpenSM are you using ?
> is mentioned that at the next release multiple HCA cards on
> the same host will be supported.
If I understand your question correctly, OpenIB Op
Quoting r. Doug Ledford <[EMAIL PROTECTED]>:
> What you just argued for is the opposite of both of those accepted and
> commonly used practices. Not only that, but since all the old code
> *could* be made to work with the new API using nothing more than a macro
> in a header file, to argue for an
On Thu, 2006-09-07 at 09:22 +0300, Michael S. Tsirkin wrote:
> Quoting r. Doug Ledford <[EMAIL PROTECTED]>:
> > Subject: Re: [openib-general] OpenSM/osm_log API: Use symbol versionsrather
> > than polluting namespace
> >
> > On Wed, 2006-09-06 at 18:1
Quoting r. Doug Ledford <[EMAIL PROTECTED]>:
> Subject: Re: [openib-general] OpenSM/osm_log API: Use symbol versionsrather
> than polluting namespace
>
> On Wed, 2006-09-06 at 18:16 +0300, Michael S. Tsirkin wrote:
> > Quoting r. Hal Rosenstock <[EMAIL PROTECTED]&
On Wed, 2006-09-06 at 18:16 +0300, Michael S. Tsirkin wrote:
> Quoting r. Hal Rosenstock <[EMAIL PROTECTED]>:
> > Subject: Re: OpenSM/osm_log API: Use symbol versions rather than polluting
> > namespace
> >
> > On Wed, 2006-09-06 at 09:42, Michael S. Tsirkin wrote:
> > > Quoting r. Hal Rosenstock
Quoting r. Hal Rosenstock <[EMAIL PROTECTED]>:
> Subject: Re: OpenSM/osm_log API: Use symbol versions rather than
> pollutingnamespace
>
> On Wed, 2006-09-06 at 12:10, Michael S. Tsirkin wrote:
> > Quoting r. Hal Rosenstock <[EMAIL PROTECTED]>:
> > > > > (which, as I understand it, is really only
On Wed, 2006-09-06 at 12:10, Michael S. Tsirkin wrote:
> Quoting r. Hal Rosenstock <[EMAIL PROTECTED]>:
> > > > (which, as I understand it, is really only an issue because opensm can
> > > > log so much),
> > > > which is what this entire patch series was designed to
> > > > address. They are two
Quoting r. Hal Rosenstock <[EMAIL PROTECTED]>:
> > > (which, as I understand it, is really only an issue because opensm can
> > > log so much),
> > > which is what this entire patch series was designed to
> > > address. They are two different problem spaces.
> >
> > So ... wouldn't it be better t
On Wed, 2006-09-06 at 11:27, Michael S. Tsirkin wrote:
> Quoting r. Doug Ledford <[EMAIL PROTECTED]>:
> > Subject: Re: OpenSM/osm_log API: Use symbol versions rather than polluting
> > namespace
> >
> > On Wed, 2006-09-06 at 10:14 -0400, Hal Rosenstock wrote:
> > > On Wed, 2006-09-06 at 09:42, Mi
Quoting r. Hal Rosenstock <[EMAIL PROTECTED]>:
> > > It is an upward compatible change so is low risk.
> >
> > Not sure what do you mean by upward compatible. This API change does not
> > seem to be backward compatible - won't it break building dependent
> > applications?
>
> We are talking about
On Wed, 2006-09-06 at 11:16, Michael S. Tsirkin wrote:
> Quoting r. Hal Rosenstock <[EMAIL PROTECTED]>:
> > Subject: Re: OpenSM/osm_log API: Use symbol versions rather than polluting
> > namespace
> >
> > On Wed, 2006-09-06 at 09:42, Michael S. Tsirkin wrote:
> > > Quoting r. Hal Rosenstock <[EMA
Quoting r. Doug Ledford <[EMAIL PROTECTED]>:
> Subject: Re: OpenSM/osm_log API: Use symbol versions rather than polluting
> namespace
>
> On Wed, 2006-09-06 at 10:14 -0400, Hal Rosenstock wrote:
> > On Wed, 2006-09-06 at 09:42, Michael S. Tsirkin wrote:
>
> > > Nor is this feature uncontroversia
Quoting r. Hal Rosenstock <[EMAIL PROTECTED]>:
> Subject: Re: OpenSM/osm_log API: Use symbol versions rather than polluting
> namespace
>
> On Wed, 2006-09-06 at 09:42, Michael S. Tsirkin wrote:
> > Quoting r. Hal Rosenstock <[EMAIL PROTECTED]>:
> > > Subject: OpenSM/osm_log API: Use symbol versi
On Wed, 2006-09-06 at 10:14 -0400, Hal Rosenstock wrote:
> On Wed, 2006-09-06 at 09:42, Michael S. Tsirkin wrote:
> > Nor is this feature uncontroversial. Would not support for log rotation
> > be better?
If you are just going to do log rotation, then no need to change opensm,
just add an appropr
On Wed, 2006-09-06 at 09:42, Michael S. Tsirkin wrote:
> Quoting r. Hal Rosenstock <[EMAIL PROTECTED]>:
> > Subject: OpenSM/osm_log API: Use symbol versions rather than polluting
> > namespace
> >
> > OpenSM/osm_log API: Rather than polluting the namespace with needless
> > symbols, use symbol ve
Quoting r. Hal Rosenstock <[EMAIL PROTECTED]>:
> Subject: OpenSM/osm_log API: Use symbol versions rather than polluting
> namespace
>
> OpenSM/osm_log API: Rather than polluting the namespace with needless
> symbols, use symbol versions and have a versioned osm_log_init rather
> than adding osm_l
2006 4:18 PM
> To: Leonid Arsh
> Cc: openib-general@openib.org
> Subject: Re: [openib-general] OpenSM - guid2lid cache file questions
>
> Leonid,
>
> On Tue, 2006-09-05 at 09:13, Leonid Arsh wrote:
> > Thanks,
> >
> > On 05 Sep 2006 08:46:22 -0400, Hal Ros
Leonid,
On Tue, 2006-09-05 at 09:13, Leonid Arsh wrote:
> Thanks,
>
> On 05 Sep 2006 08:46:22 -0400, Hal Rosenstock <[EMAIL PROTECTED]> wrote:
> > > I have a problem when OpenSM, being started, reads an out-if-date
> > > guid2lid file.
> > > OpenSM changes LIDs in this case.
> >
> > How do you k
Thanks,
On 05 Sep 2006 08:46:22 -0400, Hal Rosenstock <[EMAIL PROTECTED]> wrote:
> > I have a problem when OpenSM, being started, reads an out-if-date guid2lid
> > file.
> > OpenSM changes LIDs in this case.
>
> How do you know the file is "out of date" ?
>
Actually, the LIDs were assigned by ano
Hi Leonid,
On Tue, 2006-09-05 at 08:11, Leonid Arsh wrote:
> Hi Hal,
>
> Thank you for your reply.
>
> Probably I wasn't clear.
>
> I have a problem when OpenSM, being started, reads an out-if-date guid2lid
> file.
> OpenSM changes LIDs in this case.
How do you know the file is "out of date
Hi Hal,
Thank you for your reply.
Probably I wasn't clear.
I have a problem when OpenSM, being started, reads an out-if-date guid2lid file.
OpenSM changes LIDs in this case.
I don't want the LIDs to be changed.
As I understand it, the '-r' option, on the contrary, causes the SM to
reassign al
Hi Leonid,
On Tue, 2006-09-05 at 03:30, Leonid Arsh wrote:
> Hi list,
>
> I have a question regarding the guid2lid cache file.
>
> The file is read by OpenSM on the start up.
> OpenSM may reassign LIDs according to the LIDs saved in this file.
> It isn't always acceptable.
>
> Is it a ri
Hi,
On 15:02 Fri 25 Aug , Venkatesh Babu wrote:
>
> The document OpenSM_PKey_Mgr.txt under link
> https://openib.org/svn/gen2/trunk/src/userspace/management/osm/doc/OpenSM_PKey_Mgr.txt
>
> describes the roadmap for OpenSM partition management. It discusses two
> phase implementation.
>
>
On Mon, 2006-08-14 at 07:36, Dotan Barak wrote:
> Hi.
>
> I noticed that the behavior of the openSM was changed in the latest driver:
>
> in the past, every HCA was configured (by the FW) with 0x in the first
> entry.
> today,
Just as an FYI: I think that Anafas have this in the second entr
On Monday 14 August 2006 16:09, Sasha Khapyorsky wrote:
> >
> > Why doesn't the SM print that this file was found?
>
> Yes, some prints may be helpful. Do you mean just log file or would prefer
> the message on stdout too?
I believe that most of the users don't look at the log file, so a message
On 15:36 Mon 14 Aug , Dotan Barak wrote:
> Thanks for the quick response.
>
> On Monday 14 August 2006 15:17, Sasha Khapyorsky wrote:
> > Hi Dotan,
> >
> > On 14:36 Mon 14 Aug , Dotan Barak wrote:
> > > Hi.
> > >
> > > I noticed that the behavior of the openSM was changed in the latest
Thanks for the quick response.
On Monday 14 August 2006 15:17, Sasha Khapyorsky wrote:
> Hi Dotan,
>
> On 14:36 Mon 14 Aug , Dotan Barak wrote:
> > Hi.
> >
> > I noticed that the behavior of the openSM was changed in the latest driver:
> >
> > in the past, every HCA was configured (by the F
Hi Dotan,
On 14:36 Mon 14 Aug , Dotan Barak wrote:
> Hi.
>
> I noticed that the behavior of the openSM was changed in the latest driver:
>
> in the past, every HCA was configured (by the FW) with 0x in the first
> entry.
> today, the PKey table is being configured by the openSM: the fir
On Wed, 2006-07-12 at 21:45, Hal Rosenstock wrote:
[snip...]
> > I don't know if this is an HCA firmware issues, switch issue, or openSM
> > issue.
> > I don't think it's related to my changes or osmtest at this point.
>
> I'll see if I can reproduce this tomorrow.
I've followed your scenario
>> I don't know if this is an HCA firmware issues, switch issue, or openSM
>issue.
>> I don't think it's related to my changes or osmtest at this point.
>
>I'll see if I can reproduce this tomorrow.
>
>Also, can you send me the guid2lid files from the 3 SMs ?
I'll send this tomorrow. Before reloa
On Wed, 2006-07-12 at 18:36, Sean Hefty wrote:
> Hal Rosenstock wrote:
> > With the default sminfo_polling_timeout of 10 seconds and default
> > polling_retry_number of 4, so the total handoff time should be around 40
> > seconds. I just did that experiment with 2 SMs and saw that as well.
>
> Oka
On Wed, 2006-07-12 at 09:13, yipeeyipee yipeeyipee wrote:
> --- Hal Rosenstock <[EMAIL PROTECTED]> wrote:
>
> [snip]
> Should this IS_SM bit in port attributes be supported
> in the switch hardware?
If you are running an SM on your switch, the IS_SM bit would be on for
port 0. Otherwise not.
> >
--- Hal Rosenstock <[EMAIL PROTECTED]> wrote:
[snip]
Should this IS_SM bit in port attributes be supported
in the switch hardware?
> Yes (I'm pretty sure). The user_mad API has not
> changed in quite some
> time now. What ABI version is 2.6.14 ?
I don't know where to check this.
_
On Tue, 2006-07-11 at 09:27, yipee wrote:
> Hal Rosenstock voltaire.com> writes:
> [snip]
> > It's not the setting which is failing. You are likely not using an SM
> > which supports this (it is an enhanced capability defined in a 1.2
> > erratum). Are you running a recent OpenSM or something els
Hal Rosenstock voltaire.com> writes:
[snip]
> It's not the setting which is failing. You are likely not using an SM
> which supports this (it is an enhanced capability defined in a 1.2
> erratum). Are you running a recent OpenSM or something else ?
>
I'm running a 1.1 openSM on a 2.6.14 kernel.
On Tue, 2006-07-11 at 03:24, yipee wrote:
> Hi,
>
> On one of my IB setups I get the following error from openSM:
> osm_vendor_set_sm: ERR 5431: setting IS_SM capability mask failed; errno 2
>
> what's this IS_SM capability mask? what might cause its setting to fail?
It's not the setting which i
You probably have another SM already running on your machine.
The error means that OpenSM failed to set the local port IS_SM capability mask
bit (which say there is an SM running on that port).
If you do not have another SM running on the port you should probably restart
the driver as the ref cou
On Tue, 2006-06-13 at 12:56, Viswanath Krishnamurthy wrote:
> I am using the trunk. Should I be using 1.0 ?
No; I didn't check but if my memory serves me correctly, the trunk may
have some fixes 1.0 doesn't towards this but I'm not 100% sure right now
and since you are using the trunk, I'm not g
I am using the trunk. Should I be using 1.0 ?
-Viswa
On 13 Jun 2006 12:35:17 -0400, Hal Rosenstock <[EMAIL PROTECTED]> wrote:
On Tue, 2006-06-13 at 12:21, Viswanath Krishnamurthy wrote:> Yes.. I want to test waters again and see if the issues went away.Are you using the trunk or 1.0 ?-- Hal> -V
On Tue, 2006-06-13 at 12:21, Viswanath Krishnamurthy wrote:
> Yes.. I want to test waters again and see if the issues went away.
Are you using the trunk or 1.0 ?
-- Hal
> -Viswa
>
>
> On 13 Jun 2006 06:15:34 -0400, Hal Rosenstock <[EMAIL PROTECTED]>
> wrote:
> Hi Viswa,
>
>
Yes.. I want to test waters again and see if the issues went away.
-Viswa
On 13 Jun 2006 06:15:34 -0400, Hal Rosenstock <[EMAIL PROTECTED]> wrote:
Hi Viswa,On Mon, 2006-06-12 at 23:16, Viswanath Krishnamurthy wrote:> There were some issues with opensm running with NPTL (thread> library). Has the
Hi Viswa,
On Mon, 2006-06-12 at 23:16, Viswanath Krishnamurthy wrote:
> There were some issues with opensm running with NPTL (thread
> library). Has the issues been resolved ?
There were some fixes to the signal handling which went in back in the
Feb/early March time frame. OpenSM should be bett
Hi Paul,
On Tue, 2006-05-30 at 11:06, Paul wrote:
> Hi All,
> I will be working on this as time permits this week.
> Unfortunately my employer is not crazy about giving out remote access,
> so I will have to be your hands on this. If you want me to do
> something just tell me what it is. I kn
Don,
On Tue, 2006-05-30 at 10:55, [EMAIL PROTECTED] wrote:
> Hal,
>
> With your patch to OpenSM, I think everything is ok on the local node.
That patch with one minor change (elimination of the CL_ASSERT) will be
part of the upcoming RC6.
> The remote node is definitely having some problems,
Hi All, I will be working on this as time permits this week. Unfortunately my employer is not crazy about giving out remote access, so I will have to be your hands on this. If you want me to do something just tell me what it is. I know its a pain I have been there myself.
Regards.On 5/30/06, [E
Hal,
With your patch to OpenSM, I think everything is ok
on the local node. The remote node is definitely having some problems,
resulting in not responding to the MAD packets. I have entered a
separate message on the problems with the "ib0" interface on
that machine.
>
> On Fri, 2006-05-26 at
Hi Paul,
On 12:14 Fri 26 May , Paul wrote:
> No, I figured all of that out, ppc64 was not supported/working in RC4.
> Either way, here is what I see with opensm:
>
> [EMAIL PROTECTED] ~]# /etc/init.d/opensmd start
> *** glibc detected *** realloc(): invalid next size: 0x100ab1e0 ***
Hi Paul,
On Sat, 2006-05-27 at 02:26, Paul wrote:
> Hi Hal,
> My lab is undergoing maitanence this weekend so I wont be able
> to get you any results til tuesday, however the results are readily
> reproducable. Everything is 64bit.
Unfortunately I don't have access to a PPC64 machine on whi
Hi Hal, My lab is undergoing maitanence this weekend so I wont be able to get you any results til tuesday, however the results are readily reproducable. Everything is 64bit.Regards.
On 26 May 2006 12:46:01 -0400, Hal Rosenstock <[EMAIL PROTECTED]> wrote:
Hi again Paul,On Fri, 2006-05-26 at 12:
Don,
On Fri, 2006-05-26 at 20:59, Hal Rosenstock wrote:
> > What next, coach?
>
> Can you turn on madeye on the remote node and see what packets are
> received and sent ? Let me know if you need help with that. I think you
> said you were running OFED, right ?
I don't think madeye is part of OFE
Don,
On Fri, 2006-05-26 at 17:32, [EMAIL PROTECTED] wrote:
> Hal,
>
> I rebuilt the opensm executable with the patch you provided. The
> patch fixes (or avoids) the segmentation fault and opensm comes up and
> runs.
Thanks for trying this out.
> However, the link is still not becoming opera
Hal,
I rebuilt the opensm executable with
the patch you provided. The patch fixes (or avoids) the segmentation
fault and opensm comes up and runs. However, the link is still not
becoming operational. On the local side it goes to ARMED, and
on the remote side it goes to INIT. The osm.log s
Hal,
>
> One more thing on the remote side, try:
>
> smpquery nodeinfo -D 0
>
Here is the smpquery on the remote (system "jatoba")
side
>
[jatoba] (ib) ib> smpquery nodeinfo -D 0
# Node info: DR path [0]
BaseVers:1
ClassVers:...1
Node
Don,
On Fri, 2006-05-26 at 14:35, [EMAIL PROTECTED] wrote:
> Hal,
>
> > Yes, that is very useful. I had been working on trying to come up
> with
> > what the problem was but this narrows it down to something I was
> > thinking might be going on.
> >
> > It looks like you are running back to bac
Hal,
> Yes, that is very useful. I had been working on trying to come up
with
> what the problem was but this narrows it down to something I was
> thinking might be going on.
>
> It looks like you are running back to back HCAs, right ?
Yes, the HCAs are 4X DDR, connected back to back.
>
> It
Hi Don,
On Fri, 2006-05-26 at 13:34, [EMAIL PROTECTED] wrote:
> Hal,
>
> > Hi again Paul,
>
> Since your last message was addressed to Paul, and you said my problem
> was completely different, I don't know if a backtrace would help in my
> case, but here it is anyway, just in case. (See below.)
Hal,
> Hi again Paul,
Since your last message was addressed to Paul, and
you said my problem was completely different, I don't know if a backtrace
would help in my case, but here it is anyway, just in case. (See below.)
>
> Would you rebuild OpenSM with debug:
> ./configure --enable-debug && m
Hi again Paul,
On Fri, 2006-05-26 at 12:14, Paul wrote:
> No, I figured all of that out, ppc64 was not supported/working in RC4.
> Either way, here is what I see with opensm:
>
> [EMAIL PROTECTED] ~]# /etc/init.d/opensmd start
> *** glibc detected *** realloc(): invalid next size:
> 0x100
No, I figured all of that out, ppc64 was not supported/working in RC4. Either way, here is what I see with opensm:[EMAIL PROTECTED] ~]# /etc/init.d/opensmd start*** glibc detected *** realloc(): invalid next size: 0x100ab1e0 ***
/etc/init.d/opensmd: line 330: 7854 Donee
Hi Paul,
On Fri, 2006-05-26 at 11:35, Paul wrote:
> I am having a similar issue on my ppc64 systems. Take a look at the
> email I sent to the list last night. I have not been able to figure
> out much regarding why its dying,
Are you referring to your mail on compile flags and then OpenMPI ? I sa
I am having a similar issue on my ppc64 systems. Take a look at the email I sent to the list last night. I have not been able to figure out much regarding why its dying, I wonder if it might be tied to some other issues I have am having.
On 5/26/06, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
I
l Message-
> > From: [EMAIL PROTECTED] [mailto:openib-general-
> > [EMAIL PROTECTED] On Behalf Of Sasha Khapyorsky
> > Sent: Wednesday, May 17, 2006 2:11 AM
> > To: Troy Benjegerdes
> > Cc: openib-general@openib.org
> > Subject: Re: [openib-general] opensm segfault
On Wed, May 17, 2006 at 09:10:11AM +0300, Eitan Zahavi wrote:
> cl_memcpy should have some debug capabilities on top of memcpy ...
> cl memory management provide means to track all memory allocations, etc.
There are a huge number of canned solutions that provide a way to debug
memory problems wit
ISRAEL
> -Original Message-
> From: [EMAIL PROTECTED] [mailto:openib-general-
> [EMAIL PROTECTED] On Behalf Of Sasha Khapyorsky
> Sent: Wednesday, May 17, 2006 2:11 AM
> To: Troy Benjegerdes
> Cc: openib-general@openib.org
> Subject: Re: [openib-general] opensm segfault?
Hi Troy,
On 14:41 Tue 16 May , Troy Benjegerdes wrote:
> I got this after an indeterminate amount of time running opensm..
May this be reproducible? Or it is completely random failure?
> (gdb) bt
> #0 0x2b90b0dbebf3 in cl_memcpy (p_dest=0x2ac88850, p_src=0x0,
> count=64) at cl_m
On Tue, 2006-05-16 at 16:10, Roland Dreier wrote:
> Troy> And why the heck is "cl_memcpy" just a call to 'memcpy'
> Troy> anyway? This just seems like excessive uneeded abstraction.
>
> Hal> It's part of the component library, which is an OS
> Hal> abstraction layer.
>
> memcpy()
Troy> And why the heck is "cl_memcpy" just a call to 'memcpy'
Troy> anyway? This just seems like excessive uneeded abstraction.
Hal> It's part of the component library, which is an OS
Hal> abstraction layer.
memcpy() is specified by the ISO C standard, so it seems pretty silly
to
Hi Troy,
On Tue, 2006-05-16 at 15:41, Troy Benjegerdes wrote:
> I got this after an indeterminate amount of time running opensm..
>
>
> (gdb) bt
> #0 0x2b90b0dbebf3 in cl_memcpy (p_dest=0x2ac88850, p_src=0x0,
^
This i
Hi Troy,
On Thu, 2006-04-13 at 15:35, Troy Benjegerdes wrote:
> We just moved a cluster over to the latest redhat release, and opensm
> seems to be having issues.
>
> This is running the redhat provided kernel and opensm packages
>
> [EMAIL PROTECTED] troy]# uname -r
> 2.6.9-34.ELsmp
> [EMAIL PR
Hi Owen,
On Fri, 2006-02-17 at 00:01, Owen Stampflee wrote:
> Of course, I need to get things working first, than we can deal with the
> 64-bit issues (gotta please the boss, and if shipping 32-bit binarys and
> both 32/64 bit libraries provides a working udapl, ipoib, and 32+64-bit
> mpi, I can m
Of course, I need to get things working first, than we can deal with the
64-bit issues (gotta please the boss, and if shipping 32-bit binarys and
both 32/64 bit libraries provides a working udapl, ipoib, and 32+64-bit
mpi, I can meet my deadline (Monday)). I'm suspecting some glibc issues
on our en
On Thu, 2006-02-16 at 20:43, Owen Stampflee wrote:
> A 32-bit build of 5411 gets the link to become active
Glad to hear this.That is what I would expect and would like to confirm
the tid patch is missing from the FC5 package as well as getting to the
bottom of the 64 bit issues if you have some ti
Hi again Owen,
On Thu, 2006-02-16 at 20:01, Owen Stampflee wrote:
> http://cvs.terraplex.com/~owen/osm.log
>
> I'm currently using the Fedora FC5 packages that have been rebuilt,
I'm not sure what svn the FC5 package corresponds to. Can you check the
following in osm/libvendor/osm_vendor_ibumad.
On 13:27 Thu 16 Feb , Owen Stampflee wrote:
>
> Commenting out the cl_log_event in osm_log results in this backtrace:
>
> (gdb) bt
> #0 0x0080b9719db0 in .__GI_raise () from /lib64/tls/libc.so.6
> #1 0x0080b971b89c in .__GI_abort () from /lib64/tls/libc.so.6
> #2 0x0080b974e860
A 32-bit build of 5411 gets the link to become active and ipv_rc_pingpng
works, but I cant bring up ipoib...
dmesg says this (tried both ib0 and ib1 to ensure ports werent swapped)
ADDRCONF(NETDEV_UP): ib0: link is not ready
ADDRCONF(NETDEV_UP): ib1: link is not ready
At least we're making progre
http://cvs.terraplex.com/~owen/osm.log
I'm currently using the Fedora FC5 packages that have been rebuilt, I'm
going to try the OpenIB 5411 source.
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-gener
Hi Owen,
On Thu, 2006-02-16 at 16:27, Owen Stampflee wrote:
> So, here is the back trace with no code modifications...
>
> 0x0080b9719db0 in .__GI_raise () from /lib64/tls/libc.so.6
> (gdb) bt
> #0 0x0080b9719db0 in .__GI_raise () from /lib64/tls/libc.so.6
> #1 0x0080b971b89c in .__
So, here is the back trace with no code modifications...
0x0080b9719db0 in .__GI_raise () from /lib64/tls/libc.so.6
(gdb) bt
#0 0x0080b9719db0 in .__GI_raise () from /lib64/tls/libc.so.6
#1 0x0080b971b89c in .__GI_abort () from /lib64/tls/libc.so.6
#2 0x0080b974e860 in .__libc_m
On Wed, 2006-02-15 at 13:47, Sasha Khapyorsky wrote:
> On 09:41 Wed 15 Feb , Owen Stampflee wrote:
> > This doesnt help much... at all... no new info to report.
> >
> > [EMAIL PROTECTED] ~]# rm /var/log/osm.log
> > [EMAIL PROTECTED] ~]# opensm -v
> > ---
On 09:41 Wed 15 Feb , Owen Stampflee wrote:
> This doesnt help much... at all... no new info to report.
>
> [EMAIL PROTECTED] ~]# rm /var/log/osm.log
> [EMAIL PROTECTED] ~]# opensm -v
> -
> OpenSM Rev:openib-1.1.0
> Command Line Arguments:
> Ver
This doesnt help much... at all... no new info to report.
[EMAIL PROTECTED] ~]# rm /var/log/osm.log
[EMAIL PROTECTED] ~]# opensm -v
-
OpenSM Rev:openib-1.1.0
Command Line Arguments:
Verbose option -v (log flags = 0x7)
Log File: /var/log/osm.log
---
Hi Owen,
On Wed, 2006-02-15 at 11:43, Owen Stampflee wrote:
> > Can you strace it and provide the output ? Thanks.
> >
> > -- Hal
> http://cvs.terraplex.com/~owen/opensm.strace
I can see the initial write to send a MAD here and it fails after that.
One more try: can you send an osm.log from ope
1 - 100 of 308 matches
Mail list logo