When we run opensm (OFED) release and if a Topspin HCA is in the IB network, opensm crashes in umad_receiver with NULL pointer exception. The transaction ID is zero is the MAD'S from topspin HCA on windows. The crashes seems to random in umad_receiver.
HCA found:
hca_id=InfiniHost0
In the current communication manager (CM) implementation how is the REP MADgetting lost handled. When the REP gets lost, the cm_dup_req_handler gets calledwhich currently enters the default condition and does nothing. The client retries
the number of timers it is configured to and fails. If the f
Is there a way to disable end-to-end flowcontrol using any of the API's ?Thanks,-Viswa
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general
To unsubscribe, please visit http://openib.org/mailman/list
I am using the trunk. Should I be using 1.0 ?
-Viswa
On 13 Jun 2006 12:35:17 -0400, Hal Rosenstock <[EMAIL PROTECTED]> wrote:
On Tue, 2006-06-13 at 12:21, Viswanath Krishnamurthy wrote:> Yes.. I want to test waters again and see if the issues went away.Are you using the trunk or 1.
Yes.. I want to test waters again and see if the issues went away.
-Viswa
On 13 Jun 2006 06:15:34 -0400, Hal Rosenstock <[EMAIL PROTECTED]> wrote:
Hi Viswa,On Mon, 2006-06-12 at 23:16, Viswanath Krishnamurthy wrote:> There were some issues with opensm running with NPTL (thread> libra
There were some issues with opensm running with NPTL (thread library). Has the issues been
resolved ?
Regards,
Viswa
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general
To unsubscribe, please vis
Works like a charm...
-Viswa
On 12 Apr 2006 21:32:33 -0400, Hal Rosenstock <[EMAIL PROTECTED]> wrote:
On Wed, 2006-04-12 at 20:46, Hal Rosenstock wrote:> On Wed, 2006-04-12 at 18:25, Viswanath Krishnamurthy wrote:> > The RMPP version needs to be 1.>> Thanks. I'm not
The mad_register_agent function in mad.c kernel file was checking for rmpp_version.
This was failing and this failure was propagated to umad (thru ioctl)
On 12 Apr 2006 20:46:33 -0400, Hal Rosenstock <[EMAIL PROTECTED]> wrote:
On Wed, 2006-04-12 at 18:25, Viswanath Krishnamurthy wrote:>
The RMPP version needs to be 1.
[EMAIL PROTECTED] src]# svn diff ibping.c
Index: ibping.c
===
--- ibping.c (revision 6446)
+++ ibping.c (working copy)
@@ -336,7 +336,7 @@
exit(0);
}
- if (mad_regis
When I do a ibping I get an error (on a 32 bit machine)
Linux Kernel: 2.6.16
infiniband directory replaced with SVN6446
I enable debug in umad.c, I get the following error. The ioctl call to the umad driver (umad device)
is failing.
return value for ioctl is -1, errno is -22 (EINVAL)
portid
My guess is the bug is in userspace library, since a kernel module
which uses the same API's in kernel mode works fine. I will work on the
sample code and send it..
-Viswa
On 3/27/06, Roland Dreier <[EMAIL PROTECTED]> wrote:
Roland> Did this code work with mainline kernel 2.6.15? If so you
I tried using openib userland libraries with mainline 2.6.16 kernel but ran into
a strange problem. A userland application which uses CM and VERBS library which works
fine with earlier releases stopped working with no error (in API's). When I put
the analyser on, I see the CM connect sequence is f
When the HCA receives back to back RDMA write followed by RDMA read requests. It generates
coalesced ACK (implicit ACK for RDMA write). Is there a configuration in the mthca driver which will
enable HCA firmware to generate individual ACK's. I an trying to debug another issue and this will be hel
How does one pull out the correct userland libraries for 2.6.16 kernel IB stack. Is it
to look at the SVN number in the driver code, and pull that version ?
-Viswa
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listi
Has the mthca driver been tested on non-MSI (interrupt) system. I seem to have a problem where
interrupts are not generated on non-MSI system with the following message
"NOP command failed to generate interrupt (IRQ 9), aborting."
BIOS or ACPI interrupt routing problem?
-Viswa
__
Does openIB Gen2 stack umad/mad library support Vendor specific MAD extensions ?
-Viswa
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general
To unsubscribe, please visit http://openib.org/mailman/l
Roland,
I see the following when I use the latest mthca driver on a different HCA card
[ 193.882759] ib_mthca: Initializing :03:00.0
[ 193.887546] ib_mthca :03:00.0: Found bridge: :02:0c.0
[ 194.894937] ib_mthca :03:00.0: SYS_EN DDR error: syn=4, sock=0, sladdr=0, SPD source=DI
Hal,
Thanks.. works like a charm...
-Viswa
On 27 Sep 2005 16:13:01 -0400, Hal Rosenstock <[EMAIL PROTECTED]> wrote:
On Tue, 2005-09-27 at 16:00, Viswanath Krishnamurthy wrote:> Hal,>> I added a hack now to get around the problem. There needs to be a> proper fix later..Can yo
PROTECTED]> wrote:
On Tue, 2005-09-27 at 14:13, Viswanath Krishnamurthy wrote:> I tracked down the issue to a bug in osm_lid_mgr.c>> function: __osm_lid_mgr_init_sweep(...)>> The bad hardware was retutning an assigned LID of 0x. In this
> function there is a loop> as foll
d 16 bit numbers, the condition
in the for loop never becomes false, and opensm is stuck in the loop. There are couple of other places in that
function that needs fixing too.
-Viswa
On 9/27/05, Viswanath Krishnamurthy <[EMAIL PROTECTED]> wrote:
Log sent off-list...
-Viswa
On 9/27/05, Ei
Log sent off-list...
-Viswa
On 9/27/05, Eitan Zahavi <[EMAIL PROTECTED]> wrote:
Hi Viswa,Please send a full /var/log/osm.log file of opensm -V .You can send us a copy off the list if it is too big:yael and eitan in @mellanox.co.ilEZ
Hal Rosenstock wrote:> On Mon, 2005-09-26 at 19:57,
I have an exerciser in the IB network. The exerciser seems to be faulty/buggy. When opensm starts I do not
see 'SUBNET UP" message. It says "Entering MASTER" and waits there.
Any new node inserted in this state is not assigned any LID. Anybody seen such behavior ?
-Viswa
__
I ran into another opensm bug which caused opensm to stop functioning. This happened only once.
Here is the test case
1. Run opensm on Machine A
2. Run the following script on M/c B
a. Check ibstatus
b. Ping machine A
c. Run osmtest
d. reboot
The test case is to make sure opensm
rts up.Viswa, with all that said, it is very possible you are experiencing a bug in OpenSM and wewant to encourage your effort finding those. With your, and others, help we will be able to
flush them out.ThanksEitanHal Rosenstock wrote:> On Fri, 2005-09-23 at 14:57, Hal Rosenstock wrote:&g
On 23 Sep 2005 13:49:31 -0400, Hal Rosenstock <[EMAIL PROTECTED]> wrote:
Hi Viswa,On Fri, 2005-09-23 at 13:43, Viswanath Krishnamurthy wrote:> More information,>> The test case is as follows>> 1. Start opensm in verbose mode (-V)> 2. Ping remote node
> 3. osmtest -f c>
On 23 Sep 2005 13:59:28 -0400, Hal Rosenstock <[EMAIL PROTECTED]> wrote:
Hi Viswa,On Fri, 2005-09-23 at 13:55, Viswanath Krishnamurthy wrote:> Is there an API or command to force an IB link to go down.Not currently.> This will be helpful in running tests on opensm.
Yes, I can und
Hal,On 23 Sep 2005 14:04:00 -0400, Hal Rosenstock <[EMAIL PROTECTED]> wrote:
Hi again Viswa,On Fri, 2005-09-23 at 13:50, Viswanath Krishnamurthy wrote:Good test. Hadn't tried this. I will try it and will recreate this.> - 2 machines with a switch in bertween. One m/c running opensm.
Is there an API or command to force an IB link to go down. This will be helpful in running tests on opensm.
-Viswa
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general
To unsubscribe, please visit
from the SA to the SA client (osmtest). That's wherethis one is right now.-- Hal> Eitan>> Hal Rosenstock wrote:
> > Hi again Viswa,> >> > On Wed, 2005-09-21 at 21:00, Hal Rosenstock wrote:> >> >>Hi Viswa,> >>> >>On Wed, 2005-09-21 at 2
On 22 Sep 2005 18:44:44 -0400, Hal Rosenstock <[EMAIL PROTECTED]> wrote:
Hi Viswa,On Thu, 2005-09-22 at 15:55, Viswanath Krishnamurthy wrote:> Here is the log of osmtest failure. This was seen 150 times out of> 2500 iterations. The opensm SUBNET UP failure is tough to reproduce.
>
egister_service: ERR 0364: ib_query failed (IB_TIMEOUT).
Sep 21 17:51:04 779740 [B7F026C0] -> osmtest_run: ERR 00148: Service Flow failed (IB_TIMEOUT)
OSMTEST: TEST "All Validations" FAIL
-Viswa
On 22 Sep 2005 15:08:02 -0400, Hal Rosenstock <[EMAIL PROTECTED]> wrote:
On Thu, 2005-
Hal,On 22 Sep 2005 14:41:04 -0400, Hal Rosenstock <[EMAIL PROTECTED]> wrote:
Hi Viswa,On Thu, 2005-09-22 at 14:37, Viswanath Krishnamurthy wrote:> Hi Hal,>> Sure will test it out. I see no issue in this fix. I have run the> following test overnight> in a script with yesterd
ROTECTED]> wrote:
Hi again Viswa,On Wed, 2005-09-21 at 21:00, Hal Rosenstock wrote:> Hi Viswa,>> On Wed, 2005-09-21 at 20:23, Viswanath Krishnamurthy wrote:> > Currently opensm traps SIGINT. There was some discussion to remove it.
> > I have currently running some tests on op
Roland,
Thanks. Tested this out.. Works like a charm...
-Viswa
On 9/21/05, Roland Dreier <[EMAIL PROTECTED]> wrote:
Thanks very much for the excellent test case. The following patch(already checked into svn and queued in git for merging into 2.6.14)should fix things -- on my system, your test c
Hal,
Currently opensm traps SIGINT. There was some discussion to remove it. I have currently running some tests on opensm
by killing (SIGKILL) and restarting opensm. So far I ahve not found any resource leak issues. Is ther a plan to remove that
signal handler. Ideally it should not exist.
-Viswa
The mthca state transistion code allows this transistion (RTS
--> RESET), but the mthca hardware/firmware does not allow it. It
allows RTS->ERR->RESET. I will post the code later to
reproduce this. I was trying to workaround the CQ destroy memory
leak by caching QP entries and reusing them, but
When I try to modify QP state from RTS to RESET I get the following error
ib_mthca :05:00.0: Command 1e completed with status 0a
ib_mthca :05:00.0: modify QP 7 returned status 0a.
Is modifying QP state from RTS to RESET a valid state transistion ? (I guess so)
Are there anything else tha
I ran into this issue when using the kernel API to create CQ's. In order to reproduce this problem, I wrote
a small kernel module which creates 4K CQ's and destroys them. After running the test (8-10 times), I saw
create_cq error with error -12 (ENOMEM).
I am attaching the test module source c
I ran into this issue when using the kernel API to create CQ's. In order to reproduce this problem, I wrote
a small kernel module which creates 4K CQ's and destroys them. After running the test (8-10 times), I saw
create_cq error with error -12 (ENOMEM).
I am attaching the test module source code
Just wanted to confirm kernel mthca also works fine..
Thanks Roland & Michael
-Viswa
On 9/13/05, Viswanath Krishnamurthy <[EMAIL PROTECTED]> wrote:
Thanks.. yes that was the problem...
The panic was happening when I was getting these errors and pressed Ctrl-C on
the server. This
Thanks.. yes that was the problem...
The panic was happening when I was getting these errors and pressed Ctrl-C on
the server. This may be an error path issue.
I am not seeing it now..
-Viswa
On 9/13/05, Roland Dreier <[EMAIL PROTECTED]> wrote:
Viswanath> When I ran the cmpost program whic
Roland,
I got the latest sorces, built it along with the drivers.
Userland mthca
Your test application ran fine without any issue. (rctest)
When I ran the cmpost program which I sent you, I started getting errors from
the mthca library even for smaller number of connections (Earlie
I got the latest code from the repository to verify mthca fixes, I ran into this
strange configure error in libibcm
checking infiniband/at.h usability... yes
checking infiniband/at.h presence... yes
checking for infiniband/at.h... yes
checking for ANSI C header files... (cached) yes
checking for
Michael,
Thanks..
Roland,
Once you generate a kernel patch, I can test out both user and kernel mthca since I have the tests
ready..
-Viswa
On 9/13/05, Michael S. Tsirkin <[EMAIL PROTECTED]> wrote:
Quoting r. Roland Dreier <[EMAIL PROTECTED]>:> Subject: strange mem-free bug (was: [openib-genera
Can I start testing opensm 1.8 merge on gen2 ? What is the current status ?
-Viswa
___
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general
To unsubscribe, please visit http://openib.org/mailman/lis
Here is ibv_devinfo output. It is InfiniHost_III_Lx0
]# ibv_devinfo
hca_id: mthca0
fw_ver:
1.0.1
node_guid:
0002:c902:0040:0cfc
sys_image_guid:
0002:c902:0040:0cff
max_mr_size:
0xfff
Some more info..
This also happens in the kernel level. I have a small kernel module which does the echo
reply. After about 100-200 connections, I start to see the following message
ib_mthca :05:00.0: SQ 590473 full (8 head, 0 tail, 8 max, 0 nreq)
ib_mthca :05:00.0: SQ 590477 full (8 he
Somehow gmail ate away the main content of my mail..
Here it is..
I modified the cmpost program to have individual completion send/receive Q's. The mcpost
server acts like a echo server, echoing back anything it receives. The client program keeps sending
the packets.
The test works fine up
I modified the cmpost program to have individual completion send/receive Q's. The mcpost
server acts like a echo server, echoing back anything it receives. The client program keeps sending
the packets.
The test works fine upto around 600 connections. After 600 connections, I start to see ibv_post
See inline..On 02 Sep 2005 17:04:42 -0400, Hal Rosenstock <[EMAIL PROTECTED]> wrote:
On Fri, 2005-09-02 at 16:59, Viswanath Krishnamurthy wrote:> Here is the setup..Thanks. A couple more questions:> #svn info> Path: .>> URL:
https://openib.org/svn/gen2/trunk> Repository
, 2005-09-02 at 15:39, Viswanath Krishnamurthy wrote:> The patch failed to fix the panic..Can you describe your setup ? Did you just run ucmpost without an SM/SArunning or is it a different scenario ?
Thanks.-- Hal
___
openib-general mailing list
I am working on it. With the updated version of code, slightly difficult to reproduce.
-Viswa
On 9/2/05, Roland Dreier <[EMAIL PROTECTED]> wrote:
Not really related to the ib_at oops, since I don't know that code.But have you made any progress in being able to post the code toreproduce the other
The patch failed to fix the panic..
subnetmgr5 login: ib_at: ib_dev_ats_op: dev (c0449800) ib0 already has pending op 2
Unable to handle kernel NULL pointer dereference at virtual address 0068
printing eip:
c02fee65
*pde = 365a7001
Oops: [#1]
SMP
Modules linked in: nfsd exportfs lockd au
--- Roland Dreier <[EMAIL PROTECTED]> wrote:
> viswanath> Here is new list of issues with
> uverbs
>
> Thanks for the reports.
>
> viswanath> I have attached the firmware
> version/svn info in the
> viswanath> attachment.
>
> In the future can you attach things as text/plain
> (or
I will try out this patch and let you know..
Hal Rosenstock wrote:
> Here's a patch for this. Let me know if it works. [I tried it out and it
> works for me.] If it does, the next question is how does the pointer get
> trashed.
I don't think that the pointer is getting trashed. The SA was not r
--- Sean Hefty <[EMAIL PROTECTED]> wrote:
> viswanath krishnamurthy wrote:
> > 1. ib_cm_destroy_id(cm_id)
> > hangs (does return to the caller)
> > Is there a particular shutdown sequence
> > that needs to be followed ? Is there a
> trace/debug
&g
I have attached the firmware version/svn info in the
attachment.
Here is new list of issues with uverbs
1. ib_cm_destroy_id(cm_id)
hangs (does return to the caller)
Is there a particular shutdown sequence
that needs to be followed ? Is there a trace/debug
I can enable ?
2. libm
I have the latest openib code on 2.16 machine, when
I run the rc pingpong program I get the following
error (The first time it passed, but subsequent ones
got an error, I tried changing the iteration count to
a large number, 10 after the first time)
#dmesg
ib_mthca :05:00.0: Mapped page a
Still see the issue
1. I rebooted both the machines, started opensm, after LID assignment
killed opensm.
Next started the ucmpost client/server, killing it panics the system
-Viswa
Unable to handle kernel NULL pointer dereference at virtual address 0068
printing eip:
c02f2635
*pde = 366
I downloaded the latest openib gen2 stack and ran into kernel panic when
I run the cmpost/ucmpost example. I modified the program to continously
send and receive data in an infinite loop and killed the application
with ctrl-c.
The kernel panics pretty consistently.
I am currently running 2.6.12
nib is working now on gen2.
> but if you want you can look at mellanox IBGD 1.7.0
> from
> www.mellnaox.com follow the link "Download IB GOLD -
> 1.7.0"
> look for udapl code .
> The code is useing the user_cm IF
>
> Itamar
>
> > -----Original Messag
I looked further into the whole gen1 source tree.
There is no consumer of this useraccess_cm API
(ioctl). Are there any consumers of this API's. Is it
supported ?
Thanks,
Vish
--- viswanath krishnamurthy <[EMAIL PROTECTED]> wrote:
> Is there a sample code (examples) to use the gen
Is there a sample code (examples) to use the gen1
stack user level CM API (ioctls) ? Any pointers is
appreciated.
Thanks,
Vish
Yahoo! Sports
Rekindle the Rivalries. Sign up for Fantasy Football
http://football.fantasysports
63 matches
Mail list logo