[openib-general] opensm crash with topspin HCA

2006-11-02 Thread Viswanath Krishnamurthy
When we run opensm (OFED) release and if a Topspin HCA is in the IB network, opensm crashes in umad_receiver with NULL pointer exception.  The transaction ID is zero is the MAD'S from topspin HCA on windows. The crashes seems to random in umad_receiver. HCA found:     hca_id=InfiniHost0

[openib-general] CM and REP handling

2006-06-30 Thread Viswanath Krishnamurthy
In the current communication manager (CM) implementation how is the REP MADgetting lost handled. When the REP gets lost, the cm_dup_req_handler gets calledwhich currently enters the default condition and does nothing.  The client retries the number of timers it is configured to and fails.  If the f

[openib-general] Disabling end-to-end flow control

2006-06-22 Thread Viswanath Krishnamurthy
Is there a way to disable end-to-end flowcontrol using any of the API's ?Thanks,-Viswa ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/list

Re: [openib-general] opensm and NPTL

2006-06-13 Thread Viswanath Krishnamurthy
I am using the trunk.   Should I be using 1.0 ? -Viswa  On 13 Jun 2006 12:35:17 -0400, Hal Rosenstock <[EMAIL PROTECTED]> wrote: On Tue, 2006-06-13 at 12:21, Viswanath Krishnamurthy wrote:> Yes.. I want to test waters again and see if the issues went away.Are you using the trunk or 1.

Re: [openib-general] opensm and NPTL

2006-06-13 Thread Viswanath Krishnamurthy
Yes.. I want to test waters again and see if the issues went away. -Viswa On 13 Jun 2006 06:15:34 -0400, Hal Rosenstock <[EMAIL PROTECTED]> wrote: Hi Viswa,On Mon, 2006-06-12 at 23:16, Viswanath Krishnamurthy wrote:> There were some issues with opensm running with NPTL  (thread> libra

[openib-general] opensm and NPTL

2006-06-12 Thread Viswanath Krishnamurthy
There were some issues with opensm running with NPTL  (thread library). Has the issues been resolved ? Regards, Viswa ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please vis

Re: [openib-general] Fix for ibping

2006-04-13 Thread Viswanath Krishnamurthy
Works like a charm... -Viswa On 12 Apr 2006 21:32:33 -0400, Hal Rosenstock <[EMAIL PROTECTED]> wrote: On Wed, 2006-04-12 at 20:46, Hal Rosenstock wrote:> On Wed, 2006-04-12 at 18:25, Viswanath Krishnamurthy wrote:> > The RMPP version needs to be 1.>> Thanks. I'm not

Re: [openib-general] Fix for ibping

2006-04-12 Thread Viswanath Krishnamurthy
The mad_register_agent function in mad.c kernel file was checking for rmpp_version. This was failing and this failure was propagated to umad (thru ioctl) On 12 Apr 2006 20:46:33 -0400, Hal Rosenstock <[EMAIL PROTECTED]> wrote: On Wed, 2006-04-12 at 18:25, Viswanath Krishnamurthy wrote:>

[openib-general] Fix for ibping

2006-04-12 Thread Viswanath Krishnamurthy
The RMPP version needs to be 1. [EMAIL PROTECTED] src]# svn diff ibping.c Index: ibping.c === --- ibping.c    (revision 6446) +++ ibping.c    (working copy) @@ -336,7 +336,7 @@     exit(0);     } -   if (mad_regis

[openib-general] ibping broken in SVN 6446 ?

2006-04-12 Thread Viswanath Krishnamurthy
When I do a ibping I get an error  (on a 32 bit machine) Linux Kernel: 2.6.16 infiniband directory replaced with SVN6446 I  enable debug in umad.c, I get the following error. The ioctl call to the umad  driver (umad device) is failing. return value for ioctl is -1, errno is -22 (EINVAL) portid

Re: [openib-general] Mainline 2.6.16 kernel with openib userland libraries

2006-03-27 Thread Viswanath Krishnamurthy
My guess is the bug is in userspace library, since a kernel module which uses the same API's in kernel mode works fine. I will work on the sample code and send it.. -Viswa On 3/27/06, Roland Dreier <[EMAIL PROTECTED]> wrote: Roland> Did this code work with mainline kernel 2.6.15?  If so you   

[openib-general] Mainline 2.6.16 kernel with openib userland libraries

2006-03-27 Thread Viswanath Krishnamurthy
I tried using openib userland libraries with mainline 2.6.16  kernel but ran into a strange problem. A userland application which uses CM and VERBS library which works fine with earlier releases stopped working with no error (in API's). When I put the analyser on, I see the CM connect sequence is f

[openib-general] mthca and coalesced ACK

2006-02-21 Thread Viswanath Krishnamurthy
When the HCA receives back to back RDMA write followed by RDMA read requests. It  generates coalesced ACK (implicit ACK for RDMA write). Is there a configuration in the mthca driver which will enable HCA firmware to generate individual ACK's.  I an trying to debug another issue and this will be hel

[openib-general] Getting the right userspace libraries

2006-02-16 Thread Viswanath Krishnamurthy
How does one pull out the correct userland libraries for 2.6.16 kernel IB stack. Is it to look at the SVN number in the driver code, and pull that version ? -Viswa ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listi

[openib-general] mthca and non-MSI system

2005-11-18 Thread Viswanath Krishnamurthy
Has the mthca driver been tested on non-MSI (interrupt) system. I seem to have a problem where interrupts are not generated on non-MSI system with the following message "NOP command failed to generate interrupt (IRQ 9), aborting." BIOS or ACPI interrupt routing problem? -Viswa __

[openib-general] Vendor specific MAD support

2005-10-04 Thread Viswanath Krishnamurthy
Does openIB Gen2 stack umad/mad library support Vendor specific MAD extensions ? -Viswa ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/l

[openib-general] mthca error ?

2005-09-28 Thread Viswanath Krishnamurthy
Roland, I see the following when I use the latest mthca driver on a different HCA card [  193.882759] ib_mthca: Initializing :03:00.0 [  193.887546] ib_mthca :03:00.0: Found bridge: :02:0c.0 [  194.894937] ib_mthca :03:00.0: SYS_EN DDR error: syn=4, sock=0, sladdr=0, SPD source=DI

Re: [openib-general] opensm and faulty hardware

2005-09-27 Thread Viswanath Krishnamurthy
Hal, Thanks.. works like a charm... -Viswa On 27 Sep 2005 16:13:01 -0400, Hal Rosenstock <[EMAIL PROTECTED]> wrote: On Tue, 2005-09-27 at 16:00, Viswanath Krishnamurthy wrote:> Hal,>> I added a hack now to get around the problem. There needs to be a> proper fix later..Can yo

Re: [openib-general] opensm and faulty hardware

2005-09-27 Thread Viswanath Krishnamurthy
PROTECTED]> wrote: On Tue, 2005-09-27 at 14:13, Viswanath Krishnamurthy wrote:> I tracked down the issue to a bug in osm_lid_mgr.c>> function:  __osm_lid_mgr_init_sweep(...)>> The bad hardware was retutning an assigned LID of 0x. In this > function there is a loop> as foll

Re: [openib-general] opensm and faulty hardware

2005-09-27 Thread Viswanath Krishnamurthy
d 16 bit numbers, the condition in the for loop never becomes false, and opensm is stuck in the loop.  There are couple of other places in that function that needs fixing too. -Viswa On 9/27/05, Viswanath Krishnamurthy <[EMAIL PROTECTED]> wrote: Log sent off-list... -Viswa On 9/27/05, Ei

Re: [openib-general] opensm and faulty hardware

2005-09-27 Thread Viswanath Krishnamurthy
Log sent off-list... -Viswa On 9/27/05, Eitan Zahavi <[EMAIL PROTECTED]> wrote: Hi Viswa,Please send a full /var/log/osm.log file of opensm -V .You can send us a copy off the list if it is too big:yael and eitan in @mellanox.co.ilEZ Hal Rosenstock wrote:> On Mon, 2005-09-26 at 19:57,

[openib-general] opensm and faulty hardware

2005-09-26 Thread Viswanath Krishnamurthy
I have an exerciser in the IB network. The exerciser seems to be faulty/buggy. When opensm starts I do not see 'SUBNET UP" message. It says "Entering MASTER"  and waits there. Any new node inserted in this state is not assigned any LID.   Anybody seen such behavior ? -Viswa __

[openib-general] Another opensm bug ?

2005-09-26 Thread Viswanath Krishnamurthy
I ran into another opensm bug which caused opensm to stop functioning. This happened only once. Here is the test case 1. Run opensm on Machine A 2. Run the following script on M/c B     a. Check ibstatus     b. Ping machine A     c. Run osmtest d. reboot The test case is to make sure opensm

Re: [openib-general] Re: Another opensm problem ?

2005-09-26 Thread Viswanath Krishnamurthy
rts up.Viswa, with all that said, it is very possible you are experiencing a bug in OpenSM and wewant to encourage your effort finding those. With your, and others, help we will be able to flush them out.ThanksEitanHal Rosenstock wrote:> On Fri, 2005-09-23 at 14:57, Hal Rosenstock wrote:&g

Re: [openib-general] Re: opensm and SIGINT

2005-09-23 Thread Viswanath Krishnamurthy
On 23 Sep 2005 13:49:31 -0400, Hal Rosenstock <[EMAIL PROTECTED]> wrote: Hi Viswa,On Fri, 2005-09-23 at 13:43, Viswanath Krishnamurthy wrote:> More information,>> The test case is as follows>> 1. Start opensm in verbose mode (-V)> 2. Ping remote node > 3. osmtest -f c>

Re: [openib-general] Forcing IB link state down

2005-09-23 Thread Viswanath Krishnamurthy
On 23 Sep 2005 13:59:28 -0400, Hal Rosenstock <[EMAIL PROTECTED]> wrote: Hi Viswa,On Fri, 2005-09-23 at 13:55, Viswanath Krishnamurthy wrote:> Is there an API or command to force an IB link to go down.Not currently.>  This will be helpful in running tests on opensm. Yes, I can und

[openib-general] Re: Another opensm problem ?

2005-09-23 Thread Viswanath Krishnamurthy
Hal,On 23 Sep 2005 14:04:00 -0400, Hal Rosenstock <[EMAIL PROTECTED]> wrote: Hi again Viswa,On Fri, 2005-09-23 at 13:50, Viswanath Krishnamurthy wrote:Good test. Hadn't tried this. I will try it and will recreate this.> - 2 machines with a switch in bertween. One m/c running opensm.

[openib-general] Forcing IB link state down

2005-09-23 Thread Viswanath Krishnamurthy
Is there an API or command to force an IB link to go down. This will be helpful in running tests on opensm. -Viswa ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit

Re: [openib-general] Re: opensm and SIGINT

2005-09-23 Thread Viswanath Krishnamurthy
from the SA to the SA client (osmtest). That's wherethis one is right now.-- Hal> Eitan>> Hal Rosenstock wrote: > > Hi again Viswa,> >> > On Wed, 2005-09-21 at 21:00, Hal Rosenstock wrote:> >> >>Hi Viswa,> >>> >>On Wed, 2005-09-21 at 2

Re: [openib-general] Re: opensm and SIGINT

2005-09-22 Thread Viswanath Krishnamurthy
On 22 Sep 2005 18:44:44 -0400, Hal Rosenstock <[EMAIL PROTECTED]> wrote: Hi Viswa,On Thu, 2005-09-22 at 15:55, Viswanath Krishnamurthy wrote:> Here is the log of osmtest failure. This was seen 150 times out of> 2500 iterations. The opensm SUBNET UP failure is tough to reproduce. >

Re: [openib-general] Re: opensm and SIGINT

2005-09-22 Thread Viswanath Krishnamurthy
egister_service: ERR 0364: ib_query failed (IB_TIMEOUT). Sep 21 17:51:04 779740 [B7F026C0] -> osmtest_run: ERR 00148: Service Flow failed (IB_TIMEOUT) OSMTEST: TEST "All Validations" FAIL -Viswa On 22 Sep 2005 15:08:02 -0400, Hal Rosenstock <[EMAIL PROTECTED]> wrote: On Thu, 2005-

Re: [openib-general] Re: opensm and SIGINT

2005-09-22 Thread Viswanath Krishnamurthy
Hal,On 22 Sep 2005 14:41:04 -0400, Hal Rosenstock <[EMAIL PROTECTED]> wrote: Hi Viswa,On Thu, 2005-09-22 at 14:37, Viswanath Krishnamurthy wrote:> Hi Hal,>> Sure will test it out. I see no issue in this fix. I have run the> following test overnight> in a script with yesterd

Re: [openib-general] Re: opensm and SIGINT

2005-09-22 Thread Viswanath Krishnamurthy
ROTECTED]> wrote: Hi again Viswa,On Wed, 2005-09-21 at 21:00, Hal Rosenstock wrote:> Hi Viswa,>> On Wed, 2005-09-21 at 20:23, Viswanath Krishnamurthy wrote:> > Currently opensm traps SIGINT. There was some discussion to remove it. > > I have currently running some tests on op

Re: [openib-general] ib_create_cq memory leak?

2005-09-22 Thread Viswanath Krishnamurthy
Roland, Thanks.  Tested this out.. Works like a charm... -Viswa On 9/21/05, Roland Dreier <[EMAIL PROTECTED]> wrote: Thanks very much for the excellent test case.  The following patch(already checked into svn and queued in git for merging into 2.6.14)should fix things -- on my system, your test c

[openib-general] opensm and SIGINT

2005-09-21 Thread Viswanath Krishnamurthy
Hal, Currently opensm traps SIGINT. There was some discussion to remove it. I have currently running some tests on opensm by killing (SIGKILL) and restarting opensm. So far I ahve not found any resource leak issues. Is ther a plan to remove that signal handler. Ideally it should not exist. -Viswa

Re: [openib-general] Modifying QP state error

2005-09-21 Thread Viswanath Krishnamurthy
The mthca state transistion  code allows this transistion (RTS --> RESET), but the mthca hardware/firmware does not allow it. It allows RTS->ERR->RESET. I will post the code later  to reproduce this. I was trying to workaround the CQ destroy memory  leak by caching QP entries and reusing them, but

[openib-general] Modifying QP state error

2005-09-21 Thread Viswanath Krishnamurthy
When I try to modify QP state from RTS to RESET I get the following error ib_mthca :05:00.0: Command 1e completed with status 0a ib_mthca :05:00.0: modify QP 7 returned status 0a. Is modifying QP state from RTS to RESET a valid state transistion ?  (I guess so) Are there anything else tha

[openib-general] ib_create_cq memory leak? (Resend)

2005-09-21 Thread Viswanath Krishnamurthy
I ran into this issue when using the kernel API to create CQ's. In order to reproduce this problem, I wrote a small kernel module which creates 4K CQ's and destroys them. After running the test (8-10 times), I saw create_cq error with error -12 (ENOMEM). I am attaching the test module source c

[openib-general] ib_create_cq memory leak?

2005-09-21 Thread Viswanath Krishnamurthy
I ran into this issue when using the kernel API to create CQ's. In order to reproduce this problem, I wrote a small kernel module which creates 4K CQ's and destroys them. After running the test (8-10 times), I saw create_cq error with error -12 (ENOMEM). I am attaching the test module source code

[openib-general] Re: [PATCH] libmthca: fix wqe post

2005-09-13 Thread Viswanath Krishnamurthy
Just wanted to confirm kernel mthca also works fine.. Thanks Roland & Michael -Viswa On 9/13/05, Viswanath Krishnamurthy <[EMAIL PROTECTED]> wrote: Thanks.. yes that was the problem... The panic was happening when I was getting these errors  and pressed Ctrl-C on the server. This

[openib-general] Re: [PATCH] libmthca: fix wqe post

2005-09-13 Thread Viswanath Krishnamurthy
Thanks.. yes that was the problem... The panic was happening when I was getting these errors  and pressed Ctrl-C on the server. This may be an error path issue. I am not seeing it now.. -Viswa On 9/13/05, Roland Dreier <[EMAIL PROTECTED]> wrote: Viswanath> When I ran the cmpost program whic

[openib-general] Re: [PATCH] libmthca: fix wqe post

2005-09-13 Thread Viswanath Krishnamurthy
Roland, I got the latest sorces, built it along with the drivers.  Userland mthca Your test application ran fine without any issue. (rctest) When I ran the cmpost program which I sent you, I started getting errors from the mthca library even for smaller number of connections (Earlie

[openib-general] Strange configure error in libibcm

2005-09-13 Thread Viswanath Krishnamurthy
I got the latest code from the repository to verify mthca fixes, I ran into this strange configure error in libibcm checking infiniband/at.h usability... yes checking infiniband/at.h presence... yes checking for infiniband/at.h... yes checking for ANSI C header files... (cached) yes checking for

[openib-general] Re: [PATCH] libmthca: fix wqe post (was Re: strange mem-free bug)

2005-09-13 Thread Viswanath Krishnamurthy
Michael, Thanks.. Roland, Once you generate a kernel patch, I can test out both user and kernel mthca since I have the tests ready.. -Viswa On 9/13/05, Michael S. Tsirkin <[EMAIL PROTECTED]> wrote: Quoting r. Roland Dreier <[EMAIL PROTECTED]>:> Subject: strange mem-free bug (was: [openib-genera

[openib-general] Status of opensm 1.8 merge

2005-09-12 Thread Viswanath Krishnamurthy
Can I start testing opensm 1.8 merge on gen2  ?   What is the current status ? -Viswa ___ openib-general mailing list openib-general@openib.org http://openib.org/mailman/listinfo/openib-general To unsubscribe, please visit http://openib.org/mailman/lis

Re: [openib-general] completion Q overflow error/panic

2005-09-10 Thread Viswanath Krishnamurthy
Here is ibv_devinfo output. It is InfiniHost_III_Lx0 ]# ibv_devinfo hca_id: mthca0     fw_ver: 1.0.1     node_guid:  0002:c902:0040:0cfc     sys_image_guid: 0002:c902:0040:0cff     max_mr_size:    0xfff

Re: [openib-general] completion Q overflow error/panic

2005-09-09 Thread Viswanath Krishnamurthy
Some more info.. This also happens in the kernel level. I have a small kernel module which does the echo reply.  After about 100-200 connections, I start to see the following message ib_mthca :05:00.0: SQ 590473 full (8 head, 0 tail, 8 max, 0 nreq) ib_mthca :05:00.0: SQ 590477 full (8 he

[openib-general] completion Q overflow error/panic

2005-09-09 Thread Viswanath Krishnamurthy
Somehow gmail ate away the main content of my mail.. Here it is.. I modified the cmpost program to have individual completion send/receive Q's.  The mcpost server acts like a echo server, echoing back anything it receives. The client program keeps sending the packets. The test works fine up

[openib-general] completion Q overflow error/panic

2005-09-09 Thread Viswanath Krishnamurthy
I modified the cmpost program to have individual completion send/receive Q's.  The mcpost server acts like a echo server, echoing back anything it receives. The client program keeps sending the packets. The test works fine upto around 600 connections. After 600 connections, I start to see ibv_post

Re: [Fwd: Re: [openib-general] kernel oops]

2005-09-02 Thread Viswanath Krishnamurthy
See inline..On 02 Sep 2005 17:04:42 -0400, Hal Rosenstock <[EMAIL PROTECTED]> wrote: On Fri, 2005-09-02 at 16:59, Viswanath Krishnamurthy wrote:> Here is the setup..Thanks. A couple more questions:> #svn info> Path: .>> URL: https://openib.org/svn/gen2/trunk> Repository

Re: [Fwd: Re: [openib-general] kernel oops]

2005-09-02 Thread Viswanath Krishnamurthy
, 2005-09-02 at 15:39, Viswanath Krishnamurthy wrote:> The patch failed to fix the panic..Can you describe your setup ? Did you just run ucmpost without an SM/SArunning or is it a different scenario ? Thanks.-- Hal ___ openib-general mailing list

Re: [Fwd: Re: [openib-general] kernel oops]

2005-09-02 Thread Viswanath Krishnamurthy
I am working on it. With the updated version of code, slightly difficult to reproduce. -Viswa On 9/2/05, Roland Dreier <[EMAIL PROTECTED]> wrote: Not really related to the ib_at oops, since I don't know that code.But have you made any progress in being able to post the code toreproduce the other

Re: [Fwd: Re: [openib-general] kernel oops]

2005-09-02 Thread Viswanath Krishnamurthy
The patch failed to fix the panic.. subnetmgr5 login: ib_at: ib_dev_ats_op: dev (c0449800) ib0 already has pending op 2 Unable to handle kernel NULL pointer dereference at virtual address 0068  printing eip: c02fee65 *pde = 365a7001 Oops: [#1] SMP Modules linked in: nfsd exportfs lockd au

[openib-general] Re: List of issues in uverbs

2005-09-01 Thread viswanath krishnamurthy
--- Roland Dreier <[EMAIL PROTECTED]> wrote: > viswanath> Here is new list of issues with > uverbs > > Thanks for the reports. > > viswanath> I have attached the firmware > version/svn info in the > viswanath> attachment. > > In the future can you attach things as text/plain > (or

Re: [openib-general] kernel oops

2005-09-01 Thread Viswanath Krishnamurthy
I will try out this patch and let you know.. Hal Rosenstock wrote: > Here's a patch for this. Let me know if it works. [I tried it out and it > works for me.] If it does, the next question is how does the pointer get > trashed. I don't think that the pointer is getting trashed.  The SA was not r

Re: [openib-general] List of issues in uverbs

2005-08-31 Thread viswanath krishnamurthy
--- Sean Hefty <[EMAIL PROTECTED]> wrote: > viswanath krishnamurthy wrote: > > 1. ib_cm_destroy_id(cm_id) > > hangs (does return to the caller) > > Is there a particular shutdown sequence > > that needs to be followed ? Is there a > trace/debug &g

[openib-general] List of issues in uverbs

2005-08-31 Thread viswanath krishnamurthy
I have attached the firmware version/svn info in the attachment. Here is new list of issues with uverbs 1. ib_cm_destroy_id(cm_id) hangs (does return to the caller) Is there a particular shutdown sequence that needs to be followed ? Is there a trace/debug I can enable ? 2. libm

[openib-general] rc ping pong error

2005-08-29 Thread viswanath krishnamurthy
I have the latest openib code on 2.16 machine, when I run the rc pingpong program I get the following error (The first time it passed, but subsequent ones got an error, I tried changing the iteration count to a large number, 10 after the first time) #dmesg ib_mthca :05:00.0: Mapped page a

Re: [openib-general] kernel oops

2005-08-26 Thread Viswanath Krishnamurthy
Still see the issue 1. I rebooted both the machines, started opensm, after LID assignment killed opensm. Next started the ucmpost client/server, killing it panics the system -Viswa Unable to handle kernel NULL pointer dereference at virtual address 0068 printing eip: c02f2635 *pde = 366

[openib-general] kernel oops

2005-08-26 Thread Viswanath Krishnamurthy
I downloaded the latest openib gen2 stack and ran into kernel panic when I run the cmpost/ucmpost example. I modified the program to continously send and receive data in an infinite loop and killed the application with ctrl-c. The kernel panics pretty consistently. I am currently running 2.6.12

RE: [openib-general] Re: useraccess_cm sample client/server (gen1 )

2005-07-07 Thread viswanath krishnamurthy
nib is working now on gen2. > but if you want you can look at mellanox IBGD 1.7.0 > from > www.mellnaox.com follow the link "Download IB GOLD - > 1.7.0" > look for udapl code . > The code is useing the user_cm IF > > Itamar > > > -----Original Messag

[openib-general] Re: useraccess_cm sample client/server (gen1)

2005-07-06 Thread viswanath krishnamurthy
I looked further into the whole gen1 source tree. There is no consumer of this useraccess_cm API (ioctl). Are there any consumers of this API's. Is it supported ? Thanks, Vish --- viswanath krishnamurthy <[EMAIL PROTECTED]> wrote: > Is there a sample code (examples) to use the gen

[openib-general] useraccess_cm sample client/server (gen1)

2005-07-05 Thread viswanath krishnamurthy
Is there a sample code (examples) to use the gen1 stack user level CM API (ioctls) ? Any pointers is appreciated. Thanks, Vish Yahoo! Sports Rekindle the Rivalries. Sign up for Fantasy Football http://football.fantasysports