Re: [PATCH] ipoib: Fix lockup of the tx queue

2010-03-16 Thread Josh England
On Mon, Mar 15, 2010 at 11:33 PM, Eli Cohen wrote: > On Mon, Mar 15, 2010 at 08:25:51AM -0800, Josh England wrote: >> Everything has MT264328 ConnectX cards using the mlx4_ib driver. >> Boot/file servers are using an HP OEM 2.7.000 firmware.  Compute nodes >> have cards using Sun OEM 2.6.200 FW. >

Re: [PATCH] ipoib: Fix lockup of the tx queue

2010-03-15 Thread Eli Cohen
On Mon, Mar 15, 2010 at 08:25:51AM -0800, Josh England wrote: > Everything has MT264328 ConnectX cards using the mlx4_ib driver. > Boot/file servers are using an HP OEM 2.7.000 firmware. Compute nodes > have cards using Sun OEM 2.6.200 FW. > You probably mean MT26428? Anyway, do you still see th

Re: [PATCH] ipoib: Fix lockup of the tx queue

2010-03-15 Thread Josh England
Everything has MT264328 ConnectX cards using the mlx4_ib driver. Boot/file servers are using an HP OEM 2.7.000 firmware. Compute nodes have cards using Sun OEM 2.6.200 FW. -JE On Sat, Mar 13, 2010 at 10:52 PM, Eli Cohen wrote: > On Thu, Mar 11, 2010 at 01:38:38PM -0800, Roland Dreier wrote: >>

Re: [PATCH] ipoib: Fix lockup of the tx queue

2010-03-13 Thread Eli Cohen
On Thu, Mar 11, 2010 at 01:38:38PM -0800, Roland Dreier wrote: > > I do worry (as Moni mentioned) that this doesn't explain why you would > get send failures in this case, but the patch itself is well-explained > and looks "obviously correct" so I think we should apply it. It could be a problem i

Re: [ewg] [PATCH] ipoib: Fix lockup of the tx queue

2010-03-11 Thread Ralph Campbell
On Thu, 2010-03-11 at 13:52 -0800, Roland Dreier wrote: > > Sorry, I was referring to my patch not Eli's. > > Heh, I never would have said anything about your patch was "obvious". > I skimmed yours once but I do want to read it more carefully. > > Did you ever say what test case you are using to

Re: [ewg] [PATCH] ipoib: Fix lockup of the tx queue

2010-03-11 Thread Roland Dreier
> Sorry, I was referring to my patch not Eli's. Heh, I never would have said anything about your patch was "obvious". I skimmed yours once but I do want to read it more carefully. Did you ever say what test case you are using to provoke the problem you're fixing? -- Roland Dreier For corpora

Re: [ewg] [PATCH] ipoib: Fix lockup of the tx queue

2010-03-11 Thread Ralph Campbell
Sorry, I was referring to my patch not Eli's. On Thu, 2010-03-11 at 13:41 -0800, Ralph Campbell wrote: > On Thu, 2010-03-11 at 13:38 -0800, Roland Dreier wrote: > > good debugging, applied thanks. > > > > I do worry (as Moni mentioned) that this doesn't explain why you would > > get send failures

Re: [ewg] [PATCH] ipoib: Fix lockup of the tx queue

2010-03-11 Thread Ralph Campbell
On Thu, 2010-03-11 at 13:38 -0800, Roland Dreier wrote: > good debugging, applied thanks. > > I do worry (as Moni mentioned) that this doesn't explain why you would > get send failures in this case, but the patch itself is well-explained > and looks "obviously correct" so I think we should apply i

Re: [PATCH] ipoib: Fix lockup of the tx queue

2010-03-11 Thread Roland Dreier
good debugging, applied thanks. I do worry (as Moni mentioned) that this doesn't explain why you would get send failures in this case, but the patch itself is well-explained and looks "obviously correct" so I think we should apply it. -- Roland Dreier For corporate legal information go to: http

[PATCH] ipoib: Fix lockup of the tx queue

2010-03-03 Thread Eli Cohen
The ipoib UD QP reports send completions to priv->send_cq which is unarmed generally; it only gets armed when the number of outstanding send requests (e.g. those for which a completion was not polled yet) reaches the size of the tx queue. This arming (done using ib_req_notify_cq()) is done only in