Re: [Lustre-discuss] I/O error on clients

2010-07-20 Thread Peter Kitchener
Hi All,So I've applied the patch for the bug 22897 to the 1.8.2 source etc and rebuilt the lustre rpms. So far our development team has reported an improvement in the frequency of the I/O Errors but still not enough for them to be 100% happy. In the mean time i've been double checking all the setti

Re: [Lustre-discuss] I/O error on clients

2010-07-20 Thread Bernd Schubert
On Tuesday, July 20, 2010, Christopher J. Morrone wrote: > On 07/07/2010 01:04 AM, Gabriele Paciucci wrote: > > Hi, > > the ptlrcp bug is a problem, but i don't find in the Peter's logs any > > refer to an eviction caused by the ptlrpc but instead by a timeout > > during the comunication between a

Re: [Lustre-discuss] I/O error on clients

2010-07-20 Thread Christopher J. Morrone
On 07/07/2010 01:04 AM, Gabriele Paciucci wrote: > Hi, > the ptlrcp bug is a problem, but i don't find in the Peter's logs any > refer to an eviction caused by the ptlrpc but instead by a timeout > during the comunication between a ost and the client. But Peter could > make a downgrade to 1.8.1.1 t

Re: [Lustre-discuss] I/O error on clients

2010-07-14 Thread Andreas Dilger
On 2010-07-13, at 18:56, Peter Kitchener wrote: > Is there any sort of ETA on 1.8.4? The scheduled release data is July 31. >> See bug 22897 for a description of the bug. But the fix is a simple >> one-liner in bug 22786, attachment 29866. The fix will first appear in >> lustre 1.8.4. I woul

Re: [Lustre-discuss] I/O error on clients

2010-07-14 Thread Peter Jones
Yes, it is scheduled for release by the end of the month. Peter Kitchener wrote: > > Is there any sort of ETA on 1.8.4? > > ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss

Re: [Lustre-discuss] I/O error on clients

2010-07-13 Thread Peter Kitchener
Hi Everyone, Is there any sort of ETA on 1.8.4?Also,See bug 22897 for a description of the bug.  But the fix is a simple one-liner in bug 22786, attachment 29866.  The fix will first appear in lustre 1.8.4.  I would highly recommend to anyone using 1.8.2 or 1.8.3 that they add that patch.I have att

Re: [Lustre-discuss] I/O error on clients

2010-07-07 Thread Gabriele Paciucci
Hi, the ptlrcp bug is a problem, but i don't find in the Peter's logs any refer to an eviction caused by the ptlrpc but instead by a timeout during the comunication between a ost and the client. But Peter could make a downgrade to 1.8.1.1 that not suffer by the problem. My action plan could be

Re: [Lustre-discuss] I/O error on clients

2010-07-06 Thread Peter Kitchener
Hi Chris, See bug 22897 for a description of the bug.  But the fix is a simple one-liner in bug 22786, attachment 29866.  The fix will first appear in lustre 1.8.4.  I would highly recommend to anyone using 1.8.2 or 1.8.3 that they add that patch.How would I safely do that when i've installed lustr

Re: [Lustre-discuss] I/O error on clients

2010-07-06 Thread Christopher J. Morrone
On 07/05/2010 11:19 PM, Peter Kitchener wrote: > Hi all, > > I have been troubleshooting a strange problem that is occurring with our > Lustre setup. Under high loads our developers are complaining that various > processes they run will error out with I/O error. > > Our setup is small 1 MDS and 2

Re: [Lustre-discuss] I/O error on clients

2010-07-06 Thread Peter Kitchener
Hi All, The NICs we're using are Broadcom Corporation NetXtreme II BCM57711 10-Gigabit PCIe They're a Dual Port SFP+ card. Connected to a Dell PowerConnect 8024F.Currently we're using the open source driver bnx2x as provided in the kernel, there doesn't appear to be any dropped packets on the clien

Re: [Lustre-discuss] I/O error on clients

2010-07-06 Thread Wojciech Turek
The source of the I/O error is the eviction of the client by the OSS server that was not able to reclaim a lock from that client in specified timeout window (100s) OSS: == Jul 6 15:10:17 helium kernel: LustreError: 6708:0:(ldlm_lockd.c:305: waiting_locks_callback()) ### lock callback timer ex

Re: [Lustre-discuss] I/O error on clients

2010-07-06 Thread Gabriele Paciucci
Hi Peter, which 10GbE Card do you have? I've solved similar problem with a Netxen Card (HP Blade Mezzanine Card) using the nx_nic proprietary driver instead of the "open source" driver. In every case the problem is that your users fill the network between client and ost !!! On 07/06/2010 08:19