Re: [Lustre-discuss] problem reading HDF files on 1.8.5 filesystem

2011-05-04 Thread Christopher Walker
Hi Larry, Everything below is with the filesystem mounted with localflock. This does indeed look a lot like the bug referred to by David Dillow (thanks!) Chris On 5/4/11 10:05 PM, Larry wrote: > try mounting the lustre filesystem with -o flock or -o localflock > > On Thu, May 5, 2011 at 4:47 A

Re: [Lustre-discuss] windows native client

2011-05-04 Thread Mag Gam
horray to oracle! not On Wed, Apr 20, 2011 at 10:57 AM, Colin Faber wrote: > This port is no longer available from Oracle. > > -cf > > > On 04/20/2011 07:30 AM, hua zhou wrote: >> >> Hi, >> >> From the website: >> (http://wiki.lustre.org/index.php/Windows_Native_Client#Download_and_S...) >> >>

Re: [Lustre-discuss] problem reading HDF files on 1.8.5 filesystem

2011-05-04 Thread Larry
try mounting the lustre filesystem with -o flock or -o localflock On Thu, May 5, 2011 at 4:47 AM, Christopher Walker wrote: > Hello, > > We have a user who is trying to post-process HDF files in R.  Her script > goes through a number (~2500) of files in a directory, opening and > reading the cont

Re: [Lustre-discuss] Lustre HA Experiences

2011-05-04 Thread Jason Rappleye
On May 4, 2011, at 10:05 AM, Charles Taylor wrote: > > We are dipping our toes into the waters of Lustre HA using > pacemaker. We have 16 7.2 TB OSTs across 4 OSSs (4 OSTs each). > The four OSSs are broken out into two dual-active pairs running Lustre > 1.8.5.Mostly, the water is

Re: [Lustre-discuss] problem reading HDF files on 1.8.5 filesystem

2011-05-04 Thread David Dillow
On Wed, 2011-05-04 at 16:47 -0400, Christopher Walker wrote: > Hello, > > We have a user who is trying to post-process HDF files in R. Her script > goes through a number (~2500) of files in a directory, opening and > reading the contents. This usually goes fine, but occasionally the > script

[Lustre-discuss] problem reading HDF files on 1.8.5 filesystem

2011-05-04 Thread Christopher Walker
Hello, We have a user who is trying to post-process HDF files in R. Her script goes through a number (~2500) of files in a directory, opening and reading the contents. This usually goes fine, but occasionally the script dies with: HDF5-DIAG: Error detected in HDF5 (1.9.4) thread 46944713368

Re: [Lustre-discuss] Lustre HA Experiences

2011-05-04 Thread Justin Miller
We're investigating Pacemaker HA setup here too, so I'm interested in your findings, and I hope I can help a little here. 1. So it seems like totem is not responding on some, but still running on others, if they take the initiative to stonith. I would investigate bumping up or adding some paramet

[Lustre-discuss] Lustre HA Experiences

2011-05-04 Thread Charles Taylor
We are dipping our toes into the waters of Lustre HA using pacemaker. We have 16 7.2 TB OSTs across 4 OSSs (4 OSTs each). The four OSSs are broken out into two dual-active pairs running Lustre 1.8.5.Mostly, the water is fine but we've encountered a few surprises. 1. An 8-client

Re: [Lustre-discuss] Too many client eviction

2011-05-04 Thread Johann Lombardi
On Wed, May 04, 2011 at 04:05:56PM +0200, DEGREMONT Aurelien wrote: > >> if client/server do not re-send their RPC. > >> > > To be clear, clients go through a disconnect/reconnect cycle and eventually > > resend RPCs. > > > I'm not sure I understand clearly what happens there. I just mean

Re: [Lustre-discuss] Too many client eviction

2011-05-04 Thread DEGREMONT Aurelien
Johann Lombardi a écrit : > On Wed, May 04, 2011 at 01:37:14PM +0200, DEGREMONT Aurelien wrote: > >>> I assume that the 25315s is from a bug >>> > BTW, do you see this problem with both extent & inodebits locks? > Yes both. But more often on MDS. >> How can I track those dropped RPCs o

Re: [Lustre-discuss] Too many client eviction

2011-05-04 Thread Johann Lombardi
On Wed, May 04, 2011 at 01:37:14PM +0200, DEGREMONT Aurelien wrote: > > I assume that the 25315s is from a bug BTW, do you see this problem with both extent & inodebits locks? > (fixed in 1.8.5 I think, not sure if it was ported to 2.x) that calculated > the wrong time when printing this error m

Re: [Lustre-discuss] Too many client eviction

2011-05-04 Thread DEGREMONT Aurelien
Hello Andreas Dilger a écrit : > On May 3, 2011, at 13:41, Nathan Rutman wrote: > >> On May 3, 2011, at 10:09 AM, DEGREMONT Aurelien wrote: >> >>> Correct me if I'm wrong, but when I'm looking at Lustre manual, it said >>> that client is adapting its timeout, but not the server. I'm under

Re: [Lustre-discuss] Too many client eviction

2011-05-04 Thread DEGREMONT Aurelien
Nathan Rutman a écrit : > On May 3, 2011, at 10:09 AM, DEGREMONT Aurelien wrote: > > Server and client cooperate together for the adaptive timeouts. Ok they cooperate, the client will change its timeout through this cooperation, but will also do the same ? If yes, obd_timeout and ldlm_timeout are