Re: [Lustre-discuss] Can Lustre be built on old IBM pSeries platform?

2008-03-04 Thread Andreas Dilger
On Mar 03, 2008 02:19 -0800, gas5x1 wrote: > I'm trying to access my Lustre (1.6.4) filesystem from an old IBM > pSeries cluster(the Linux is SLES9 for PPC64). I've built and started > the patched kernel 2.6.18 and tried to compile the latest Lustre > tarball, 1.6.4.2. It does configure, but dies

Re: [Lustre-discuss] files/directories are temporarily unavailable on patchless clients

2008-03-04 Thread Andreas Dilger
On Mar 05, 2008 01:19 +0100, Harald van Pee wrote: > On Wednesday 05 March 2008 01:06 am, Andreas Dilger wrote: > > On Mar 04, 2008 19:52 +0100, Harald van Pee wrote: > > > I have updated all clients to patched version 1.6.1, the servers still > > > are 1.6.0.1. No lustre related error message o

[Lustre-discuss] Lustre on IBM pSeries PPC64

2008-03-04 Thread Grigory Shamov
Dear Lustre Developers, Could you please advice whether it is possible to build Lustre client for IBM pSeries PPC64 or not, and if yes -- how to build it from sources? I'm using SLES9 for PPC. Thank you -- WBR, Grigory Shamov ___ Lustre-discuss mail

[Lustre-discuss] Installing Lustre on PowerPC (IBM pSeries)

2008-03-04 Thread gas5x1
Dear All, Could you please advice me, how, if at all passible, is to install Lustre on IBM PPC64? I have already Lustre 1.6 installation working for Intel i386 and AMD Opteron nodes, and now would like to acess it from IBM clients. Thank you very much in advance! -- Grigory Shamov Kazan Science

[Lustre-discuss] Can Lustre be built on old IBM pSeries platform?

2008-03-04 Thread gas5x1
Dear All, I'm trying to access my Lustre (1.6.4) filesystem from an old IBM pSeries cluster(the Linux is SLES9 for PPC64). I've built and started the patched kernel 2.6.18 and tried to compile the latest Lustre tarball, 1.6.4.2. It does configure, but dies at compile time saying following: ==

Re: [Lustre-discuss] Cannot send after transport endpoint shutdown (-108)

2008-03-04 Thread Craig Prescott
Hi Aaron; As Charlie mentioned, we have 400 clients and a timeout value of 1000 is "enough" for us. How many clients do you have? If it is more than 400, or the ratio of your o2ib/tcp clients is not like ours (80/20), you may need a bigger value. Also, we have observed that occassionally we se

Re: [Lustre-discuss] files/directories are temporarily unavailable on patchless clients

2008-03-04 Thread Harald van Pee
On Wednesday 05 March 2008 01:06 am, Andreas Dilger wrote: > On Mar 04, 2008 19:52 +0100, Harald van Pee wrote: > > I have updated all clients to patched version 1.6.1, the servers still > > are 1.6.0.1. No lustre related error message occured since (2 weeks). > > > > I think its reasonable (nece

Re: [Lustre-discuss] files/directories are temporarily unavailable on patchless clients

2008-03-04 Thread Andreas Dilger
On Mar 04, 2008 19:52 +0100, Harald van Pee wrote: > I have updated all clients to patched version 1.6.1, the servers still are > 1.6.0.1. No lustre related error message occured since (2 weeks). > > I think its reasonable (necessary?) to e2fsck all osts and the mdt? > The mdt resides on an drb

[Lustre-discuss] New Linux HPC Software Stack email list

2008-03-04 Thread Linda Bebernes
The Linux HPC Software Stack is an integrated software stack for Linux-based HPC solutions based on Sun HPC hardware. You are invited to join a conversation with the Linux HPC Software Stack development team and others interested in this topic by signing up for the following email list: [EMAIL

Re: [Lustre-discuss] Cannot send after transport endpoint shutdown (-108)

2008-03-04 Thread Aaron Knister
I made this change and clients are still being evicted. This is very frustrating. It happens over tcp and infiniband. My timeout is 1000. Anybody know why don't the clients reconnect? On Mar 4, 2008, at 3:55 PM, Aaron S. Knister wrote: I think I tried that before and it didn't help, but I wi

Re: [Lustre-discuss] Cannot send after transport endpoint shutdown (-108)

2008-03-04 Thread Aaron S. Knister
I think I tried that before and it didn't help, but I will try it again. Thanks for the suggestion. -Aaron - Original Message - From: "Charles Taylor" <[EMAIL PROTECTED]> To: "Aaron S. Knister" <[EMAIL PROTECTED]> Cc: "lustre-discuss" <[EMAIL PROTECTED]>, "Thomas Wakefield" <[EMAIL

Re: [Lustre-discuss] Cannot send after transport endpoint shutdown (-108)

2008-03-04 Thread Charles Taylor
We've seen this before as well.Our experience is that the obd_timeout is far too small for large clusters (ours is 400+ nodes) and the only way we avoid these errors is by setting it to 1000 which seems high to us but appears to work and puts an end to the transport endpoint shutdown

Re: [Lustre-discuss] Root file system access though telnet

2008-03-04 Thread Aaron S. Knister
Try the linux commands "ls" "cd". They'll let you list files and change directories. - Original Message - From: "ashok bharat bayana" <[EMAIL PROTECTED]> To: lustre-discuss@lists.lustre.org, lustre-discuss@lists.lustre.org Sent: Monday, March 3, 2008 1:37:02 AM GMT -05:00 US/Canada Ea

[Lustre-discuss] Cannot send after transport endpoint shutdown (-108)

2008-03-04 Thread Aaron S. Knister
This morning I've had both my infiniband and tcp lustre clients hiccup. They are evicted from the server presumably as a result of their high load and consequent timeouts. My question is- why don't the clients re-connect. The infiniband and tcp clients both give the following message when I type

Re: [Lustre-discuss] OST balancing (1.6) question

2008-03-04 Thread Andreas Dilger
On Mar 04, 2008 09:18 +, Daire Byrne wrote: > We have a mix of 1.4 and 1.6 clients and so when we recently expanded > one of our filesystems... > My question is if we have 1.6 clients will they try to balance OST space > no matter what or do they have to mount the filesystem using the "mgs" >

[Lustre-discuss] upgraded REHEL5.1 problem with lustre rpm's

2008-03-04 Thread Jonathan Meagher
I've just upgrade the kernel on my RedHat machine to 2.6.18-53.1.13.el5, how to migrate over the lustre 1.6.4.2. rpm's that I had installed with the old kernel. I tried to erase and re-install them but that didn't work.. Thanks Jonathan ___ Lustre-discu

Re: [Lustre-discuss] files/directories are temporarily unavailable on patchless clients

2008-03-04 Thread Harald van Pee
Hi, I have updated all clients to patched version 1.6.1, the servers still are 1.6.0.1. No lustre related error message occured since (2 weeks). I think its reasonable (necessary?) to e2fsck all osts and the mdt? The mdt resides on an drbd device configured as failover. I now have the followin

Re: [Lustre-discuss] Latest RHEL kernel: won't make mgs module, and ko2iblnd not built for right OFED modules

2008-03-04 Thread Chris Worley
On Fri, Feb 29, 2008 at 1:49 PM, Canon, Richard Shane <[EMAIL PROTECTED]> wrote: > > Chris, > > Try using /usr/local/ofed/current/src/ofa_kernel/ instead of the version > specific one. We were seeing Oops when compiling against the version > specific tree. I've been meaning to post to the lis

Re: [Lustre-discuss] ko2iblnd panics in kiblnd_map_tx_descs

2008-03-04 Thread Canon, Richard Shane
Chris, Which headers did you point lustre to? Try using the non-version one (ie ofa_kernel not ofa_kernel-1.2.5.5). Shane - Original Message - From: [EMAIL PROTECTED] <[EMAIL PROTECTED]> To: lustre-discuss@lists.lustre.org Sent: Tue Mar 04 11:06:48 2008 Subject: [Lustre-discuss] ko

[Lustre-discuss] ko2iblnd panics in kiblnd_map_tx_descs

2008-03-04 Thread Chris Worley
I'm trying to port Lustre 1.6.4.2 to OFED 1.2.5.5 w/ the RHEL kernel 2.6.9.67.0.4. ksocklnd-based mounts work fine, but when I try to mount over IB, I get a panic in ko2iblnd in the transmit descriptor mapping routine: general protection fault: [1] SMP CPU 1 Modules linked in: ko2iblnd(U) pt

Re: [Lustre-discuss] lustre and small files overhead

2008-03-04 Thread Joe Barjo
Andreas Dilger a écrit : > Joe Barjo wrote: > >> Turning off debuging made it much better. >> It went from 1m54 down to 25 seconds, but still 85% of system processing... >> I really think you should turn off debuging by default, or make it appear >> as a BIG warning me

[Lustre-discuss] OST balancing (1.6) question

2008-03-04 Thread Daire Byrne
Hi, We have a mix of 1.4 and 1.6 clients and so when we recently expanded one of our filesystems we had to downgrade to 1.4 on the servers and do a write_conf to record the new config. After upgrading the servers back to 1.6 both 1.4 and 1.6 clients could see the new OSS fine. My question is if