[Lustre-discuss] OST crash recovery problem

2008-08-13 Thread Heiko Schroeter
Hello again, any idea what can be done in such a case ? Regards Heiko Hello, after a crash (hardware failure) of an OST with two lustre partitions one partition (/dev/sdb) cannot be remounted after restart. The second (/dev/sdc) partition mounts fine. What needs to be done in such a case ? I

Re: [Lustre-discuss] OST disconnect messages on OSS

2008-08-13 Thread Alex Lee
>/ clip off one of the OSS: />/ />/ Aug 13 17:26:48 lustre-oss-0-1 kernel: LustreError: 137-5: UUID />/ 'lfs-OST0004_UUID' is not available for connect(no target) / I dont see any hardware issues. Could this be caused by extremely high load? I'm seeing up to 300 load on a dual processor quad

Re: [Lustre-discuss] mv_sata patch

2008-08-13 Thread Frank Leers
I just realized that I may not have answered your question, and I'm not sure if the patch is in the source posted at sun.com or not. If not, it is in the bug as an attachment - https://bugzilla.lustre.org/show_bug.cgi?id=14040 -frank On Aug 13, 2008, at 1:07 PM, Frank Leers wrote: On Aug 13

Re: [Lustre-discuss] mv_sata patch

2008-08-13 Thread Frank Leers
On Aug 13, 2008, at 12:38 PM, Brock Palen wrote: Is the cache patch for mv_sata noted in the sun paper on the x4500 available? Or has it been rolled into the source distributed by sun? What source are you referring to? It can be had here http://www.sun.com/servers/x64/x4500/downloads.jsp T

[Lustre-discuss] mv_sata patch

2008-08-13 Thread Brock Palen
Is the cache patch for mv_sata noted in the sun paper on the x4500 available? Or has it been rolled into the source distributed by sun? Trying to avoid data loss. Brock Palen www.umich.edu/~brockp Center for Advanced Computing [EMAIL PROTECTED] (734)936-1985 ___

[Lustre-discuss] OST disconnect messages on OSS

2008-08-13 Thread Alex Lee
The disks are DDN S2A 9550. I didnt see any issues on the hardware side. And my IO tests finished fine which was using all of the OST. Could long IO wait cause an error like that? I'v seen the OSS load hit 300 on a IO stress test. -Alex ___ Lustre-d

Re: [Lustre-discuss] OST disconnect messages on OSS

2008-08-13 Thread Brian J. Murrell
On Wed, 2008-08-13 at 22:39 +0900, Alex Lee wrote: > I have a system thats been spitting out OST disconnect messages under > heavy load. I'm guessing the OST eventually reconnects. > I want to say this happens when the OSS is extremely overloaded but I > did notice this happening even under light

[Lustre-discuss] OST disconnect messages on OSS

2008-08-13 Thread Alex Lee
I have a system thats been spitting out OST disconnect messages under heavy load. I'm guessing the OST eventually reconnects. I want to say this happens when the OSS is extremely overloaded but I did notice this happening even under light load. Only the OSS seems to spit out any error messages.

Re: [Lustre-discuss] SAN, shared storage, iscsi using lustre?

2008-08-13 Thread Brian J. Murrell
On Wed, 2008-08-13 at 11:55 +0300, Alex wrote: > Hello Brian, Hi. > Thanks for your prompt reply... See my comments inline.. NP. > i have a cluster with 3 servers (let say they are web servers > for simplicity). All web servers are serving the same content from a shared > storage volume mount

[Lustre-discuss] OST crash recovery problem

2008-08-13 Thread Heiko Schroeter
Hello, after a crash (hardware failure) of an OST with two lustre partitions one partition (/dev/sdb) cannot be remounted after restart. The second (/dev/sdc) partition mounts fine. What needs to be done in such a case ? I tried to move the mountpoint because of the "file exists" message but tha

Re: [Lustre-discuss] SAN, shared storage, iscsi using lustre?

2008-08-13 Thread Alex
Hello Brian, Thanks for your prompt reply... See my comments inline.. > Right. So you have 8 iscsi disks. Yes, let simplify our test environment. - i have 2 lvs routers (one active router and one backup router used for failover) to balance connections through our servers located behind them. -