Re: [Lustre-discuss] software raid

2011-03-28 Thread Lundgren, Andrew
I have done both SW and HW raid across with OSTs and MDTs. As part of your choice, look into what happens when you have to replace a failed disk in a sw configuration. My negatives for sw raid are all management at this point. When you pull a bad disk out of a linux box (/dev/sde for example)

Re: [Lustre-discuss] software raid

2011-03-28 Thread Lundgren, Andrew
it. -Original Message- From: Oleg Drokin [mailto:gr...@whamcloud.com] Sent: Monday, March 28, 2011 3:57 PM To: Lundgren, Andrew Cc: Brian O'Connor; lustre-discuss@lists.lustre.org Subject: Re: [Lustre-discuss] software raid Hello! On Mar 28, 2011, at 4:43 PM, Lundgren, Andrew wrote: When you

Re: [Lustre-discuss] controlling which eth interface lustre uses

2010-10-21 Thread Lundgren, Andrew
Just as a FYI, you can set most of the bonding options in the ifcfg-bond0 file. IE: BONDING_OPTS=arp_ip_target=10.248.58.254 arp_interval=500 mode=active-backup primary=eth0 Then your modprobe.conf only needs: alias bond0 bonding -Original Message- From:

[Lustre-discuss] 1.8.4 runs on 5.5 and 2.0.0 runs on 5.4?

2010-08-27 Thread Lundgren, Andrew
Are the release notes correct where they state that Lustre 1.8.4 runs on RHEL 5.5 and Lustre 2.0.0 runs on RHEL 5.4? Does that mean that there is no upgrade path from 1.8.4 to 2.0.0? -- Andrew ___ Lustre-discuss mailing list

Re: [Lustre-discuss] Adding OST to online Lustre with quota

2010-07-08 Thread Lundgren, Andrew
Can you explain how to do this, or provide sample files that we can use as well as where to place them? -Original Message- It should be possible to automatically create these quota files the first time that a new OST is mounted, since we know at that point that the filesystem is empty

Re: [Lustre-discuss] Group descriptors corrupted

2010-05-20 Thread Lundgren, Andrew
I just went though something similar. When your fsck completes you may be left with things moved to your lost+found. If that happens, you can mount the file system using -t ldiskfs and run the ll_recover_lost_found_objs against the lost+found directory. -- Andrew -Original Message-

Re: [Lustre-discuss] Lost Files - How to remove from MDT

2010-04-19 Thread Lundgren, Andrew
I was also going to recommend the unlink. We have had to do this as well, the unlink worked for us. It did need to be run with privileges for the file. (root in our case.) -- Andrew -Original Message- From: lustre-discuss-boun...@lists.lustre.org

Re: [Lustre-discuss] Permanently delete OST

2010-01-29 Thread Lundgren, Andrew
I went back with our guys internally and reviewed the --perm parameter. It turns out that is a parameter that was put into our wrapper scripts to simplify the process. We are doing it today, but Brian is correct that the perm parameter doesn't exist in the luster software. If needed, I can

Re: [Lustre-discuss] Permanently delete OST

2010-01-26 Thread Lundgren, Andrew
On Mon, 2010-01-25 at 11:38 -0700, Lundgren, Andrew wrote: Level 3 requested this feature be developed in 1.6. The after Sun did some work for us, the following is the procedure that we have setup for usage. IIRC the functionality was enabled in 1.6.7. We have tested it in 1.8.0. Do you

Re: [Lustre-discuss] Permanently delete OST

2010-01-25 Thread Lundgren, Andrew
: Friday, January 22, 2010 2:53 PM To: lustre-discuss@lists.lustre.org Subject: Re: [Lustre-discuss] Permanently delete OST On Fri, 2010-01-22 at 14:45 -0700, Lundgren, Andrew wrote: I might be mistaken, but I thought this exact feature was added back in 1.6.x Which feature exactly

Re: [Lustre-discuss] Permanently delete OST

2010-01-22 Thread Lundgren, Andrew
I might be mistaken, but I thought this exact feature was added back in 1.6.x -Original Message- From: lustre-discuss-boun...@lists.lustre.org [mailto:lustre-discuss-boun...@lists.lustre.org] On Behalf Of Brian J. Murrell Sent: Friday, January 22, 2010 2:18 PM To:

[Lustre-discuss] Oracle Linux instead of SUSE for 2.0?

2009-12-03 Thread Lundgren, Andrew
I noticed in the announcement that SUSE isn't support as a server in this release? Oracle's distro seems to have taken SUSE's place. Is that going to be the case going forward? -- Andrew ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org

Re: [Lustre-discuss] Memory (?) problem with 1.8.1

2009-10-13 Thread Lundgren, Andrew
This sounds very much like a problem we saw before we changed the lru_size to a fixed size from dynamic. -- Andrew -Original Message- From: lustre-discuss-boun...@lists.lustre.org [mailto:lustre-discuss-boun...@lists.lustre.org] On Behalf Of David Simas Sent: Monday, October 12, 2009

Re: [Lustre-discuss] Is there a way to set lru_size and have it stick?

2009-10-13 Thread Lundgren, Andrew
Thanks Bernd. For a work around, we are doing a cron every 5 minutes for now to force it down after unmount/remounts. -- Andrew -Original Message- From: Bernd Schubert [mailto:bs_li...@aakef.fastmail.fm] Sent: Tuesday, October 13, 2009 4:15 AM To: Andreas Dilger Cc: Lundgren, Andrew

Re: [Lustre-discuss] Is there a way to set lru_size and have it stick?

2009-10-12 Thread Lundgren, Andrew
specifying the wrong setting? -Original Message- From: Bernd Schubert [mailto:bs_li...@aakef.fastmail.fm] Sent: Monday, October 12, 2009 11:21 AM To: lustre-discuss@lists.lustre.org Cc: Andreas Dilger; Lundgren, Andrew Subject: Re: [Lustre-discuss] Is there a way to set lru_size and have

[Lustre-discuss] Is there a way to set lru_size and have it stick?

2009-10-08 Thread Lundgren, Andrew
Is there a way to set the lru_size to a fixed value and have it stay that way across mounts? I know it can be set using: $ lctl set_param ldlm.namespaces.*osc*.lru_size=$((NR_CPU*100)) But that isn't retained across a reboot. Thank you. -- Andrew ___

[Lustre-discuss] 1.8.0 Loosing connection to the MDT for several minutes and then recovering.

2009-10-06 Thread Lundgren, Andrew
We have a few 1.8.0 clusters running. We have seen multiple instances now where the clients loose connectivity to the MDT. The MDS logs indicate that there is some sort of problem on the MDT. The following is a typical output: Oct 6 02:56:08 mint1502 kernel: LustreError:

Re: [Lustre-discuss] 1.8.0 Loosing connection to the MDT for several minutes and then recovering.

2009-10-06 Thread Lundgren, Andrew
Oh man, that should have read LOSING! From: lustre-discuss-boun...@lists.lustre.org [mailto:lustre-discuss-boun...@lists.lustre.org] On Behalf Of Lundgren, Andrew Sent: Tuesday, October 06, 2009 11:14 AM To: lustre-discuss Subject: [Lustre-discuss] 1.8.0 Loosing connection to the MDT for several

Re: [Lustre-discuss] Lustre error

2009-09-24 Thread Lundgren, Andrew
I don’t believe that you need to install the client RPMs when you install the server. I wouldn’t force install them. The client will mount with just the server RPMs. We have seen cases when clients will cause you to have to reboot a machine because of bugs or other issues. When that

Re: [Lustre-discuss] WARNING: data corruption issue found in 1.8.x releases

2009-09-09 Thread Lundgren, Andrew
Does this need to be run on EACH OSS? Is there a central way to do it on the MDS? You recommend disabling the read and the write as the settings indicate or just the read as the text indicates? -Original Message- A patch is under testing and will be included in 1.8.1.1. Until 1.8.1.1

Re: [Lustre-discuss] failover software - heartbeat

2009-07-13 Thread Lundgren, Andrew
It is very difficult to find relevant documentation for heartbeat 1/2. I just finished configuring a heartbeat system and would not recommend it because of the documentation. (They seem to have removed portions the heartbeat documentation from the site.) Pacemaker is not a simple solution

Re: [Lustre-discuss] failover software - heartbeat

2009-07-13 Thread Lundgren, Andrew
. -- Andrew -Original Message- From: Jim Garlick [mailto:garl...@llnl.gov] Sent: Monday, July 13, 2009 2:21 PM To: Lundgren, Andrew Cc: Carlos Santana; lustre-discuss@lists.lustre.org Subject: Re: [Lustre-discuss] failover software - heartbeat We recently put heartbeat v1 in production

Re: [Lustre-discuss] failover software - heartbeat

2009-07-13 Thread Lundgren, Andrew
Are you doing anything if the network fails to one mds? How about if your fiber path fails? -Original Message- From: Jim Garlick [mailto:garl...@llnl.gov] Sent: Monday, July 13, 2009 2:39 PM To: Lundgren, Andrew Cc: Carlos Santana; lustre-discuss@lists.lustre.org Subject: Re

Re: [Lustre-discuss] lustre using wrong network

2009-06-19 Thread Lundgren, Andrew
Look at doing a --writeconf with tunefs.lustre on all of your OSTs and MDTs. Then remount them with the correct settings. -Original Message- From: lustre-discuss-boun...@lists.lustre.org [mailto:lustre-discuss- boun...@lists.lustre.org] On Behalf Of Michael Di Domenico Sent:

[Lustre-discuss] Anyone using heartbeat and pingd to detect failed networks?

2009-06-18 Thread Lundgren, Andrew
I am trying to configure pingd in heartbeat 2.1.3 (comes with RHEL 5) to shift my MDS to its peer when the MDS is no longer able to ping its gateway. I haven't been able to get the syntax in the XML correct to get it to shift when pingd detects the problem. Anyone else using it? -- Andrew

[Lustre-discuss] Newest Diagnostics?

2009-04-13 Thread Lundgren, Andrew
What is the current version of the lustre-diagnostics rpm? The newest one I have been able to find is lustre-diagnostics-1.4-cfs1.noarch.rpm from here: http://downloads.lustre.org/public/tools/lustre-diagnostics/ Is there a new one or is that deprecated? -- Andrew

Re: [Lustre-discuss] OSS Cache Size for read optimization

2009-04-03 Thread Lundgren, Andrew
The parameter is called dirty, is that write cache, or is it read-write? Current Lustre does not cache on OSTs at all. All IO is direct. Future Lustre releases will provide an OST cache. For now, you can increase the amount of data cached on clients, which might help a little. Client

Re: [Lustre-discuss] Lustre 1.6.7 kernel panics when umounting MDTs

2009-04-02 Thread Lundgren, Andrew
Has this problem only been seen in 1.6.7? We are currently running on 1.6.4.3. -Original Message- This happened a day after upgrading to Lustre 1.6.7. We have since downgraded our servers to 1.6.6. ___ Lustre-discuss mailing list

Re: [Lustre-discuss] inode out of bounds issue in 1.6.7 (bug 18695)

2009-04-02 Thread Lundgren, Andrew
Will this be dropped as a 1.6.7.1 release or a later 1.6.8 release? -Original Message- From: lustre-discuss-boun...@lists.lustre.org [mailto:lustre-discuss- boun...@lists.lustre.org] On Behalf Of Johann Lombardi Sent: Thursday, April 02, 2009 3:13 PM To:

[Lustre-discuss] Log creation/deletion of files?

2009-01-15 Thread Lundgren, Andrew
Is there a way to enable logging of UID and host for creation/deletion of files/directories within the cluster? -- Andrew ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss

Re: [Lustre-discuss] Log creation/deletion of files?

2009-01-15 Thread Lundgren, Andrew
/deletion of files? On Thu, 2009-01-15 at 18:41 -0700, Lundgren, Andrew wrote: Is there a way to enable logging of UID and host for creation/deletion of files/directories within the cluster? The feature you are looking for is called audit logs. It used to exist on code branch for the Hendrix

[Lustre-discuss] Quiescent ost addition

2008-08-25 Thread Lundgren, Andrew
Having read though the lustre manual, it appears that to add additional OSTs to a cluster there should be no I/O. (Page 17.) I would like to confirm that this is no Reads as well as no writes? Do we need to disconnect all of our cleints? (No really possible to make sure that no one on does

Re: [Lustre-discuss] lustre client 1.6.5.1 hangs

2008-07-10 Thread Lundgren, Andrew
We are experiencing the same problem with 1.6.4.2. We thought it was the statahead problems. After turning off the statahead code, we experienced the same problem again. I had hoped going to 1.6.5 would resolve the issue. If you open a bug, would you mind sending the bug number to the list?

Re: [Lustre-discuss] lctl problem

2008-06-02 Thread Lundgren, Andrew
Is there a place where the networks options are thoroughly discussed? I have read though the manual and not found a lot on it. I would like to understand exactly what the different options and behaviors are. Thanks! -- Andrew -Original Message- From: [EMAIL PROTECTED]

Re: [Lustre-discuss] Performance Drop creating big files

2008-05-30 Thread Lundgren, Andrew
Are you suggesting he use machines with 40G of RAM to work with 10G files? We have many 800-900G files... I am not sure that is a realistic number. -- Andrew -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Kumaran Rajaram Sent: Friday, May 30,

Re: [Lustre-discuss] Performance Drop creating big files

2008-05-30 Thread Lundgren, Andrew
I get it. Thanks! -Original Message- From: Kumaran Rajaram [mailto:[EMAIL PROTECTED] Sent: Friday, May 30, 2008 3:07 PM To: Lundgren, Andrew Cc: Roger Spellman; lustre-discuss@lists.lustre.org Subject: RE: [Lustre-discuss] Performance Drop creating big files I meant use file

[Lustre-discuss] clients getting disconnected from MGS and OSS servers

2008-05-07 Thread Lundgren, Andrew
We seem to be having a problem with lustre 1.6.4.3 and clients getting disconnected. We currently have a situation where a box that just does maintenance work on the cluster (du/stats other work) has some directories it cannot enter. (The shell just hangs and doesn't timeout.) An lfs check

Re: [Lustre-discuss] how to replace a bad OST.

2008-03-17 Thread Lundgren, Andrew
unmounting MDT/MGS but im not sure . Cheers . - Original Message - From: Lundgren, Andrew To: '[EMAIL PROTECTED]' Sent: Monday, March 17, 2008 7:29 PM Subject: [Lustre-discuss] how to replace a bad OST. I am trying to learn how to replace

[Lustre-discuss] lustre cross IP network routing

2008-03-07 Thread Lundgren, Andrew
Is there any restriction that lustre nodes on TCP must be on the same IP subnets? Is there anything special that needs to be done to make a client on one network see an MGS/OSSes on another network? Thanks! -- Andrew ___ Lustre-discuss mailing list

Re: [Lustre-discuss] lustre cross IP network routing

2008-03-07 Thread Lundgren, Andrew
Thanks! I believe it is a router ACL issue. But I wanted to make sure this wasn't an issue. From: Klaus Steden [mailto:[EMAIL PROTECTED] Sent: Friday, March 07, 2008 3:51 PM To: Lundgren, Andrew; '[EMAIL PROTECTED]' Subject: Re: [Lustre-discuss] lustre cross IP

Re: [Lustre-discuss] e2fsprogs version recommended for Lustre 1.6.4.2

2008-01-25 Thread Lundgren, Andrew
I got them yesterday. Thank you. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Friday, January 25, 2008 12:57 AM To: Lundgren, Andrew Cc: Sébastien Buisson; [EMAIL PROTECTED] Subject: RE: [Lustre-discuss] e2fsprogs version recommended for Lustre