Re: [Lustre-discuss] Virtualization and Lustre

2010-05-21 Thread Sebastian Reitenbach
Hi, On Friday 21 May 2010 05:03:20 am Tyler Hawes wrote: Has there been any testing or conclusions regarding the use of virtualization and Lustre, or is this even possible considering how Lustre is coded? I've gotten used to the idea of virtualization for all our other servers, where it is

Re: [Lustre-discuss] Future of lustre 1.8.3+(Debian and SLES/RH kernels)

2010-05-21 Thread Ramiro Alba Queipo
Hi Andreas, On Thu, 2010-05-20 at 14:45 -0600, Andreas Dilger wrote: On Thu, 2010-05-20 at 10:16 -0600, Andreas Dilger wrote: The SLES11 kernel is at 2.6.27 so it could be usable for this. Also, I Ok, I am getting

Re: [Lustre-discuss] Best way to recover an OST

2010-05-21 Thread Andreas Dilger
On 2010-05-20, at 20:25, Mervini, Joseph A wrote: We encountered a multi-disk failure on one of our mdadm RAID6 8+2 OSTs. 2 drives failed in the array within the space of a couple of hours and were replaced. I guess the need for +3 parity is closer than we think... Fortunately I am able

Re: [Lustre-discuss] Future of lustre 1.8.3+

2010-05-21 Thread Christopher Huhn
Hi Ramiro, Ramiro Alba Queipo wrote: On Thu, 2010-05-20 at 10:16 -0600, Andreas Dilger wrote: The SLES11 kernel is at 2.6.27 so it could be usable for this. Also, I Ok, I am getting http://downloads.lustre.org/public/kernels/sles11/linux-2.6.27.39-0.3.1.tar.bz2 but, please. Where

Re: [Lustre-discuss] MGS Nids

2010-05-21 Thread leen smit
Ok. I started from scratch, using your kind replies as a guide line. Yet, still no fail over when brining down the first MGS. Below are the steps I've taken to setup, hopefully some one here can spot my err. I got rid of keepalived and drbd (was this wise? or should I keep this for the MGS/MDT

Re: [Lustre-discuss] MGS Nids

2010-05-21 Thread Gabriele Paciucci
Hi, be carefoul with LVM, you should import and export the volume when you try to mount from one machine to an other please refer to: http://kbase.redhat.com/faq/docs/DOC-4124 On 05/21/2010 11:57 AM, leen smit wrote: Ok. I started from scratch, using your kind replies as a guide line.

Re: [Lustre-discuss] Future of lustre 1.8.3+

2010-05-21 Thread Ramiro Alba Queipo
Hi Cristopher: I've had a look at the config file and found a lot of differences with the one available on lustre source. Especially I saw: # CONFIG_RESOURCE_COUNTERS is not set CONFIG_SYSFS_DEPRECATED=y CONFIG_SYSFS_DEPRECATED_V2=y which is not going to work with de udev version on Ubuntu

Re: [Lustre-discuss] MGS Nids

2010-05-21 Thread leen smit
Wouldn't it be easier then to use brdb on the msg disk, so you dont have to move the lvm over to a new node? On 05/21/2010 12:14 PM, Gabriele Paciucci wrote: Hi, be carefoul with LVM, you should import and export the volume when you try to mount from one machine to an other please

[Lustre-discuss] Recovery Problem

2010-05-21 Thread Stefano Elmopi
Hi, I realized that the time server differed much across machines, there were at least a few hours of difference. I'm doing the tests and have not been paying attention to time synchronization but now I have aligned the time of all servers and I've configured ntpd service and the problem

Re: [Lustre-discuss] Recovery Problem

2010-05-21 Thread Johann Lombardi
On Fri, May 21, 2010 at 01:49:41PM +0200, Stefano Elmopi wrote: I realized that the time server differed much across machines, there were at least a few hours of difference. I'm doing the tests and have not been paying attention to time synchronization but now I have aligned the time of all

Re: [Lustre-discuss] Future of lustre 1.8.3+/Debian support

2010-05-21 Thread Christopher Huhn
Hi Ramiro, Ramiro Alba Queipo wrote: By the way are you using Debian and Lustre? If positive, what is your feeling? Right now we are running Lustre 1.6.7.2 on Debian Etch. The current size is net 1 PB (100+ OSS servers, gross 1.6 PB) and roughly 500 number crunchers running Debian Etch

Re: [Lustre-discuss] Modifying Lustre network (good practices)

2010-05-21 Thread Olivier Hargoaa
Johann Lombardi a écrit : Hi Olivier, On Thu, May 20, 2010 at 07:12:45PM +0200, Olivier Hargoaa wrote: But you couldn't know but we already ran lnet self test unsuccessfully. I wrote results as answer to Brian. ok. To get back to your original question: Currently Lustre network is

Re: [Lustre-discuss] Future of lustre 1.8.3+/Debian support

2010-05-21 Thread Andreas Dilger
On 2010-05-21, at 6:34, Christopher Huhn c.h...@gsi.de wrote: What worries us is that the Lustre server patches do not appear to progress towards integration into the mainline kernel but rather away from it, which makes porting to Debian (and up-to-date kernels in general) more and more

Re: [Lustre-discuss] Recovery Problem

2010-05-21 Thread Andreas Dilger
On 2010-05-21, at 5:49, Stefano Elmopi stefano.elm...@sociale.it wrote: I realized that the time server differed much across machines, there were at least a few hours of difference. I'm doing the tests and have not been paying attention to time synchronization but now I have aligned the

[Lustre-discuss] sync_journal

2010-05-21 Thread Kit Westneat
Hello, I was wondering if Oracle has a recommended way to set the sync_journal parameter. I haven't been able to figure out how to set it with conf_param, so I'm assuming it's one of the special proc files that doesn't get a conf_param equivalent. Right now, running a cron job to set it seems

Re: [Lustre-discuss] Clients getting incorrect network information for one of two MDT servers (active/passive)

2010-05-21 Thread McKee, Shawn
Hi Everyone, I never got any reply or suggestions from this one. We are still having the issue. Summarizing: the clients get the wrong address for the MDS when our LMD01 node is running the service. If LMD02 (the active/passive HA partner to LMD01) runs as the MDS things work. Some

Re: [Lustre-discuss] Clients getting incorrect network information for one of two MDT servers (active/passive)

2010-05-21 Thread Daniel Kobras
Hi! On Fri, May 21, 2010 at 11:54:56AM -0400, McKee, Shawn wrote: Parameters: mgsnode=10.10.1@tcp,192.41.230@tcp1,141.211.101@tcp2 failover.node=10.10.1...@tcp,192.41.230...@tcp1 Notice there is no reference to 192.41.230...@tcp anywhere here. Lustre MDS and OSS nodes

Re: [Lustre-discuss] Clients getting incorrect network information for one of two MDT servers (active/passive)

2010-05-21 Thread McKee, Shawn
Thanks Daniel, That indeed seems to be our problem. We are following the process documented in section 4.3.12 of the Lustre 1.8 Operations Manual and that should fix us up. Many thanks for the solution, Shawn -Original Message- From: lustre-discuss-boun...@lists.lustre.org

Re: [Lustre-discuss] sync_journal

2010-05-21 Thread Andreas Dilger
On 2010-05-21, at 09:09, Kit Westneat wrote: I was wondering if Oracle has a recommended way to set the sync_journal parameter. I haven't been able to figure out how to set it with conf_param, so I'm assuming it's one of the special proc files that doesn't get a conf_param equivalent. Right

Re: [Lustre-discuss] sync_journal

2010-05-21 Thread Kit Westneat
I'm surprised you can't set this via conf_param. What parameter name did you try? Also, in bugzilla Nathan has patches to change conf_param syntax to match the {get,set,list}_param syntax so that it is easier to set permanent tunables. I'd think (without having tried it) that: lctl

Re: [Lustre-discuss] sync_journal

2010-05-21 Thread Andreas Dilger
On 2010-05-21, at 12:35, Kit Westneat wrote: I'm surprised you can't set this via conf_param. What parameter name did you try? Also, in bugzilla Nathan has patches to change conf_param syntax to match the {get,set,list}_param syntax so that it is easier to set permanent tunables. I'd