Re: [Lustre-discuss] Recommended failover software for Lustre

2012-07-16 Thread Cliff White
Thanks, we've created http://jira.whamcloud.com/browse/LUDOC-69 to track the fixes to the manual. cliffw On Mon, Jul 16, 2012 at 4:23 AM, Christopher J.Walker c.j.wal...@qmul.ac.uk wrote: The configuring failover section in the Whamcloud release of the Lustre manual seems rather out of date:

Re: [Lustre-discuss] Seeking LNET router recomendations

2012-03-30 Thread Cliff White
DuPont WA 98327 *bob.ha...@intel.com mallick.arigap...@intel.com* ** ** *From:* Cliff White [mailto:cli...@whamcloud.com] *Sent:* Wednesday, March 21, 2012 9:28 AM *To:* Hayes, Bob *Cc:* lustre-discuss@lists.lustre.org *Subject:* Re: [Lustre-discuss] Seeking LNET

Re: [Lustre-discuss] Seeking LNET router recomendations

2012-03-21 Thread Cliff White
- An OSS really can't be a router, an OSS is an endpoint. Topologically, it shouldn't work, you should re-think network layout. - routing does place a load on the system, nodes doing routing should be dedicated to routing. - Load depends on traffic, basically you would have two hardware network

Re: [Lustre-discuss] Seeking LNET router recomendations

2012-03-21 Thread Cliff White
Or to put it another way, if your OSS systems can already 'see' both IB and IPoIB networks the most cost effective, high performance solution would be to add the necessary interface and put your MDS/MGS on both networks also. No need for routers, no performance impact. cliffw On Tue, Mar 13,

Re: [Lustre-discuss] Lustre 1.8.7 - Setup prototype in Research field - STUCK !

2012-02-03 Thread Cliff White
You should download from the Whamcloud download site, for a start: http://downloads.whamcloud.com/public/lustre/ Typically, the Lustre server does nothing but run Lustre. For that reason there is generally little risk from using our current version on the server platforms. If your clients require

Re: [Lustre-discuss] two multi-homed cluster

2011-12-16 Thread Cliff White
You can do this, simply define networks for both devices. Assuming ib0, and eth0, you would have options lnet networks=tcp0(eth0),o2ib0(ib0) The IB clients will mount using a @o2ib0 NID, and the ethernet clients will mount using @tcp0 NIDs. Since you are explicitly specifying the network, the hop

Re: [Lustre-discuss] Log Files Skipped nn previous similar messages

2011-11-23 Thread Cliff White
It means errors have occurred which are duplicates of the displayed message, we limit messages in that case to reduce system log traffic. If a different error occurs that message will be displayed. cliffw On Wed, Nov 23, 2011 at 11:41 AM, Lucia M. Walle lucia.wa...@cornell.eduwrote: Hello, I'm

Re: [Lustre-discuss] EXTERNAL: Re: Unable to write to the Lustre File System as any user except root

2011-10-12 Thread Cliff White
If you are not using LDAP, etc, then the user's information must be in the MDS's password files. Users must be known to the MDS. cliffw On Wed, Oct 12, 2011 at 6:37 AM, Barberi, Carl E carl.e.barb...@lmco.comwrote: Thank you Kilian. However, I was just able to perform Lustre operations as

Re: [Lustre-discuss] quilt messages on CentOS 5.7 x86_64

2011-09-28 Thread Cliff White
Sadly, it is not so much unsafe to continue as impossible - the patches that failed were reverted, so you don't have a properly patched source. At this point, you would have to walk through the Lustre patches and fix each place where there is a FAIL. The fixes themselves are usually trivial, but

Re: [Lustre-discuss] Hotspots

2011-09-21 Thread Cliff White
Well, this is why Lustre uses striping - however if your file is very small, it will be located on one stripe only, and at that point it's limited by hardware. In current Lustre (1.8.6-wc1) you can enable caching on the OSS which may help. cliffw On Wed, Sep 21, 2011 at 1:16 PM, Michael Di

Re: [Lustre-discuss] how to add force_over_8tb to MDS

2011-07-14 Thread Cliff White
--writeconf will erase parameters set via lctl conf_param, and will erase pools definitions. It will also allow you to set rather silly parameters that can prevent your filesystem from starting, such as incorrect server NIDs or incorrect failover NIDs. For this reason (and from a history of

Re: [Lustre-discuss] how to add force_over_8tb to MDS

2011-07-14 Thread Cliff White
This error message you are seeing is what Andreas was talking about - you must use the ext4-based version, as you will not need any option with your size LUNS. The 'must use force_over_8tb' error is the key here, you most certainly want/need to *.ext4.rpm versions of stuff. cliffw On Thu, Jul

Re: [Lustre-discuss] Fwd: Lustre performance issue (obdfilter_survey

2011-07-06 Thread Cliff White
The case=network part of obdfilter_survey has really been replaced by lnet_selftest. I don't think it's been maintained in awhile. It would be best to repeat the network-only test with lnet_selftest, this is likely an issue with the script. cliffw On Wed, Jul 6, 2011 at 1:04 PM, lior amar

Re: [Lustre-discuss] Need help

2011-07-01 Thread Cliff White
Did you also install the correct e2fsprogs? cliffw On Fri, Jul 1, 2011 at 5:45 PM, Mervini, Joseph A jame...@sandia.govwrote: Hi, I just upgraded our servers from RHEL 5.4 - RHEL 5.5 and went from lustre 1.8.3 to 1.8.5. Now when I try to mount the OSTs I'm getting: [root@aoss1 ~]# mount

Re: [Lustre-discuss] What exactly is punch statistic?

2011-06-16 Thread Cliff White
It is called when truncating a file - afaik it is showing you the number of truncates, more or less. cliffw On Thu, Jun 16, 2011 at 10:52 AM, Mervini, Joseph A jame...@sandia.govwrote: Hi, I have been covertly trying for a long time to find out what punch means as far a lustre llobdstat

Re: [Lustre-discuss] Enabling mds failover after filesystem creation

2011-06-14 Thread Cliff White
It depends - are you using a combined MGS/MDS? If so, you will have to update the mgsnid on all servers to reflect the failover node, plus change the client mount string to show the failover node. otherwise, it's the same procedure as with an OST. cliffw On Tue, Jun 14, 2011 at 12:06 PM, Jeff

Re: [Lustre-discuss] Enabling mds failover after filesystem creation

2011-06-14 Thread Cliff White
between the two nodes. On 6/14/11 12:12 PM, Cliff White wrote: It depends - are you using a combined MGS/MDS? If so, you will have to update the mgsnid on all servers to reflect the failover node, plus change the client mount string to show the failover node. otherwise, it's the same

Re: [Lustre-discuss] Has anyone built 1.8.5 on Centos 5.6?

2011-06-01 Thread Cliff White
Actually, we have 2.6.18-238 in testing for Lustre 1.8.6 release, builds fine, you can get RPMS/SRPMS here: http://newbuild.whamcloud.com/job/lustre-b1_8/lastSuccessfulBuild/ including the server kernels. or source from our git repo cliffw On Wed, Jun 1, 2011 at 1:16 AM, Joe Landman

Re: [Lustre-discuss] LNET routing question

2011-04-04 Thread Cliff White
On Mon, Apr 4, 2011 at 1:32 PM, David Noriega tsk...@my.utsa.edu wrote: Reading up on LNET routing and have a question. Currently have nothing special going on, simply specified tcp0(bond0) on the OSSs and MDS. Same for all the clients as well, we have an internal network for our cluster,

Re: [Lustre-discuss] MDT extremely slow after restart

2011-04-03 Thread Cliff White
What is the underlying disk, did that hardware/RAID config change when you switched hardware? The 'still busy' message is a bug, may be fixed in 1.8.5 cliffw On Sat, Apr 2, 2011 at 1:01 AM, Thomas Roth t.r...@gsi.de wrote: Hi all, we are suffering from a sever metadata performance

Re: [Lustre-discuss] Migrating so all MGS/MDTs on same node

2011-03-31 Thread Cliff White
You can't mount two MGS on the same node. The MGS NID has to be unique. On Thu, Mar 31, 2011 at 2:22 AM, Andrus, Brian Contractor bdand...@nps.eduwrote: All, We have a system that was grown from 3 separate lustre filesystems and I would like to set it up so they share mgs/mdt services from

Re: [Lustre-discuss] Optimal stratgy for OST distribution

2011-03-31 Thread Cliff White
No, the algorithm is not purely random, it is weighted on QOS, space and a few other things. When a stripe is chosen on one OSS, we add a penalty to the other OSTs on that OSS to prevent IO bunching on one OSS. cliffw On Thu, Mar 31, 2011 at 1:59 PM, Jeremy Filizetti jeremy.filize...@gmail.com

Re: [Lustre-discuss] software raid

2011-03-24 Thread Cliff White
Historically, Linux software RAID had multiple issues, we did not advise using it. Those issues afaik were fixed long ago, and we changed the advice. Sun/Oracle sold a product that was based on software RAID - there are no unique issues using soft RAID with Lustre. Performance/reliability is a

Re: [Lustre-discuss] Details on the LNET-Selftests

2011-03-23 Thread Cliff White
Sadly, as far as I am aware, no. cliffw On Wed, Mar 23, 2011 at 10:56 AM, Alvaro Aguilera s2506...@inf.tu-dresden.de wrote: Hello, I've read the manual section about the selftest-Module and wonder if someone here can point me to more detailed information about it. For example some kind

Re: [Lustre-discuss] MDS hangs with OFED

2011-03-17 Thread Cliff White
Unfortunately, we've had lot's of reports of IB instability. It does appear to happen quite a bit, and generally is not a Lustre problem at all. - Check all mechanical connections, cables, etc. - replace if need be - many issues have been cable-related. - Check firmware versions of all IB cards,

Re: [Lustre-discuss] OST problem

2011-03-04 Thread Cliff White
For clarity, Lustre does not replicate data. If you add an OST, it is unique. If you wish to do failover, this requires shared storage between two nodes. We do not replicate storage. If you wish to increase the size of your filesystem, you can add OSTs. cliffw On Fri, Mar 4, 2011 at 7:16 AM,

Re: [Lustre-discuss] Help! Newbie trying to set up Lustre network

2011-02-22 Thread Cliff White
Run 'lctl list_nids' on the client also. Then you can # lctl ping other nid from both server and client to verify your LNET is functioning. Also, use tunefs.lustre --print on your MDS/MGT and OST devices to verify that mgsnid is set correctly there. cliffw On Tue, Feb 22, 2011 at 11:10 AM,

Re: [Lustre-discuss] Is this setup possible

2011-02-17 Thread Cliff White
All the nodes have to run the same network type, so they can talk to one another. If client is runnng Infiniband, server must also run Infiniband, in most cases. See the Lustre Manual for information on Lustre Routing. Clients and server can run different versions of Lustre. You need to run a

Re: [Lustre-discuss] Recovery from Hardware Failure

2011-02-07 Thread Cliff White
You should not have to do the lfsck if the initial fsck's come back clean. cliffw On Mon, Feb 7, 2011 at 1:16 PM, Joe Digilio jgd-lus...@metajoe.com wrote: Last week we experienced a major hardware failure (disk controller) that brought down our system hard. Now that I have the replacement

Re: [Lustre-discuss] Fwd: question about routing between subnets

2011-01-25 Thread Cliff White
The MGS can be behind an lnet router. So, provided you have LNET routing set up between A and B, the MGS should be okay with only a subnet A address, clients on B with proper routing configuration should be fine. The address on net C is thus immaterial. cliffw On Tue, Jan 25, 2011 at 6:51 AM,

Re: [Lustre-discuss] MDT raid parameters, multiple MGSes

2011-01-21 Thread Cliff White
On Fri, Jan 21, 2011 at 3:43 AM, Thomas Roth t.r...@gsi.de wrote: Hi all, we have gotten new MDS hardware, and I've got two questions: What are the recommendations for the RAID configuration and formatting options? I was following the recent discussion about these aspects on an OST: chunk

Re: [Lustre-discuss] finding performance issues

2010-12-10 Thread Cliff White
On 12/10/2010 11:42 AM, Brock Palen wrote: We have an lustre 1.6.x filesystem, 1.6 has been dead for well over a year. End Of Life. 4 OSS, 3 x4500 and 1 ddn s2a6620 Each oss has 4 1gig interfaces bonded, or 1 10gig interface. I have a user who is running a few hundred serial jobs that are

Re: [Lustre-discuss] Determining /proc/fs/lustre/llite subdirectoy

2010-12-08 Thread Cliff White
On 12/08/2010 10:42 AM, James Robnett wrote: Our clients have 2 or 3 different lustre filesystems mounted. We're using Lustre 1.8.4 on RHEL 5.5. On clients I'd like to be able to toggle via a script the extents monitoring in /proc/fs/lustre/llite/lustre-XX/extents_stats When you have

Re: [Lustre-discuss] Getting around a Catch-22

2010-12-07 Thread Cliff White
On 12/07/2010 06:51 AM, Bob Ball wrote: We have 6 OSS, each with at least 8 OST. It sometimes happens that I need to do maintenance on an OST, so to avoid hanging processes on the client machines, I use lctl to disable access to that OST on active client machines. So, now, it may happen

Re: [Lustre-discuss] manual OST failover for maintenance work?

2010-12-07 Thread Cliff White
On 12/06/2010 09:57 AM, Adeyemi Adesanya wrote: Hi. We have pairs of OSS nodes hooked up to shared storage arrays containing OSTs but we have not enabled any failover settings yet. Now we need to perform maintenance work on an OSS and we would like to minimize Lustre downtime. Can I use

Re: [Lustre-discuss] client modules not loading during boot

2010-09-08 Thread Cliff White
The mount command will automatically load the modules on the client. cliffw On 09/03/2010 11:56 AM, Ronald K Long wrote: We have installed lustre 1.8.2 and 1.8.4 client on Red hat 5. The lustre modules are not loading during boot. In order to get the lustre file system to mount we have to

Re: [Lustre-discuss] Configuration question

2010-08-19 Thread Cliff White
On 08/19/2010 10:59 AM, David Noriega wrote: I'm curious about the underlying framework of lustre in regards to failover. When creating the filesystems, one can provide --failnode=x.x@tcp0 and even for the OSTs you can provide two nids for the MDS/MGS. What do these options tell lustre

Re: [Lustre-discuss] Virtualization and Lustre

2010-06-10 Thread Cliff White
On 05/20/2010 08:03 PM, Tyler Hawes wrote: Has there been any testing or conclusions regarding the use of virtualization and Lustre, or is this even possible considering how Lustre is coded? I've gotten used to the idea of virtualization for all our other servers, where it is great to know

Re: [Lustre-discuss] Newbie w/issues

2010-04-28 Thread Cliff White
Brian Andrus wrote: Ok, I inherited a lustre filesystem used on a cluster. I am seeing an issue where on the frontend, I see all of /work On nodes, however, I only see SOME of the user's directories. That's rather odd. The directory structure is all on the MDS, so it's usually either all

Re: [Lustre-discuss] mgt backup

2010-04-01 Thread Cliff White
John White wrote: I just wanted to confirm that the backup/restore procedure for MDTs apply equally to MGTs. Can someone please confirm? Actually, the only things kept on a dedicated MGT are config logs. Should be a trivial task to backup. mount as ldiskfs and make a quick tarball. You don't

Re: [Lustre-discuss] filter_grant_incoming()) LBUG in 1.8.1.1

2010-03-26 Thread Cliff White
Scott Barber wrote: Background: MDS and OSTs are all running CentOS 5.4 / x86_64 / 2.6.18-128.7.1.el5_lustre.1.8.1.1 2 types of clients - CentOS 5.4 / x86_64 / 2.6.18-128.7.1.el5_lustre.1.8.1.1 - Ubuntu 8.04.1 / i686 / 2.6.22.19 patchless A few days ago one of the OSSs hit an LBUG. The

Re: [Lustre-discuss] filter_grant_incoming()) LBUG in 1.8.1.1

2010-03-26 Thread Cliff White
Scott Barber wrote: Background: MDS and OSTs are all running CentOS 5.4 / x86_64 / 2.6.18-128.7.1.el5_lustre.1.8.1.1 2 types of clients - CentOS 5.4 / x86_64 / 2.6.18-128.7.1.el5_lustre.1.8.1.1 - Ubuntu 8.04.1 / i686 / 2.6.22.19 patchless A few days ago one of the OSSs hit an LBUG. The

Re: [Lustre-discuss] How to force client-oss communication over IB when the MDS has only ethernet?

2010-03-26 Thread Cliff White
Tero Hakala wrote: Hi, our MDS is temporarily missing IB connection and has only eth available. However, OSS and clients have both IB and eth. At the moment, it seems that all the traffic between clients-OSS goes also through the slow eth connection. Is it possible to force them to use

Re: [Lustre-discuss] programmatic access to parameters

2010-03-25 Thread Cliff White
burlen wrote: System limits are sometimes provided in a header, I wasn't sure if Lustre adopted that approach. The llapi_* functions are great, I see how to set the stripe count and size. I wasn't sure if there was also a function to query about the configuration, eg number of OST's

Re: [Lustre-discuss] add new network links to existing OSS

2010-03-24 Thread Cliff White
Jake Maul wrote: Greetings, We've got a small Lustre network set up, and have rather suddenly run into a bottleneck with a single gigabit link to certain OSS's. http://wiki.lustre.org/manual/LustreManual18_HTML/Bonding.html#50638966_pgfId-1289000 Based on that page in the manual, it

Re: [Lustre-discuss] Lustre Monitoring Tools

2010-01-06 Thread Cliff White
Jagga Soorma wrote: Hi Guys, I would like to monitor the performance and usage of my Lustre filesystem and was wondering what are the commonly used monitoring tools for this? Cacti? Nagios? Any input would be greatly appreciated. Regards, -Simran LLNL's LMT tool is very good. It's

Re: [Lustre-discuss] Lustre Monitoring Tools

2010-01-06 Thread Cliff White
release has been tested with Lustre 1.6.6. So, yup, seems a bit old. But might be worth looking into. cliffw jab -Original Message- From: lustre-discuss-boun...@lists.lustre.org [mailto:lustre-discuss-boun...@lists.lustre.org] On Behalf Of Cliff White Sent: Wednesday, January 06

Re: [Lustre-discuss] Lustre and iSCSI

2009-07-31 Thread Cliff White
David Pratt wrote: Hi. I am exploring possibilities for pooled storage for virtual machines. Lustre looks quite interesting for both tolerance and speed. I have a couple of basic questions: 1) Can Lustre present an iSCSI target Lustre doesn't present target, we use targets, and we should

Re: [Lustre-discuss] failover software - heartbeat

2009-07-14 Thread Cliff White
Lundgren, Andrew wrote: It is very difficult to find relevant documentation for heartbeat 1/2. I just finished configuring a heartbeat system and would not recommend it because of the documentation. (They seem to have removed portions the heartbeat documentation from the site.)

Re: [Lustre-discuss] failover software - heartbeat

2009-07-14 Thread Cliff White
Jim Garlick wrote: Hi, OK I have posted it to https://bugzilla.lustre.org/show_bug.cgi?id=20165 20165: scripts for heartbeat v1 integration I added example config files from our test cluster. Probably best to redirect questions/comments/criticisms to the bug and I'll respond there.

Re: [Lustre-discuss] Lustre DRBD failover time

2009-07-14 Thread Cliff White
tao.a...@nokia.com wrote: Hi, all, I am evaluating Lustre with DRBD failover, and experiencing about 2 minutes in OSS failover time to switch to the secondary node. Has anyone have the similar observation (so that we can conclude this should be expected), or if there is some

Re: [Lustre-discuss] Out of Memory on MDS

2009-06-23 Thread Cliff White
Roger Spellman wrote: I have an MDS that is crashing with out-of-memory. Prior to the crash, I started collecting /proc/slabinfo. I see that ldlm_locks is up to 4,500,000, and each one is 512 bytes, for a total of 2.2GB, which is more than half my RAM. Is there a way to limit

Re: [Lustre-discuss] a simple question

2009-06-19 Thread Cliff White
Onane wrote: Hello, After installing lustre, how can I test it quickliy if it is installed correctly ? # modprobe -v lustre If the modules load without error this is good. # lctl network up # lctl list_nids This shows you that LNET can run. Beyond that, create a filesystem. Examples at

Re: [Lustre-discuss] Lustre installation and configuration problems

2009-06-17 Thread Cliff White
is a output from strace for mount: http://www.heypasteit.com/clip/8WT Any further debugging hints? Thanks, CS. On 6/16/09, Cliff White cliff.wh...@sun.com wrote: Carlos Santana wrote: The '$ modprobe -l lustre*' did not show any module on a patchless client. modprobe -v returns 'FATAL: Module

Re: [Lustre-discuss] missing ost's?

2009-06-17 Thread Cliff White
Michael Di Domenico wrote: On Tue, Jun 16, 2009 at 8:25 PM, Michael Di Domenicomdidomeni...@gmail.com wrote: I have a small lustre test cluster with eight OST's running. The servers were shut off over the weekend, upon turning them back on and trying to startup lustre I seem to have lost my

Re: [Lustre-discuss] Lustre installation and configuration problems

2009-06-17 Thread Cliff White
-discuss] Lustre installation and configuration problems To: Carlos Santana neu...@gmail.com Cc: Cliff White cliff.wh...@sun.com, lustre-discuss@lists.lustre.org Date: Wednesday, June 17, 2009, 1:08 PM Carlos - The installation procedures for Lustre 1.6 and 1.8 are the same. The manual's

Re: [Lustre-discuss] Lustre installation and configuration problems

2009-06-17 Thread Cliff White
And here is a output from strace for mount: http://www.heypasteit.com/clip/8WT Any further debugging hints? Thanks, CS. On 6/16/09, Cliff White cliff.wh...@sun.com wrote: Carlos Santana wrote: The '$ modprobe -l lustre*' did not show any module on a patchless client. modprobe -v returns 'FATAL

Re: [Lustre-discuss] Lustre installation and configuration problems

2009-06-16 Thread Cliff White
Carlos Santana wrote: Thanks Kevin.. Please read: http://manual.lustre.org/manual/LustreManual16_HTML/ConfiguringLustre.html#50401328_pgfId-1289529 Those instructions are identical for 1.6 and 1.8. For current lustre, only two commands are used for configuration. mkfs.lustre and mount.

Re: [Lustre-discuss] Lustre installation and configuration problems

2009-06-16 Thread Cliff White
, it's a network/name misconfiguration. Run 'tunefs.lustre --print' on your servers, and verify that mgsnode= is correct. cliffw Thanks, CS. On Tue, Jun 16, 2009 at 12:16 PM, Cliff White cliff.wh...@sun.com mailto:cliff.wh...@sun.com wrote: Carlos Santana wrote

Re: [Lustre-discuss] Lustre installation and configuration problems

2009-06-16 Thread Cliff White
the modprobe. cliffw --- --- I tried lustre_rmmod and depmod commands and it did not return any error messages. Any further clues? Reinstall patchless client again? - CS. On Tue, Jun 16, 2009 at 1:32 PM, Cliff White cliff.wh...@sun.com mailto:cliff.wh...@sun.com wrote: Carlos

Re: [Lustre-discuss] Lustre 2.0* CMD doc/info?

2009-06-08 Thread Cliff White
Tom.Wang wrote: Hi CMD evaluation will be available on lustre 2.0 alpha-5.0. There are no more information yet except Wiki. And btw, we always welcome any contributions to lustre documentation. Use the wiki, and/or submit a bug against the Documentation product on bugzilla.lustre.org.

Re: [Lustre-discuss] ext3 tuning on a lustre filesystem

2009-04-28 Thread Cliff White
Nick Jennings wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hello Everyone, I was wondering if certain ext3 tweaks can be applied to a lustre filesystem? Things like: - - reclaiming some of the reserved 5% space for non-root filesystems This one gets done quite often - 5% is a

Re: [Lustre-discuss] Random access is not improving

2009-04-06 Thread Cliff White
set...@gmail.com wrote: Does Lustre increase random access performance? I would like to know this becauseI have a large random access file (a hash table). I have striped this file across multiple OSTs. The file is 24 gigabytes, and the stripe size was 1gig across 10 OSTs. I also tried a

Re: [Lustre-discuss] OSS Cache Size for read optimization

2009-04-03 Thread Cliff White
Jordan Mendler wrote: Hi all, I deployed Lustre on some legacy hardware and as a result my (4) OSS's each have 32GB of RAM. Our workflow is such that we are frequently rereading the same 15GB indexes over and over again from Lustre (they are striped across all OSS's) by all nodes on our

Re: [Lustre-discuss] Adding OSTs problem

2009-03-16 Thread Cliff White
Mag Gam wrote: I have added 2 volumes onto my existing filesystem. mkfs.lustre --fsname lfs002 --ost --mgsnode=mg...@tcp /dev/lustrevg/lv03 mkfs.lustre --fsname lfs002 --ost --mgsnode=mg...@tcp /dev/lustrevg/lv04 I even managed to mount up the OSTS (each are 2TB) However, on the clients

Re: [Lustre-discuss] Error using mkfs.lustre

2009-03-03 Thread Cliff White
Rayentray Tappa wrote: On Tue, 2009-03-03 at 08:46 -0800, Evan Felix wrote: Yes, while you are installing lustre you should probably have /sbin/ and /usr/sbin in your path. ok, so i'll add them Check to see if /usr/sbin/mkfs.ext2 exists. i checked and there's no such thing as

Re: [Lustre-discuss] About MDS failover

2009-01-15 Thread Cliff White
Jeffrey Alan Bennett wrote: Hi, What software are people using for MDS failover? I have been using Heartbeat from Linux-HA but I am not absolutely happy with its performance. Is there anything better out there? Are you using heartbeat V1 or V2? I would like to hear more about the

Re: [Lustre-discuss] LBUG ASSERTION(lock-l_resource != NULL) failed

2009-01-14 Thread Cliff White
Brock Palen wrote: I am having servers LBUG on a regular basis, Clients are running 1.6.6 patchless on RHEL4, servers are running RHEL4 with 1.6.5.1 RPM's from the download page. All connection is over Ethernet, Servers are x4600's. This looks like bug 16496, which is fixed in 1.6.6.

Re: [Lustre-discuss] LBUG ASSERTION(lock-l_resource != NULL) failed

2009-01-14 Thread Cliff White
for the insight! Brock Palen www.umich.edu/~brockp Center for Advanced Computing bro...@umich.edu (734)936-1985 On Jan 14, 2009, at 7:27 PM, Cliff White wrote: Brock Palen wrote: I am having servers LBUG on a regular basis, Clients are running 1.6.6 patchless on RHEL4, servers

Re: [Lustre-discuss] Lustre NOT HEALTHY

2009-01-13 Thread Cliff White
Brock Palen wrote: How common is it for servers to go NOT HEALTHY? I feel it is happening much more often than it should be with us. A few times a month. It should not happen at all, in the normal case. It indicates a problem. If this happens, we reboot the servers. Should we do

Re: [Lustre-discuss] writeconf needed for 1.6.6?

2009-01-12 Thread Cliff White
Roger Spellman wrote: Is writeconf needed to upgrade a filesystem from 1.6.5 to 1.6.6? If so, is this run just on the MGS and MDT, or also on the OSTs? If you are not changing any configuration or NIDS, no writeconf is needed when upgrading Lustre from one point release to the next. Major

Re: [Lustre-discuss] What do clients run on?

2009-01-12 Thread Cliff White
Arden Wiebe wrote: I've read it a zillion times but can't seem to find it again. Can a client run on the same server as a MGS, MDT or OSS? Is a dedicated client machines necessary? You can run all of Lustre (clients and all servers) on one node, but this is not supported for production

Re: [Lustre-discuss] What do clients run on?

2009-01-12 Thread Cliff White
Arden Wiebe wrote: Okay I'll rephrase the question? Given a limited deployment can I mount the client on the MDT, MGS or OSS? Is the best choice to build a dedicated client? If you care about performance at all, a dedicated client is always best. While you can run client/MDS somewhat

Re: [Lustre-discuss] what are the meanings of collectl output for lustre

2009-01-12 Thread Cliff White
xiangyong ouyang wrote: hi all, I'm running collectl to get profiling information about lustre. I'm using lustre 2.6.18-53.1.13.el5_lustre.1.6.4.3smp, and collectl V3.1.1-5 (zlib:1.42,HiRes:1.86) Basically I want to see the metadata information on both client and MDS. On client

Re: [Lustre-discuss] MGS and MDT on Failover Pair

2008-09-12 Thread Cliff White
Brian J. Murrell wrote: On Wed, 2008-09-10 at 16:23 -0400, Roger Spellman wrote: I am building a system with a redundant MDS, that is two MDS sharing a set of disks, one being Active, the other Standby. If I put the MGS and MDS on the same system, it appears that they must be on the same

Re: [Lustre-discuss] beta lustre

2008-09-09 Thread Cliff White
Papp Tamas wrote: hi All, Where can I download beta version of lustre? Depends on what you mean. Current Lustre is always available from the Sun Download site. http://www.sun.com/software/products/lustre/get.jsp (free, of course) We have pre-released versions availble via CVS.

Re: [Lustre-discuss] Typical IB timeout? Or something more?

2008-09-09 Thread Cliff White
Alex Lee wrote: I been seeing something that looks like IB timeout errors lately after upgrading to 1.6.5.1 using the supplied ofed kernel drivers. From what I can tell there hasnt been any real network issues that was apparent. Are these errors just typical if the network is busy? Could

Re: [Lustre-discuss] simulations

2008-08-08 Thread Cliff White
/KnowledgeBase.html#50544717_84403 The .pdf version I think has more details. cliffw TIA On Thu, Aug 7, 2008 at 10:59 AM, Cliff White [EMAIL PROTECTED] wrote: Mag Gam wrote: We do a lot of fluid simulations at my university, but on a similar note I would like to know what the Lustre

Re: [Lustre-discuss] MDS

2008-08-07 Thread Cliff White
Cliff White wrote: Mag Gam wrote: Also, what is the best way to test the backup? Other than really remove my MGS and restore it. Is there a better way to test this? If you really care about the backups, you need to be brave. If you can't remove the MDS and restore it, then something

Re: [Lustre-discuss] simulations

2008-08-07 Thread Cliff White
Mag Gam wrote: We do a lot of fluid simulations at my university, but on a similar note I would like to know what the Lustre experts will do in particular simulated scenarios... The environment is this: 30 Servers (All Linux) 1000+ Clients (All Linux) 30 Servers 1 MDS 30 OSTs each

Re: [Lustre-discuss] Lustre health check

2008-07-15 Thread Cliff White
Mag Gam wrote: We are planning to deploy lustre on a large scale at my university, and we were wondering if there are any health check utilities available for OST, OSS, MDS and MDT. I know there is a SNMP module avaliable, but I prefer a solid front end with SNMP as the backend. So what tools

Re: [Lustre-discuss] Gluster then DRBD now Lustre?

2008-06-16 Thread Cliff White
[EMAIL PROTECTED] wrote: On Mon, 16 Jun 2008, Kilian CAVALOTTI wrote: On Monday 16 June 2008 11:40:40 am Andreas Dilger wrote: NYC == New York City? What is SJC? SJC == San Jose, California That's why I thought, but if so, the following part loses me: This is working in a test setup,

Re: [Lustre-discuss] Size of MDT, used space

2008-05-13 Thread Cliff White
Thomas Roth wrote: Hi all, I'm still in trouble with numbers: the available, used and necessary space on my MDT: According to lfs df, I have now filled my file system with 115.3 TB. All of these files are sized 5 MB. That should be roughly 24 million files. For the MDT, lfs df reports

Re: [Lustre-discuss] Lustre Downloads

2008-02-14 Thread Cliff White
Canon, Richard Shane wrote: I see that the download site has been moved and integrated into the Sun site. It looks like this broke a few things. For one, I can’t get to any of the 1.4 releases. Can this get fixed? I'll see what can be done. cliffw Thanks, --Shane

Re: [Lustre-discuss] Lustre Downloads

2008-02-14 Thread Cliff White
Canon, Richard Shane wrote: I see that the download site has been moved and integrated into the Sun site. It looks like this broke a few things. For one, I can’t get to any of the 1.4 releases. Can this get fixed? It looks like some links were recently mis-moved. Should be fixed

Re: [Lustre-discuss] Lustre Downloads

2008-02-14 Thread Cliff White
Cliff White wrote: Canon, Richard Shane wrote: I see that the download site has been moved and integrated into the Sun site. It looks like this broke a few things. For one, I can’t get to any of the 1.4 releases. Can this get fixed? It looks like some links were recently mis-moved

Re: [Lustre-discuss] Help with lustre 1.6.4

2008-02-12 Thread Cliff White
Ali Algarrous wrote: My name is Ali Algarrous and I'm doing a research about Lustre file systesm. I'm running Debian on my machine and after a long time I was able to install Lustre on my machine. I was able to format devices and mount clients and servers easily but I had to reinstall the