Re: [lustre-discuss] High MDS load

2020-05-28 Thread Carlson, Timothy S
Since some mailers don't like attachments, I'll just paste in the script we use here. I call the script with ./parse.sh | sort -k3 -n You just need to change out the name of your MDT in two places. #!/bin/bash set -e SLEEP=10 stats_clear() { cd $1 echo clear >clear }

Re: [lustre-discuss] [EXTERNAL] Re: Lustre Timeouts/Filesystem Hanging

2019-10-29 Thread Carlson, Timothy S
stats_print "$dir" done From: Moreno Diego (ID SIS) Sent: Tuesday, October 29, 2019 10:08 AM To: Louis Allen ; Oral, H. ; Carlson, Timothy S ; lustre-discuss@lists.lustre.org Subject: Re: [lustre-discuss] [EXTERNAL] Re: Lustre Timeouts/Filesystem Hanging Hi Louis, If you don’t hav

Re: [lustre-discuss] Lustre Timeouts/Filesystem Hanging

2019-10-28 Thread Carlson, Timothy S
In my experience, this is almost always related to some code doing really bad I/O. Let's say you have a 1000 rank MPI code doing open/read 4k/close on a few specific files on that OST. That will make for a bad day. The other place you can see this, and this isn't your case, is when ZFS

Re: [lustre-discuss] Is it a good practice to use big OST?

2019-10-08 Thread Carlson, Timothy S
I've been running 100->200TB OSTs making up small petabyte file systems for the last 4 or 5 years with no pain. Lustre 2.5.x through current generation. Plenty of ZFS rebuilds when I ran across a set of bad disks that went fine. From: lustre-discuss On Behalf Of w...@umich.edu Sent: Tuesday,

Re: [lustre-discuss] ZFS tuning for MDT/MGS

2019-03-13 Thread Carlson, Timothy S
+1 on options zfs zfs_prefetch_disable=1 Might not be as critical now, but that was a must-have on Lustre 2.5.x Tim From: lustre-discuss On Behalf Of Riccardo Veraldi Sent: Wednesday, March 13, 2019 3:00 PM To: Kurt Strosahl ; lustre-discuss@lists.lustre.org Subject: Re: [lustre-discuss] ZFS

Re: [lustre-discuss] Rebooting storage nodes while jobs are running?

2019-02-27 Thread Carlson, Timothy S
I will say YMMV. I've rebooted storage nodes and have had mixed results where we land into one of three bucktes 1) Codes breeze through and have just been stuck in D state while OSS's reboot 2) RPCs get stuck somewhere and when the OSS comes back I eventually have to force an abort_recovery 3)

Re: [lustre-discuss] lustre for home directories

2018-04-25 Thread Carlson, Timothy S
I would work on fixing your NFS server before moving to Lustre. That being said, I have no idea of how big an installation you have. How many nodes you have for NFS clients, how much data you are talking about moving around, etc. As others will point out, even with improvements in Lustre

Re: [lustre-discuss] Lustre as /home directory

2018-02-16 Thread Carlson, Timothy S
I'll just add +1 to this thread. /home on NFS for software builds, small files, lots of metadata operations. Lustre for the rest. Users will do the wrong thing even after education. Tim -Original Message- From: lustre-discuss [mailto:lustre-discuss-boun...@lists.lustre.org] On

Re: [lustre-discuss] Designing a new Lustre system

2017-12-21 Thread Carlson, Timothy S
Isilon is truly an enterprise solution. We have one (about a dozen bricks worth) and use it for home directories on our super computers and it allows easy access via CIFS to users on Windows/Mac. It is highly configurable with “smart pools” and policies to move data around based on

Re: [lustre-discuss] Does lustre 2.10 client support 2.5 server ?

2017-11-07 Thread Carlson, Timothy S
FWIW, we have successfully been running 2.9 clients (RHEL 7.3) with 2.5.3 servers (RHEL 6.6) at a small scale. About 40 OSSes and dozens of 2.9 clients with hundreds of 2.5.3 clients mixed in. Tim From: lustre-discuss [mailto:lustre-discuss-boun...@lists.lustre.org] On Behalf Of E.S.

Re: [lustre-discuss] problems accessing files as non-root user.

2016-12-12 Thread Carlson, Timothy S
Does your new MDS server have all the UIDs of these people in /etc/passwd? Tim -Original Message- From: lustre-discuss [mailto:lustre-discuss-boun...@lists.lustre.org] On Behalf Of Phill Harvey-Smith Sent: Monday, December 12, 2016 9:16 AM To: lustre-discuss@lists.lustre.org Subject:

Re: [lustre-discuss] Odd problem with new OSTs not being used

2016-09-01 Thread Carlson, Timothy S
oks like I will be upgrading to 2.5.4 soon as I really need to be able to deactivate OSTs and have the algorithm on the MDS still be able to choose new OSTs to write to. Tim -Original Message- From: lustre-discuss [mailto:lustre-discuss-boun...@lists.lustre.org] On Behalf Of Carlson,

[lustre-discuss] Odd problem with new OSTs not being used

2016-09-01 Thread Carlson, Timothy S
Running Lustre 2.5.3(ish) backed with ZFS. We’ve added a few OSTs and they show as being “UP” but aren’t taking any data [root@lzfs01a ~]# lctl dl 0 UP osd-zfs MGS-osd MGS-osd_UUID 5 1 UP mgs MGS MGS 1085 2 UP mgc MGC172.17.210.11@o2ib9 77cf08da-86a4-7824-1878-84b540993c6d 5 3 UP

Re: [lustre-discuss] wildly inaccurate file size

2016-06-30 Thread Carlson, Timothy S
Is this a ZFS backed Lustre with compression? If so, then that is not at all surprising if that is a compressible file. I have a 1G file of zeros that shows up as 512 bytes [root@pic-admin03 tim]# ls -sh 1G 512 1G [root@pic-admin03 tim]# ls -l 1G -rw-r--r-- 1 tim users 1073741824 Dec 2 2015

[lustre-discuss] ZFS backed OSS out of memory

2016-06-23 Thread Carlson, Timothy S
Folks, I've done my fair share of googling and run across some good information on ZFS backed Lustre tuning including this: http://lustre.ornl.gov/ecosystem-2016/documents/tutorials/Stearman-LLNL-ZFS.pdf and various discussions around how to limit (or not) the ARC and clear it if needed.

Re: [Lustre-discuss] lustre on debian

2013-11-25 Thread Carlson, Timothy S
Lustre is not (yet) part of the mainstream kernel so you are not going to find Lustre digging through the linux kernel build process. Thus you see the link below from Thomas on some lustre packages. Tim -Original Message- From: lustre-discuss-boun...@lists.lustre.org

Re: [Lustre-discuss] Can't increase effective client read cache

2013-09-26 Thread Carlson, Timothy S
-Original Message- From: Dilger, Andreas [mailto:andreas.dil...@intel.com] Sent: Wednesday, September 25, 2013 10:03 AM To: Carlson, Timothy S Cc: lustre-discuss@lists.lustre.org; hpdd-disc...@lists.01.org Subject: Re: [Lustre-discuss] Can't increase effective client read cache

[Lustre-discuss] Can't increase effective client read cache

2013-09-24 Thread Carlson, Timothy S
I've got an odd situation that I can't seem to fix. My setup is Lustre 1.8.8-wc1 clients on RHEL 6 talking to 1.8.6 servers on RHEL 5. My compute nodes have 64 GB of memory and I have a use case where an application has very low memory usage and needs to access a few thousand files in Lustre

Re: [Lustre-discuss] Lustre buffer cache causes large system overhead.

2013-08-22 Thread Carlson, Timothy S
FWIW, we have seen the same issues with Lustre 1.8.x and slightly older RHEL6 kernel. We do the echo as part of our slurm prolog/epilog scripts. Not a fix but a workaround before/after jobs run. No swap activity, but very large buffer cache in use. Tim -Original Message- From:

Re: [Lustre-discuss] Anybody have a client running on a 2.6.37 or later kernel?

2011-10-23 Thread Carlson, Timothy S
way too much in the past day to boot back into working kernels. :) Tim -Original Message- From: Kevin Van Maren [mailto:kvanma...@fusionio.com] Sent: Saturday, October 22, 2011 8:24 AM To: Carlson, Timothy S Cc: Lustre-discuss@lists.lustre.org Subject: Re: [Lustre-discuss

Re: [Lustre-discuss] Anybody have a client running on a 2.6.37 or later kernel?

2011-10-23 Thread Carlson, Timothy S
to be working so far. Thanks Tim -Original Message- From: lustre-discuss-boun...@lists.lustre.org [mailto:lustre-discuss- boun...@lists.lustre.org] On Behalf Of Carlson, Timothy S Sent: Sunday, October 23, 2011 3:19 PM To: 'Kevin Van Maren' Cc: Lustre-discuss@lists.lustre.org Subject: Re

[Lustre-discuss] Anybody have a client running on a 2.6.37 or later kernel?

2011-10-21 Thread Carlson, Timothy S
Folks, I've got a need to run a 2.6.37 or later kernel on client machines in order to properly support AMD Interlagos CPUs. My other option is to switch from RHEL 5.x to RHEL 6.x and use the whamcloud 1.8.6-wc1 patchless client (the latest RHEL 6 kernel also supports Interlagos). But I would

Re: [Lustre-discuss] Anybody actually using Flash (Fusion IO specifically) for meta data?

2011-05-19 Thread Carlson, Timothy S
On May 19, 2011, at 10:28, Kevin Van Maren wrote: Dardo D Kleiner - CONTRACTOR wrote: As for putting the entire filesystem on flash, sure that would be pretty nifty, but expensive. Not being able to do failover, with storage on internal PCIe cards, is a downside. [Andreas added this

[Lustre-discuss] Anybody actually using Flash (Fusion IO specifically) for meta data?

2011-05-16 Thread Carlson, Timothy S
Folks, I know that flash based technology gets talked about from time to time on the list, but I was wondering if anybody has actually implemented FusionIO devices for metadata. The last thread I can find on the mailing list that relates to this topic dates from 3 years ago. The software