Re: [lustre-discuss] [EXTERNAL] Re: lnet instability over infiniband when running el9 + connextX-3 hardware

2024-06-24 Thread Kurt Strosahl via lustre-discuss
LID:247 GID:fe80::fc6a:1c03:60:47c0 Jun 21 08:12:03 570054 [725C7640] 0x02 -> do_sweep: w/r, Kurt From: John Hearns Sent: Saturday, June 22, 2024 2:54 AM To: Kurt Strosahl Cc: lustre-discuss@lists.lustre.org ; sci...@jlab.org Subject: [EXTERNAL] Re: [lustr

[lustre-discuss] lnet instability over infiniband when running el9 + connextX-3 hardware

2024-06-21 Thread Kurt Strosahl via lustre-discuss
Good Morning, We've been experiencing a fairly nasty issue with our clients following our move to Alma 9. It seems to occur randomly (a few days to over a week), the clients with connectX-3 cards start getting lnet network errors and seeing moving hangs on random osts spread across our

[lustre-discuss] File size discrepancy on lustre

2023-09-15 Thread Kurt Strosahl via lustre-discuss
Important addendum... after a cp of the impacted files the du size shrinks to match the ls and du --apparent-size w/r, Kurt J. Strosahl (he/him) System Administrator: Lustre, HPC Scientific Computing Group, Thomas Jefferson National Accelerator Facility

[lustre-discuss] File size discrepancy on lustre

2023-09-15 Thread Kurt Strosahl via lustre-discuss
Good Morning, We have encountered a very odd issue. Where files are being created that show as double in size under du, then they do using ls or du --apparent-size. under ls we see 119G ~> ls -lh \ >

[lustre-discuss] OSTs per OSS question

2023-02-27 Thread Kurt Strosahl via lustre-discuss
Good Morning, I'm in the early phases of a new lustre file system design. One thing we are observing on our lustre file system from 2019 (running 2.12.9-1) is that the oss systems report a lot of time spent in i/o wait. We are configured with two oss systems connected via sas cables to a

Re: [lustre-discuss] [EXTERNAL] lustre-discuss Digest, Vol 192, Issue 7

2022-03-11 Thread Kurt Strosahl via lustre-discuss
What version were you upgrading from? From: lustre-discuss on behalf of lustre-discuss-requ...@lists.lustre.org Sent: Thursday, March 10, 2022 9:49 AM To: lustre-discuss@lists.lustre.org Subject: [EXTERNAL] lustre-discuss Digest, Vol 192, Issue 7 Send

[lustre-discuss] Rollback after upgrade of luster

2022-03-08 Thread Kurt Strosahl via lustre-discuss
Good Morning, I'm planning on updating a luster 2.12.1-1 file system to 2.12.8_6. Are there any disk format changes in there that would prevent a rollback if we encounter a serious issue? w/r, Kurt J. Strosahl (he/him) System Administrator: Lustre, HPC Scientific Computing Group, Thomas

Re: [lustre-discuss] Upgrading lustre servers

2022-02-23 Thread Kurt Strosahl via lustre-discuss
We only have a single, combined, mdt-mds... Would that impact it? From: Spitz, Cory James Sent: Wednesday, February 23, 2022 9:38 AM To: Patrick Farrell ; lustre-discuss@lists.lustre.org ; Kurt Strosahl Subject: [EXTERNAL] Re: Upgrading lustre servers Kurt

Re: [lustre-discuss] [EXTERNAL] Re: Disabling max creates and migrating data doesn't seem to be reducing the usage on an OST

2021-02-17 Thread Kurt Strosahl
We are running 2.12.1, which we've been running for more then a year now. A reboot of the OST cleaned it up. From: Nathan Dauchy - NOAA Affiliate Sent: Wednesday, February 17, 2021 4:36 PM To: Kurt Strosahl ; lustre-discuss@lists.lustre.org Subject: [EXTERNAL

[lustre-discuss] Disabling max creates and migrating data doesn't seem to be reducing the usage on an OST

2021-02-11 Thread Kurt Strosahl
Good Morning, One of the OSTs in a lustre file system I manage is showing a higher usage. I attempted to stop writes by setting the max_create_count to zero and then moving data off it but that doesn't seem to be working. > lfs df | grep OST:15lustre19-OST000f_UUID 71145018368 62653382656

[lustre-discuss] A question about processors for an MDS

2021-01-26 Thread Kurt Strosahl
Hello, I'm in the planning phase of a new luster file system, and while going over MDS hardware I saw that AMD had some pretty good chips (price wise) compared to Intel. Has anyone had experiences with AMD based MDS systems, are they any different? I've always done Intel based ones in

[lustre-discuss] Lnet ping issue

2020-07-27 Thread Kurt Strosahl
Good Afternoon, I'm experiencing an odd issue with one of my lustre clients. The system seems to be having an issue talking to one of the oss systems. When it reboots it is somehow mounting lustre twice. attempts to use lctl ping from the client to the OSS return the following error: ~]

Re: [lustre-discuss] [EXTERNAL] Re: oss servers crashing

2020-07-15 Thread Kurt Strosahl
] From: Alex Zarochentsev Sent: Wednesday, July 15, 2020 11:20 AM To: Kurt Strosahl Cc: lustre-discuss@lists.lustre.org Subject: [EXTERNAL] Re: [lustre-discuss] oss servers crashing Hello! On Wed, Jul 15, 2020 at 5:28 PM Kurt Strosahl mailto:stros...@jlab.org>> wrote: Good M

[lustre-discuss] oss servers crashing

2020-07-15 Thread Kurt Strosahl
Good Morning, Yesterday one of our lustre file servers rebooted several times. the crash dump showed: [14333982.153989] Pid: 381367, comm: ll_ost_io01_076 3.10.0-957.10.1.el7_lustre.x86_64 #1 SMP Tue Apr 30 22:18:15 UTC 2019 [14333982.153989] Kernel panic - not syncing: LBUG

Re: [lustre-discuss] Files hanging on lustre clients

2020-03-31 Thread Kurt Strosahl
I can't tell, any commands I run against the files in question hang indefinitely. It seems very suspicious though. From: Mohr Jr, Richard Frank Sent: Tuesday, March 31, 2020 3:41 PM To: Kurt Strosahl Cc: lustre-discuss@lists.lustre.org ; sci...@jlab.org

[lustre-discuss] Files hanging on lustre clients

2020-03-31 Thread Kurt Strosahl
Hello, I'm tracking a very vexing issue. Somehow users are creating files that cause attempts to examine or manipulate the files to hang. They can't be removed, they can't be examined, even an ls command will hang. an strace on an ls command run against some of these files produced the

Re: [lustre-discuss] Lustre rpm install creating a file that breaks lustre

2019-10-02 Thread Kurt Strosahl
To: Kurt Strosahl ; lustre-discuss@lists.lustre.org Subject: [EXTERNAL] Re: [lustre-discuss] Lustre rpm install creating a file that breaks lustre Might be best to open a ticket for this. What was the nature of the failure? Chris Horn From: lustre-discuss on behalf of Kurt Strosahl Date

[lustre-discuss] Lustre rpm install creating a file that breaks lustre

2019-10-02 Thread Kurt Strosahl
Good Afternoon, While getting lustre 2.10.8 running on a RHEL 7.7 system I found that the RPM install was putting a file in /etc/modprobe.d that was preventing lnet from starting properly. the file is ko2iblnd.conf, which contains the following... alias ko2iblnd-opa ko2iblnd options

[lustre-discuss] A question about lctl lfsck

2019-07-03 Thread Kurt Strosahl
Good Afternoon, Hopefully a simple question... If I run lctl lfsck_start is there a place where I can get a list of what it did? w/r, Kurt J. Strosahl System Administrator: Lustre, HPC Scientific Computing Group, Thomas Jefferson National Accelerator Facility

Re: [lustre-discuss] lfs df showing a "D" after an ost

2019-06-27 Thread Kurt Strosahl
From: Kurt Strosahl Sent: Thursday, June 27, 2019 1:23 PM To: lustre-discuss@lists.lustre.org Subject: lfs df showing a "D" after an ost Good Afternoon, I've got a new 2.12 lustre file system up, and when I run a lfs df it shows a D next to one of the

[lustre-discuss] lnet.service reporting failure on start

2019-06-24 Thread Kurt Strosahl
Good Afternoon, I recently stood up a new lustre client, when I run systemctl start lnet.service systemd reports a failure. However an lnetctl net show displays all the right information and I can reach other nodes with lclt ping. Further the lustre file system mounts properly. Looking

[lustre-discuss] a question about max_create_count

2019-06-19 Thread Kurt Strosahl
Good Afternoon, I'm in the process of testing a new lustre file system at 2.12, and in going through the documentation I saw that to stop writes from going to the system we now set the max_create_count to 0 on the mdt. I looked at that value on my system and the default seems to be

Re: [lustre-discuss] lustre-discuss Digest, Vol 158, Issue 10

2019-05-09 Thread Kurt Strosahl
Presently I'm experimenting with the following: /etc/zfs/vdev_id.conf alias ost01d1shasl00 /dev/disk/by-id/dm-uuid-mpath-35000cca26b825d6c alias ost01d2shasl01 /dev/disk/by-id/dm-uuid-mpath-35000cca26b860178 alias ost01d3shasl02 /dev/disk/by-id/dm-uuid-mpath-35000cca26c1e2cb4 alias

Re: [lustre-discuss] ZFS and multipathing for OSTs

2019-04-26 Thread Kurt Strosahl
Sent: Friday, April 26, 2019 6:28 AM To: Kurt Strosahl Cc: lustre-discuss@lists.lustre.org Subject: Re: [lustre-discuss] ZFS and multipathing for OSTs Disk replacement with multipathd + zfs is somewhat not convenient. step1: mark offline the disk you should replace with zpool command step2: remove

[lustre-discuss] ZFS and multipathing for OSTs

2019-04-25 Thread Kurt Strosahl
Good Afternoon, As part of a new lustre deployment I've now got two disk shelves connected redundantly to two servers. Since each disk has two paths to the server I'd like to use multipathing for both redundancy and improved performance. I haven't found examples or discussion about such

[lustre-discuss] Problem with OPA fabric leading to unexpected lustre behaviour

2019-04-15 Thread Kurt Strosahl
Good Morning, I'm presently working on an issue with my OPA network that seems to be having an unusual impact on lustre. What happens is that when one of the nodes on the OPA fabric reboots it sometimes has trouble reaching one of the four lnet routers that we have set up. This isn't,

Re: [lustre-discuss] Issue updating lustre from 2.10.6 to 2.10.7

2019-04-12 Thread Kurt Strosahl
Thanks, that was exactly what I needed to dig out the bad modules! From: Jeff Johnson Sent: Friday, April 12, 2019 12:00 PM To: Kurt Strosahl Cc: lustre-discuss@lists.lustre.org Subject: Re: [lustre-discuss] Issue updating lustre from 2.10.6 to 2.10.7 Kurt, I

[lustre-discuss] Issue updating lustre from 2.10.6 to 2.10.7

2019-04-12 Thread Kurt Strosahl
Good Morning, I've encountered an issue updating lustre from 2.10.6 to 2.10.7 on my metadata system. I installed the updated RPMs using whamcloud's yum repository, but when I run modprobe -v lustre I get the following errors: insmod

[lustre-discuss] Tools for backing up a ZFS MDT

2019-03-25 Thread Kurt Strosahl
Good Afternoon, I've been working on a new lustre file system, now with ZFS on the MDT (my current lustre file system uses ldiskfs). One of the reasons for this was the ability to use ZFS snapshots to backup the MDT, and I'm wondering if anyone has experience with zfs backup tools like

[lustre-discuss] ZFS tuning for MDT/MGS

2019-03-13 Thread Kurt Strosahl
Good Afternoon, I'm reviewing the zfs parameters for a new metadata system and I was looking to see if anyone had examples (good or bad) of zfs parameters? I'm assuming that the MDT won't benefit from a recordsize of 1MB, and I've already set the ashift to 12. I'm using an MDT/MGS made

[lustre-discuss] systemd lnet start script has route config commented out

2019-02-22 Thread Kurt Strosahl
Good Morning, I'm in the process of setting up a new lustre file system, one that will need to reach some nodes through a set of lnet routers. Adding the routes by hand doesn't work but adding them to the lnet_routes.conf file doesn't get picked up on reboot. I looked at the file and

[lustre-discuss] Questions about using multiple MDT spaces in DNE

2018-12-11 Thread Kurt Strosahl
Good Morning, I have some questions about multiple mdts and striping. As I read the documentation it sounds like once I create a directory and assign it to a mdt then all the files in that directory will be written to that mdt. In the case of striping across mds files will be striped

[lustre-discuss] Laying the groundwork for a new lustre file system

2018-11-09 Thread Kurt Strosahl
All, I asked this question a month or so back, but thought I'd ask it again to see if anyone else had insight. I'm in the process of designing a new lustre file system to replace an existing 2.3PB lustre file system. We have over a thousand clients connecting to lustre over mostly QDR

[lustre-discuss] new lustre setup, questions about the mgt and mdt.

2018-10-23 Thread Kurt Strosahl
Good Afternoon, I'm in the planning stages of a new lustre file system, and I had some questions. 1) I've seen it said that ZFS is a good choice for the lustre mdt, as long as it is zfs 0.7.x. Has anyone had that experience? 2) What about the MGT, that seems like it wouldn't have any

[lustre-discuss] lustre 2.10.5 or 2.11.0

2018-10-17 Thread Kurt Strosahl
Good Afternoon, I'm in the early planning stages of a lustre upgrade. We are going to be moving from 2.5 to either 2.10 or 2.11, possibly by standing up a new lustre file system alongside the existing one and migrating the data over. I'm wondering if anyone has had specific experiences

Re: [lustre-discuss] Lustre traffic slow on OPA fabric network

2018-07-12 Thread Kurt Strosahl
do except for the map_on_demand (which our system defaults to 256). w/r, Kurt - Original Message - From: "Robin Humble" To: "Kurt Strosahl" Cc: lustre-discuss@lists.lustre.org Sent: Tuesday, July 10, 2018 5:03:30 AM Subject: Re: [lustre-discuss] Lustre traffi

[lustre-discuss] Lustre traffic slow on OPA fabric network

2018-07-03 Thread Kurt Strosahl
Good Afternoon, I've been seeing a great deal of slowness from clients on an OPA network accessing lustre through lnet routers. The nodes take very long to complete things like lfs df, and show lots of dropped / reestablished connections. The OSS systems show this as well, and

Re: [lustre-discuss] luster 2.10.3 lnetctl configurations not persisting through reboot

2018-04-17 Thread Kurt Strosahl
re wiki: http://wiki.lustre.org/Dynamic_LNet_Configuration_and_lnetctl Alex. On 4/17/18, 3:37 PM, "lustre-discuss on behalf of Kurt Strosahl" <lustre-discuss-boun...@lists.lustre.org on behalf of stros...@jlab.org> wrote: I configured an lnet router today with luster 2.10.3 as the lustr

[lustre-discuss] luster 2.10.3 lnetctl configurations not persisting through reboot

2018-04-17 Thread Kurt Strosahl
I configured an lnet router today with luster 2.10.3 as the lustre software. I then connfigured the lnet router using the following lnetctl commands lnetctl lnet configure lnetctl net add --net o2ib0 --if ib1 lnetctl net add --net o2ib1 --if ib0 lnetctl set routing 1 When I rebooted the

[lustre-discuss] lnet rounter compatibility with different versions of Luster

2018-04-11 Thread Kurt Strosahl
Good Afternoon, I'm presently in the process of upgrading the OS on my lnet routers. I was wondering if there were any interoprability issues between 2.5 (clients and servers) and 2.10 (the version of luster I'm looking to run on the routers). w/r, Kurt J. Strosahl System Administrator:

[lustre-discuss] Issue running IOR

2016-09-02 Thread Kurt Strosahl
Good Afternoon, I'd asked earlier about what other bandwidth tests could be run since I'd encountered an issue with IOR, and it was suggested that I ask here about the problem I'm having with IOR. The setup is as follows... We have a cluster of Intel KNL nodes that communicate over an

[lustre-discuss] benchmarking the lustre file system

2016-08-24 Thread Kurt Strosahl
Good Morning I've recently encountered an issue with IOR and was wondering if there were any other file system benchmarking tools that can be used to test reads / writes to a lustre file system from multiple nodes. w/r, Kurt ___ lustre-discuss

Re: [lustre-discuss] Possible bug in file size reporting

2016-04-22 Thread Kurt Strosahl
Please ignore... it looks like the user was creating sparse files. ll -alsh total 12M 878K -rw-r--r-- 1 4.1E Apr 22 01:48 file1 878K -rw-r--r-- 1 6.8E Apr 22 01:32 file2 582K -rw-r--r-- 1 544P Apr 21 15:34 file3 w/r, Kurt - Original Message - From: "Kurt Strosahl" <stro

[lustre-discuss] Possible bug in file size reporting

2016-04-22 Thread Kurt Strosahl
Good Morning, We just came across an interesting issue... three files were created last night that ls reports as having a size in the exabytes (though du shows as being the proper size). ll total 12198 -rw-r--r-- 1 4647714819775210200 Apr 22 01:48 file1 -rw-r--r-- 1 7810701297118485272

[lustre-discuss] Lustreerror: Error -2 syncing data on lock cancel

2016-03-22 Thread Kurt Strosahl
Good Morning, Recently in my test environment I've seen the following error on the oss: Error -2 syncing data on lock cancel. At the time there was only one client mounting the test lustre file system, and the only process running was a compilation of gcc, so there was virtually no activity

[lustre-discuss] Follow up to OST not connecting to secondary mgs on initial startup

2016-03-03 Thread Kurt Strosahl
After some digging it looks like this may be related to https://jira.hpdd.intel.com/browse/LU-3829 Respectfully, Kurt J. Strosahl System Administrator Scientific Computing Group, Thomas Jefferson National Accelerator Facility ___ lustre-discuss mailing

[lustre-discuss] OST not connecting to secondary mgs on initial startup

2016-03-03 Thread Kurt Strosahl
Good Morning, I'm trying to add a new ost to a lustre file system, one in which the mgs is presently on its failover node. It looks like the ost does not failover and connect to the mgs. Lustre: Lustre: Build Version: 2.5.3-RC1--PRISTINE-2.6.32-431.23.3.el6_lustre.x86_64 Lustre:

[lustre-discuss] The difference between failnode and servicenode

2016-03-01 Thread Kurt Strosahl
Hello, I'm curious as to the difference between the --failnode and --servicenode options used by mkfs.lustre. On an OST these options are used to for ost failover, not mgs failover (which is done through the --mgsnode parameter), but what is the difference between a failnode and a service

[lustre-discuss] qos_threshold_rr not working after mds failover

2016-02-17 Thread Kurt Strosahl
Good Morning, I've recently encountered an issue with the qos_threshold_rr. About a month ago we removed an OST from our system, and the qos_threshold_rr kept working, however we had to do a failover of our mds... and from that point on the system stopped allocating files using the

Re: [lustre-discuss] Inactivated ost still showing up on the mds

2016-02-02 Thread Kurt Strosahl
essage - From: "Sean Brisbane" <sean.brisb...@physics.ox.ac.uk> To: "Kurt Strosahl" <stros...@jlab.org>, "aik" <a...@fnal.gov> Cc: "<lustre-discuss@lists.lustre.org>" <lustre-discuss@lists.lustre.org> Sent: Tuesday, February 2,

[lustre-discuss] Large directory Feature in lustre 2.5.3

2016-01-28 Thread Kurt Strosahl
Hello, Recently this message appeard in the metadata system on our lustre 2.5.3 file system. Is there a way to enable this feature in 2.5.3? Jan 28 10:53:56 scmds14a kernel: LDISKFS-fs warning (device dm-1): ldiskfs_dx_add_entry: Directory (ino: 111675137) index full, reach max htree

Re: [lustre-discuss] Inactivated ost still showing up on the mds

2016-01-26 Thread Kurt Strosahl
Kulyavtsev To: Kurt Strosahl <stros...@jlab.org> Cc: Alexander I Kulyavtsev , <lustre-discuss@lists.lustre.org> <lustre-discuss@lists.lustre.org> Sent: Tue, 26 Jan 2016 13:23:20 -0500 (EST) Subject: Re: [lustre-discuss] Inactivated ost still showing up on the mds Hi Kurt, probab

Re: [lustre-discuss] Inactivated ost still showing up on the mds

2016-01-22 Thread Kurt Strosahl
the lifetime of a file system (due to system retirement or hardware failure). w/r, Kurt - Original Message - From: "Kurt Strosahl" <stros...@jlab.org> To: "Sean Brisbane" <sean.brisb...@physics.ox.ac.uk> Cc: "Chris Hunter" <chunt...@gmail.com>, lustr

Re: [lustre-discuss] Inactivated ost still showing up on the mds

2016-01-22 Thread Kurt Strosahl
- Original Message - From: "Sean Brisbane" <sean.brisb...@physics.ox.ac.uk> To: "Kurt Strosahl" <stros...@jlab.org>, "Chris Hunter" <chunt...@gmail.com> Cc: lustre-discuss@lists.lustre.org Sent: Friday, January 22, 2016 4:33:41 AM Subject: RE: Inactivated

Re: [lustre-discuss] Inactivated ost still showing up on the mds

2016-01-21 Thread Kurt Strosahl
Message - From: "Chris Hunter" <chunt...@gmail.com> To: lustre-discuss@lists.lustre.org Cc: "Kurt Strosahl" <stros...@jlab.org> Sent: Thursday, January 21, 2016 12:50:03 PM Subject: [lustre-discuss] Inactivated ost still showing up on the mds Hi Kurt, For referen

Re: [lustre-discuss] Inactivated ost still showing up on the mds

2016-01-21 Thread Kurt Strosahl
Is that in lustre 2.5.3? the lctl --device=xx deactivate is what sets an ost to read-only mode, it doesn't permanently disable int in the system... or have I missed something? w/r, Kurt - Original Message - From: "Chris Hunter" <chunt...@gmail.com> To: "

[lustre-discuss] Inactivated ost still showing up on the mds

2016-01-20 Thread Kurt Strosahl
All, Continuing the issues that I reported yesterday... I found that by unlinking lost files that I was able to stop the below error from occurring, this gives me hope that systems will stop crashing once all the lost files are scrubbed. LustreError:

[lustre-discuss] strange lustre issues following removal of an OST

2016-01-19 Thread Kurt Strosahl
All, On Monday morning we had to remove an OST due to the failure of the underlying zpool. I set the lazystatfs option on the mds, and everything seemed to be ok. However now we, after rebooting a node, are seeing the below errors: Jan 19 16:59:42 ifarm1401 kernel: LustreError:

Re: [lustre-discuss] strange lustre issues following removal of an OST

2016-01-19 Thread Kurt Strosahl
<<< text/html; charset=utf-8: Unrecognized >>> ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Re: [lustre-discuss] zfs and luster 2.5.3.90

2016-01-15 Thread Kurt Strosahl
om> wrote: > On 2016/01/12, 14:21, "lustre-discuss on behalf of Kurt Strosahl" > <lustre-discuss-boun...@lists.lustre.org on behalf of stros...@jlab.org> > wrote: > > >Hello, > > > >What is the highest version of zfs supported by lustre 2.5.3.90? >

[lustre-discuss] Lustre oss failover with ZFS

2015-10-27 Thread Kurt Strosahl
Hello, I'm working on setting up a pair of oss systems that should be able to take over for eachother should one of them fail (they both have access to the same storage arrays). I'm using lustre 2.5.3 with zfs as the back end on a pair of CentOS 6.5 systems. Respectfully, Kurt J.

Re: [lustre-discuss] lustre client server interoperability

2015-08-11 Thread Kurt Strosahl
So is there a stable client for centos 7 that is backwards compatible with 2.5.3? w/r, Kurt - Original Message - From: Patrick Farrell p...@cray.com To: Kurt Strosahl stros...@jlab.org, lustre-discuss@lists.lustre.org Sent: Monday, August 10, 2015 4:24:15 PM Subject: RE: lustre client

[lustre-discuss] lustre client server interoperability

2015-08-10 Thread Kurt Strosahl
Hello, Is the 2.7 lustre client compatible with lustre 2.5.3 servers? I'm running a 2.5.3 system lustre system, but have been asked by a few people about upgrading some of our clients to CentOS 7 (which appears to need a 2.7 or greater client). w/r, Kurt J. Strosahl System Administrator

Re: [lustre-discuss] lustre-discuss Digest, Vol 112, Issue 30

2015-07-29 Thread Kurt Strosahl
Hi Massimo, This sounds exactly like the issue I encountered over a month ago with my lustre 2.5.3 system. The quick solution I found was to set the qos_threshold_rr to 100% (so flat round robin, not weighted). However that causes a problem where osts would go over 90% while others were

[lustre-discuss] Consequences of reducing the number of service threads (lustre 2.5.3)

2015-07-22 Thread Kurt Strosahl
Hello, Several of my oss systems have been experiencing exceptionally high load, and I was looking at reducing the number of service threads to mitigate this... but I was wondering if that would reduce the systems overall performance. I was also wondering if adding more ram to the system

Re: [lustre-discuss] A quick question about reusing osts (lustre 2.5.3)

2015-07-21 Thread Kurt Strosahl
or massive corruption). This avoids having the target try to register as a new target with the MGS. w/r, Kurt - Original Message - From: Ben Evans bev...@cray.com To: Kurt Strosahl stros...@jlab.org, lustre-discuss@lists.lustre.org Sent: Tuesday, July 21, 2015 10:17:47 AM Subject

[lustre-discuss] A quick question about reusing osts (lustre 2.5.3)

2015-07-21 Thread Kurt Strosahl
Hello, I had a quick question about recreating osts... If I drain all the files off an ost can I just reformat it and have it added back into lustre, in essence reusing the same index? The server wouldn't change. Or would I have to preserve its configuration files? w/r, Kurt

Re: [lustre-discuss] lustre 2.5.3 ost not draining

2015-07-14 Thread Kurt Strosahl
fragmentation? w/r, Kurt - Original Message - From: aik a...@fnal.gov To: Kurt Strosahl stros...@jlab.org Cc: aik a...@fnal.gov, Sean Brisbane sean.brisb...@physics.ox.ac.uk, lustre-discuss@lists.lustre.org Sent: Monday, July 13, 2015 8:20:31 PM Subject: Re: [lustre-discuss] lustre 2.5.3

Re: [lustre-discuss] lustre 2.5.3 ost not draining

2015-07-12 Thread Kurt Strosahl
- From: Sean Brisbane sean.brisb...@physics.ox.ac.uk To: Kurt Strosahl stros...@jlab.org Cc: Shawn Hall shawn.h...@nag.com, lustre-discuss@lists.lustre.org Sent: Sunday, July 12, 2015 5:13:07 AM Subject: RE: [lustre-discuss] lustre 2.5.3 ost not draining Hi Kurt, I was following the recommendation

Re: [lustre-discuss] lustre 2.5.3 ost not draining

2015-07-11 Thread Kurt Strosahl
Thanks, I'll have to see if I can run this test myself. Did you notice if the inactive status persisted through the unmount/remount? w/r, Kurt - Original Message - From: Sean Brisbane sean.brisb...@physics.ox.ac.uk To: Kurt Strosahl stros...@jlab.org, Shawn Hall shawn.h...@nag.com

[lustre-discuss] lustre 2.5.3 ost not draining

2015-07-10 Thread Kurt Strosahl
Good Morning, Earlier in the week some of the OSTs in the lustre 2.5.3 system I'm managing hit 97% full. I set them to read only, and yesterday I started an lfs_migrate on one of them, hoping to drain it down overnight. This morning, after the system spent about 12 hours moving files off

Re: [lustre-discuss] lustre 2.5.3 ost not draining

2015-07-10 Thread Kurt Strosahl
Brisbane sean.brisb...@physics.ox.ac.uk To: Kurt Strosahl stros...@jlab.org Cc: Patrick Farrell p...@cray.com, lustre-discuss@lists.lustre.org lustre-discuss@lists.lustre.org Sent: Friday, July 10, 2015 11:04:27 AM Subject: RE: [lustre-discuss] lustre 2.5.3 ost not draining Dear Kurt, Apologies. After

Re: [lustre-discuss] lustre 2.5.3 ost not draining

2015-07-10 Thread Kurt Strosahl
Will that let deletes happen against it? w/r, Kurt - Original Message - From: aik a...@fnal.gov To: Kurt Strosahl stros...@jlab.org Cc: aik a...@fnal.gov, Sean Brisbane sean.brisb...@physics.ox.ac.uk, lustre-discuss@lists.lustre.org Sent: Friday, July 10, 2015 11:52:00 AM Subject: Re

Re: [lustre-discuss] lustre 2.5.3 ost not draining

2015-07-10 Thread Kurt Strosahl
Unfortunately... due to https://jira.hpdd.intel.com/browse/LU-5778, I'm stuck at just a flat round robin. - Original Message - From: Shawn Hall shawn.h...@nag.com To: Kurt Strosahl stros...@jlab.org, Sean Brisbane sean.brisb...@physics.ox.ac.uk Cc: lustre-discuss@lists.lustre.org Sent

Re: [lustre-discuss] lustre 2.5.3 ost not draining

2015-07-10 Thread Kurt Strosahl
it offline again using the above command to prevent it from reaching 100% w/r, Kurt - Original Message - From: Patrick Farrell p...@cray.com To: Kurt Strosahl stros...@jlab.org, lustre-discuss@lists.lustre.org Sent: Friday, July 10, 2015 9:00:38 AM Subject: RE: lustre 2.5.3 ost

Re: [lustre-discuss] lustre 2.5.3 ost not draining

2015-07-10 Thread Kurt Strosahl
I'd have the advantage, as my moves would be targeted directly to the ost, while the other writes would just land where ever they could. w/r, Kurt - Original Message - From: Shawn Hall shawn.h...@nag.com To: Kurt Strosahl stros...@jlab.org, Sean Brisbane sean.brisb...@physics.ox.ac.uk Cc

Re: [lustre-discuss] lustre 2.5.3 ost not draining

2015-07-10 Thread Kurt Strosahl
Thanks, But I don't see that command in the lfs documentaiton. w/r, Kurt - Original Message - From: Shawn Hall shawn.h...@nag.com To: Kurt Strosahl stros...@jlab.org Cc: Sean Brisbane sean.brisb...@physics.ox.ac.uk, lustre-discuss@lists.lustre.org Sent: Friday, July 10, 2015 3:27:31

Re: [lustre-discuss] lustre 2.5.3 ost not draining

2015-07-10 Thread Kurt Strosahl
The problem there is that I cannot afford to leave it some number of days... it is at 97% full, so new writes are going to it faster then it can clean itself off. w/r, Kurt - Original Message - From: Sean Brisbane sean.brisb...@physics.ox.ac.uk To: Patrick Farrell p...@cray.com, Kurt

[lustre-discuss] lustre 2.5.4 release?

2015-06-28 Thread Kurt Strosahl
Hello? I'm running lustre 2.5.3, and was wondering when the release would move to lustre 2.5.4? The rpms from https://downloads.hpdd.intel.com/public/lustre/latest-maintenance-release/el6/server/RPMS/x86_64/ are almost a year old. respectfully, Kurt J. Strosahl System Administrator

[lustre-discuss] lustre 2.5.3 changelog question

2015-06-18 Thread Kurt Strosahl
Good Morning, A quick question regarding the changelog... Lets say that I have a directory A with some files in it. I rename the directory using the mv command. Will the changelog only record the change in the directory, or will the changelog record a change for all the files within the

[lustre-discuss] writes not going to new osts on new oss (lustre 2.5.3)

2015-06-17 Thread Kurt Strosahl
Good Morning, An odd issue has come up with my new lustre 2.5.3 system. I added a new oss yesterday with three osts, but the mds doesn't seem to want to write to them. I've forced it to do so by setting stripes on test directories and writting data there. I've also (since the lustre

Re: [lustre-discuss] writes not going to new osts on new oss (lustre 2.5.3)

2015-06-17 Thread Kurt Strosahl
%, after echoing 100 into it the value went to (predictably) 100%. w/r, Kurt - Original Message - From: Kurt Strosahl stros...@jlab.org To: lustre-discuss@lists.lustre.org Sent: Wednesday, June 17, 2015 7:22:04 AM Subject: writes not going to new osts on new oss (lustre 2.5.3) Good Morning

[lustre-discuss] lustre2 cannot set ost to read only

2015-06-15 Thread Kurt Strosahl
Hello, I've been following the procedure for setting a lustre ost to read only, but it doesn't seem to be working. I want to set this ost to read only (as seen on the mds): lctl dl | grep OST000e 23 UP osp lustre2-OST000e-osc-MDT lustre2-MDT-mdtlov_UUID 5 lctl --device 23

Re: [lustre-discuss] lustre2 cannot set ost to read only

2015-06-15 Thread Kurt Strosahl
An update, Even though the MDT isn't showing the ost as inactive the ost isn't getting new writes. Further the high load on the system that prompted me to set the osts on it offline has gone away. So the command certainly seems to be working, it is just that the lctl command isn't showing

[lustre-discuss] Subject: Re: lustre2 cannot set ost to read only

2015-06-15 Thread Kurt Strosahl
Thanks, At first I was a bit concerned that the ost wasn't set to read only, but after I observed it for a bit I realized that it was just not being reported as read only. I agree that it is quite annoying... as, under lustre 1.8 I had the mds occasionally set osts to read only (discovered

Re: [lustre-discuss] upgrading zfs back end file system lustre doesn't mount anymore

2015-06-09 Thread Kurt Strosahl
. But I'm fairly confident that the data will be safe. w/r, Kurt - Original Message - From: Mohr rm...@utk.edu To: Kurt Strosahl stros...@jlab.org Cc: lustre-discuss@lists.lustre.org Sent: Monday, June 8, 2015 4:31:38 PM Subject: Re: [lustre-discuss] upgrading zfs back end file system

[lustre-discuss] upgrading zfs back end file system lustre doesn't mount anymore

2015-06-08 Thread Kurt Strosahl
Hello, I recently encountered an issue with zfs on linux that required me to upgrade to a newer version. Now I'm trying to patch my oss systems. I've got a new oss (has never had its osts mounted) that I'm using to test the upgrade process, but now that I've installed the new zfs modules

Re: [lustre-discuss] upgrading zfs back end file system lustre doesn't mount anymore

2015-06-08 Thread Kurt Strosahl
Strosahl stros...@jlab.org Cc: lustre-discuss@lists.lustre.org Sent: Monday, June 8, 2015 4:13:39 PM Subject: Re: [lustre-discuss] upgrading zfs back end file system lustre doesn't mount anymore On Jun 8, 2015, at 3:41 PM, Kurt Strosahl stros...@jlab.org wrote: snip osd_zfs

[lustre-discuss] lustre 2.5.3 and lctl abort_recovery

2015-06-07 Thread Kurt Strosahl
Hello, We had a lustre oss that had to be rebooted, and one of its osts has now been in recovery mode for almost 12 hours. It is working though its list of clients to recover... but while poking around I noticed the lctl abort_recovery command. Can this command be run on clients, or does

Re: [lustre-discuss] lustre 2.5.3 and lctl abort_recovery

2015-06-07 Thread Kurt Strosahl
. Respectfully, Kurt - Original Message - From: Murshid Azman murshid.az...@gmail.com To: Kurt Strosahl stros...@jlab.org, lustre-discuss@lists.lustre.org Sent: Sunday, June 7, 2015 7:16:52 AM Subject: Re: [lustre-discuss] lustre 2.5.3 and lctl abort_recovery On the oss. lctl --device ost

[lustre-discuss] lustre 2.5.3 changelog mask question

2015-05-29 Thread Kurt Strosahl
Good Morning, I'm trying to set the changelog mask on a lustre 2.5.3 system and it seems that the documentation isn't 100% accurate. According to the documentation found at https://build.hpdd.intel.com/job/lustre-manual/lastSuccessfulBuild/artifact/lustre_manual.pdf one of the values is

Re: [lustre-discuss] lustre 2.5.3 changelog mask question

2015-05-29 Thread Kurt Strosahl
OK, Good to know. But someone over at Intel needs to update that documentation. w/r, Kurt - Original Message - From: Olaf P. Faaland faala...@llnl.gov To: Kurt Strosahl stros...@jlab.org, lustre-discuss@lists.lustre.org Sent: Friday, May 29, 2015 11:04:17 AM Subject: Re: [lustre

Re: [lustre-discuss] Issue with lnet showing network down lustre 2.5.3

2015-05-28 Thread Kurt Strosahl
Hi, The problems persisted through a reboot. w/r, Kurt - Original Message - From: Mohr rm...@utk.edu To: Kurt Strosahl stros...@jlab.org Cc: lustre-discuss@lists.lustre.org Sent: Thursday, May 28, 2015 10:00:27 AM Subject: Re: [lustre-discuss] Issue with lnet showing network down lustre

Re: [lustre-discuss] Issue with lnet showing network down lustre 2.5.3

2015-05-28 Thread Kurt Strosahl
- Original Message - From: Edward Wahl ew...@osc.edu To: Kurt Strosahl stros...@jlab.org, lustre-discuss@lists.lustre.org Sent: Thursday, May 28, 2015 10:40:48 AM Subject: RE: Issue with lnet showing network down lustre 2.5.3 Were you loading both lustre and lnet with a modprobe/insmod? Try just

Re: [lustre-discuss] disecting changelog lustre2.5.3

2015-05-27 Thread Kurt Strosahl
Thanks, that works perfectly: sudo lfs fid2path /lustre2 0x21d41:0x19f7:0x0 - Original Message - From: Patrick Farrell p...@cray.com To: Kurt Strosahl stros...@jlab.org, lustre-discuss@lists.lustre.org Sent: Wednesday, May 27, 2015 3:51:31 PM Subject: RE: disecting changelog

[lustre-discuss] disecting changelog lustre2.5.3

2015-05-27 Thread Kurt Strosahl
Hello, I activated the changelog and allowed it to collect some data, and I'm wondering how to interpret part of it. 7270731 01CREAT 19:27:48.153432113 2015.05.27 0x0 t=[0x21d41:0x19f7:0x0] p=[0x213a1:0x9:0x0] flow.xml.14088 Is there a way to tie the target and parent fids back to

Re: [lustre-discuss] Exporting a lustre mounted directory via nfs

2015-05-20 Thread Kurt Strosahl
Unfortunately we are stuck with the 1.8 client as long as the lustre 1.8 system exists (we are transitioning, so both systems have to be mounted during the data migration). Kurt J. Strosahl System Administrator Scientific Computing Group, Thomas Jefferson National Accelerator Facility

[lustre-discuss] Exporting a lustre mounted directory via nfs

2015-05-20 Thread Kurt Strosahl
Good Afternoon, I'm attempting to use nfs to export a lustre mount point on a client box (essentially acting as a gateway for systems that don't have the lustre client). I've mounted lustre, and added it to the nfs exports file (it shows up as exported) but when I try to mount the nfs

Re: [lustre-discuss] Exporting a lustre mounted directory via nfs

2015-05-20 Thread Kurt Strosahl
Sorry, left off some important info... the system is rhel5, with client 1.8.7... the lustre file system is 2.5.3 (the system already exports a 1.8.9 lustre file system). w/r, Kurt - Original Message - From: Kurt Strosahl stros...@jlab.org To: lustre-discuss@lists.lustre.org Sent

Re: [lustre-discuss] lustre issue with OST setting to read-only mode as soon as writes are attempted. using Lustre 1.8.8

2015-05-11 Thread Kurt Strosahl
: Colin Faber cfa...@gmail.com To: Kurt Strosahl stros...@jlab.org Cc: lustre-discuss@lists.lustre.org Sent: Thursday, May 7, 2015 5:05:06 PM Subject: Re: [lustre-discuss] lustre issue with OST setting to read-only mode as soon as writes are attempted. using Lustre 1.8.8 Hi Kurt, What's e2fsck

  1   2   >