Re: [lustre-discuss] lnetctl fails to recreate exact config when importing exported lnet.conf

2020-09-04 Thread Mohr Jr, Richard Frank
> On Sep 4, 2020, at 11:26 AM, Angelos Ching > wrote: > > If I don't add the "Lnet router + Server" peers manually as multi-rail > enabled peer before route add, a non-multi-rail > peer with only TCP NID would be added by the route add command for the "Lnet > router + Server" (as seen in

Re: [lustre-discuss] lnetctl fails to recreate exact config when importing exported lnet.conf

2020-09-04 Thread Mohr Jr, Richard Frank
> On Sep 4, 2020, at 12:11 AM, Angelos Ching > wrote: > > All steps below carried out on Lustre client: > > 1. Restart lnet service with empty /etc/lnet.conf > 2. lnetctl net add: TCP network using Ethernet > 3. lnetctl peer add: 2 peers with "Lnet router + server"@o2ib,tcp NIDs The

Re: [lustre-discuss] systemd lnet/rdma conflict

2020-08-04 Thread Mohr Jr, Richard Frank
> On Jul 17, 2020, at 2:20 PM, Mohr Jr, Richard Frank wrote: > > > >> On Jul 17, 2020, at 1:41 PM, Andreas Dilger wrote: >> >> >> Rick, >> would you be able to put this in the form of a patch against >> lustre/scripts/systemd/lnet.serv

Re: [lustre-discuss] systemd lnet/rdma conflict

2020-07-17 Thread Mohr Jr, Richard Frank
be interested in getting verification from Chris (or someone else) that it works just to make sure this isn’t something that is only working for me. Rick > > >> On Jul 16, 2020, at 2:34 PM, Mohr Jr, Richard Frank wrote: >>> On Jul 16, 2020, at 2:46 PM, Christopher B

Re: [lustre-discuss] systemd lnet/rdma conflict

2020-07-16 Thread Mohr Jr, Richard Frank
> On Jul 16, 2020, at 2:46 PM, Christopher Benjamin Coffey > wrote: > > > I'm trying to get lustre , and rdma setup on an el8 system. I can't get > systemd to get the two services: lnet, and rdma shutdown correctly without > hanging the system. I've tried many things in the rdma.service,

Re: [lustre-discuss] OST Mount Error

2020-06-01 Thread Mohr Jr, Richard Frank
It looks like the writeconf flag is set on the ost you are trying to mount. Did you completely replace the ost with a newly formatted ost? Or did you set the writeconf flag on the existing ost? The writeconf flag is an indicator for lustre to regenerated configuration logs, but it needs to

Re: [lustre-discuss] Group and Project quota enforcement semantics

2020-04-23 Thread Mohr Jr, Richard Frank
> On Apr 23, 2020, at 4:11 PM, Adesanya, Adeyemi wrote: > > The Red Hat documentation suggests that project and group quotas in XFS are > “mutually exclusive”: > https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6/html/storage_administration_guide/xfsquota > > It would

Re: [lustre-discuss] confused about mdt space

2020-04-01 Thread Mohr Jr, Richard Frank
> On Apr 1, 2020, at 10:07 AM, Mohr Jr, Richard Frank wrote: > > > >> On Apr 1, 2020, at 3:55 AM, 肖正刚 wrote: >> >> For " the recent lustre versions use a 1KB inode size by default and the >> default format options create 1 inodes for every 2.5

Re: [lustre-discuss] confused about mdt space

2020-04-01 Thread Mohr Jr, Richard Frank
> On Apr 1, 2020, at 3:55 AM, 肖正刚 wrote: > > For " the recent lustre versions use a 1KB inode size by default and the > default format options create 1 inodes for every 2.5 KB of MDT space" : > I checked the inode size is 1KB and in my online systems, as you said , > about 40~41% of mdt

Re: [lustre-discuss] Files hanging on lustre clients

2020-03-31 Thread Mohr Jr, Richard Frank
> On Mar 31, 2020, at 3:43 PM, Kurt Strosahl wrote: > > I can't tell, any commands I run against the files in question hang > indefinitely. It seems very suspicious though. The fact that the same OST appeared in error messages on two different clients made me think the problem might be

Re: [lustre-discuss] Files hanging on lustre clients

2020-03-31 Thread Mohr Jr, Richard Frank
> On Mar 31, 2020, at 2:36 PM, Kurt Strosahl wrote: > > an strace on an ls command run against some of these files produced the > following: > getxattr("/volatile/halld/home/haoli/RunPeriod-2017-01/analysis/ver36_Mar27/log/030408/stdout.030408_124.out", > "system.posix_acl_default", NULL, 0)

Re: [lustre-discuss] confused about mdt space

2020-03-31 Thread Mohr Jr, Richard Frank
> On Mar 30, 2020, at 10:56 PM, 肖正刚 wrote: > > Hello, I have some question about metadata space. > 1) I have ten 960GB SAS SSDs for mdt,after done raid10,we have 4.7TB space > free. > after formated as mdt,we only have 2.6TB space free; so where the 2.1TB space > go ? > 2) for the 2.6TB

Re: [lustre-discuss] old Lustre 2.8.0 panic'ing continously

2020-03-05 Thread Mohr Jr, Richard Frank
> On Mar 5, 2020, at 2:48 AM, Torsten Harenberg > wrote: > > [QUOTA WARNING] Usage inconsistent for ID 2901:actual (757747712, 217) > != expected (664182784, 215) I assume you are running ldiskfs as the backend? If so, have you tried regenerating the quota info for the OST? I believe the

Re: [lustre-discuss] enable quota enforcement on the fly?

2020-02-18 Thread Mohr Jr, Richard Frank
> On Feb 17, 2020, at 2:42 PM, Liam Forbes wrote: > > We recently noticed we apparently did not enable group quota enforcement > early last year during the most recent rebuild of our Lustre filesystem. Is > it possible to do so on the fly, or is it better/required for the filesystem > to be

Re: [lustre-discuss] 8TiB LDISKFS MDT

2019-10-15 Thread Mohr Jr, Richard Frank
> On Oct 15, 2019, at 9:52 AM, Tamas Kazinczy > wrote: > > With defaults (1024 for inode size and 2560 for inode ratio) I get only 4,8T > usable space. With those values, an inode is created for every 2560 bytes of MDT space. Since the inode is 1024 bytes, that leaves (2560 - 1024) = 1536

Re: [lustre-discuss] changing inode size on MDT

2019-10-02 Thread Mohr Jr, Richard Frank
> On Oct 2, 2019, at 3:45 PM, Hebenstreit, Michael > wrote: > > and I'd like to use --mkfsoptions='-i 1024' to have more inodes in the MDT. > We already run out of inodes on that FS (probably due to an ZFS bug in early > IEEL version) - so I'd like to increase #inodes if possible I don’t

Re: [lustre-discuss] changing inode size on MDT

2019-10-02 Thread Mohr Jr, Richard Frank
> On Oct 2, 2019, at 1:08 PM, Hebenstreit, Michael > wrote: > > Could anyone point out to me what the downside of having an inode size of 1k > on the MDT would be (compared to the 4k default)? Are you talking about the inode size, or the “-i” option to mkfs.lustre (which actually controls

Re: [lustre-discuss] [Urgent] Multiple issues in Lustre 2.12.2.

2019-08-27 Thread Mohr Jr, Richard Frank
> On Aug 27, 2019, at 3:35 AM, Udai Sharma wrote: > > > Hello Team, > I am facing multiple issues when I configure Lustre in clustered environment > with multiple OST in HA-LVM and one MGS and MDT server each. > > Issues: > 1. OST00* are going to INACTIVE state if the corresponding disk in

Re: [lustre-discuss] Cannot mount from Lustre from client any longer

2019-06-27 Thread Mohr Jr, Richard Frank
> On Jun 27, 2019, at 8:16 AM, Miguel Santos Novoa wrote: > > For the last couple of weeks we have been adding and removing OSTs, and we > were also doing tests with a client using Lustre version 2.12, which this > seems our main hypothesis of the problem. We are not sure what is causing >

Re: [lustre-discuss] Stop writes for users

2019-05-14 Thread Mohr Jr, Richard Frank (Rick Mohr)
> On May 13, 2019, at 6:51 PM, Fernando Pérez wrote: > > Is there a way to stop file writes for all users or for groups without using > quotes? > > We have a lustre filesystem with corrupted quotes and I need to stop the > write for all users (or for some users). There are ways to

Re: [lustre-discuss] PFL not working on 2.10 client

2019-05-01 Thread Mohr Jr, Richard Frank (Rick Mohr)
2.10.6 or 2.10.7 to fix this problem. > > Cheers, Andreas > > On Apr 22, 2019, at 15:15, Mohr Jr, Richard Frank wrote: >> >> >> I was trying to play around with some PFL layout today, and I ran into an >> issue. I have a file system running Lustre 2.10.6 and a cl

[lustre-discuss] PFL not working on 2.10 client

2019-04-22 Thread Mohr Jr, Richard Frank (Rick Mohr)
I was trying to play around with some PFL layout today, and I ran into an issue. I have a file system running Lustre 2.10.6 and a client with 2.10.0 installed. I created a PFL with this command: [rfmohr@sip-login1 rfmohr]$ lfs setstripe -E 4M -c 2 -E 100M -c 4 comp_file It did not return

Re: [lustre-discuss] lfsck repair quota

2019-04-17 Thread Mohr Jr, Richard Frank (Rick Mohr)
> On Apr 17, 2019, at 4:32 AM, Fernando Perez wrote: > > I tried to run the e2fsck in the mdt three years ago and the logs shows a lot > of this kind of messages: > >> Unattached inode 26977505 >> Connect to /lost+found? yes >> Inode 26977505 ref count is 2, should be 1. Fix? yes > > In fact

Re: [lustre-discuss] unable to install lustre clients on Centos 7.6 with MLNX_OFED_LINUX-4.5-1.0.1.0

2019-04-16 Thread Mohr Jr, Richard Frank (Rick Mohr)
Which RPMs did you download? The ones from the /public/lustre/lustre-2.10.7 directory, or the ones from /public/lustre/lustre-2.10.7-ib? The former are built with support for in-kernel IB, and the latter are for MOFED. If you downloaded the latter, did you install MOFED yourself, or did you

Re: [lustre-discuss] lfsck repair quota

2019-04-16 Thread Mohr Jr, Richard Frank (Rick Mohr)
> On Apr 16, 2019, at 10:24 AM, Fernando Pérez wrote: > > According to the lustre wiki I though that the lfsck could repair corrupted > quotes: > > http://wiki.lustre.org/Lustre_Quota_Troubleshooting Keep in mind that page is a few years old, but I assume they were referring to LFSCK Phase

Re: [lustre-discuss] lfsck repair quota

2019-04-16 Thread Mohr Jr, Richard Frank (Rick Mohr)
> On Apr 15, 2019, at 10:54 AM, Fernando Perez wrote: > > Could anyone confirm me that the correct way to repair wrong quotes in a > ldiskfs mdt is lctl lfsck_start -t layout -A? As far as I know, lfsck doesn’t repair quota info. It only fixes internal consistency within Lustre. Whenever I

Re: [lustre-discuss] inodes not adding up

2019-04-15 Thread Mohr Jr, Richard Frank (Rick Mohr)
> On Apr 13, 2019, at 4:57 AM, Youssef Eldakar wrote: > > For one Lustre filesystem, inode count in the summary is notably less than > what the individual OST inode counts would add up to: The first thing to understand is that every Lustre file will consume one inode on the MDT, and this

Re: [lustre-discuss] how to erase lustre filesystem

2019-04-04 Thread Mohr Jr, Richard Frank (Rick Mohr)
You might need to clarify what you mean by “erase” the file system. The procedure in the manual is intended for reformatting MDTs/OSTs that had previously been formatted for lustre. I don’t think it actually erases data in these sense of overwriting existing data with zeros (or something

Re: [lustre-discuss] Tools for backing up a ZFS MDT

2019-03-29 Thread Mohr Jr, Richard Frank (Rick Mohr)
This presentation from LUG 2017 might be useful for you: http://cdn.opensfs.org/wp-content/uploads/2017/06/Wed06-CroweTom-lug17-ost_data_migration_using_ZFS.pdf It shows how ZFS send/receive can be used to migrate data between OSTs. I used it as a reference when I worked with another admin to

[lustre-discuss] Using lfs migrate to move files between MDTs

2019-03-29 Thread Mohr Jr, Richard Frank (Rick Mohr)
I have been playing a little bit with DNE today, and I had a question about some odd behavior I saw regarding inode counts. My Lustre 2.10.6 file system has 2 MDTs. I created a directory (which by default resides on MDT0) and then created 10 files in that directory: [root@sip-mgmt2 test]#

Re: [lustre-discuss] Error with project quotas on 2.10.6

2019-03-26 Thread Mohr Jr, Richard Frank (Rick Mohr)
> On Mar 20, 2019, at 1:24 PM, Peter Jones wrote: > > If it's not in the manual then it should be. Could you please open an LUDOC > ticket to track getting this corrected if need be? Done. https://jira.whamcloud.com/browse/LUDOC-435 -- Rick Mohr Senior HPC System Administrator National

Re: [lustre-discuss] Error with project quotas on 2.10.6

2019-03-18 Thread Mohr Jr, Richard Frank (Rick Mohr)
> On Mar 18, 2019, at 5:31 PM, Peter Jones wrote: > > You need the patched kernel for that feature I suppose that should be documented in the manual somewhere. I thought project quota support was determined based on ldiskfs vs zfs, and not patched vs unpatched. -- Rick Mohr Senior HPC

[lustre-discuss] Error with project quotas on 2.10.6

2019-03-18 Thread Mohr Jr, Richard Frank (Rick Mohr)
I just recently installed a new Lustre 2.10.6 file system using the RPMS from /public/lustre/lustre-2.10.6-ib/MOFED-4.5-1.0.1.0/el7.6.1810/patchless-ldiskfs-server. (I had already built and installed MOFED-4.5-1.0.1.0, and I installed e2fsprogs-1.44.5.wc1-0.el7). I was able to format the MDT

Re: [lustre-discuss] Migrating files doesn't free space on the OST

2019-01-17 Thread Mohr Jr, Richard Frank (Rick Mohr)
> On Jan 17, 2019, at 2:38 PM, Jason Williams wrote: > > - I just looked for lfsck but I don't seem to have it. We are running 2.10.4 > so I don't know what version that appeared in. lfsck is handled as a subcommand for lctl. http://doc.lustre.org/lustre_manual.xhtml#dbdoclet.lfsckadmin

Re: [lustre-discuss] index is already in use problem

2019-01-16 Thread Mohr Jr, Richard Frank (Rick Mohr)
> On Jan 16, 2019, at 4:18 AM, Jae-Hyuck Kwak wrote: > > How can I force --writeconf option? It seems that mkfs.lustre doesn't support > --writeconf option. You will need to use the tunefs.lustre command to do a writeconf. -- Rick Mohr Senior HPC System Administrator National Institute for

Re: [lustre-discuss] Odd client behavior with mixed Lustre versions

2019-01-11 Thread Mohr Jr, Richard Frank (Rick Mohr)
Is it possible you have some incompatible ko2iblnd module parameters between the 2.8 servers and the 2.10 clients? If there was something causing LNet issues, that could possibly explain some of the symptoms you are seeing. -- Rick Mohr Senior HPC System Administrator National Institute for

Re: [lustre-discuss] Full OST, any way of avoiding it without hanging?

2019-01-07 Thread Mohr Jr, Richard Frank (Rick Mohr)
> On Jan 7, 2019, at 2:09 PM, Jason Williams wrote: > > One last question, How safe is lfs_migrate? The man page on the installation > says it's UNSAFE for possibly in-use files. The lustre manual doesn't have > the same warning and says something about it being a bit more integrated with

Re: [lustre-discuss] Full OST, any way of avoiding it without hanging?

2019-01-07 Thread Mohr Jr, Richard Frank (Rick Mohr)
> On Jan 7, 2019, at 12:53 PM, Jason Williams wrote: > > As I have gone through the testing, I think you may be right. I think I > disabled the OST in a slightly different way and that caused issues. > > Do you happen to know where I could find out a bit more about what the "lctl >

Re: [lustre-discuss] Full OST, any way of avoiding it without hanging?

2019-01-07 Thread Mohr Jr, Richard Frank (Rick Mohr)
> > -- > Jason Williams > Assistant Director > Systems and Data Center Operations. > Maryland Advanced Research Computing Center (MARCC) > Johns Hopkins University > jas...@jhu.edu > > > From: lustre-discuss on behalf of > Jason Williams > Sent: Monday

Re: [lustre-discuss] Full OST, any way of avoiding it without hanging?

2019-01-06 Thread Mohr Jr, Richard Frank (Rick Mohr)
> On Jan 5, 2019, at 9:49 PM, Jason Williams wrote: > > I have looked around the internet and found you can disable an OST, but when > I have tried that, any writes (including deletes) to the OST hang the clients > indefinitely. Does anyone know a way to make an OST basically "read-only" >

Re: [lustre-discuss] Usage for lfs setstripe -o ost_indices

2018-11-09 Thread Mohr Jr, Richard Frank (Rick Mohr)
> On Nov 9, 2018, at 11:28 AM, Mohr Jr, Richard Frank (Rick Mohr) > wrote: > > >> On Nov 8, 2018, at 11:44 AM, Ms. Megan Larko wrote: >> >> I have been attempting this command on a directory on a Lustre-2.10.4 >> storage from a Lustre 2.10.1 client a

Re: [lustre-discuss] Usage for lfs setstripe -o ost_indices

2018-11-09 Thread Mohr Jr, Richard Frank (Rick Mohr)
> On Nov 8, 2018, at 11:44 AM, Ms. Megan Larko wrote: > > I have been attempting this command on a directory on a Lustre-2.10.4 storage > from a Lustre 2.10.1 client and I fail with the following message: > > lfs setstripe -c 4 -S 1m -o 1,2-4 custTest/ > error on ioctl 0x4008669a for

Re: [lustre-discuss] migrating MDS to different infrastructure

2018-10-29 Thread Mohr Jr, Richard Frank (Rick Mohr)
> On Oct 29, 2018, at 1:12 AM, Riccardo Veraldi > wrote: > > it is time for me to move my MDS to a diferent HW infrastructure. > So I Was wondering if the following procedure can work. > I have mds1 (old mds) and mds2 (new mds). On the old mds I have a zfs MGS > partition and a zfs MDT

Re: [lustre-discuss] lustre 2.10.5 or 2.11.0

2018-10-19 Thread Mohr Jr, Richard Frank (Rick Mohr)
> On Oct 17, 2018, at 7:30 PM, Riccardo Veraldi > wrote: > > anyway especially regarding the OSSes you may eventually need some ZFS module > parameters optimizations regarding vdev_write and vdev_read max to increase > those values higher than default. You may also disable ZIL, change the

Re: [lustre-discuss] LU-11465 OSS/MDS deadlock in 2.10.5

2018-10-19 Thread Mohr Jr, Richard Frank (Rick Mohr)
> On Oct 19, 2018, at 10:42 AM, Marion Hakanson wrote: > > Thanks for the feedback. You're both confirming what we've learned so far, > that we had to unmount all the clients (which required rebooting most of > them), then reboot all the storage servers, to get things unstuck until the >

Re: [lustre-discuss] Multihoming Lustre server

2018-10-16 Thread Mohr Jr, Richard Frank (Rick Mohr)
> On Oct 16, 2018, at 7:04 AM, Mark Roper wrote: > > I have successfully set up a Lustre filesystem that is multi-homed on two > different TCP NIDs, using the following configuration. > Mount MGS & MDT > >sudo lnetctl lnet configure >sudo lnetctl net del --net tcp >sudo lnetctl

Re: [lustre-discuss] Experience with resizing MDT

2018-09-20 Thread Mohr Jr, Richard Frank (Rick Mohr)
> On Sep 19, 2018, at 8:09 PM, Colin Faber wrote: > > Why wouldn't you use DNE? I am considering it as an option, but there appear to be some potential drawbacks. If I use DNE1, then I have to manually create directories on specific MDTs. I will need to monitor MDT usage and make

[lustre-discuss] Experience with resizing MDT

2018-09-19 Thread Mohr Jr, Richard Frank (Rick Mohr)
Has anyone had recent experience resizing a ldiskfs-backed MDT using the resize2fs tool? We may be purchasing a small lustre file system in the near future with the expectation that it could grow considerably over time. Since we don’t have a clear idea of how many inodes we might need in the

Re: [lustre-discuss] lustre client not able to lctl ping or mount

2018-09-04 Thread Mohr Jr, Richard Frank (Rick Mohr)
> On Sep 4, 2018, at 12:12 PM, Pak Lui wrote: > > I have tried "map_on_demand=16" to the "/etc/modprobe.d/ko2iblnd.conf" that > was suggested. Also tried "map_on_demand=0" as suggested here: > http://wiki.lustre.org/Optimizing_o2iblnd_Performance > > /etc/modprobe.d/ko2iblnd.conf > alias

Re: [lustre-discuss] migrating MDS to different infrastructure

2018-08-23 Thread Mohr Jr, Richard Frank (Rick Mohr)
> On Aug 22, 2018, at 8:10 PM, Riccardo Veraldi > wrote: > > On 8/22/18 3:13 PM, Mohr Jr, Richard Frank (Rick Mohr) wrote: >>> On Aug 22, 2018, at 3:31 PM, Riccardo Veraldi >>> wrote: >>> I would like to migrate this virtual machine to another infras

Re: [lustre-discuss] migrating MDS to different infrastructure

2018-08-22 Thread Mohr Jr, Richard Frank (Rick Mohr)
> On Aug 22, 2018, at 3:31 PM, Riccardo Veraldi > wrote: > I would like to migrate this virtual machine to another infrastructure. it is > not simple because the other infrastructure is vmware. > what is the best way to migrate those partitions without incurring into any > corruption of data

Re: [lustre-discuss] Lustre Size Variation after formatiing

2018-08-20 Thread Mohr Jr, Richard Frank (Rick Mohr)
> On Aug 20, 2018, at 2:58 AM, ANS wrote: > > 1) CentOS 7.4 > 2) Lustre version 2.11 > 3) MDT LUN Size is 6.5 TB (RAID 10) and after formatting using lustre we are > getting the size as 3.9 TB, when formatted using XFS is showing accurate. For Lustre 2.10 and up, the default inode size is

Re: [lustre-discuss] Lustre 2.10.4 failover

2018-08-13 Thread Mohr Jr, Richard Frank (Rick Mohr)
> On Aug 13, 2018, at 2:25 PM, David Cohen > wrote: > > the fstab line I use for mounting the Lustre filesystem: > > oss03@tcp:oss01@tcp:/fsname /storagelustre flock,user_xattr,defaults >0 0 OK. That looks correct. > the mds is also configured for failover (unsuccessfully) :

Re: [lustre-discuss] Upgrading ZFS version for Lustre

2018-07-27 Thread Mohr Jr, Richard Frank (Rick Mohr)
> On Jul 27, 2018, at 1:56 PM, Andreas Dilger wrote: > >> On Jul 27, 2018, at 10:24, Mohr Jr, Richard Frank (Rick Mohr) >> wrote: >> >> I am working on upgrading some Lustre servers. The servers currently run >> lustre 2.8.0 with zfs 0.6.5, and I am

[lustre-discuss] Upgrading ZFS version for Lustre

2018-07-27 Thread Mohr Jr, Richard Frank (Rick Mohr)
I am working on upgrading some Lustre servers. The servers currently run lustre 2.8.0 with zfs 0.6.5, and I am looking to upgrade to lustre 2.10.4 with zfs 0.7.9. I was looking at the manual, and I did not see anything in there that mentioned special steps when changing ZFS versions. Do I

Re: [lustre-discuss] lctl ping node28@o2ib report Input/output error

2018-06-28 Thread Mohr Jr, Richard Frank (Rick Mohr)
> On Jun 27, 2018, at 4:44 PM, Mohr Jr, Richard Frank (Rick Mohr) > wrote: > > >> On Jun 27, 2018, at 3:12 AM, yu sun wrote: >> >> client: >> root@ml-gpu-ser200.nmg01:~$ mount -t lustre >> node28@o2ib1:node29@o2ib1:/project /mnt/lustre_data >&g

Re: [lustre-discuss] lctl ping node28@o2ib report Input/output error

2018-06-27 Thread Mohr Jr, Richard Frank (Rick Mohr)
> On Jun 27, 2018, at 3:12 AM, yu sun wrote: > > client: > root@ml-gpu-ser200.nmg01:~$ mount -t lustre > node28@o2ib1:node29@o2ib1:/project /mnt/lustre_data > mount.lustre: mount node28@o2ib1:node29@o2ib1:/project at /mnt/lustre_data > failed: Input/output error > Is the MGS running? >

Re: [lustre-discuss] lctl ping node28@o2ib report Input/output error

2018-06-26 Thread Mohr Jr, Richard Frank (Rick Mohr)
> On Jun 27, 2018, at 12:52 AM, yu sun wrote: > > I have create file /etc/modprobe.d/lustre.conf with content on all mdt ost > and client: > root@ml-gpu-ser200.nmg01:~$ cat /etc/modprobe.d/lustre.conf > options lnet networks="o2ib1(eth3.2)" > and I exec command line : lnetctl lnet configure

Re: [lustre-discuss] Lustre on native ZFS encryption

2018-05-02 Thread Mohr Jr, Richard Frank (Rick Mohr)
> On May 2, 2018, at 9:59 AM, Mark Miller wrote: > > Since I have the Lustre source code, I can start looking through it to see if > I can find where the Lustre mount system call may be getting hung up. I have > no idea... but it feels like the Lustre mount may be trying to

Re: [lustre-discuss] varying sequential read performance.

2018-04-05 Thread Mohr Jr, Richard Frank (Rick Mohr)
> On Apr 5, 2018, at 11:31 AM, John Bauer wrote: > > I don't have access to the OSS so I cant report on the Lustre settings. I > think the client side max cached is 50% of memory. Looking at your cache graph, that looks about right. > After speaking with Doug Petesch

Re: [lustre-discuss] varying sequential read performance.

2018-04-05 Thread Mohr Jr, Richard Frank (Rick Mohr)
John, I had a couple of thoughts (though not sure if they are directly relevant to your performance issue): 1) Do you know what caching settings are applied on the lustre servers? This could have an impact on performance, especially if your tests are being run while others are doing IO on

Re: [lustre-discuss] Adding a servicenode (failnode) to existing OSTs

2018-04-04 Thread Mohr Jr, Richard Frank (Rick Mohr)
> On Apr 3, 2018, at 1:14 PM, Steve Barnet wrote: > > There appear to be a > couple ways that this could be done: > > a) Add the service nodes: > tunefs.lustre --servicenode=nid,nid /dev/ > > b) Add a failover node: > tunefs.lustre --param="failover.node= /dev/

[lustre-discuss] Question about lctl changelog_deregister

2018-01-26 Thread Mohr Jr, Richard Frank (Rick Mohr)
I have started playing around with Lustre changelogs, and I have noticed a behavior with the “lctl changelog_deregister” command that I don’t understand. I tried running a little test by enabling changelogs on my MDS server: [root@server ~]# lctl --device orhydra-MDT changelog_register

Re: [lustre-discuss] Designing a new Lustre system

2017-12-20 Thread Mohr Jr, Richard Frank (Rick Mohr)
My $0.02 below. > On Dec 20, 2017, at 11:21 AM, E.S. Rosenberg > wrote: > > 1. After my recent experience with failover I wondered is there any reason > not to set all machines that are within reasonable cable range as potential > failover nodes so that in the

Re: [lustre-discuss] Lustre compilation error

2017-11-30 Thread Mohr Jr, Richard Frank (Rick Mohr)
> On Nov 29, 2017, at 8:35 PM, Dilger, Andreas wrote: > > Would you be able to open a ticket for this, and possibly submit a patch to > fix the build? I can certainly open a ticket, but I’m afraid I don’t know what needs to be fixed so I can’t provide a patch. --

Re: [lustre-discuss] Lustre compilation error

2017-11-29 Thread Mohr Jr, Richard Frank (Rick Mohr)
> On Oct 18, 2017, at 9:44 AM, parag_k wrote: > > > I got the source from github. > > My configure line is- > > ./configure --disable-client > --with-kernel-source-header=/usr/src/kernels/3.10.0-514.el7.x86_64/ > --with-o2ib=/usr/src/ofa_kernel/default/ > Are you

Re: [lustre-discuss] mdt mounting error

2017-11-01 Thread Mohr Jr, Richard Frank (Rick Mohr)
> On Nov 1, 2017, at 7:18 AM, Parag Khuraswar wrote: > > For mgt – > mkfs.lustre --servicenode=10.2.1.204@o2ib --servicenode=10.2.1.205@o2ib --mgs > /dev/mapper/mpathc > > For mdt > mkfs.lustre --fsname=home --mgsnode=10.2.1.204@o2ib --mgsnode=10.2.1.205@o2ib >

Re: [lustre-discuss] 1 MDS and 1 OSS

2017-10-30 Thread Mohr Jr, Richard Frank (Rick Mohr)
> On Oct 30, 2017, at 4:46 PM, Brian Andrus wrote: > > Someone please correct me if I am wrong, but that seems a bit large of an > MDT. Of course drives these days are pretty good sized, so the extra is > probably very inexpensive. That probably depends on what the

Re: [lustre-discuss] Lustre routing help needed

2017-10-30 Thread Mohr Jr, Richard Frank (Rick Mohr)
> On Oct 30, 2017, at 8:47 AM, Kevin M. Hildebrand wrote: > > All of the hosts (client, server, router) have the following in ko2iblnd.conf: > > alias ko2iblnd-opa ko2iblnd > options ko2iblnd-opa peer_credits=128 peer_credits_hiw=64 credits=1024 > concurrent_sends=256 ntx=2048

Re: [lustre-discuss] Linux users are not able to access lustre folders

2017-10-20 Thread Mohr Jr, Richard Frank (Rick Mohr)
> On Oct 20, 2017, at 11:37 AM, Ravi Bhat wrote: > > Thanks, I have created user (luser6) in client as well as in lustre servers. > But I get the same error as > No directory /home/luser6 > Logging in with home="/". > > But now I can cd /home/luser6 manually and create

Re: [lustre-discuss] Linux users are not able to access lustre folders

2017-10-20 Thread Mohr Jr, Richard Frank (Rick Mohr)
> On Oct 20, 2017, at 10:50 AM, Ravi Konila wrote: > > Can you please guide me how do I do it, I mean install NIS on servers and > clients? > Is it mandatory to setup NIS? > NIS is not mandatory. You just need a way to ensure that user accounts are visible to the

[lustre-discuss] OSTs remounting read-only after ldiskfs journal error

2017-10-19 Thread Mohr Jr, Richard Frank (Rick Mohr)
Recently, I ran into an issue where several of the OSTs on my Lustre file system went read-only. When I checked the logs, I saw messages like these for several OSTs: Oct 6 23:27:11 haven-oss2 kernel: LDISKFS-fs: ldiskfs_getblk:834: aborting transaction: error 28 in

Re: [lustre-discuss] Lustre poor performance

2017-08-23 Thread Mohr Jr, Richard Frank (Rick Mohr)
> On Aug 22, 2017, at 7:14 PM, Riccardo Veraldi > wrote: > > On 8/22/17 9:22 AM, Mannthey, Keith wrote: >> Younot expected. >> > yes they are automatically used on my Mellanox and the script ko2iblnd-probe > seems like not working properly. The ko2iblnd-probe

Re: [lustre-discuss] Lustre Quotas

2017-08-03 Thread Mohr Jr, Richard Frank (Rick Mohr)
> On Aug 3, 2017, at 11:48 AM, Jackson, Gary L. > wrote: > > Are quotas well supported and robust on Lustre? As far as I know, they are. But I mainly use quotas for reporting purposes. I have not had much experience with enforcing quota limits in Lustre. > What

Re: [lustre-discuss] Spiking OSS load?

2017-08-03 Thread Mohr Jr, Richard Frank (Rick Mohr)
> On Aug 1, 2017, at 3:07 PM, Jason Williams wrote: > 1) Is 512 threads a reasonable setting or should it be lower? Since your servers have enough memory to support 512 threads, then it is probably reasonable. If your server load is ~100, that probably means most of

Re: [lustre-discuss] New Lustre Installation

2017-05-22 Thread Mohr Jr, Richard Frank (Rick Mohr)
You might want to start by looking at these online tutorials: http://lustre.ornl.gov/lustre101-courses/ -- Rick Mohr Senior HPC System Administrator National Institute for Computational Sciences http://www.nics.tennessee.edu > On May 21, 2017, at 6:19 AM, Ravi Konila

Re: [lustre-discuss] Lustre 2.8.0 - MDT/MGT failing to mount

2017-05-04 Thread Mohr Jr, Richard Frank (Rick Mohr)
> On May 4, 2017, at 11:03 AM, Steve Barnet <bar...@icecube.wisc.edu> wrote: > > On 5/4/17 10:01 AM, Mohr Jr, Richard Frank (Rick Mohr) wrote: >> Did you try doing a writeconf to regenerate the config logs for the file >> system? > > > Not yet, but quick en

Re: [lustre-discuss] building of lustre-client fails

2017-05-04 Thread Mohr Jr, Richard Frank (Rick Mohr)
> On May 3, 2017, at 10:56 PM, Riccardo Veraldi > wrote: > > I am building lustre-client from src rpm on RHL73. > > it fails with this error during the install process: > > + echo /etc/init.d/lnet > + echo /etc/init.d/lsvcgss > + find

Re: [lustre-discuss] operation ldlm_queue failed with -11

2017-05-03 Thread Mohr Jr, Richard Frank (Rick Mohr)
> On May 3, 2017, at 12:23 PM, Patrick Farrell wrote: > > That reasoning is sound, but this is a special case. -11 (-EAGAIN) on > ldlm_enqueue is generally OK... > > LU-8658 explains the situation (it's POSIX flocks), so I'm going to reference > that rather than repeat it

Re: [lustre-discuss] operation ldlm_queue failed with -11

2017-05-03 Thread Mohr Jr, Richard Frank (Rick Mohr)
I think that -11 is EAGAIN, but I don’t know how to interpret what that means in the context of Lustre locking. I assume these messages are from the clients and the changing “x” portion is just the fact that each client has a different identifier. So if you have multiple clients

Re: [lustre-discuss] client fails to mount

2017-04-24 Thread Mohr Jr, Richard Frank (Rick Mohr)
This might be a long shot, but have you checked for possible firewall rules that might be causing the issue? I’m wondering if there is a chance that some rules were added after the nodes were up to allow Lustre access, and when a node got rebooted, it lost the rules. -- Rick Mohr Senior HPC

Re: [lustre-discuss] Lustre [2.8.0] flock Functionality

2017-03-28 Thread Mohr Jr, Richard Frank (Rick Mohr)
> On Mar 28, 2017, at 1:49 PM, DeWitt, Chad wrote: > > We've encountered several programs that require flock, so we are now > investigating enabling flock functionality. However, the Lustre manual > includes a passage in regards to flocks which gives us pause: > >

Re: [lustre-discuss] Odd quota behavior with Lustre/ZFS

2017-02-16 Thread Mohr Jr, Richard Frank (Rick Mohr)
Yes, in lustre 2.5.3 after doing chgrp for large subtree. IIRC, for three > groups; counts were small different "negative" numbers, not 21. > I can get more details tomorrow. > > Alex > >> On Feb 9, 2017, at 5:14 PM, Mohr Jr, Richard Frank (Rick Mohr) >> <r

Re: [lustre-discuss] Virtual servers

2017-02-16 Thread Mohr Jr, Richard Frank (Rick Mohr)
> On Feb 16, 2017, at 9:56 AM, Jon Tegner wrote: > > I have three (physical) machines, and each one have a virtual machine on them > (KVM). On one of the virtual machines there is an MDS and on two of them > there are OSS:es installed. > > All system use CentOS-7.3 and Lustre

[lustre-discuss] Odd quota behavior with Lustre/ZFS

2017-02-09 Thread Mohr Jr, Richard Frank (Rick Mohr)
I recently set up a Lustre 2.8 file system that uses ZFS for the backend storage (both on the MDT and OSTs). When I was doing some testing, I noticed that the output from lfs quota seemed odd. While the quota information for the amount of used space seemed correct, the info on the number of

Re: [lustre-discuss] Lustre Client hanging on mount

2017-01-12 Thread Mohr Jr, Richard Frank (Rick Mohr)
I noticed that you appear to have formatted the MDT with the file system name “mgsZFS” while the OST was formatted with the file system name “ossZFS”. The same name needs to be used on all MDTs/OSTs in the same file system. Until that is fixed, your file system won’t work properly. -- Rick

Re: [lustre-discuss] MGS failover problem

2017-01-11 Thread Mohr Jr, Richard Frank (Rick Mohr)
> On Jan 11, 2017, at 12:39 PM, Vicker, Darby (JSC-EG311) > wrote: > >>> Getting failover right over multiple separate networks can be a real >>> hair-pulling experience. >> >> Darby: Do you have the option of (at least temporarily) running the file >> system with

Re: [lustre-discuss] MGS failover problem

2017-01-11 Thread Mohr Jr, Richard Frank (Rick Mohr)
> On Jan 11, 2017, at 11:58 AM, Ben Evans wrote: > > Getting failover right over multiple separate networks can be a real > hair-pulling experience. Darby: Do you have the option of (at least temporarily) running the file system with only Infiniband configured? If you could

Re: [lustre-discuss] Lustre with Hadoop (Hortonworks Data Platform)

2017-01-09 Thread Mohr Jr, Richard Frank (Rick Mohr)
> On Jan 9, 2017, at 4:21 AM, Markham Benjamin wrote: > > I was wondering the use cases of using Lustre on Hadoop. One key things about > HDFS is that it runs on commodity hardware. Unless I’m being misinformed, > Lustre doesn’t exactly run on commodity hardware. I

Re: [lustre-discuss] MGS failover problem

2017-01-09 Thread Mohr Jr, Richard Frank (Rick Mohr)
Have you tried performing a writeconf to regenerate the lustre config log files? This can sometimes fix the problem by making sure that everything is consistent. (A writeconf is often required when making NID or failover changes.) I think you could also use that opportunity to correct your

Re: [lustre-discuss] Round robin allocation (in general and in buggy 2.5.3)

2016-12-20 Thread Mohr Jr, Richard Frank (Rick Mohr)
> On Dec 20, 2016, at 10:48 AM, Jessica Otey wrote: > > qos_threshold_rr > > This setting controls how much consideration should be given to QoS in > allocation > The higher this number, the more QOS is taken into consideration. > When set to 100%, Lustre ignores the QoS

Re: [lustre-discuss] [UNTRUSTED] Re: Check clients connected?

2016-12-15 Thread Mohr Jr, Richard Frank (Rick Mohr)
> On Dec 15, 2016, at 9:30 AM, Phill Harvey-Smith > wrote: > > On 15/12/2016 14:21, Hanley, Jesse A. wrote: >> I forgot: You should also be able to use lshowmount. > > Humm that works on the old sever, but can't find the command on the new > centos 7.2 server,

Re: [lustre-discuss] Lustre newbie problems: formatting disk or partition with Lustre filesystem fails

2016-11-28 Thread Mohr Jr, Richard Frank (Rick Mohr)
> On Nov 28, 2016, at 9:58 AM, Stefano Turolla > wrote: > > thanks for the quick reply, I am maybe doing the wrong thing. What I am > trying to achieve is to have a Lustre volume to be shared among the > nodes, and the 30TB is the size of existing storage. > >

Re: [lustre-discuss] Quick ZFS pool question?

2016-10-13 Thread Mohr Jr, Richard Frank (Rick Mohr)
> On Oct 13, 2016, at 12:32 PM, E.S. Rosenberg > wrote: > > I thought ZFS was only recommended for OSTs and not for MDTs/MGS? ZFS usually has lower metadata performance for MDT than using ldiskfs which is why some people recommend ZFS only for the OSTs. However,

Re: [lustre-discuss] Still having problems Lustre 2.8 Centos 7.2

2016-09-28 Thread Mohr Jr, Richard Frank (Rick Mohr)
Did you check to make sure there are no firewalls running that could be blocking traffic? -- Rick Mohr Senior HPC System Administrator National Institute for Computational Sciences http://www.nics.tennessee.edu > On Sep 27, 2016, at 10:12 AM, Phill Harvey-Smith >

Re: [lustre-discuss] More problems setting things up....

2016-09-21 Thread Mohr Jr, Richard Frank (Rick Mohr)
> On Sep 21, 2016, at 5:08 AM, Phill Harvey-Smith > wrote: > > Sep 21 09:44:29 oric kernel: osd_zfs: disagrees about version of symbol > dsl_prop_register > Sep 21 09:44:29 oric kernel: osd_zfs: Unknown symbol dsl_prop_register (err > -22) > Sep 21 09:44:29 oric

Re: [lustre-discuss] Mount lustre client with MDS/MGS backup

2016-09-14 Thread Mohr Jr, Richard Frank (Rick Mohr)
rcher > c/ Sola nº 1; 10200 Trujillo, ESPAÑA > Tel: +34 927 65 93 17 Fax: +34 927 32 32 37 > > > > > De: Ben Evans [bev...@cray.com] > Enviado el: jueves, 01 de septiembre de 2016 15:25 > Para: Pardo Diaz, Alfonso; Mohr Jr, Ric

Re: [lustre-discuss] Mount lustre client with MDS/MGS backup

2016-08-31 Thread Mohr Jr, Richard Frank (Rick Mohr)
> On Aug 31, 2016, at 8:12 AM, Pardo Diaz, Alfonso > wrote: > > I mount my clients: mount -t lustre mds1@o2ib:mds2@o2ib:/fs /mnt/fs > > 1) When both MDS are OK I can mount without problems > 2) If the MDS1 is down and my clients have lustre mounted, they use MDS2 >

Re: [lustre-discuss] Does an updated version exist?

2016-08-16 Thread Mohr Jr, Richard Frank (Rick Mohr)
> On Aug 16, 2016, at 6:55 AM, E.S. Rosenberg > wrote: > > I just found this paper: > http://wiki.lustre.org/images/d/da/Understanding_Lustre_Filesystem_Internals.pdf > > It looks interesting but it deals with lustre 1.6 so I am not sure how > relevant it still

Re: [lustre-discuss] proper procedure after MDT kernel panic

2016-08-11 Thread Mohr Jr, Richard Frank (Rick Mohr)
> On Aug 11, 2016, at 5:42 AM, E.S. Rosenberg > wrote: > > Our MDT suffered a kernel panic (which I will post separately), the OSSs > stayed alive but the MDT was out for some time while nodes still tried to > interact with lustre. > > So I have several

  1   2   >