Re: [lustre-discuss] Stop writes for users

2019-05-14 Thread Mohr Jr, Richard Frank (Rick Mohr)
> On May 13, 2019, at 6:51 PM, Fernando Pérez wrote: > > Is there a way to stop file writes for all users or for groups without using > quotes? > > We have a lustre filesystem with corrupted quotes and I need to stop the > write for all users (or for some users). There are ways to deactivate

Re: [lustre-discuss] PFL not working on 2.10 client

2019-05-01 Thread Mohr Jr, Richard Frank (Rick Mohr)
I don’t think we need to have PFL working immediately, and since we have plans to upgrade the client at some point, I will just wait and see what happens after the upgrade. -- Rick Mohr Senior HPC System Administrator National Institute for Computational Sciences http://www.nics.tennessee.edu

[lustre-discuss] PFL not working on 2.10 client

2019-04-22 Thread Mohr Jr, Richard Frank (Rick Mohr)
I was trying to play around with some PFL layout today, and I ran into an issue. I have a file system running Lustre 2.10.6 and a client with 2.10.0 installed. I created a PFL with this command: [rfmohr@sip-login1 rfmohr]$ lfs setstripe -E 4M -c 2 -E 100M -c 4 comp_file It did not return any

Re: [lustre-discuss] lfsck repair quota

2019-04-17 Thread Mohr Jr, Richard Frank (Rick Mohr)
> On Apr 17, 2019, at 4:32 AM, Fernando Perez wrote: > > I tried to run the e2fsck in the mdt three years ago and the logs shows a lot > of this kind of messages: > >> Unattached inode 26977505 >> Connect to /lost+found? yes >> Inode 26977505 ref count is 2, should be 1. Fix? yes > > In fact

Re: [lustre-discuss] unable to install lustre clients on Centos 7.6 with MLNX_OFED_LINUX-4.5-1.0.1.0

2019-04-16 Thread Mohr Jr, Richard Frank (Rick Mohr)
Which RPMs did you download? The ones from the /public/lustre/lustre-2.10.7 directory, or the ones from /public/lustre/lustre-2.10.7-ib? The former are built with support for in-kernel IB, and the latter are for MOFED. If you downloaded the latter, did you install MOFED yourself, or did you

Re: [lustre-discuss] lfsck repair quota

2019-04-16 Thread Mohr Jr, Richard Frank (Rick Mohr)
> On Apr 16, 2019, at 10:24 AM, Fernando Pérez wrote: > > According to the lustre wiki I though that the lfsck could repair corrupted > quotes: > > http://wiki.lustre.org/Lustre_Quota_Troubleshooting Keep in mind that page is a few years old, but I assume they were referring to LFSCK Phase 2

Re: [lustre-discuss] lfsck repair quota

2019-04-16 Thread Mohr Jr, Richard Frank (Rick Mohr)
> On Apr 15, 2019, at 10:54 AM, Fernando Perez wrote: > > Could anyone confirm me that the correct way to repair wrong quotes in a > ldiskfs mdt is lctl lfsck_start -t layout -A? As far as I know, lfsck doesn’t repair quota info. It only fixes internal consistency within Lustre. Whenever I h

Re: [lustre-discuss] inodes not adding up

2019-04-15 Thread Mohr Jr, Richard Frank (Rick Mohr)
> On Apr 13, 2019, at 4:57 AM, Youssef Eldakar wrote: > > For one Lustre filesystem, inode count in the summary is notably less than > what the individual OST inode counts would add up to: The first thing to understand is that every Lustre file will consume one inode on the MDT, and this ino

Re: [lustre-discuss] how to erase lustre filesystem

2019-04-04 Thread Mohr Jr, Richard Frank (Rick Mohr)
You might need to clarify what you mean by “erase” the file system. The procedure in the manual is intended for reformatting MDTs/OSTs that had previously been formatted for lustre. I don’t think it actually erases data in these sense of overwriting existing data with zeros (or something simil

Re: [lustre-discuss] Tools for backing up a ZFS MDT

2019-03-29 Thread Mohr Jr, Richard Frank (Rick Mohr)
This presentation from LUG 2017 might be useful for you: http://cdn.opensfs.org/wp-content/uploads/2017/06/Wed06-CroweTom-lug17-ost_data_migration_using_ZFS.pdf It shows how ZFS send/receive can be used to migrate data between OSTs. I used it as a reference when I worked with another admin to

[lustre-discuss] Using lfs migrate to move files between MDTs

2019-03-29 Thread Mohr Jr, Richard Frank (Rick Mohr)
I have been playing a little bit with DNE today, and I had a question about some odd behavior I saw regarding inode counts. My Lustre 2.10.6 file system has 2 MDTs. I created a directory (which by default resides on MDT0) and then created 10 files in that directory: [root@sip-mgmt2 test]# lf

Re: [lustre-discuss] Error with project quotas on 2.10.6

2019-03-26 Thread Mohr Jr, Richard Frank (Rick Mohr)
> On Mar 20, 2019, at 1:24 PM, Peter Jones wrote: > > If it's not in the manual then it should be. Could you please open an LUDOC > ticket to track getting this corrected if need be? Done. https://jira.whamcloud.com/browse/LUDOC-435 -- Rick Mohr Senior HPC System Administrator National Inst

Re: [lustre-discuss] Error with project quotas on 2.10.6

2019-03-18 Thread Mohr Jr, Richard Frank (Rick Mohr)
> On Mar 18, 2019, at 5:31 PM, Peter Jones wrote: > > You need the patched kernel for that feature I suppose that should be documented in the manual somewhere. I thought project quota support was determined based on ldiskfs vs zfs, and not patched vs unpatched. -- Rick Mohr Senior HPC Syst

[lustre-discuss] Error with project quotas on 2.10.6

2019-03-18 Thread Mohr Jr, Richard Frank (Rick Mohr)
I just recently installed a new Lustre 2.10.6 file system using the RPMS from /public/lustre/lustre-2.10.6-ib/MOFED-4.5-1.0.1.0/el7.6.1810/patchless-ldiskfs-server. (I had already built and installed MOFED-4.5-1.0.1.0, and I installed e2fsprogs-1.44.5.wc1-0.el7). I was able to format the MDT a

Re: [lustre-discuss] Migrating files doesn't free space on the OST

2019-01-17 Thread Mohr Jr, Richard Frank (Rick Mohr)
> On Jan 17, 2019, at 2:38 PM, Jason Williams wrote: > > - I just looked for lfsck but I don't seem to have it. We are running 2.10.4 > so I don't know what version that appeared in. lfsck is handled as a subcommand for lctl. http://doc.lustre.org/lustre_manual.xhtml#dbdoclet.lfsckadmin --

Re: [lustre-discuss] index is already in use problem

2019-01-16 Thread Mohr Jr, Richard Frank (Rick Mohr)
> On Jan 16, 2019, at 4:18 AM, Jae-Hyuck Kwak wrote: > > How can I force --writeconf option? It seems that mkfs.lustre doesn't support > --writeconf option. You will need to use the tunefs.lustre command to do a writeconf. -- Rick Mohr Senior HPC System Administrator National Institute for C

Re: [lustre-discuss] Odd client behavior with mixed Lustre versions

2019-01-11 Thread Mohr Jr, Richard Frank (Rick Mohr)
Is it possible you have some incompatible ko2iblnd module parameters between the 2.8 servers and the 2.10 clients? If there was something causing LNet issues, that could possibly explain some of the symptoms you are seeing. -- Rick Mohr Senior HPC System Administrator National Institute for Com

Re: [lustre-discuss] Full OST, any way of avoiding it without hanging?

2019-01-07 Thread Mohr Jr, Richard Frank (Rick Mohr)
> On Jan 7, 2019, at 2:09 PM, Jason Williams wrote: > > One last question, How safe is lfs_migrate? The man page on the installation > says it's UNSAFE for possibly in-use files. The lustre manual doesn't have > the same warning and says something about it being a bit more integrated with

Re: [lustre-discuss] Full OST, any way of avoiding it without hanging?

2019-01-07 Thread Mohr Jr, Richard Frank (Rick Mohr)
> On Jan 7, 2019, at 12:53 PM, Jason Williams wrote: > > As I have gone through the testing, I think you may be right. I think I > disabled the OST in a slightly different way and that caused issues. > > Do you happen to know where I could find out a bit more about what the "lctl > set_para

Re: [lustre-discuss] Full OST, any way of avoiding it without hanging?

2019-01-07 Thread Mohr Jr, Richard Frank (Rick Mohr)
#3. > > > -- > Jason Williams > Assistant Director > Systems and Data Center Operations. > Maryland Advanced Research Computing Center (MARCC) > Johns Hopkins University > jas...@jhu.edu > > > From: lustre-discuss on behalf of > Jason Williams > Se

Re: [lustre-discuss] Full OST, any way of avoiding it without hanging?

2019-01-06 Thread Mohr Jr, Richard Frank (Rick Mohr)
> On Jan 5, 2019, at 9:49 PM, Jason Williams wrote: > > I have looked around the internet and found you can disable an OST, but when > I have tried that, any writes (including deletes) to the OST hang the clients > indefinitely. Does anyone know a way to make an OST basically "read-only" >

Re: [lustre-discuss] Usage for lfs setstripe -o ost_indices

2018-11-09 Thread Mohr Jr, Richard Frank (Rick Mohr)
> On Nov 9, 2018, at 11:28 AM, Mohr Jr, Richard Frank (Rick Mohr) > wrote: > > >> On Nov 8, 2018, at 11:44 AM, Ms. Megan Larko wrote: >> >> I have been attempting this command on a directory on a Lustre-2.10.4 >> storage from a Lustre 2.10.1 client a

Re: [lustre-discuss] Usage for lfs setstripe -o ost_indices

2018-11-09 Thread Mohr Jr, Richard Frank (Rick Mohr)
> On Nov 8, 2018, at 11:44 AM, Ms. Megan Larko wrote: > > I have been attempting this command on a directory on a Lustre-2.10.4 storage > from a Lustre 2.10.1 client and I fail with the following message: > > lfs setstripe -c 4 -S 1m -o 1,2-4 custTest/ > error on ioctl 0x4008669a for 'custTest

Re: [lustre-discuss] migrating MDS to different infrastructure

2018-10-29 Thread Mohr Jr, Richard Frank (Rick Mohr)
> On Oct 29, 2018, at 1:12 AM, Riccardo Veraldi > wrote: > > it is time for me to move my MDS to a diferent HW infrastructure. > So I Was wondering if the following procedure can work. > I have mds1 (old mds) and mds2 (new mds). On the old mds I have a zfs MGS > partition and a zfs MDT partiti

Re: [lustre-discuss] lustre 2.10.5 or 2.11.0

2018-10-19 Thread Mohr Jr, Richard Frank (Rick Mohr)
> On Oct 17, 2018, at 7:30 PM, Riccardo Veraldi > wrote: > > anyway especially regarding the OSSes you may eventually need some ZFS module > parameters optimizations regarding vdev_write and vdev_read max to increase > those values higher than default. You may also disable ZIL, change the >

Re: [lustre-discuss] LU-11465 OSS/MDS deadlock in 2.10.5

2018-10-19 Thread Mohr Jr, Richard Frank (Rick Mohr)
> On Oct 19, 2018, at 10:42 AM, Marion Hakanson wrote: > > Thanks for the feedback. You're both confirming what we've learned so far, > that we had to unmount all the clients (which required rebooting most of > them), then reboot all the storage servers, to get things unstuck until the > pr

Re: [lustre-discuss] Multihoming Lustre server

2018-10-16 Thread Mohr Jr, Richard Frank (Rick Mohr)
> On Oct 16, 2018, at 7:04 AM, Mark Roper wrote: > > I have successfully set up a Lustre filesystem that is multi-homed on two > different TCP NIDs, using the following configuration. > Mount MGS & MDT > >sudo lnetctl lnet configure >sudo lnetctl net del --net tcp >sudo lnetctl net

Re: [lustre-discuss] Experience with resizing MDT

2018-09-20 Thread Mohr Jr, Richard Frank (Rick Mohr)
> On Sep 19, 2018, at 8:09 PM, Colin Faber wrote: > > Why wouldn't you use DNE? I am considering it as an option, but there appear to be some potential drawbacks. If I use DNE1, then I have to manually create directories on specific MDTs. I will need to monitor MDT usage and make adjustment

[lustre-discuss] Experience with resizing MDT

2018-09-19 Thread Mohr Jr, Richard Frank (Rick Mohr)
Has anyone had recent experience resizing a ldiskfs-backed MDT using the resize2fs tool? We may be purchasing a small lustre file system in the near future with the expectation that it could grow considerably over time. Since we don’t have a clear idea of how many inodes we might need in the f

Re: [lustre-discuss] Lustre Filesystem mounted but having "Input/output error" on df command

2018-09-10 Thread Mohr Jr, Richard Frank (Rick Mohr)
Those are the kind of symptoms you would see if the client is able to connect to the MDS server but not to an OSS server. Certain operations (mount, cd, ls) will work if the MDS server is reachable , even if one or more OSS servers is not reachable. But other operations (“ls -la”, df) require

Re: [lustre-discuss] lustre client not able to lctl ping or mount

2018-09-04 Thread Mohr Jr, Richard Frank (Rick Mohr)
> On Sep 4, 2018, at 12:12 PM, Pak Lui wrote: > > I have tried "map_on_demand=16" to the "/etc/modprobe.d/ko2iblnd.conf" that > was suggested. Also tried "map_on_demand=0" as suggested here: > http://wiki.lustre.org/Optimizing_o2iblnd_Performance > > /etc/modprobe.d/ko2iblnd.conf > alias ko2i

Re: [lustre-discuss] migrating MDS to different infrastructure

2018-08-23 Thread Mohr Jr, Richard Frank (Rick Mohr)
> On Aug 22, 2018, at 8:10 PM, Riccardo Veraldi > wrote: > > On 8/22/18 3:13 PM, Mohr Jr, Richard Frank (Rick Mohr) wrote: >>> On Aug 22, 2018, at 3:31 PM, Riccardo Veraldi >>> wrote: >>> I would like to migrate this virtual machine to another infras

Re: [lustre-discuss] migrating MDS to different infrastructure

2018-08-22 Thread Mohr Jr, Richard Frank (Rick Mohr)
> On Aug 22, 2018, at 3:31 PM, Riccardo Veraldi > wrote: > I would like to migrate this virtual machine to another infrastructure. it is > not simple because the other infrastructure is vmware. > what is the best way to migrate those partitions without incurring into any > corruption of data ?

Re: [lustre-discuss] Lustre Size Variation after formatiing

2018-08-20 Thread Mohr Jr, Richard Frank (Rick Mohr)
> On Aug 20, 2018, at 2:58 AM, ANS wrote: > > 1) CentOS 7.4 > 2) Lustre version 2.11 > 3) MDT LUN Size is 6.5 TB (RAID 10) and after formatting using lustre we are > getting the size as 3.9 TB, when formatted using XFS is showing accurate. For Lustre 2.10 and up, the default inode size is 1KB

Re: [lustre-discuss] Lustre 2.10.4 failover

2018-08-13 Thread Mohr Jr, Richard Frank (Rick Mohr)
> On Aug 13, 2018, at 2:25 PM, David Cohen > wrote: > > the fstab line I use for mounting the Lustre filesystem: > > oss03@tcp:oss01@tcp:/fsname /storagelustre flock,user_xattr,defaults >0 0 OK. That looks correct. > the mds is also configured for failover (unsuccessfully) : >

Re: [lustre-discuss] Lustre 2.10.4 failover

2018-08-13 Thread Mohr Jr, Richard Frank (Rick Mohr)
> On Aug 13, 2018, at 7:14 AM, David Cohen > wrote: > > I installed a new 2.10.4 Lustre file system. > Running MDS and OSS on the same servers. > Failover wasn't configured at format time. > I'm trying to configure failover node with tunefs without success. > tunefs.lustre --writeconf --erase-

Re: [lustre-discuss] Upgrading ZFS version for Lustre

2018-07-27 Thread Mohr Jr, Richard Frank (Rick Mohr)
> On Jul 27, 2018, at 1:56 PM, Andreas Dilger wrote: > >> On Jul 27, 2018, at 10:24, Mohr Jr, Richard Frank (Rick Mohr) >> wrote: >> >> I am working on upgrading some Lustre servers. The servers currently run >> lustre 2.8.0 with zfs 0.6.5, and I am

[lustre-discuss] Upgrading ZFS version for Lustre

2018-07-27 Thread Mohr Jr, Richard Frank (Rick Mohr)
I am working on upgrading some Lustre servers. The servers currently run lustre 2.8.0 with zfs 0.6.5, and I am looking to upgrade to lustre 2.10.4 with zfs 0.7.9. I was looking at the manual, and I did not see anything in there that mentioned special steps when changing ZFS versions. Do I ne

Re: [lustre-discuss] lctl ping node28@o2ib report Input/output error

2018-06-28 Thread Mohr Jr, Richard Frank (Rick Mohr)
> On Jun 27, 2018, at 4:44 PM, Mohr Jr, Richard Frank (Rick Mohr) > wrote: > > >> On Jun 27, 2018, at 3:12 AM, yu sun wrote: >> >> client: >> root@ml-gpu-ser200.nmg01:~$ mount -t lustre >> node28@o2ib1:node29@o2ib1:/project /mnt/lustre_data >&g

Re: [lustre-discuss] lctl ping node28@o2ib report Input/output error

2018-06-27 Thread Mohr Jr, Richard Frank (Rick Mohr)
> On Jun 27, 2018, at 3:12 AM, yu sun wrote: > > client: > root@ml-gpu-ser200.nmg01:~$ mount -t lustre > node28@o2ib1:node29@o2ib1:/project /mnt/lustre_data > mount.lustre: mount node28@o2ib1:node29@o2ib1:/project at /mnt/lustre_data > failed: Input/output error > Is the MGS running? > root@m

Re: [lustre-discuss] lctl ping node28@o2ib report Input/output error

2018-06-26 Thread Mohr Jr, Richard Frank (Rick Mohr)
> On Jun 27, 2018, at 12:52 AM, yu sun wrote: > > I have create file /etc/modprobe.d/lustre.conf with content on all mdt ost > and client: > root@ml-gpu-ser200.nmg01:~$ cat /etc/modprobe.d/lustre.conf > options lnet networks="o2ib1(eth3.2)" > and I exec command line : lnetctl lnet configure --

Re: [lustre-discuss] Lustre on native ZFS encryption

2018-05-02 Thread Mohr Jr, Richard Frank (Rick Mohr)
> On May 2, 2018, at 10:37 AM, Mohr Jr, Richard Frank (Rick Mohr) > wrote: > > >> On May 2, 2018, at 9:59 AM, Mark Miller wrote: >> >> Since I have the Lustre source code, I can start looking through it to see >> if I can find where the Lustre mount sy

Re: [lustre-discuss] Lustre on native ZFS encryption

2018-05-02 Thread Mohr Jr, Richard Frank (Rick Mohr)
> On May 2, 2018, at 9:59 AM, Mark Miller wrote: > > Since I have the Lustre source code, I can start looking through it to see if > I can find where the Lustre mount system call may be getting hung up. I have > no idea... but it feels like the Lustre mount may be trying to read something >

Re: [lustre-discuss] varying sequential read performance.

2018-04-05 Thread Mohr Jr, Richard Frank (Rick Mohr)
> On Apr 5, 2018, at 11:31 AM, John Bauer wrote: > > I don't have access to the OSS so I cant report on the Lustre settings. I > think the client side max cached is 50% of memory. Looking at your cache graph, that looks about right. > After speaking with Doug Petesch of Cray, I though I wou

Re: [lustre-discuss] varying sequential read performance.

2018-04-05 Thread Mohr Jr, Richard Frank (Rick Mohr)
John, I had a couple of thoughts (though not sure if they are directly relevant to your performance issue): 1) Do you know what caching settings are applied on the lustre servers? This could have an impact on performance, especially if your tests are being run while others are doing IO on the

Re: [lustre-discuss] Adding a servicenode (failnode) to existing OSTs

2018-04-04 Thread Mohr Jr, Richard Frank (Rick Mohr)
> On Apr 3, 2018, at 1:14 PM, Steve Barnet wrote: > > There appear to be a > couple ways that this could be done: > > a) Add the service nodes: > tunefs.lustre --servicenode=nid,nid /dev/ > > b) Add a failover node: > tunefs.lustre --param="failover.node= /dev/ The first one is the prefer

[lustre-discuss] Question about lctl changelog_deregister

2018-01-26 Thread Mohr Jr, Richard Frank (Rick Mohr)
I have started playing around with Lustre changelogs, and I have noticed a behavior with the “lctl changelog_deregister” command that I don’t understand. I tried running a little test by enabling changelogs on my MDS server: [root@server ~]# lctl --device orhydra-MDT changelog_register orhy

Re: [lustre-discuss] Designing a new Lustre system

2017-12-20 Thread Mohr Jr, Richard Frank (Rick Mohr)
My $0.02 below. > On Dec 20, 2017, at 11:21 AM, E.S. Rosenberg > wrote: > > 1. After my recent experience with failover I wondered is there any reason > not to set all machines that are within reasonable cable range as potential > failover nodes so that in the very unlikely event of both mach

Re: [lustre-discuss] Lustre compilation error

2017-11-30 Thread Mohr Jr, Richard Frank (Rick Mohr)
> On Nov 29, 2017, at 8:35 PM, Dilger, Andreas wrote: > > Would you be able to open a ticket for this, and possibly submit a patch to > fix the build? I can certainly open a ticket, but I’m afraid I don’t know what needs to be fixed so I can’t provide a patch. -- Rick Mohr Senior HPC System

Re: [lustre-discuss] Lustre compilation error

2017-11-29 Thread Mohr Jr, Richard Frank (Rick Mohr)
> On Oct 18, 2017, at 9:44 AM, parag_k wrote: > > > I got the source from github. > > My configure line is- > > ./configure --disable-client > --with-kernel-source-header=/usr/src/kernels/3.10.0-514.el7.x86_64/ > --with-o2ib=/usr/src/ofa_kernel/default/ > Are you still running into this i

Re: [lustre-discuss] mdt mounting error

2017-11-01 Thread Mohr Jr, Richard Frank (Rick Mohr)
> On Nov 1, 2017, at 7:18 AM, Parag Khuraswar wrote: > > For mgt – > mkfs.lustre --servicenode=10.2.1.204@o2ib --servicenode=10.2.1.205@o2ib --mgs > /dev/mapper/mpathc > > For mdt > mkfs.lustre --fsname=home --mgsnode=10.2.1.204@o2ib --mgsnode=10.2.1.205@o2ib > --servicenode=10.2.1.204@o2ib

Re: [lustre-discuss] 1 MDS and 1 OSS

2017-10-30 Thread Mohr Jr, Richard Frank (Rick Mohr)
> On Oct 30, 2017, at 4:46 PM, Brian Andrus wrote: > > Someone please correct me if I am wrong, but that seems a bit large of an > MDT. Of course drives these days are pretty good sized, so the extra is > probably very inexpensive. That probably depends on what the primary usage will be. If

Re: [lustre-discuss] Lustre routing help needed

2017-10-30 Thread Mohr Jr, Richard Frank (Rick Mohr)
> On Oct 30, 2017, at 8:47 AM, Kevin M. Hildebrand wrote: > > All of the hosts (client, server, router) have the following in ko2iblnd.conf: > > alias ko2iblnd-opa ko2iblnd > options ko2iblnd-opa peer_credits=128 peer_credits_hiw=64 credits=1024 > concurrent_sends=256 ntx=2048 map_on_demand=32

Re: [lustre-discuss] Linux users are not able to access lustre folders

2017-10-20 Thread Mohr Jr, Richard Frank (Rick Mohr)
> On Oct 20, 2017, at 11:37 AM, Ravi Bhat wrote: > > Thanks, I have created user (luser6) in client as well as in lustre servers. > But I get the same error as > No directory /home/luser6 > Logging in with home="/". > > But now I can cd /home/luser6 manually and create files or folders. Are

Re: [lustre-discuss] Linux users are not able to access lustre folders

2017-10-20 Thread Mohr Jr, Richard Frank (Rick Mohr)
> On Oct 20, 2017, at 10:50 AM, Ravi Konila wrote: > > Can you please guide me how do I do it, I mean install NIS on servers and > clients? > Is it mandatory to setup NIS? > NIS is not mandatory. You just need a way to ensure that user accounts are visible to the lustre servers. You could

[lustre-discuss] OSTs remounting read-only after ldiskfs journal error

2017-10-19 Thread Mohr Jr, Richard Frank (Rick Mohr)
Recently, I ran into an issue where several of the OSTs on my Lustre file system went read-only. When I checked the logs, I saw messages like these for several OSTs: Oct 6 23:27:11 haven-oss2 kernel: LDISKFS-fs: ldiskfs_getblk:834: aborting transaction: error 28 in __ldiskfs_handle_dirty_meta

Re: [lustre-discuss] Lustre poor performance

2017-08-23 Thread Mohr Jr, Richard Frank (Rick Mohr)
> On Aug 22, 2017, at 7:14 PM, Riccardo Veraldi > wrote: > > On 8/22/17 9:22 AM, Mannthey, Keith wrote: >> Younot expected. >> > yes they are automatically used on my Mellanox and the script ko2iblnd-probe > seems like not working properly. The ko2iblnd-probe script looks in /sys/class/infin

Re: [lustre-discuss] Lustre Quotas

2017-08-03 Thread Mohr Jr, Richard Frank (Rick Mohr)
> On Aug 3, 2017, at 11:48 AM, Jackson, Gary L. > wrote: > > Are quotas well supported and robust on Lustre? As far as I know, they are. But I mainly use quotas for reporting purposes. I have not had much experience with enforcing quota limits in Lustre. > What is the performance impact,

Re: [lustre-discuss] Spiking OSS load?

2017-08-03 Thread Mohr Jr, Richard Frank (Rick Mohr)
> On Aug 1, 2017, at 3:07 PM, Jason Williams wrote: > 1) Is 512 threads a reasonable setting or should it be lower? Since your servers have enough memory to support 512 threads, then it is probably reasonable. If your server load is ~100, that probably means most of those threads are si

Re: [lustre-discuss] New Lustre Installation

2017-05-22 Thread Mohr Jr, Richard Frank (Rick Mohr)
You might want to start by looking at these online tutorials: http://lustre.ornl.gov/lustre101-courses/ -- Rick Mohr Senior HPC System Administrator National Institute for Computational Sciences http://www.nics.tennessee.edu > On May 21, 2017, at 6:19 AM, Ravi Konila wrote: > > Hi There, >

Re: [lustre-discuss] Lustre 2.8.0 - MDT/MGT failing to mount

2017-05-04 Thread Mohr Jr, Richard Frank (Rick Mohr)
> On May 4, 2017, at 11:03 AM, Steve Barnet wrote: > > On 5/4/17 10:01 AM, Mohr Jr, Richard Frank (Rick Mohr) wrote: >> Did you try doing a writeconf to regenerate the config logs for the file >> system? > > > Not yet, but quick enough to try. Do this for the

Re: [lustre-discuss] Lustre 2.8.0 - MDT/MGT failing to mount

2017-05-04 Thread Mohr Jr, Richard Frank (Rick Mohr)
Did you try doing a writeconf to regenerate the config logs for the file system? -- Rick Mohr Senior HPC System Administrator National Institute for Computational Sciences http://www.nics.tennessee.edu > On May 4, 2017, at 10:03 AM, Steve Barnet wrote: > > Hi all, > > This is Lustre 2.8.0 co

Re: [lustre-discuss] building of lustre-client fails

2017-05-04 Thread Mohr Jr, Richard Frank (Rick Mohr)
> On May 3, 2017, at 10:56 PM, Riccardo Veraldi > wrote: > > I am building lustre-client from src rpm on RHL73. > > it fails with this error during the install process: > > + echo /etc/init.d/lnet > + echo /etc/init.d/lsvcgss > + find /root/rpmbuild/BUILDROOT/lustre-client-2.9.0-1.el7.x86_64

Re: [lustre-discuss] operation ldlm_queue failed with -11

2017-05-03 Thread Mohr Jr, Richard Frank (Rick Mohr)
> On May 3, 2017, at 12:23 PM, Patrick Farrell wrote: > > That reasoning is sound, but this is a special case. -11 (-EAGAIN) on > ldlm_enqueue is generally OK... > > LU-8658 explains the situation (it's POSIX flocks), so I'm going to reference > that rather than repeat it here. > > https://

Re: [lustre-discuss] operation ldlm_queue failed with -11

2017-05-03 Thread Mohr Jr, Richard Frank (Rick Mohr)
I think that -11 is EAGAIN, but I don’t know how to interpret what that means in the context of Lustre locking. I assume these messages are from the clients and the changing “x” portion is just the fact that each client has a different identifier. So if you have multiple clients complainin

Re: [lustre-discuss] client fails to mount

2017-04-24 Thread Mohr Jr, Richard Frank (Rick Mohr)
This might be a long shot, but have you checked for possible firewall rules that might be causing the issue? I’m wondering if there is a chance that some rules were added after the nodes were up to allow Lustre access, and when a node got rebooted, it lost the rules. -- Rick Mohr Senior HPC Sy

Re: [lustre-discuss] Lustre [2.8.0] flock Functionality

2017-03-28 Thread Mohr Jr, Richard Frank (Rick Mohr)
> On Mar 28, 2017, at 1:49 PM, DeWitt, Chad wrote: > > We've encountered several programs that require flock, so we are now > investigating enabling flock functionality. However, the Lustre manual > includes a passage in regards to flocks which gives us pause: > > "Warning > This mode affect

Re: [lustre-discuss] Odd quota behavior with Lustre/ZFS

2017-02-16 Thread Mohr Jr, Richard Frank (Rick Mohr)
ter doing chgrp for large subtree. IIRC, for three > groups; counts were small different "negative" numbers, not 21. > I can get more details tomorrow. > > Alex > >> On Feb 9, 2017, at 5:14 PM, Mohr Jr, Richard Frank (Rick Mohr) >> wrote: >> >

Re: [lustre-discuss] Virtual servers

2017-02-16 Thread Mohr Jr, Richard Frank (Rick Mohr)
> On Feb 16, 2017, at 9:56 AM, Jon Tegner wrote: > > I have three (physical) machines, and each one have a virtual machine on them > (KVM). On one of the virtual machines there is an MDS and on two of them > there are OSS:es installed. > > All system use CentOS-7.3 and Lustre 2.9.0, and I mou

[lustre-discuss] Odd quota behavior with Lustre/ZFS

2017-02-09 Thread Mohr Jr, Richard Frank (Rick Mohr)
I recently set up a Lustre 2.8 file system that uses ZFS for the backend storage (both on the MDT and OSTs). When I was doing some testing, I noticed that the output from lfs quota seemed odd. While the quota information for the amount of used space seemed correct, the info on the number of fi

Re: [lustre-discuss] Lustre Client hanging on mount

2017-01-13 Thread Mohr Jr, Richard Frank (Rick Mohr)
t on this. > > -Original Message----- > From: Mohr Jr, Richard Frank (Rick Mohr) [mailto:rm...@utk.edu] > Sent: Thursday, January 12, 2017 10:51 AM > To: Jeff Slapp > Cc: lustre-discuss@lists.lustre.org > Subject: Re: [lustre-discuss] Lustre Client hanging on mount > > I noticed t

Re: [lustre-discuss] Lustre Client hanging on mount

2017-01-12 Thread Mohr Jr, Richard Frank (Rick Mohr)
I noticed that you appear to have formatted the MDT with the file system name “mgsZFS” while the OST was formatted with the file system name “ossZFS”. The same name needs to be used on all MDTs/OSTs in the same file system. Until that is fixed, your file system won’t work properly. -- Rick Mo

Re: [lustre-discuss] MGS failover problem

2017-01-11 Thread Mohr Jr, Richard Frank (Rick Mohr)
> On Jan 11, 2017, at 12:39 PM, Vicker, Darby (JSC-EG311) > wrote: > >>> Getting failover right over multiple separate networks can be a real >>> hair-pulling experience. >> >> Darby: Do you have the option of (at least temporarily) running the file >> system with only Infiniband configured?

Re: [lustre-discuss] MGS failover problem

2017-01-11 Thread Mohr Jr, Richard Frank (Rick Mohr)
> On Jan 11, 2017, at 11:58 AM, Ben Evans wrote: > > Getting failover right over multiple separate networks can be a real > hair-pulling experience. Darby: Do you have the option of (at least temporarily) running the file system with only Infiniband configured? If you could set up the file sy

Re: [lustre-discuss] Lustre with Hadoop (Hortonworks Data Platform)

2017-01-09 Thread Mohr Jr, Richard Frank (Rick Mohr)
> On Jan 9, 2017, at 4:21 AM, Markham Benjamin wrote: > > I was wondering the use cases of using Lustre on Hadoop. One key things about > HDFS is that it runs on commodity hardware. Unless I’m being misinformed, > Lustre doesn’t exactly run on commodity hardware. I don’t think you want to use

Re: [lustre-discuss] MGS failover problem

2017-01-09 Thread Mohr Jr, Richard Frank (Rick Mohr)
Have you tried performing a writeconf to regenerate the lustre config log files? This can sometimes fix the problem by making sure that everything is consistent. (A writeconf is often required when making NID or failover changes.) I think you could also use that opportunity to correct your -

Re: [lustre-discuss] Round robin allocation (in general and in buggy 2.5.3)

2016-12-20 Thread Mohr Jr, Richard Frank (Rick Mohr)
> On Dec 20, 2016, at 10:48 AM, Jessica Otey wrote: > > qos_threshold_rr > > This setting controls how much consideration should be given to QoS in > allocation > The higher this number, the more QOS is taken into consideration. > When set to 100%, Lustre ignores the QoS variable and hits all

Re: [lustre-discuss] [UNTRUSTED] Re: Check clients connected?

2016-12-15 Thread Mohr Jr, Richard Frank (Rick Mohr)
> On Dec 15, 2016, at 9:30 AM, Phill Harvey-Smith > wrote: > > On 15/12/2016 14:21, Hanley, Jesse A. wrote: >> I forgot: You should also be able to use lshowmount. > > Humm that works on the old sever, but can't find the command on the new > centos 7.2 server, which I installed from RPMs I s

Re: [lustre-discuss] Lustre newbie problems: formatting disk or partition with Lustre filesystem fails

2016-11-28 Thread Mohr Jr, Richard Frank (Rick Mohr)
> On Nov 28, 2016, at 9:58 AM, Stefano Turolla > wrote: > > thanks for the quick reply, I am maybe doing the wrong thing. What I am > trying to achieve is to have a Lustre volume to be shared among the > nodes, and the 30TB is the size of existing storage. > > Should I create a separate (and m

Re: [lustre-discuss] Quick ZFS pool question?

2016-10-13 Thread Mohr Jr, Richard Frank (Rick Mohr)
> On Oct 13, 2016, at 12:32 PM, E.S. Rosenberg > wrote: > > I thought ZFS was only recommended for OSTs and not for MDTs/MGS? ZFS usually has lower metadata performance for MDT than using ldiskfs which is why some people recommend ZFS only for the OSTs. However, ZFS has features (like snaps

Re: [lustre-discuss] Still having problems Lustre 2.8 Centos 7.2

2016-09-28 Thread Mohr Jr, Richard Frank (Rick Mohr)
Did you check to make sure there are no firewalls running that could be blocking traffic? -- Rick Mohr Senior HPC System Administrator National Institute for Computational Sciences http://www.nics.tennessee.edu > On Sep 27, 2016, at 10:12 AM, Phill Harvey-Smith > wrote: > > Hi all > > I'm st

Re: [lustre-discuss] More problems setting things up....

2016-09-21 Thread Mohr Jr, Richard Frank (Rick Mohr)
> On Sep 21, 2016, at 5:08 AM, Phill Harvey-Smith > wrote: > > Sep 21 09:44:29 oric kernel: osd_zfs: disagrees about version of symbol > dsl_prop_register > Sep 21 09:44:29 oric kernel: osd_zfs: Unknown symbol dsl_prop_register (err > -22) > Sep 21 09:44:29 oric kernel: osd_zfs: disagrees abo

Re: [lustre-discuss] Mount lustre client with MDS/MGS backup

2016-09-20 Thread Mohr Jr, Richard Frank (Rick Mohr)
> On Sep 19, 2016, at 2:40 AM, Pardo Diaz, Alfonso > wrote: > > I still having the same problem in my system. My clients is stucked in the > primary MDS, that it's down, and It doesn’t use the backup (service MDS), but > only when try to connect there first time. > As I said in previous messa

Re: [lustre-discuss] Mount lustre client with MDS/MGS backup

2016-09-14 Thread Mohr Jr, Richard Frank (Rick Mohr)
/ Sola nº 1; 10200 Trujillo, ESPAÑA > Tel: +34 927 65 93 17 Fax: +34 927 32 32 37 > > > > > De: Ben Evans [bev...@cray.com] > Enviado el: jueves, 01 de septiembre de 2016 15:25 > Para: Pardo Diaz, Alfonso; Mohr Jr, Richard Frank (Rick Mohr) >

Re: [lustre-discuss] Mount lustre client with MDS/MGS backup

2016-08-31 Thread Mohr Jr, Richard Frank (Rick Mohr)
> On Aug 31, 2016, at 8:12 AM, Pardo Diaz, Alfonso > wrote: > > I mount my clients: mount -t lustre mds1@o2ib:mds2@o2ib:/fs /mnt/fs > > 1) When both MDS are OK I can mount without problems > 2) If the MDS1 is down and my clients have lustre mounted, they use MDS2 > without problems > 3) If th

Re: [lustre-discuss] Does an updated version exist?

2016-08-16 Thread Mohr Jr, Richard Frank (Rick Mohr)
> On Aug 16, 2016, at 6:55 AM, E.S. Rosenberg > wrote: > > I just found this paper: > http://wiki.lustre.org/images/d/da/Understanding_Lustre_Filesystem_Internals.pdf > > It looks interesting but it deals with lustre 1.6 so I am not sure how > relevant it still is….. Some of the information

Re: [lustre-discuss] proper procedure after MDT kernel panic

2016-08-11 Thread Mohr Jr, Richard Frank (Rick Mohr)
> On Aug 11, 2016, at 5:42 AM, E.S. Rosenberg > wrote: > > Our MDT suffered a kernel panic (which I will post separately), the OSSs > stayed alive but the MDT was out for some time while nodes still tried to > interact with lustre. > > So I have several questions: > a. what happens to proces

Re: [lustre-discuss] tune2fs being blocked by MMP

2016-08-04 Thread Mohr Jr, Richard Frank (Rick Mohr)
> On Aug 4, 2016, at 9:39 AM, Gibbins, Faye wrote: > > Yes it is mounted. But that's not always a problem. We have a test lustre > cluster where it's mounted and the tune2fs works fine. But it fails in > production. > > Production have failover turned on for the OSTs. Something absent on that

Re: [lustre-discuss] poor performance on reading small files

2016-08-03 Thread Mohr Jr, Richard Frank (Rick Mohr)
> On Aug 3, 2016, at 1:30 PM, Ben Evans wrote: > > I thought read caching was disabled by default, as the kernel's default > handling of pages was better. You might be right. It has been a while since I have set up a Lustre file system from scratch, and I haven’t done so for newer versions

Re: [lustre-discuss] poor performance on reading small files

2016-08-03 Thread Mohr Jr, Richard Frank (Rick Mohr)
Do you have the Lustre read caching feature enabled? I think it should be on by default, but you might want to check. If the files are only 20 KB, then I would think the Lustre OSS nodes could keep them in memory most of the time to speed up access (unless of course this is a metadata bottlene

Re: [lustre-discuss] tune2fs being blocked by MMP

2016-08-02 Thread Mohr Jr, Richard Frank (Rick Mohr)
> On Aug 2, 2016, at 10:38 AM, Gibbins, Faye wrote: > > > tune2fs: MMP: device currently active while trying to open > /dev/mapper/scratch--1--5-scratch_3 > > MMP error info: last update: Tue Aug 2 15:34:09 2016 > > node: edi-vf-1-5.ad.cirrus.com device: dm-19 > > 0 edi-vf-1-5:~# Is the d

Re: [lustre-discuss] ​luster client mount issues

2016-08-01 Thread Mohr Jr, Richard Frank (Rick Mohr)
> On Jul 28, 2016, at 9:54 PM, sohamm wrote: > > Client is configured for IB interface. So it looks like there might be something wrong with the LNet config on the client then. Based on the output from “lctl ping” that you ran from the server, the client only reported a NID on the tcp netwo

Re: [lustre-discuss] ​luster client mount issues

2016-07-28 Thread Mohr Jr, Richard Frank (Rick Mohr)
Is the client supposed to have an IB interface configured, or is it just supposed to mount over ethernet? -- Rick Mohr Senior HPC System Administrator National Institute for Computational Sciences http://www.nics.tennessee.edu > On Jul 20, 2016, at 2:09 PM, sohamm wrote: > > Hi > > Any guid

Re: [lustre-discuss] lnet router lustre rpm compatibility

2016-06-20 Thread Mohr Jr, Richard Frank (Rick Mohr)
> On Jun 20, 2016, at 5:00 PM, Jessica Otey wrote: > > All, > I am in the process of preparing to upgrade a production lustre system > running 1.8.9 to 2.4.3. > This current system has 2 lnet routers. > Our plan is to perform the upgrade in 2 stages: > 1) Upgrade the MDS and OSSes to 2.4.3, lea

Re: [lustre-discuss] stripe count recommendation, and proposal for auto-stripe tool

2016-05-19 Thread Mohr Jr, Richard Frank (Rick Mohr)
> On May 19, 2016, at 12:46 PM, Nathan Dauchy - NOAA Affiliate > wrote: > > Thanks for pointing out the approach of trying to keep a single file from > using too much space on an OST. It looks like the Log2(size_in_GB) method I > proposed works well up to a point, but breaks down in the capa

Re: [lustre-discuss] stripe count recommendation, and proposal for auto-stripe tool

2016-05-18 Thread Mohr Jr, Richard Frank (Rick Mohr)
> On May 18, 2016, at 1:22 PM, Nathan Dauchy - NOAA Affiliate > wrote: > > Since there is the "increased overhead" of striping, and weather applications > do unfortunately write MANY tiny files, we usually keep the filesystem > default stripe count at 1. Unfortunately, there are several user

Re: [lustre-discuss] Lustre filesystem suddenly not allowing *new* mounts, but exciting mounts continue working.

2016-05-17 Thread Mohr Jr, Richard Frank (Rick Mohr)
Have you tried doing a writeconf to regenerate the config logs? -- Rick Mohr Senior HPC System Administrator National Institute for Computational Sciences http://www.nics.tennessee.edu > On May 17, 2016, at 12:08 PM, Randall Radmer wrote: > > We've been working with lustre systems for a few ye

Re: [lustre-discuss] MDS crashing: unable to handle kernel paging request at 00000000deadbeef (iam_container_init+0x18/0x70)

2016-04-13 Thread Mohr Jr, Richard Frank (Rick Mohr)
> On Apr 13, 2016, at 2:53 PM, Mark Hahn wrote: > thanks, we'll be trying the LU-5726 patch and cpu_npartitions things. > it's quite a long thread - do I understand correctly that periodic > vm.drop_caches=1 can postpone the issue? Not really. I was periodically dropping the caches as a way to

Re: [lustre-discuss] MDS crashing: unable to handle kernel paging request at 00000000deadbeef (iam_container_init+0x18/0x70)

2016-04-13 Thread Mohr Jr, Richard Frank (Rick Mohr)
> On Apr 13, 2016, at 8:02 AM, Tommi T wrote: > > We had to use lustre-2.5.3.90 on the MDS servers because of memory leak. > > https://jira.hpdd.intel.com/browse/LU-5726 Mark, If you don’t have the patch for LU-5726, then you should definitely try to get that one. If nothing else, reading t

Re: [lustre-discuss] MDS crashing: unable to handle kernel paging request at 00000000deadbeef (iam_container_init+0x18/0x70)

2016-04-13 Thread Mohr Jr, Richard Frank (Rick Mohr)
> On Apr 12, 2016, at 6:46 PM, Mark Hahn wrote: > > all our existing Lustre MDSes run happily with vm.zone_reclaim_mode=0, > and making this one consistent appears to have resolved a problem > (in which one family of lustre kernel threads would appear to spin, > "perf top" showing nearly all tim

  1   2   >