Re: [lustre-discuss] lfsck repair quota

2019-04-17 Thread Martin Hecht
Dear Fernando, I'm not sure if those files contribute to the quota, but I would assume that the ones on the OSTs consume disk quota and the ones on the MDT consume inode quota. As long as they are in the lost+found directory they are not visible to the users, but they may contain data which

Re: [lustre-discuss] lfsck repair quota

2019-04-16 Thread Martin Hecht
Are there a lot of inodes moved to lost+found by the fsck, which contribute to the occupied quota now? - Ursprüngliche Mail - Von: Fernando Pérez An: lustre-discuss@lists.lustre.org Gesendet: Tue, 16 Apr 2019 16:24:13 +0200 (CEST) Betreff: Re: [lustre-discuss] lfsck repair quota Thank

Re: [lustre-discuss] Command line tool to monitor Lustre I/O ?

2018-12-21 Thread Martin Hecht
Hello Roland, there is a nice collection of lustre monitoring tools on the lustre wiki: http://wiki.lustre.org/Lustre_Monitoring_and_Statistics_Guide which also contains a couple of references. One of them is lltop, which has already been mentioned a couple of times and that's what came to my

Re: [lustre-discuss] ko2iblnd optimizations for EDR

2018-11-08 Thread Martin Hecht
On 11/7/18 9:44 PM, Riccardo Veraldi wrote: > Anyway I Was wondering if something different is needed for mlx5 and > what are the suggested values in that case ? > > Anyone has experience with mlx5 LNET performance tunings ? Hi Riccardo, We have recently integrated mlx5 nodes into our fabric,

Re: [lustre-discuss] building lustre 2.11.50 on CentOS 7.4

2018-04-10 Thread Martin Hecht
:55 PM, Martin Hecht wrote: > Hi, > > I'm trying to build lustre 2.11 from source, with ldiskfs on CentOS 7.4. > > patching the kernel for ldiskfs worked fine, I have installed and booted > the patched kernel as well as the devel-rpm,  but when I run `make rpms` > it exits with

[lustre-discuss] building lustre 2.11.50 on CentOS 7.4

2018-04-09 Thread Martin Hecht
Hi, I'm trying to build lustre 2.11 from source, with ldiskfs on CentOS 7.4. patching the kernel for ldiskfs worked fine, I have installed and booted the patched kernel as well as the devel-rpm,  but when I run `make rpms` it exits with the following errors: Processing files:

Re: [lustre-discuss] Mixed size OST's

2018-03-16 Thread Martin Hecht
On 03/15/2018 04:48 PM, Steve Thompson wrote: > If I go with one OST per system (one zpool comprising 8 x 6 RAIDZ2 > vdevs), I will have a lustre f/s comprised of two 60 TB OST's and two > 192 TB OST's (minus RAIDZ2 overhead). This is obviously a big mismatch > between OST sizes. Depending on how

Re: [lustre-discuss] Fwd: FW: mdt mounting error

2017-11-09 Thread Martin Hecht
Hi Parag, can you lctl ping 10.2.1.204@o2ib from the mgs node and from the mds now? I have seen on the list that you were able to load the modules, but well, if lnet is not working on the ib this might be a the reason for the errors you are seeing. Regards, Martin On 11/08/2017 09:15 AM, Parag

Re: [lustre-discuss] ldiskfsprogs

2017-10-30 Thread Martin Hecht
Hi Parag, please reply to the list or keep it in cc at least On 10/30/2017 01:21 PM, Parag Khuraswar wrote: > Hi Martin, > > The problem got resolved. > But I am not able to see ib in 'lctl list_nids' output > My lnet.conf file entry is 'options lnet networks=o2ib(ib0)' This file is > not

Re: [lustre-discuss] ldiskfsprogs

2017-10-30 Thread Martin Hecht
Hi, On 10/30/2017 09:56 AM, Parag Khuraswar wrote: > Hi, > > I am installing lustre cloned from github. Hmm... there are a few lustre related repositories on github. I would prefer the upstream Lustre git repository managed by Intel git://git.hpdd.intel.com unless you are interested in specific

Re: [lustre-discuss] Lustre [2.8.0] flock Functionality

2017-03-29 Thread Martin Hecht
Hello, we use the flock mount option on all our lustre systems (currently some 2.5 versions) and are not aware of any issues due to that. If your applications run on a single node (or require locks only locally) you could also try localflock. localflock has less performance impact than the

Re: [lustre-discuss] many 'ksym' packages required

2016-12-20 Thread Martin Hecht
I have seen this, too, on SL6, build went smoothly, but installation failed. A few months before 2.9 was tagged on master the build and install went smoothly. I'm not using zfs by the way. Unfortunately, I didn't find the time yet, to investigate this more deeply. Cheers, Martin On 12/20/2016

Re: [lustre-discuss] Mounting Lustre over IB-to-Ethernet gateway

2016-08-02 Thread Martin Hecht
Hi Kevin, I think your proposed lnet config line is correct and it would add tcp0. If you add a new lnet on the servers you have to reload the lnet module, which implies that you have to restart lustre (you don't have to reboot if unloading the modules works smoothly, i.e. unmounting all targets,

Re: [lustre-discuss] Analog of ll_recover_lost_found_objs for MDS

2016-07-27 Thread Martin Hecht
Hi James, I'm not aware of a ready-to use tool, but if you have captured the output of e2fsck you can use that as a basis for a script that puts the files back to their original location. e2fsck usually prints out the full path and the inode numbers of the files/directories which it moves to

Re: [lustre-discuss] ​luster client mount issues

2016-07-21 Thread Martin Hecht
Hi, I think your client doesn't have the o2ib lnet (it should appear in the output of the lctl ping, even if you ping on the tcp lnet). In your /etc/modprobe.d/lustre.conf o2ib is associated with the ib0 interface, but your /var/log/messages talks about ib1. If it is a dual port card where just

Re: [lustre-discuss] rpmbuild error with lustre-2.8.0-3.10.0_327.3.1.el7_lustre.x86_64.x86_64.src.rpm

2016-07-05 Thread Martin Hecht
c file. While heartbeat is one option for HA on > servers, it definitely should not be required. Could you please file a Jira > ticket with details. > > Cheers, Andreas > >> On Jun 29, 2016, at 11:36, Martin Hecht <he...@hlrs.de> wrote: >> >> Hello, >> >

Re: [lustre-discuss] rpmbuild error with lustre-2.8.0-3.10.0_327.3.1.el7_lustre.x86_64.x86_64.src.rpm

2016-06-29 Thread Martin Hecht
Hello, I have just seen that you managed to mount with a different kernel, but let me come back to this error when building your own rpms for a specific kernel. Independent if you use it or not, I believe on lustre servers you need to have heartbeat installed nowadays. This is not installed by

Re: [lustre-discuss] Apache via NFS via Lustre

2016-03-09 Thread Martin Hecht
I think, if the apache uid and gid needs to be known on the mds, this depends on the question if you have configured mdt.group_upcall or not. If not, the group memberships are checked on the lustre client against its /etc/group (or ldap if that's configured). On 03/09/2016 06:59 AM, Philippe

Re: [lustre-discuss] recovery MDT ".." directory entries (LU-5626)

2015-11-05 Thread Martin Hecht
Hi, comments inline... On 11/04/2015 01:34 PM, Patrick Farrell wrote: > Our observation at the time was that lfsck did not add the fid to the .. > dentry unless there was already space in the appropriate location. Ok, I might have been wrong in this point and some manual mv by the users was

Re: [lustre-discuss] recovery MDT ".." directory entries (LU-5626)

2015-11-04 Thread Martin Hecht
On 11/04/2015 03:23 AM, Patrick Farrell wrote: > PAF: Remember, the specific conditions are pretty tight. Created under 1.8, > not empty (if it's empty, the .. dentry is not misplaced when moved) but also > non-htree, then moved with dirdata enabled, and then grown to this larger > size. How

Re: [lustre-discuss] recovery MDT ".." directory entries (LU-5626)

2015-11-02 Thread Martin Hecht
Hi Chris and Patrick, I was sick last week so I have found this conversation not before today, sorry On 10/27/2015 05:06 PM, Patrick Farrell wrote: > If you read LU-5626 carefully, there's an explanation of the exact nature of > the damage, and having that should let you make partial recoveries

Re: [lustre-discuss] Lustre 2.5.3 - OST unable to connect to MGS

2015-10-09 Thread Martin Hecht
Hi, you can use ll_recover_lost_found_objs to recover the files in lost+found to their original location. I think this should be the first step. Also these messages look a bit scary to me: Oct 7 13:02:04 OSS50 kernel: LustreError: 0-0: Trying to start OBD Lustre-OST003b_UUID using the wrong

Re: [lustre-discuss] Remove failnode parameter

2015-09-28 Thread Martin Hecht
p 24, 2015 at 1:43 AM, Martin Hecht <he...@hlrs.de> wrote: > >> On 09/23/2015 02:38 AM, Exec Unerd wrote: >>> I made a typo when setting failnode/servicenode parameters, but I can't >>> figure out how to remove the failnode parameter entirely >>>

Re: [lustre-discuss] Multiple MGS interfaces config

2015-09-28 Thread Martin Hecht
2ib0:/testfs /mnt/testfs > both: mount -v -t lustre 172.16.10.1@o2ib0,192.168.10.1@tcp0:/testfs I think here it should be a colon between the two MGS nids: mount -v -t lustre 172.16.10.1@o2ib0:192.168.10.1@tcp0:/testfs > /mnt/testfs > > Everything should be happy? > > O

Re: [lustre-discuss] Multiple MGS interfaces config

2015-09-24 Thread Martin Hecht
On 09/23/2015 02:39 AM, Exec Unerd wrote: > My environment has both TCP and IB clients, so my Lustre config has to > accommodate both, but I'm having a hard time figuring out the proper syntax > for it. Theoretically, I should be able to use comma-separated interfaces > in the mgsnode parameter

Re: [lustre-discuss] Remove failnode parameter

2015-09-24 Thread Martin Hecht
On 09/23/2015 02:38 AM, Exec Unerd wrote: > I made a typo when setting failnode/servicenode parameters, but I can't > figure out how to remove the failnode parameter entirely > > I can change the failnode NIDs, but I can't figure out how to completely > remove "failnode" from the system. > > Does

Re: [lustre-discuss] Multiple MGS interfaces config

2015-09-24 Thread Martin Hecht
On 09/24/2015 05:33 PM, Chris Hunter wrote: > [...] >>2. What's the best way to trace the TCP client interactions to see >> where >>it's breaking down? > If lnet is running on the client, you can try "lctl ping" > eg) lctl ping 172.16.10.1@o2ib > > I believe a lustre mount uses ipoib for

Re: [lustre-discuss] 1.8 client on 3.13.0 kernel

2015-09-14 Thread Martin Hecht
chris hunter > >> On 9/10/15 11:17 AM, Mohr Jr, Richard Frank (Rick Mohr) wrote: >>> Lewis, >>> >>> I did an upgrade from Lustre 1.8.6 to 2.4.3 on our servers, and for the >>> most part things went pretty good. I?ll chime in on a couple of Martin?s >

Re: [lustre-discuss] 1.8 client on 3.13.0 kernel

2015-09-11 Thread Martin Hecht
tre 1.8.6 to 2.4.3 on our servers, and for >> the most part things went pretty good. I’ll chime in on a couple of >> Martin’s points and mention a few other things. >> >>> On Sep 10, 2015, at 9:30 AM, Martin Hecht <he...@hlrs.de> wrote: >>> >>> In any case th

Re: [lustre-discuss] [HPDD-discuss] possible to read orphan ost objects on live filesystem?

2015-09-11 Thread Martin Hecht
On 09/11/2015 05:23 AM, Dilger, Andreas wrote: > On 2015/09/10, 6:54 PM, "Chris Hunter" wrote: > >> We experienced file corruption on several OSTs. We proceeded through >> recovery using e2fsck & ll_recover_lost_found_obj tools. >> Following these steps, e2fsck came out

Re: [lustre-discuss] 1.8 client on 3.13.0 kernel

2015-09-10 Thread Martin Hecht
Hi Lewis, it's difficult to tell how much data loss was actually related to the lustre upgrade itself. We have upgraded 6 file systems and we had to do it more or less in one shot, because at that time they were using a common MGS server. All servers of one file system must be on the same level

Re: [lustre-discuss] 1.8 client on 3.13.0 kernel

2015-09-09 Thread Martin Hecht
Hi Lewis, Yes, for lustre 2.x you have to "upgrade" the OS, which basically means a reinstall of a CentOS 6.x (because there is no upgade path across major releases), then install the lustre packages and the lustre-patched kernel, and then the pain begins. We had a lot of trouble when we upgraded

Re: [lustre-discuss] refresh file layout error

2015-09-04 Thread Martin Hecht
On 09/03/2015 07:22 AM, E.S. Rosenberg wrote: > On Wed, Sep 2, 2015 at 8:47 PM, Wahl, Edward wrote: > >> That would be my guess here. Any chance this is across NFS? Seen that a >> great deal with this error, it used to cause crashes. >> > Strictly speaking it is not, but it may

Re: [lustre-discuss] Convert a disk from lustre to ext4

2015-09-04 Thread Martin Hecht
Maybe, it's anyhow too late, but I have found this thread in my unread mail: On 09/01/2015 06:38 PM, Colin Faber wrote: > If you're just looking to reformat the drive, then just reformat the drive: > > http://linux.die.net/man/8/mkfs.ext4 It's still unclear what he actually did. Maybe he

Re: [lustre-discuss] [HPDD-discuss] possible to read orphan ost objects on live filesystem?

2015-09-03 Thread Martin Hecht
Hi Chris, On 09/02/2015 07:18 AM, Chris Hunter wrote: > Hi Andreas > > On 09/01/2015 07:22 PM, Dilger, Andreas wrote: >> On 2015/09/01, 7:59 AM, "lustre-discuss on behalf of Chris Hunter" >> > chris.hun...@yale.edu> wrote: >> >>> Hi Andreas,

Re: [lustre-discuss] quota only in- but not decreasing after upgrading to Lustre 2.5.3

2015-07-28 Thread Martin Hecht
Hi, it might help to disable quota using tune2fs and re-enable it again on the ext2 level on all devices, see LU-3861. (BTW you don't need the e2fsprogs mentioned in the bug, there was an official release last year in September). You have to stop lustre for the tune2fs run and it takes some

Re: [lustre-discuss] trouble mounting after a tunefs

2015-06-12 Thread Martin Hecht
Hi John, on the Parameters line the different nodes should not be separated by :. Each node should be specified by a separate mgsnode=... or failover.node=... statement. I'm not sure if separating the two interfaces of each node by , is correct here, or if this should be splitted again in two

Re: [lustre-discuss] Exporting a lustre mounted directory via nfs

2015-05-22 Thread Martin Hecht
it with the derp option. thanks, Kurt - Original Message - From: Martin Hecht To: Kurt Strosahl Sent: Thursday, May 21, 2015 12:51:41 PM Subject: Re: [lustre-discuss] Exporting a lustre mounted directory via nfs Hi Kurt, some time ago we had a client re-exporting a lustre 1.8.x

Re: [lustre-discuss] Size difference between du and quota

2015-05-21 Thread Martin Hecht
Hi, a few more things which may play a role: - as you are suspecting, the difference of used blocks vs. used bytes might be the reason, especially if there are many very small files, but there are more possible causes: - some tools use 2^10 bytes and some others use 1000 bytes as kb which might

Re: [Lustre-discuss] [HPDD-discuss] Recovering a failed OST

2014-05-28 Thread Martin Hecht
Hi bob, just to make sure: You already followed: http://wiki.lustre.org/index.php/Handling_File_System_Errors, especially the steps for e2fsck linked there? If you did *not yet* do any write operation to the damaged OST, you might want to back up the whole OST first, using dd for instance (if