Re: [lustre-discuss] NRS TBF by UID and congestion

2021-10-18 Thread Moreno Diego (ID SIS)
Stephane > On Oct 14, 2021, at 12:33 PM, Moreno Diego (ID SIS) wrote: > > Hi Lustre friends, > > I'm wondering if someone has experience setting NRS TBF (by UID) on the OSTs (ost_io and ost service) in order to avoid congestion of the filesystem

[lustre-discuss] NRS TBF by UID and congestion

2021-10-14 Thread Moreno Diego (ID SIS)
Hi Lustre friends, I'm wondering if someone has experience setting NRS TBF (by UID) on the OSTs (ost_io and ost service) in order to avoid congestion of the filesystem IOPS or bandwidth. All my tries during the last months have miserably failed into something that doesn’t look like QoS when th

Re: [lustre-discuss] Elegant way to dump quota/usage database?

2021-02-12 Thread Moreno Diego (ID SIS)
Hi Steve, If you have access to the servers you could aggregate the information given by: lctl get_param osd-ldiskfs.*.quota_slave_dt.acct_{user,group,project} The command will basically give you for each Lustre device the information stored on the inode quota for user, group or project quotas.

Re: [lustre-discuss] Do old clients ever go away?

2020-06-05 Thread Moreno Diego (ID SIS)
I don't see a way to clear the exports on the MGS side so it seems you get there every single NID that ever connected to the system. You can however clear this on the MDSes/OSSes: [root@mds01 ~]# ls /proc/fs/lustre/mdt/fs1-MDT0001/exports/ | wc -l 5182 [root@mds01 ~]# echo 1 > /proc/fs/lustre/md

Re: [lustre-discuss] Slow mount on clients

2020-02-03 Thread Moreno Diego (ID SIS)
Not sure if it's your case but the order of MGS' NIDs when mounting matters: [root@my-ms-01xx-yy ~]# time mount -t lustre 10.210.1.101@tcp:10.210.1.102@tcp:/fs2 /scratch real0m0.215s user0m0.007s sys 0m0.059s [root@my-ms-01xx-yy ~]# time mount -t lustre 10.210.1.102@tcp:10.210.1.10

Re: [lustre-discuss] Nodemap, ssk and mutiple fileset from one client

2020-01-06 Thread Moreno Diego (ID SIS)
I’m not sure about the SSK limitations but I know for sure that you can have multiple filesets belonging to the same filesystem on a client. As you already said, you’ll basically need to have one LNET per fileset (o2ib0, o2ib1, o2ib2), then mount each fileset with the option ‘-o network=’. I ga

Re: [lustre-discuss] LDLM locks not expiring/cancelling

2020-01-06 Thread Moreno Diego (ID SIS)
Hi Steve, I was having a similar problem in the past months where the MDS servers would go OOM because of SlabUnreclaim. The root cause has not yet been found but we stopped seeing this the day we disabled the NRS TBF (QoS) for any LDLM service (just in case you have it enabled). Just in case y

Re: [lustre-discuss] Degraded read performance with Large Bulk IO (16MB RPC)

2019-12-13 Thread Moreno Diego (ID SIS)
performance: lctl list_param -R llite | grep max_read_ahead From: Pinkesh Valdria Date: Friday, 13 December 2019 at 17:33 To: "Moreno Diego (ID SIS)" , "lustre-discuss@lists.lustre.org" Subject: Re: [lustre-discuss] Degraded read performance with Large Bulk IO (16MB

Re: [lustre-discuss] Degraded read performance with Large Bulk IO (16MB RPC)

2019-12-13 Thread Moreno Diego (ID SIS)
, Diego From: Pinkesh Valdria Date: Wednesday, 11 December 2019 at 17:46 To: "Moreno Diego (ID SIS)" , "lustre-discuss@lists.lustre.org" Subject: Re: [lustre-discuss] Degraded read performance with Large Bulk IO (16MB RPC) I was not able to find those parameters on my clien

Re: [lustre-discuss] Degraded read performance with Large Bulk IO (16MB RPC)

2019-12-10 Thread Moreno Diego (ID SIS)
With that kind of degradation performance on read I would immediately think on llite’s max_read_ahead parameters on the client. Specifically these 2: max_read_ahead_mb: total amount of MB allocated for read ahead, usually quite low for bandwidth benchmarking purposes and when there’re several fi

Re: [lustre-discuss] Lnet Self Test

2019-12-04 Thread Moreno Diego (ID SIS)
I recently did some work on 40Gb and 100Gb ethernet interfaces and these are a few of the things that helped me during lnet_selftest: * On lnet: credits set to higher than the default (e.g: 1024 or more), peer_credits to 128 at least for network testing (it’s just 8 by default which is goo

Re: [lustre-discuss] one ost down

2019-11-15 Thread Moreno Diego (ID SIS)
Hi Einar, As for the OST in bad shape, if you have not cleared the bad blocks on the storage system you’ll keep having IO errors when your server tries to access these blocks, that’s kind of a protection mechanism and lots of IO errors might give you many issues. The procedure to clean them up

Re: [lustre-discuss] [EXTERNAL] Re: Lustre Timeouts/Filesystem Hanging

2019-10-29 Thread Moreno Diego (ID SIS)
Hi Louis, If you don’t have any particular monitoring on the servers (Prometheus, ganglia, etc..) you could also use sar (sysstat) or a similar tool to confirm the CPU waits for IO. Also the device saturation on sar or with iostat. For instance: avg-cpu: %user %nice %system %iowait %steal

Re: [lustre-discuss] RV: Lustre quota issues

2019-07-11 Thread Moreno Diego (ID SIS)
Hi Thomas, I think one way to get reliable quota values might be to reduce a couple of tunables: - osc.*.max_dirty_mb : In this case you reduce the amount of non-committed data on the client cache and thus the potential for quota inconsistencies. I've been recently having quota issues on a fil

Re: [lustre-discuss] LFS Quota

2019-01-09 Thread Moreno Diego (ID SIS)
Hi ANS, About the soft limits and not receiving any warning or notification when the soft quota is reached, this would be the expected behavior. The soft quota is used together with the grace period to give some “extra” time to the user to remove inodes/blocks, as per the Lustre Operations Manu

Re: [lustre-discuss] space usage is not limited when using project quota

2019-01-07 Thread Moreno Diego (ID SIS)
Hi Zhang Di, Hope it’s not too late to jump into this one 😊 You’re only providing the quota settings on MDT0 but did you also enable project quotas on the OSTs? oss1$> lctl get_param osd-*.*.quota_slave.info | grep space space acct: ugp space acct: ugp space acct: ugp space acct:

[lustre-discuss] New accounts in Jira?

2018-06-27 Thread Moreno Diego (ID SIS)
Hello, It doesn’t seem possible to create a new accounts on https://jira.whamcloud.com/ unless I’m missing something obvious… On the login screen it says “Not a member? To request an account, please contact your JIRA administrators