Re: [lustre-discuss] [EXTERNAL] [BULK] Files created in append mode don't obey directory default stripe count

2024-04-29 Thread Simon Guilbault
This is the expected behaviour. In the original implementation of PFL, when a file was open in append mode, the lock from 0 to EOF was initializing all stripes of the PFL file. We have a PFL layout on our system with 1 stripe up to 1 GB, then it increased to 4 and then 32 stripes when the file was

Re: [lustre-discuss] o2ib on RoCE

2023-06-16 Thread Simon Guilbault
We are not using RoCE in production, but a few years ago we tested it with 25Gb/s cards. From what I recall, RDMA was working as expected using the MOFED stack, running lnet bench was using a few cores with TCP, while the same benchmark with RoCE was using almost 0 CPU cores. The only change I had

Re: [lustre-discuss] Lustre 2.15.1 change log

2022-09-30 Thread Simon Guilbault
Hi, the grant_shrink bug was fixed in 2.15.0 according to this JIRA: https://jira.whamcloud.com/browse/LU-14124 On Fri, Sep 30, 2022 at 3:59 AM Tung-Han Hsieh < thhs...@twcp1.phys.ntu.edu.tw> wrote: > Dear Peter, > > Thank you very much for your prompt reply. > > Actually we just encountered OST

Re: [lustre-discuss] ZFS wobble

2022-04-28 Thread Simon Guilbault
Hi, Start a ZFS scrub on your pool, this will ensure that all the content is fine since the short resilver when re-adding dead disks to a pool does not check everything, only what changed on the pool while that disk was gone. I sadly often see that kind of error on my personal NAS due to some bad

Re: [lustre-discuss] MDT hanging

2021-03-09 Thread Simon Guilbault via lustre-discuss
Hi, One of the things that the ZFS pacemaker resource does not seem to pick up failure is when MMP fails due to some problem with the SAS bus. We added this short script running as a systemd daemon to do a failover when this happens. The other check in this script is using NHC, mostly to check if t

Re: [lustre-discuss] getting list of files and folders without OST/OSS

2020-11-12 Thread Simon Guilbault
Hi, You can mount your MDT directly in ldiskfs mode (if using ldiskfs) instead of Lustre, your complete metadata tree will be visible in the ROOT directory inside the mount. On Thu, Nov 12, 2020 at 2:41 AM Zeeshan Ali Shah wrote: > Dear All, is it possible to get list of files and directories f

Re: [lustre-discuss] Hidden QoS in Lustre ?

2020-11-02 Thread Simon Guilbault
cur_grant_bytes=797097 > > > > > > The value 797097 seems to be the minimum. When it dropped to 1062795, > > the time of cp dramatically increased from around 1 sec to 1 min. In > > addition, during the test, the cluster is completely idling. And it > > is obviou

Re: [lustre-discuss] Hidden QoS in Lustre ?

2020-10-29 Thread Simon Guilbault
namically adjusted > ? > > Thank you very much for your comment in advance. > > Best Regards, > > T.H.Hsieh > > On Wed, Oct 28, 2020 at 02:00:21PM -0400, Simon Guilbault wrote: > > Hi, we had a similar performance problem on our login/DTNs node a few > > months a

Re: [lustre-discuss] Hidden QoS in Lustre ?

2020-10-28 Thread Simon Guilbault
Hi, we had a similar performance problem on our login/DTNs node a few months ago, the problem was the grant size was shrinking and was getting stuck under 1MB. Once under 1MB, the client had to send every request to the OST using sync IO. Check the output of the following command: lctl get_param o

Re: [lustre-discuss] status of HSM copytools?

2020-08-24 Thread Simon Guilbault
Hi The Compute Canada S3/Ceph copytool was never deployed in production since S3 backed storage was never installed. We moved to a tape based system with TSM instead and have been running that copytool on 3 different sites for the past year. https://github.com/guilbaults/ct_tsm On Sun, Aug 23, 20

Re: [lustre-discuss] Lustre client on RedHat 7.8

2020-05-14 Thread Simon Guilbault
Hi, you can try these patches and check their respective Jira entry: git cherry-pick --strategy=recursive -X theirs 97823e65efd85bb2db325232584d65646be5644f # for centos 7.8 LU-13347 git cherry-pick --strategy=recursive -X theirs 851ba18ee0424a3a4bf27d54d0c1af20eaf04ed6 # for centos 7.8 LU-13347 gi

[lustre-discuss] Shrinking grant with 2.12 clients

2020-03-30 Thread Simon Guilbault
Hi, We seem to be hitting a performance issue with Lustre clients 2.12.2 and 2.12.3. Over time, the grant size of the OSC is shrinking and getting under 1MB and does not grow back. This lowers the performance of this client to a few MB/s, even in the kB/s for some OST. This does not seem to happen

Re: [lustre-discuss] Assistance Compiling Lustre 2.12.2 with ZFS 0.8.1

2019-08-07 Thread Simon Guilbault
If it can be of any help, here is the script I use in a throwaway Openstack VM to build Lustre+ZFS without LDISKFS on a vanilla Centos kernel: yum install -y epel-release yum install -y wget git yum install -y asciidoc audit-libs-devel automake bc binutils-devel bison device-mapper-devel elfutils

Re: [lustre-discuss] compile lustre client for ARM64

2018-05-30 Thread Simon Guilbault
Hi, I had the same error message with ./configure but it was for a x86 server with some old Qlogic card. The relevant information to find the error was in config.log. In my case it was a macro problem (IS_ENABLED) in source/arch/x86/include/asm/pgtable.h:504 that was added for the Meltdown/Spectr

Re: [lustre-discuss] 1 MDS and 1 OSS

2017-10-30 Thread Simon Guilbault
Hi, If everything is connected with SAS JBOD and controllers, you could probably run 1 OST on each server and get better performance that way. With both server reaching the same SAS drives, you could also have a failover in case one server does not work. You can forget about failover if you are u

Re: [lustre-discuss] Does Lustre support RoCE?

2017-05-11 Thread Simon Guilbault
Hi, your lnet.conf look fine, I tested lnet with RoCE V2 a while back with a pair of server using Connectx4 with a single 25Gb interface and RDMA was working with Centos 7.3, stock RHEL OFED and Lustre 2.9. The only settings that I had to use in lustre's config was this one: options lnet networks=