[lustre-discuss] Migrating to new OSTs

2024-06-18 Thread Sid Young via lustre-discuss
to a new OST? But I also need steps on adding the new OSTs to the MDS (I have /home and /lustre as 2 pools). Sid Young Translational Research Institute ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi

[lustre-discuss] tunefs.lustre safe way to get config

2023-02-23 Thread Sid Young via lustre-discuss
--print mdthome/home tunefs.lustre --print mdtlustre/lustre Sid Young ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

[lustre-discuss] Lustre crash and now lockup on ls -la /lustre

2023-02-22 Thread Sid Young via lustre-discuss
specific checks I can make. Sid Young ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

[lustre-discuss] upgrade 2.12.6 to 2.12.7 - no lnet after reboot - SOLVED

2021-11-10 Thread Sid Young via lustre-discuss
# And all good, every mounts and works first go as expected :) Sid Young Translational Research Institute Brisbane > -- Forwarded message -- > From: Sid Young > To: lustre-discuss > Cc: > Bcc: > Date: Mon, 8 Nov 2021 11:15:59 +1000 > Subject: [lustre-di

[lustre-discuss] upgrade 2.12.6 to 2.12.7 - no lnet after reboot?

2021-11-07 Thread Sid Young via lustre-discuss
: # ls -la /usr/lib/modules drwxr-xr-x. 3 root root 4096 Mar 18 2021 3.10.0-1160.2.1.el7.x86_64 drwxr-xr-x 3 root root 4096 Nov 8 10:32 3.10.0-1160.25.1.el7.x86_64 drwxr-xr-x. 7 root root 4096 Nov 8 11:02 3.10.0-1160.el7.x86_64 # Anyone upgraded this way? Any obvious gottas I've missed? Sid

Re: [lustre-discuss] OST "D" status - only 1 OSS mounting

2021-11-01 Thread Sid Young via lustre-discuss
and its associated OST's? Sid Young On Mon, Nov 1, 2021 at 2:11 PM Andreas Dilger wrote: > The "D" status means the OST is marked in "Degraded" mode, see the > lfs-df(1) man page. The "lfs check osts" is only checking the client > connection to the OS

[lustre-discuss] OST "D" status - only 1 OSS mounting

2021-10-31 Thread Sid Young via lustre-discuss
... only shows 1 OST.. 10.140.93.42@o2ib:/home *48T * 48T 414G 100% /home Where should I look? Sid Young ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

[lustre-discuss] df shows wrong size of lustre file system (on all nodes).

2021-10-18 Thread Sid Young via lustre-discuss
pool is reporting as online and a scrub returns 0 errors. Sid Young ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

[lustre-discuss] Best ways to backup a Lustre file system?

2021-10-16 Thread Sid Young via lustre-discuss
G'Day all, Apart from rsync'ing all the data on a mounted lustre filesystem to another server, what backup systems are people using to backup Lustre? Sid Young M: 0458 396300 W: https://off-grid-engineering.com ___ lustre-discuss mailing list lustre

[lustre-discuss] /home remounted and running for 6 hours

2021-10-13 Thread Sid Young via lustre-discuss
MDT / OST etc? Sid Young ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Re: [lustre-discuss] Lustre /home lockup - more info

2021-10-11 Thread Sid Young via lustre-discuss
is still working and all disks physically report as OK in the ILO of the two OSS servers... When the scrub finishes later today I will unmount and remount the 4 OSTs and see if the remount changes the status... updates in about 8 hours. Sid Young On Tue, Oct 12, 2021 at 8:18 AM Sid Young wrote

[lustre-discuss] Lustre /home lockup - how to check

2021-10-11 Thread Sid Young via lustre-discuss
> > >2. Tools to check a lustre (Sid Young) >4. Re: Tools to check a lustre (Dennis Nelson) > > > My key issue is why /home locks solid when you try to use it but /lustre is OK . The backend is ZFS used to manage the disks presented from the HP D8000 JBOD I'm at

[lustre-discuss] Tools to check a lustre

2021-10-11 Thread Sid Young via lustre-discuss
hit /home... Sid Young ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

[lustre-discuss] eviction timeout

2021-10-10 Thread Sid Young via lustre-discuss
to be a 3minute timeout, is it possible to shorten this and even not log this message? Sid Young ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

[lustre-discuss] Missing OST's from 1 node only

2021-10-07 Thread Sid Young via lustre-discuss
-OST0004_UUID 51098511360 11505326080 39593183232 23% /lustre[OST:4] lustre-OST0005_UUID 51098429440 9272059904 41826367488 19% /lustre[OST:5] filesystem_summary: 153295455232 31252696064 122042753024 21% /lustre [root@n04 ~]# Sid Young Translational Research Institute

[lustre-discuss] Converting MGS to ZFS - HA Config Question

2021-05-27 Thread Sid Young via lustre-discuss
errors [root@hpc-mds-02# Sid Young ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Re: [lustre-discuss] lustre-discuss Digest, Vol 181, Issue 22

2021-04-29 Thread Sid Young via lustre-discuss
3 things Can you send your /etc/lnet.conf file Can you also send /etc/modprobe.d/lnet.conf and does a systemctl restart lnet produce an error? Sid On Fri, Apr 30, 2021 at 6:27 AM wrote: > Send lustre-discuss mailing list submissions to > lustre-discuss@lists.lustre.org > > To

Re: [lustre-discuss] lustre-discuss Digest, Vol 180, Issue 23

2021-03-23 Thread Sid Young via lustre-discuss
> > LNET on the failover node will be operational as its a separate service, > you can check it as shown below and do a "lnetctl net show": > [root@hpc-mds-02 ~]# systemctl status lnet ● lnet.service - lnet management Loaded: loaded (/usr/lib/systemd/system/lnet.service; disabled; vendor

[lustre-discuss] LVM support

2021-03-08 Thread Sid Young via lustre-discuss
likely managing the file system of each node in your cluster, what impact does an LVM LV have as an OST ? Sid Young ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

[lustre-discuss] Performance over 100G ethernet

2021-03-08 Thread Sid Young via lustre-discuss
with some performance benchmarks and config examples that would be much appreciated. Thanks Sid Young ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

[lustre-discuss] Solved - OSS Crash

2021-03-03 Thread Sid Young via lustre-discuss
and this morning its all still up and running :) Thanks everyone for your suggestions. Next challenge RoCE over 100G ConnectX5 cards :) Sid Young ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre

[lustre-discuss] OSS node crash/high CPU latency when deleting 100's of empty test files

2021-03-02 Thread Sid Young via lustre-discuss
of lustre 2.12.6). We use ZFS. YMMV. -- Karsten Weiss Sid Young ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

[lustre-discuss] OSS crashes - could be LU-14341

2021-03-02 Thread Sid Young via lustre-discuss
G'Day all, Is 2.12.6 supported on Centos 7.9? After more investigation, I believe this is the issue I am seeing: https://jira.whamcloud.com/browse/LU-14341 If there is a patch release built for 7.9 I am really happy to test it, as it's easy to reproduce and crash the OSS's Sid Young

[lustre-discuss] OSS Nodes crashing (and an MDS crash as well)

2021-03-02 Thread Sid Young via lustre-discuss
o try 2.13? https://downloads.whamcloud.com/public/lustre/lustre-2.13.0/el7/patchless-ldiskfs-server/RPMS/x86_64/ Or build a fresh instance on a clean build of the OS? Thoughts? Sid Young ___ lustre-discuss mailing list lustre-discuss@lists.lustr

[lustre-discuss] OSS node crash/high CPU latency when deleting 100's of emty test files

2021-03-01 Thread Sid Young via lustre-discuss
do that, does anyone use the mellanox ConnectX5 cards on their Lustre Storage nodes and ethernet only and if so, which driver are you using and on which OS... Thanks in advance! Sid Young ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://list

[lustre-discuss] servicenode /failnode

2021-02-25 Thread Sid Young via lustre-discuss
rvers that can manage this particular OST (the HA pair)? Sid Young ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Re: [lustre-discuss] lustre-discuss Digest, Vol 179, Issue 20

2021-02-23 Thread Sid Young via lustre-discuss
is not needed, just the /etc/modprobe.d/lnet.conf and the /etc/lnet.conf. Sid Young > > -- Forwarded message -- > From: "Degremont, Aurelien" > To: Sid Young , lustre-discuss < > lustre-discuss@lists.lustre.org> > Cc: > Bcc: > Date: Tue,

[lustre-discuss] need to always manually add network after reboot

2021-02-22 Thread Sid Young via lustre-discuss
its still wrong. Any help appreciated :) Sid Young ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

[lustre-discuss] MDS using D3710 DAS - partially Solved

2021-02-18 Thread Sid Young
OSS config several times as I optimise the installation while running under Pacemaker and have been able to mount /lustre and /home on the Compute nodes so this new system is 50% of the way there :) Sid Young > Today's Topics: > >1. Re: MDS using D3710 DAS (Sid Young) >2. R

[lustre-discuss] lfs check now working

2021-02-18 Thread Sid Young
After some experiments and recreating the two filesystems I now have lfs check mds etc working from the HPC clients :) sorry to waste bandwidth. Sid ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org

[lustre-discuss] cant check MDS?

2021-02-18 Thread Sid Young
2.4T 1% /mdt-home [root@hpc-mds-02 ~]# lfs check mds lfs check: cannot find mounted Lustre filesystem: No such device [root@hpc-mds-02 ~]# What am I doing wrong? Sid Young ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lus

Re: [lustre-discuss] MGS IP in a HA cluster

2021-02-18 Thread Sid Young
Thanks for the clarification. :) Sid Young M: 0458 396300 W: https://off-grid-engineering.com W: ( <https://z900collector.wordpress.com/restoration/>personal) https://sidyoung.com/ On Thu, Feb 18, 2021 at 4:35 PM Indivar Nair wrote: > Hi Sid, > > 1. > -- You don't need a

[lustre-discuss] MGS IP in a HA cluster

2021-02-17 Thread Sid Young
by using the IP of both the MDS servers (assume dual MDS HA cluster here)? And, if I have a 100G ethernet network (for RoCE) for Lustre usage and a 10G network for server access is the MGS IP based around the 100G network or my 10G network? Any help appreciated :) Sid Young

Re: [lustre-discuss] MDS using D3710 DAS

2021-02-14 Thread Sid Young
ools (depending on the > usage) and haven't seen any problems of this sort. Are you using ldiskfs? > > - Chris > > > On Fri, Feb 12, 2021 at 03:14:58PM +1000, Sid Young wrote: > >G'day all, > >Is anyone using a HPe D3710 with two HPeDL380/385 servers in a MDS

[lustre-discuss] MDS using D3710 DAS

2021-02-11 Thread Sid Young
at the same time. :( Sid Young ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

[lustre-discuss] Metrics Gathering into ELK stack

2020-12-09 Thread Sid Young
strings of metrics I can push into more bespoke monitoring solutions... I am more interested in I/O metrics from the lustre side of things as I can gather Disk/CPU/memory metrics with Metricbeat as needed already in the legacy HPC. Sid Young W: https://off-grid-engineering.com W: ( <ht

[lustre-discuss] Lustre via 100G Ethernet or Infiniband

2020-09-15 Thread Sid Young
With the growth of 100G ethernet, is it better to connect a lustre file server via EDR 100G Infiniband or 100G Ethernet for a 32 node HPC cluster running a typical life sciences - Genomics workload? Thoughts anyone? Sid Young ___ lustre-discuss

[lustre-discuss] RA's found

2020-07-13 Thread Sid Young
Please ignore my last email, I discovered I had the resource agent rpm but not installed it. Sid Young W: https://off-grid-engineering.com W: ( <https://z900collector.wordpress.com/restoration/>personal) https://sidyoung.com/ ___ lustre-d

[lustre-discuss] Pacemaker resource Agents

2020-07-13 Thread Sid Young
in advance! Sid Young W: https://off-grid-engineering.com W: ( <https://z900collector.wordpress.com/restoration/>personal) https://sidyoung.com/ ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/

[lustre-discuss] SOLVED - new client locks up on ls /lustre

2020-07-09 Thread Sid Young
SOLVED - Rebuilt the MDT and OST disks, changed /etc/fstab to have rw flag set explicitly and rebooted everything. Clients now mount and OSTs come up as active when I run "lfs check servers". Sid Young ___ lustre-discuss mailing list lust

[lustre-discuss] new install client locks up on ls /lustre

2020-07-08 Thread Sid Young
: [7.998649] Lustre: Lustre: Build Version: 2.12.5 [8.016113] LNet: Added LNI 10.140.95.65@tcp [8/256/0/180] [8.016214] LNet: Accept secure, port 988 [ 10.992285] Lustre: Mounted lustre-client Any pointer where to look? /var/log/messages shows no errors Sid Young