Dear All,
I can contribute a few simple scripts to coordinate the start / stop of the
whole Lustre file system. Everyone is welcome to use it or modify it to fit
the usage of your system. Sorry that I did not prepare a completed document
for these scripts. Here I only mention some relevant usages
dge). More recent MOFED also come
> bundled with xpmem which is supposed to accelerate intro-node
> communications (zero copy mechanism) way better than CMA or KNEM (and xpmem
> is supported by both OpenMPI and mpich).
>
> --
> *From:* Tung-Han Hsieh
> *Sent:* September 26, 2023
cult and there is no mention of using
> "tunefs.lustre
> --writeconf" for this kind of update.
>
>
> Or am I missing something ?
>
>
> Thanks in advance for providing more tips for this kind of update.
>
>
> Martin Audet
> --
>
can try
> that to see if it fixes your issue.
>
> --Rick
>
> On 9/23/23, 2:22 PM, "lustre-discuss on behalf of Tung-Han Hsieh via
> lustre-discuss" lustre-discuss-boun...@lists.lustre.org> on behalf of
> lustre-discuss@lists.lustre.org <mailto:lustre-discuss@lists
Dear All,
Today we tried to upgrade Lustre file system from version 2.12.6 to 2.15.3.
But after the work, we cannot mount MDT successfully. Our MDT is ldiskfs
backend. The procedure of upgrade is
1. Install the new version of e2fsprogs-1.47.0
2. Install Lustre-2.15.3
3. After reboot, run:
Dear Redl, Robert,
On Wed, Oct 26, 2022 at 02:37:12PM +, Redl, Robert wrote:
> Dear Etienne,
>
> thanks a lot for the detailed explanation! I will try out the patch at the
> next opportunity.
>
> @Tung-Han Hsieh: I think the issue that indices of old OST remain un
Hello,
Just an experience to share. If we follow the correct procedure
to permanently remove an OST, the index of that OST still exists
in MDT. The only way to remove that OST index from MDT is to run
"tunefs.lustre --writeconf" to MDT (and also to all OSTs). That
needs temporarily shutdown the
Dear All,
Occasionally, we need to explicitly move some data out of a specific
OST. For example, the device of that OST should be replaced, or the
OST is almost full and we want to balence the amount of data across
the OSTs. In these cases we usually run:
lfs find --obd
to get the
y is used by mkfs.lustre
>
> As workaround you can install e2fsprogs from a rpm.
> Here is a link to el8 version for instance
>
> https://downloads.whamcloud.com/public/e2fsprogs/latest/el8/RPMS/x86_64/
>
> Best regards,
> Artem Blagodarenko
>
> > On 12 Oct 2022
Dear All,
We manually compiled and installed Lustre-2.15.0 and 2.15.1,
with backend file system ldiskfs. We found that it cannot mount
the MGT/MDT partition at all (we formatted MGT and MDT in the
same partition).
The symptom is following. After installation, we partitioned and
formatted the
gt;
> > On Fri, Sep 30, 2022 at 3:59 AM Tung-Han Hsieh <
> > thhs...@twcp1.phys.ntu.edu.tw> wrote:
> >
> >> Dear Peter,
> >>
> >> Thank you very much for your prompt reply.
> >>
> >> Actually we just encountered OST file
ntime is there a specific piece of
> information that you wanted to know? The list of fixed issues is also
> available in JIRA for example -
> https://jira.whamcloud.com/secure/ReleaseNote.jspa?projectId=1=15891
>
>
> ???On 2022-09-30, 12:09 AM, "lustre-discuss
Dear All,
Could anyone point out where to find the Lustre-2.15.1 change log ?
This URL found in Lustre website is invalid:
https://wiki.lustre.org/Lustre_2.15.1_Changelog
Thank you very much.
T.H.Hsieh
___
lustre-discuss mailing list
Dear All,
We tried to compile Lustre-2.12.6, 2.12.8, and 2.14.0 with Linux
kernel 5.10.114 and 5.15.38, the newest releases of the longterm
series of Linux kernel in Linux Kernel Archives:
https://www.kernel.org/
but all failed in configure state. When running this command in:
./configure
Dear All,
We encounter a problem to mount a damaged OST partition, as described
below.
The OST partition suffered serious hard disk damage, which was sent to
a data rescue company to try to recover the data as much as possible.
After that, we run
tunefs.lustre --writeconf /dev/
to
15, 2022 at 06:31:32PM +0800, Tung-Han Hsieh wrote:
> Dear All,
>
> We encounter a problem to mount a damaged OST partition, as described
> below.
>
> The OST partition suffered serious hard disk damage, which was sent to
> a data rescue company to try to recover the data as m
Dear All,
We encounter a problem to mount a damaged OST partition, as described
below.
The OST partition suffered serious hard disk damage, which was sent to
a data rescue company to try to recover the data as much as possible.
After that, we run
tunefs.lustre --writeconf /dev/
to
Dear All,
We have an existing OST with ZFS backend which occupies the full size
(63 TB) of a large storage. Now this OST already has 56% full of data.
The machine which holds this OST does not have other OSTs. But for a
long time, we found that the loading of this machine is unreasonably
high
rocess
> > directly
> > > > inside a lctl command "del_ost".
> > > >
> > > > This process could be applied live, the changes will take effect only
> > > > after whole system remount (when MGS configuration is read by
> > > > clien
ustre with --replace
> >option, and set the index to the old OST index (e.g., 0x8):
> >
> >
> > 6.5. Reactivate the old OST index:
> >
> >lctl set_param osc.chome-OST0008-osc-MDT.active=1
> >
> > 7. Mount the new OST (run in th
> whether
> > > it can be recovered back or not.
> > >
> > > So probably I missed another step. Between step 6 and 7, we need to
> > > reactivate the old OST before mounting the new OST ?
> > >
> > > 6. Prepare the new OST for rep
ase the new OST for accepting new objects:
lctl set_param osc.chome-OST0008-osc-MDT.max_create_count=2
Cheers,
T.H.Hsieh
> On 05/03/2021 11:48, Tung-Han Hsieh via lustre-discuss wrote:
> > Dear Hans,
> >
> > Thank you very much. Replacing the OST is new to
ced.
>
> Hope you migrated your data away from the OST also. Otherwise you would
> have lost it.
>
> Cheers,
> Hans Henrik
>
> On 03.03.2021 11.22, Tung-Han Hsieh via lustre-discuss wrote:
> > Dear All,
> >
> > Here is a question about how to remove an O
Dear All,
Here is a question about how to remove an OST completely without
restarting the Lustre file system. Our Lustre version is 2.12.6.
We did the following steps to remove the OST:
1. Lock the OST (e.g., chome-OST0008) such that it will not create
new files (run in the MDT server):
t;mv" problem in 2.12.6. But so far I cannot
> figure out what has been missed.
>
> The way to cure this problem is simple. We only need to rename the directories
> created in 1.8.8, i.e.,
>
> mv A A.tmp
> mv A.tmp A
> mv B B.tmp
>
ments is very welcome.
Best Regards,
T.H.Hsieh
On Fri, Feb 19, 2021 at 03:53:41AM +0800, Tung-Han Hsieh wrote:
> Dear All,
>
> Recently we found a strange problem of the upgraded Lustre file system.
>
> We have several very old Lustre file systems with version 1.8.8. We first
&
Dear All,
Recently we found a strange problem of the upgraded Lustre file system.
We have several very old Lustre file systems with version 1.8.8. We first
upgraded them to 2.10.6. It seems ok. Then we upgraded them to 2.12.6.
Now we encouter a problem of moving file from directory A to
Dear All,
I hope that this is the right place to post our feature request
of Lustre file system.
Currently Lustre file system with ldiskfs backend server has to
compiled with specified Linux kernel versions of RedHat, or SuSE,
or other distributions which we haven't tried. But this is very
Dear All,
Recently we read from this mailing list and learned that currently
Lustre file system MDT with ZFS backend has poor performance. Too
bad that we did not notice this before deploying this configuration
to many of our cluster systems. Now that there are already too much
amount of data to
Dear Serg,
Many years ago when I was using Lustre-1.8.X, I used to suffer the
same nightmare as you now. The following procedure saved me. But
I am not sure whether it works to you or not.
1. umount all the clients, umount OST.
2. mount OST as ldiskfs:
mount -t ldiskfs /dev/ /mnt
3.
Dear Nguyen,
Usually the upgrade procedure is the following:
1. Shutdown the Lustre file system completely.
(umount all the clients and servers)
2. In all the clients and servers, install the new version of Lustre
software. If your servers are Lustre with ldiskfs backend, please
Hello,
It is OK. We have a cluster with Lustre-2.5.3 installed in
the Lustre servers, and the clients with Lustre 2.5.3, 2.10.7,
and 2.12.5 mounted the Lustre-2.5.3 servers. So far there is
no problems.
Cheers,
T.H.Hsieh
On Mon, Nov 30, 2020 at 03:48:07PM +0700, Nguyen Viet Cuong wrote:
> Hi
eed LU-12759 (fixed in 2.12.4)
> > on your clients since there was a bug on older clients and that setting was
> > not working correctly.
> >
> > On Mon, Nov 2, 2020 at 12:38 AM Tung-Han Hsieh <
> > thhs...@twcp1.phys.ntu.edu.tw> wrote:
> >
> >> Dear
g what's the meaning of this setting in MGS node.
Thank you very much.
T.H.Hsieh
On Fri, Oct 30, 2020 at 01:37:01PM +0800, Tung-Han Hsieh wrote:
> Dear Simon,
>
> Thank you very much for your useful information. Now we are arranging
> the system maintenance date in order to upg
; shrinking seems to be useful when the OSTs are running out of free space.
>
> On Wed, Oct 28, 2020 at 11:47 PM Tung-Han Hsieh <
> thhs...@twcp1.phys.ntu.edu.tw> wrote:
>
> > Dear Simon,
> >
> > Thank you very much for your hint. Yes, you are right. We compared
&g
shrinking and was getting
> stuck under 1MB. Once under 1MB, the client had to send every request to
> the OST using sync IO.
>
> Check the output of the following command:
> lctl get_param osc.*.cur_grant_bytes
>
> On Wed, Oct 28, 2020 at 12:08 AM Tung-Han Hsieh <
> th
t 08, 2020 at 01:32:53PM -0600, Andreas Dilger wrote:
> On Oct 8, 2020, at 10:37 AM, Tung-Han Hsieh
> wrote:
> >
> > Dear All,
> >
> > In the past months, we encountered several times of Lustre I/O abnormally
> > slowing down. It is quite mysterious that ther
Dear All,
I have a question about partitioning OSS with ZFS backend, where OSS
has a very large storage attached.
We have a lustre file system with two OSS. Each OSS has a storage
attached:
$ ssh fs2 df -h
Filesystem Size Used Avail Use% Mounted on
/dev/root59G 8.4G 48G 16% /
Dear All,
In the past months, we encountered several times of Lustre I/O abnormally
slowing down. It is quite mysterious that there seems no problem on the
network hardware, nor the lustre itself since there is no error message
at all in MDT/OST/client sides.
Recently we probably found a way to
Dear All,
We have a cluster installing Lustre-2.12.4. We occationally encountered
serious I/O slowing down. So I am asking how to fix this problem.
There are one MDT server and one OST server. Since our operating system
is Debian-9.12, we installed Lustre by compiling from the source codes:
-
ole Lustre file system ?
Any comments are very welcome. Thanks in advance.
Best Regards,
T.H.Hsieh
On Wed, Sep 09, 2020 at 01:45:10PM -0600, Andreas Dilger wrote:
> On Sep 8, 2020, at 9:13 PM, Tung-Han Hsieh
> wrote:
> >
> > I would like to ask whether Lustre file system has
; you can use Lustre with ZFS backend and enabled compression. That has
> the effect you are looking for and works very well.
>
> Cheers,
> Robert
>
> Am 09.09.20 um 05:13 schrieb Tung-Han Hsieh:
> > Dear All,
> >
> > I would like to ask whether Lustre
Dear All,
I would like to ask whether Lustre file system has implemented the
function to optimize for large sparse data files ?
For example, a 3GB data file but with more than 80% bytes zero, can
Lustre file system optimize the storage not actually taking the whole
3GB of disk space ?
I know
Dear All,
We have a lustre file system version 2.10.7. Its layout is the following:
1. MGS+MDT (ldiskfs): chome-MDT: /dev/sda7
2. OST (ZFS): chome-OST0004: chome_ost/ost (41% occupied)
They are located in different machines. Now due to some problem of the
existing lustre system
e the correct record for the OST manually ? We would
like to try whether this problem could be fixed in this way.
Thanks very much.
T.H.Hsieh
On Sat, Aug 08, 2020 at 05:36:39PM +0800, Tung-Han Hsieh wrote:
> Dear All,
>
> We found additional error message in dmesg of MDT server:
>
> L
not sure whether it caused the indefinitely running of the
"orph_cleanup_ch" process in MDT. Is there any way to fix it ?
(now it has run more than 6.5 hours, and is still running)
Thanks very much.
T.H.Hsieh
On Sat, Aug 08, 2020 at 03:44:18PM +0800, Tung-Han Hsieh wrote:
> Dear All,
Dear All,
We have a running Lustre file system with version 2.10.7. The MDT
server runs Linux kernel 3.0.101, and MDT is using ldiskfs backend
with patched Linux kernel.
Today our MDT server crashed and needed cold reboot. In other words,
the Lustre MDT was not cleanly unmounted before reboot.
Dear All,
One of our Lustre OST servers continuously shown up the following
error messages in dmesg:
==
LNet: Service thread pid 51988 was inactive for 200.44s. Watchdog stack traces
are limited to 3 per 300 seconds,
t workaround we found was to disable discovery on 2.12 clients:
>
> # lnetctl set discovery 0
>
> Cheers,
> Hans Henrik
>
> On 04.07.2020 09.26, Tung-Han Hsieh wrote:
> > Dear All,
> >
> > We have Lustre servers (MDS, OSS) with Lustre-2.10.7 installed,
Dear All,
We have a Lustre server with MGS/MDT/OST in the same node:
(IP: 192.168.60.151)
/dev/sda775374372 881520 69427728 2% /cfs/chome_mdt (MGS+MDT)
/dev/sdb2 4842583272 4232667824 365744340 93% /cfs/chome_ost2 (OST1)
/dev/sdc1 4841978664 806638400 3791199640 18%
Dear All,
We have Lustre servers (MDS, OSS) with Lustre-2.10.7 installed, with
both tcp and o2ib interfaces:
[ 193.016516] Lustre: Lustre: Build Version: 2.10.7
[ 193.486408] LNet: Added LNI 192.168.62.151@o2ib [8/256/0/180]
[ 193.538200] LNet: Added LNI 192.168.60.151@tcp [8/256/0/180]
[
Greetings,
Recently we have a new storage device. It has dual RAID controllers
with two fibre connections to the file server which map the LUM of
the storage to the server:
# lsscsi -g
[5:0:0:0]diskIFT DS 1000 Series 661J /dev/sdb /dev/sg4
[6:0:0:0]diskIFT DS 1000
and good OSTs. This seems quite unnecessary. Is
there a way to remove the logs of only the bad OST if I know its ID ?
Thanks in advance.
T.H.Hsieh
On Mon, Apr 15, 2019 at 04:32:57PM +0800, Tung-Han Hsieh wrote:
> Dear All,
>
> We are facing a serious problem after a mistake of doing Lustre
Dear All,
We are facing a serious problem after a mistake of doing Lustre
(1.8.8) maintenance.
We had a bad OST and want to remove it. So we went to MDS and run
lctl conf_param foo-OST.osc.active=0
After doing this, in MDS there are still logs reside in /proc/fs/lustre/osc/
Dear All,
Our system was recently upgraded to lustre-2.10.6. We are doing the
data migration from some almost full OSTs to a newly installed file
server. But we often encountered file system freezed for about 30 secs,
and then returned to normal (within 5 mins it may happen several times).
Our
,
T.H.Hsieh
On Sat, Mar 16, 2019 at 10:24:55AM +0800, Tung-Han Hsieh wrote:
> Dear YangSheng,
>
> Sorry for replying late, and thank you very much your suggestion.
> Here is the follow up of our tests.
>
> We followed your suggestion. In this file:
>
> ldiskfs/ke
> You can port ldiskfs
> patch(ldiskfs//kernel_patches/patches/rhel6.3/ext4-export-64bit-name-hash.patch)
> to fix this issue. This patch has been landed to RHEL. But looks like Debian
> not.
>
> Thanks,
> YangSheng
>
> > 在 2019年3月13日,上午12:36,Tung-Han Hsieh ?吹?:
> We also have prebuilt 2.10.6 packages for many platforms:
> https://downloads.whamcloud.com/public/lustre/
>
>
> - Patrick
>
>
>
> ________
> From: lustre-discuss on behalf of
> Tung-Han Hsieh
> Sent: Tuesday, March 12, 2019 9:22:57
Dear All,
I am trying to compile lustre-2.10.6 from source code. During
compilation, there are undefined symbols:
==
CC [M]
/home/thhsieh/lustre/L-2.10.6/lustre-2.10.6/lustre/osd-ldiskfs/osd_handler.o
In file included
Ts. But since they
are blocked, so the whole file system hangs.
Could anyone give us suggestions how to solve it ?
Best Regards,
T.H.Hsieh
On Sun, Mar 03, 2019 at 06:00:17PM +0800, Tung-Han Hsieh wrote:
> Dear All,
>
> We have a problem of data migration from one OST two another.
Dear All,
We have a problem of data migration from one OST two another.
We have installed Lustre-2.5.3 on the MDS and OSS servers, and Lustre-2.8
on the clients. We want to migrate some data from one OST to another in
order to re-balance the data occupation among OSTs. In the beginning we
follow
Dear All,
I am asking whether it is possible to mount lustre when some of the
OSTs are missing.
The situation is following. We have some old systems running Lustre
version 1.8.8. One of the file server which contains several OSTs
were broken, and need some time to repair. In the meantime, we are
have made wrong speculation about
my problem. It is nothing related to the version of kmod or udev.
I should apologize for making wrong statements before more careful
investigation.
Best Regards,
T.H.Hsieh
On Tue, Sep 25, 2018 at 06:00:24PM +0800, Tung-Han Hsieh wrote:
> Hello,
>
> I jus
der system was due to kmod
or udev version.
Could anyone confirm my speculation ?
Thanks very much.
T.H.Hsieh
On Tue, Sep 25, 2018 at 05:33:01PM +0800, Tung-Han Hsieh wrote:
> Dear Andreas,
>
> Thank you very much for your kindly reply.
>
> When I run "modprobe lustre", dm
leased
Could anyone suggest how to debug ?
Thanks very much.
T.H.Hsieh
On Tue, Sep 25, 2018 at 12:14:00AM +0800, Tung-Han Hsieh wrote:
> Dear Nathaniel,
>
> Thank you very much for your kindly reply. Indeed I modified the
> lustre-2.10.5 codes:
>
> lustre/osd-zfs/osd_obj
t;
> There’s a ticket for adding ZFS 0.7.11 support to lustre:
> https://jira.whamcloud.com/browse/LU-11393
>
> It has patches for master (pre-2.12) and a separate patch for 2.10.
>
> —
> Nathaniel Clark mailto:ncl...@whamcloud.com>>
> Senior Engineer
> Wha
Dear All,
I am trying to install Lustre version 2.10.5 with ZFS-0.7.11
from source code. After compilation and installation, I tried
to load the "lustre" module, but encountered the following
error:
# modprobe lustre
could not load module 'lustre': no such device
My procedure of installation is
Dear All,
When backing up the MDT of the lustre filesystem, I saw that
one can use "tar" with option --sparse to avoid mistakenly
archiving data with full of zeros. My question is that, Is
there any requirement of the "tar" version for this function ?
For example, I saw GNU tar version 1.20 also
Dear All,
I am asking for a strange problem of Lustre filesystem we have
encountered since many years ago.
We are using Lustre 1.8.7 (many years ago we already have set
up the Lustre filesystem for more than 50TB data storage, and
our system is busy for scientific computation almost all the
Dear All,
I have a question about data file migration from one OST to anther
for Lustre 2.5.
Suppose that OST is going to be removed. In the past (I mean,
Lustre-1.8.7), I can do the migration smoothly via the following
procedure:
1. In MDT server, stop writing new files to OST:
Dear Dilger,
On Thu, Mar 12, 2015 at 05:42:33AM +, Dilger, Andreas wrote:
On 2015/03/11, 12:57 AM, Tung-Han Hsieh thhs...@twcp1.phys.ntu.edu.tw
wrote:
But here we encounter a problem. Since the facilities of the data
rescue company is Windows based, they told me that the most they
can
Regards,
T.H.Hsieh
On Fri, Feb 13, 2015 at 06:09:46PM +, Dilger, Andreas wrote:
On 2015/02/12, 2:47 AM, Tung-Han Hsieh thhs...@twcp1.phys.ntu.edu.tw
wrote:
We have a Lustre filesystem (version 1.8.8) in our lab, which
contains several about 20 OSTs and only one MDT. Recently one
of our
Dear All,
We have a Lustre filesystem (version 1.8.8) in our lab, which
contains several about 20 OSTs and only one MDT. Recently one
of our file server got serious crashed. It contains 4 OSTs
stored in RAID 5. We sent the file server to data rescue company.
They told me that they can rescue all
73 matches
Mail list logo