Hi,Are you mixing OS versions? You may want to align parameters of the ko2iblnd kernel module (esp "map_on_demand") value on both server and clients. The default seems to differ among major kernel versions and that could cause issue. Also double check your firewall if present.Best regards,
Have you tried tcp pings on the IP addresses associated with the IB interfaces?
--Rick
On 6/20/23, 12:11 PM, "lustre-discuss on behalf of Youssef Eldakar via
lustre-discuss" mailto:lustre-discuss-boun...@lists.lustre.org> on behalf of
lustre-discuss@lists.lustre.org
In a cluster having ~100 Lustre clients (compute nodes) connected together
with the MDS and OSS over Intel True Scale InfiniBand (discontinued
product), we started seeing certain nodes failing to mount the Lustre file
system and giving I/O error on LNET (lctl) ping even though an ibping test
to
Sorry, typo in the version number - the version we are actually running is
2.12.6
From: Jon Marshall
Sent: 20 June 2023 16:18
To: lustre-discuss@lists.lustre.org
Subject: No space left on device MDT DoM but not full nor run out of inodes
Hi,
We've been running
Hi,
We've been running lustre 2.15.1 in production for over a year and recently
decided to enable PFL with DoM on our filesystem. Things have been fine up
until last week, when users started reporting issues copying files,
specifically "No space left on device". The MDT is running ldiskfs as