> On May 13, 2019, at 6:51 PM, Fernando Pérez wrote:
>
> Is there a way to stop file writes for all users or for groups without using
> quotes?
>
> We have a lustre filesystem with corrupted quotes and I need to stop the
> write for all users (or for some users).
There are ways to deactivate
I don’t think we need to have PFL working immediately, and since we have plans
to upgrade the client at some point, I will just wait and see what happens
after the upgrade.
--
Rick Mohr
Senior HPC System Administrator
National Institute for Computational Sciences
http://www.nics.tennessee.edu
I was trying to play around with some PFL layout today, and I ran into an
issue. I have a file system running Lustre 2.10.6 and a client with 2.10.0
installed. I created a PFL with this command:
[rfmohr@sip-login1 rfmohr]$ lfs setstripe -E 4M -c 2 -E 100M -c 4 comp_file
It did not return any
> On Apr 17, 2019, at 4:32 AM, Fernando Perez wrote:
>
> I tried to run the e2fsck in the mdt three years ago and the logs shows a lot
> of this kind of messages:
>
>> Unattached inode 26977505
>> Connect to /lost+found? yes
>> Inode 26977505 ref count is 2, should be 1. Fix? yes
>
> In fact
Which RPMs did you download? The ones from the /public/lustre/lustre-2.10.7
directory, or the ones from /public/lustre/lustre-2.10.7-ib? The former are
built with support for in-kernel IB, and the latter are for MOFED. If you
downloaded the latter, did you install MOFED yourself, or did you
> On Apr 16, 2019, at 10:24 AM, Fernando Pérez wrote:
>
> According to the lustre wiki I though that the lfsck could repair corrupted
> quotes:
>
> http://wiki.lustre.org/Lustre_Quota_Troubleshooting
Keep in mind that page is a few years old, but I assume they were referring to
LFSCK Phase 2
> On Apr 15, 2019, at 10:54 AM, Fernando Perez wrote:
>
> Could anyone confirm me that the correct way to repair wrong quotes in a
> ldiskfs mdt is lctl lfsck_start -t layout -A?
As far as I know, lfsck doesn’t repair quota info. It only fixes internal
consistency within Lustre.
Whenever I h
> On Apr 13, 2019, at 4:57 AM, Youssef Eldakar wrote:
>
> For one Lustre filesystem, inode count in the summary is notably less than
> what the individual OST inode counts would add up to:
The first thing to understand is that every Lustre file will consume one inode
on the MDT, and this ino
You might need to clarify what you mean by “erase” the file system. The
procedure in the manual is intended for reformatting MDTs/OSTs that had
previously been formatted for lustre. I don’t think it actually erases data in
these sense of overwriting existing data with zeros (or something simil
This presentation from LUG 2017 might be useful for you:
http://cdn.opensfs.org/wp-content/uploads/2017/06/Wed06-CroweTom-lug17-ost_data_migration_using_ZFS.pdf
It shows how ZFS send/receive can be used to migrate data between OSTs. I
used it as a reference when I worked with another admin to
I have been playing a little bit with DNE today, and I had a question about
some odd behavior I saw regarding inode counts. My Lustre 2.10.6 file system
has 2 MDTs. I created a directory (which by default resides on MDT0) and then
created 10 files in that directory:
[root@sip-mgmt2 test]# lf
> On Mar 20, 2019, at 1:24 PM, Peter Jones wrote:
>
> If it's not in the manual then it should be. Could you please open an LUDOC
> ticket to track getting this corrected if need be?
Done.
https://jira.whamcloud.com/browse/LUDOC-435
--
Rick Mohr
Senior HPC System Administrator
National Inst
> On Mar 18, 2019, at 5:31 PM, Peter Jones wrote:
>
> You need the patched kernel for that feature
I suppose that should be documented in the manual somewhere. I thought project
quota support was determined based on ldiskfs vs zfs, and not patched vs
unpatched.
--
Rick Mohr
Senior HPC Syst
I just recently installed a new Lustre 2.10.6 file system using the RPMS from
/public/lustre/lustre-2.10.6-ib/MOFED-4.5-1.0.1.0/el7.6.1810/patchless-ldiskfs-server.
(I had already built and installed MOFED-4.5-1.0.1.0, and I installed
e2fsprogs-1.44.5.wc1-0.el7). I was able to format the MDT a
> On Jan 17, 2019, at 2:38 PM, Jason Williams wrote:
>
> - I just looked for lfsck but I don't seem to have it. We are running 2.10.4
> so I don't know what version that appeared in.
lfsck is handled as a subcommand for lctl.
http://doc.lustre.org/lustre_manual.xhtml#dbdoclet.lfsckadmin
--
> On Jan 16, 2019, at 4:18 AM, Jae-Hyuck Kwak wrote:
>
> How can I force --writeconf option? It seems that mkfs.lustre doesn't support
> --writeconf option.
You will need to use the tunefs.lustre command to do a writeconf.
--
Rick Mohr
Senior HPC System Administrator
National Institute for C
Is it possible you have some incompatible ko2iblnd module parameters between
the 2.8 servers and the 2.10 clients? If there was something causing LNet
issues, that could possibly explain some of the symptoms you are seeing.
--
Rick Mohr
Senior HPC System Administrator
National Institute for Com
> On Jan 7, 2019, at 2:09 PM, Jason Williams wrote:
>
> One last question, How safe is lfs_migrate? The man page on the installation
> says it's UNSAFE for possibly in-use files. The lustre manual doesn't have
> the same warning and says something about it being a bit more integrated with
> On Jan 7, 2019, at 12:53 PM, Jason Williams wrote:
>
> As I have gone through the testing, I think you may be right. I think I
> disabled the OST in a slightly different way and that caused issues.
>
> Do you happen to know where I could find out a bit more about what the "lctl
> set_para
#3.
>
>
> --
> Jason Williams
> Assistant Director
> Systems and Data Center Operations.
> Maryland Advanced Research Computing Center (MARCC)
> Johns Hopkins University
> jas...@jhu.edu
>
>
> From: lustre-discuss on behalf of
> Jason Williams
> Se
> On Jan 5, 2019, at 9:49 PM, Jason Williams wrote:
>
> I have looked around the internet and found you can disable an OST, but when
> I have tried that, any writes (including deletes) to the OST hang the clients
> indefinitely. Does anyone know a way to make an OST basically "read-only"
>
> On Nov 9, 2018, at 11:28 AM, Mohr Jr, Richard Frank (Rick Mohr)
> wrote:
>
>
>> On Nov 8, 2018, at 11:44 AM, Ms. Megan Larko wrote:
>>
>> I have been attempting this command on a directory on a Lustre-2.10.4
>> storage from a Lustre 2.10.1 client a
> On Nov 8, 2018, at 11:44 AM, Ms. Megan Larko wrote:
>
> I have been attempting this command on a directory on a Lustre-2.10.4 storage
> from a Lustre 2.10.1 client and I fail with the following message:
> > lfs setstripe -c 4 -S 1m -o 1,2-4 custTest/
> error on ioctl 0x4008669a for 'custTest
> On Oct 29, 2018, at 1:12 AM, Riccardo Veraldi
> wrote:
>
> it is time for me to move my MDS to a diferent HW infrastructure.
> So I Was wondering if the following procedure can work.
> I have mds1 (old mds) and mds2 (new mds). On the old mds I have a zfs MGS
> partition and a zfs MDT partiti
> On Oct 17, 2018, at 7:30 PM, Riccardo Veraldi
> wrote:
>
> anyway especially regarding the OSSes you may eventually need some ZFS module
> parameters optimizations regarding vdev_write and vdev_read max to increase
> those values higher than default. You may also disable ZIL, change the
>
> On Oct 19, 2018, at 10:42 AM, Marion Hakanson wrote:
>
> Thanks for the feedback. You're both confirming what we've learned so far,
> that we had to unmount all the clients (which required rebooting most of
> them), then reboot all the storage servers, to get things unstuck until the
> pr
> On Oct 16, 2018, at 7:04 AM, Mark Roper wrote:
>
> I have successfully set up a Lustre filesystem that is multi-homed on two
> different TCP NIDs, using the following configuration.
> Mount MGS & MDT
>
>sudo lnetctl lnet configure
>sudo lnetctl net del --net tcp
>sudo lnetctl net
> On Sep 19, 2018, at 8:09 PM, Colin Faber wrote:
>
> Why wouldn't you use DNE?
I am considering it as an option, but there appear to be some potential
drawbacks.
If I use DNE1, then I have to manually create directories on specific MDTs. I
will need to monitor MDT usage and make adjustment
Has anyone had recent experience resizing a ldiskfs-backed MDT using the
resize2fs tool? We may be purchasing a small lustre file system in the near
future with the expectation that it could grow considerably over time. Since
we don’t have a clear idea of how many inodes we might need in the f
Those are the kind of symptoms you would see if the client is able to connect
to the MDS server but not to an OSS server. Certain operations (mount, cd, ls)
will work if the MDS server is reachable , even if one or more OSS servers is
not reachable. But other operations (“ls -la”, df) require
> On Sep 4, 2018, at 12:12 PM, Pak Lui wrote:
>
> I have tried "map_on_demand=16" to the "/etc/modprobe.d/ko2iblnd.conf" that
> was suggested. Also tried "map_on_demand=0" as suggested here:
> http://wiki.lustre.org/Optimizing_o2iblnd_Performance
>
> /etc/modprobe.d/ko2iblnd.conf
> alias ko2i
> On Aug 22, 2018, at 8:10 PM, Riccardo Veraldi
> wrote:
>
> On 8/22/18 3:13 PM, Mohr Jr, Richard Frank (Rick Mohr) wrote:
>>> On Aug 22, 2018, at 3:31 PM, Riccardo Veraldi
>>> wrote:
>>> I would like to migrate this virtual machine to another infras
> On Aug 22, 2018, at 3:31 PM, Riccardo Veraldi
> wrote:
> I would like to migrate this virtual machine to another infrastructure. it is
> not simple because the other infrastructure is vmware.
> what is the best way to migrate those partitions without incurring into any
> corruption of data ?
> On Aug 20, 2018, at 2:58 AM, ANS wrote:
>
> 1) CentOS 7.4
> 2) Lustre version 2.11
> 3) MDT LUN Size is 6.5 TB (RAID 10) and after formatting using lustre we are
> getting the size as 3.9 TB, when formatted using XFS is showing accurate.
For Lustre 2.10 and up, the default inode size is 1KB
> On Aug 13, 2018, at 2:25 PM, David Cohen
> wrote:
>
> the fstab line I use for mounting the Lustre filesystem:
>
> oss03@tcp:oss01@tcp:/fsname /storagelustre flock,user_xattr,defaults
>0 0
OK. That looks correct.
> the mds is also configured for failover (unsuccessfully) :
>
> On Aug 13, 2018, at 7:14 AM, David Cohen
> wrote:
>
> I installed a new 2.10.4 Lustre file system.
> Running MDS and OSS on the same servers.
> Failover wasn't configured at format time.
> I'm trying to configure failover node with tunefs without success.
> tunefs.lustre --writeconf --erase-
> On Jul 27, 2018, at 1:56 PM, Andreas Dilger wrote:
>
>> On Jul 27, 2018, at 10:24, Mohr Jr, Richard Frank (Rick Mohr)
>> wrote:
>>
>> I am working on upgrading some Lustre servers. The servers currently run
>> lustre 2.8.0 with zfs 0.6.5, and I am
I am working on upgrading some Lustre servers. The servers currently run
lustre 2.8.0 with zfs 0.6.5, and I am looking to upgrade to lustre 2.10.4 with
zfs 0.7.9. I was looking at the manual, and I did not see anything in there
that mentioned special steps when changing ZFS versions. Do I ne
> On Jun 27, 2018, at 4:44 PM, Mohr Jr, Richard Frank (Rick Mohr)
> wrote:
>
>
>> On Jun 27, 2018, at 3:12 AM, yu sun wrote:
>>
>> client:
>> root@ml-gpu-ser200.nmg01:~$ mount -t lustre
>> node28@o2ib1:node29@o2ib1:/project /mnt/lustre_data
>&g
> On Jun 27, 2018, at 3:12 AM, yu sun wrote:
>
> client:
> root@ml-gpu-ser200.nmg01:~$ mount -t lustre
> node28@o2ib1:node29@o2ib1:/project /mnt/lustre_data
> mount.lustre: mount node28@o2ib1:node29@o2ib1:/project at /mnt/lustre_data
> failed: Input/output error
> Is the MGS running?
> root@m
> On Jun 27, 2018, at 12:52 AM, yu sun wrote:
>
> I have create file /etc/modprobe.d/lustre.conf with content on all mdt ost
> and client:
> root@ml-gpu-ser200.nmg01:~$ cat /etc/modprobe.d/lustre.conf
> options lnet networks="o2ib1(eth3.2)"
> and I exec command line : lnetctl lnet configure --
> On May 2, 2018, at 10:37 AM, Mohr Jr, Richard Frank (Rick Mohr)
> wrote:
>
>
>> On May 2, 2018, at 9:59 AM, Mark Miller wrote:
>>
>> Since I have the Lustre source code, I can start looking through it to see
>> if I can find where the Lustre mount sy
> On May 2, 2018, at 9:59 AM, Mark Miller wrote:
>
> Since I have the Lustre source code, I can start looking through it to see if
> I can find where the Lustre mount system call may be getting hung up. I have
> no idea... but it feels like the Lustre mount may be trying to read something
>
> On Apr 5, 2018, at 11:31 AM, John Bauer wrote:
>
> I don't have access to the OSS so I cant report on the Lustre settings. I
> think the client side max cached is 50% of memory.
Looking at your cache graph, that looks about right.
> After speaking with Doug Petesch of Cray, I though I wou
John,
I had a couple of thoughts (though not sure if they are directly relevant to
your performance issue):
1) Do you know what caching settings are applied on the lustre servers? This
could have an impact on performance, especially if your tests are being run
while others are doing IO on the
> On Apr 3, 2018, at 1:14 PM, Steve Barnet wrote:
>
> There appear to be a
> couple ways that this could be done:
>
> a) Add the service nodes:
> tunefs.lustre --servicenode=nid,nid /dev/
>
> b) Add a failover node:
> tunefs.lustre --param="failover.node= /dev/
The first one is the prefer
I have started playing around with Lustre changelogs, and I have noticed a
behavior with the “lctl changelog_deregister” command that I don’t understand.
I tried running a little test by enabling changelogs on my MDS server:
[root@server ~]# lctl --device orhydra-MDT changelog_register
orhy
My $0.02 below.
> On Dec 20, 2017, at 11:21 AM, E.S. Rosenberg
> wrote:
>
> 1. After my recent experience with failover I wondered is there any reason
> not to set all machines that are within reasonable cable range as potential
> failover nodes so that in the very unlikely event of both mach
> On Nov 29, 2017, at 8:35 PM, Dilger, Andreas wrote:
>
> Would you be able to open a ticket for this, and possibly submit a patch to
> fix the build?
I can certainly open a ticket, but I’m afraid I don’t know what needs to be
fixed so I can’t provide a patch.
--
Rick Mohr
Senior HPC System
> On Oct 18, 2017, at 9:44 AM, parag_k wrote:
>
>
> I got the source from github.
>
> My configure line is-
>
> ./configure --disable-client
> --with-kernel-source-header=/usr/src/kernels/3.10.0-514.el7.x86_64/
> --with-o2ib=/usr/src/ofa_kernel/default/
>
Are you still running into this i
> On Nov 1, 2017, at 7:18 AM, Parag Khuraswar wrote:
>
> For mgt –
> mkfs.lustre --servicenode=10.2.1.204@o2ib --servicenode=10.2.1.205@o2ib --mgs
> /dev/mapper/mpathc
>
> For mdt
> mkfs.lustre --fsname=home --mgsnode=10.2.1.204@o2ib --mgsnode=10.2.1.205@o2ib
> --servicenode=10.2.1.204@o2ib
> On Oct 30, 2017, at 4:46 PM, Brian Andrus wrote:
>
> Someone please correct me if I am wrong, but that seems a bit large of an
> MDT. Of course drives these days are pretty good sized, so the extra is
> probably very inexpensive.
That probably depends on what the primary usage will be. If
> On Oct 30, 2017, at 8:47 AM, Kevin M. Hildebrand wrote:
>
> All of the hosts (client, server, router) have the following in ko2iblnd.conf:
>
> alias ko2iblnd-opa ko2iblnd
> options ko2iblnd-opa peer_credits=128 peer_credits_hiw=64 credits=1024
> concurrent_sends=256 ntx=2048 map_on_demand=32
> On Oct 20, 2017, at 11:37 AM, Ravi Bhat wrote:
>
> Thanks, I have created user (luser6) in client as well as in lustre servers.
> But I get the same error as
> No directory /home/luser6
> Logging in with home="/".
>
> But now I can cd /home/luser6 manually and create files or folders.
Are
> On Oct 20, 2017, at 10:50 AM, Ravi Konila wrote:
>
> Can you please guide me how do I do it, I mean install NIS on servers and
> clients?
> Is it mandatory to setup NIS?
>
NIS is not mandatory. You just need a way to ensure that user accounts are
visible to the lustre servers. You could
Recently, I ran into an issue where several of the OSTs on my Lustre file
system went read-only. When I checked the logs, I saw messages like these for
several OSTs:
Oct 6 23:27:11 haven-oss2 kernel: LDISKFS-fs: ldiskfs_getblk:834: aborting
transaction: error 28 in __ldiskfs_handle_dirty_meta
> On Aug 22, 2017, at 7:14 PM, Riccardo Veraldi
> wrote:
>
> On 8/22/17 9:22 AM, Mannthey, Keith wrote:
>> Younot expected.
>>
> yes they are automatically used on my Mellanox and the script ko2iblnd-probe
> seems like not working properly.
The ko2iblnd-probe script looks in /sys/class/infin
> On Aug 3, 2017, at 11:48 AM, Jackson, Gary L.
> wrote:
>
> Are quotas well supported and robust on Lustre?
As far as I know, they are. But I mainly use quotas for reporting purposes. I
have not had much experience with enforcing quota limits in Lustre.
> What is the performance impact,
> On Aug 1, 2017, at 3:07 PM, Jason Williams wrote:
> 1) Is 512 threads a reasonable setting or should it be lower?
Since your servers have enough memory to support 512 threads, then it is
probably reasonable. If your server load is ~100, that probably means most of
those threads are si
You might want to start by looking at these online tutorials:
http://lustre.ornl.gov/lustre101-courses/
--
Rick Mohr
Senior HPC System Administrator
National Institute for Computational Sciences
http://www.nics.tennessee.edu
> On May 21, 2017, at 6:19 AM, Ravi Konila wrote:
>
> Hi There,
>
> On May 4, 2017, at 11:03 AM, Steve Barnet wrote:
>
> On 5/4/17 10:01 AM, Mohr Jr, Richard Frank (Rick Mohr) wrote:
>> Did you try doing a writeconf to regenerate the config logs for the file
>> system?
>
>
> Not yet, but quick enough to try. Do this for the
Did you try doing a writeconf to regenerate the config logs for the file system?
--
Rick Mohr
Senior HPC System Administrator
National Institute for Computational Sciences
http://www.nics.tennessee.edu
> On May 4, 2017, at 10:03 AM, Steve Barnet wrote:
>
> Hi all,
>
> This is Lustre 2.8.0 co
> On May 3, 2017, at 10:56 PM, Riccardo Veraldi
> wrote:
>
> I am building lustre-client from src rpm on RHL73.
>
> it fails with this error during the install process:
>
> + echo /etc/init.d/lnet
> + echo /etc/init.d/lsvcgss
> + find /root/rpmbuild/BUILDROOT/lustre-client-2.9.0-1.el7.x86_64
> On May 3, 2017, at 12:23 PM, Patrick Farrell wrote:
>
> That reasoning is sound, but this is a special case. -11 (-EAGAIN) on
> ldlm_enqueue is generally OK...
>
> LU-8658 explains the situation (it's POSIX flocks), so I'm going to reference
> that rather than repeat it here.
>
> https://
I think that -11 is EAGAIN, but I don’t know how to interpret what that means
in the context of Lustre locking. I assume these messages are from the clients
and the changing “x” portion is just the fact that each client has a
different identifier. So if you have multiple clients complainin
This might be a long shot, but have you checked for possible firewall rules
that might be causing the issue? I’m wondering if there is a chance that some
rules were added after the nodes were up to allow Lustre access, and when a
node got rebooted, it lost the rules.
--
Rick Mohr
Senior HPC Sy
> On Mar 28, 2017, at 1:49 PM, DeWitt, Chad wrote:
>
> We've encountered several programs that require flock, so we are now
> investigating enabling flock functionality. However, the Lustre manual
> includes a passage in regards to flocks which gives us pause:
>
> "Warning
> This mode affect
ter doing chgrp for large subtree. IIRC, for three
> groups; counts were small different "negative" numbers, not 21.
> I can get more details tomorrow.
>
> Alex
>
>> On Feb 9, 2017, at 5:14 PM, Mohr Jr, Richard Frank (Rick Mohr)
>> wrote:
>>
>
> On Feb 16, 2017, at 9:56 AM, Jon Tegner wrote:
>
> I have three (physical) machines, and each one have a virtual machine on them
> (KVM). On one of the virtual machines there is an MDS and on two of them
> there are OSS:es installed.
>
> All system use CentOS-7.3 and Lustre 2.9.0, and I mou
I recently set up a Lustre 2.8 file system that uses ZFS for the backend
storage (both on the MDT and OSTs). When I was doing some testing, I noticed
that the output from lfs quota seemed odd. While the quota information for the
amount of used space seemed correct, the info on the number of fi
t on this.
>
> -Original Message-----
> From: Mohr Jr, Richard Frank (Rick Mohr) [mailto:rm...@utk.edu]
> Sent: Thursday, January 12, 2017 10:51 AM
> To: Jeff Slapp
> Cc: lustre-discuss@lists.lustre.org
> Subject: Re: [lustre-discuss] Lustre Client hanging on mount
>
> I noticed t
I noticed that you appear to have formatted the MDT with the file system name
“mgsZFS” while the OST was formatted with the file system name “ossZFS”. The
same name needs to be used on all MDTs/OSTs in the same file system. Until
that is fixed, your file system won’t work properly.
--
Rick Mo
> On Jan 11, 2017, at 12:39 PM, Vicker, Darby (JSC-EG311)
> wrote:
>
>>> Getting failover right over multiple separate networks can be a real
>>> hair-pulling experience.
>>
>> Darby: Do you have the option of (at least temporarily) running the file
>> system with only Infiniband configured?
> On Jan 11, 2017, at 11:58 AM, Ben Evans wrote:
>
> Getting failover right over multiple separate networks can be a real
> hair-pulling experience.
Darby: Do you have the option of (at least temporarily) running the file system
with only Infiniband configured? If you could set up the file sy
> On Jan 9, 2017, at 4:21 AM, Markham Benjamin wrote:
>
> I was wondering the use cases of using Lustre on Hadoop. One key things about
> HDFS is that it runs on commodity hardware. Unless I’m being misinformed,
> Lustre doesn’t exactly run on commodity hardware.
I don’t think you want to use
Have you tried performing a writeconf to regenerate the lustre config log
files? This can sometimes fix the problem by making sure that everything is
consistent. (A writeconf is often required when making NID or failover
changes.) I think you could also use that opportunity to correct your
-
> On Dec 20, 2016, at 10:48 AM, Jessica Otey wrote:
>
> qos_threshold_rr
>
> This setting controls how much consideration should be given to QoS in
> allocation
> The higher this number, the more QOS is taken into consideration.
> When set to 100%, Lustre ignores the QoS variable and hits all
> On Dec 15, 2016, at 9:30 AM, Phill Harvey-Smith
> wrote:
>
> On 15/12/2016 14:21, Hanley, Jesse A. wrote:
>> I forgot: You should also be able to use lshowmount.
>
> Humm that works on the old sever, but can't find the command on the new
> centos 7.2 server, which I installed from RPMs I s
> On Nov 28, 2016, at 9:58 AM, Stefano Turolla
> wrote:
>
> thanks for the quick reply, I am maybe doing the wrong thing. What I am
> trying to achieve is to have a Lustre volume to be shared among the
> nodes, and the 30TB is the size of existing storage.
>
> Should I create a separate (and m
> On Oct 13, 2016, at 12:32 PM, E.S. Rosenberg
> wrote:
>
> I thought ZFS was only recommended for OSTs and not for MDTs/MGS?
ZFS usually has lower metadata performance for MDT than using ldiskfs which is
why some people recommend ZFS only for the OSTs. However, ZFS has features
(like snaps
Did you check to make sure there are no firewalls running that could be
blocking traffic?
--
Rick Mohr
Senior HPC System Administrator
National Institute for Computational Sciences
http://www.nics.tennessee.edu
> On Sep 27, 2016, at 10:12 AM, Phill Harvey-Smith
> wrote:
>
> Hi all
>
> I'm st
> On Sep 21, 2016, at 5:08 AM, Phill Harvey-Smith
> wrote:
>
> Sep 21 09:44:29 oric kernel: osd_zfs: disagrees about version of symbol
> dsl_prop_register
> Sep 21 09:44:29 oric kernel: osd_zfs: Unknown symbol dsl_prop_register (err
> -22)
> Sep 21 09:44:29 oric kernel: osd_zfs: disagrees abo
> On Sep 19, 2016, at 2:40 AM, Pardo Diaz, Alfonso
> wrote:
>
> I still having the same problem in my system. My clients is stucked in the
> primary MDS, that it's down, and It doesn’t use the backup (service MDS), but
> only when try to connect there first time.
> As I said in previous messa
/ Sola nº 1; 10200 Trujillo, ESPAÑA
> Tel: +34 927 65 93 17 Fax: +34 927 32 32 37
>
>
>
>
> De: Ben Evans [bev...@cray.com]
> Enviado el: jueves, 01 de septiembre de 2016 15:25
> Para: Pardo Diaz, Alfonso; Mohr Jr, Richard Frank (Rick Mohr)
>
> On Aug 31, 2016, at 8:12 AM, Pardo Diaz, Alfonso
> wrote:
>
> I mount my clients: mount -t lustre mds1@o2ib:mds2@o2ib:/fs /mnt/fs
>
> 1) When both MDS are OK I can mount without problems
> 2) If the MDS1 is down and my clients have lustre mounted, they use MDS2
> without problems
> 3) If th
> On Aug 16, 2016, at 6:55 AM, E.S. Rosenberg
> wrote:
>
> I just found this paper:
> http://wiki.lustre.org/images/d/da/Understanding_Lustre_Filesystem_Internals.pdf
>
> It looks interesting but it deals with lustre 1.6 so I am not sure how
> relevant it still is…..
Some of the information
> On Aug 11, 2016, at 5:42 AM, E.S. Rosenberg
> wrote:
>
> Our MDT suffered a kernel panic (which I will post separately), the OSSs
> stayed alive but the MDT was out for some time while nodes still tried to
> interact with lustre.
>
> So I have several questions:
> a. what happens to proces
> On Aug 4, 2016, at 9:39 AM, Gibbins, Faye wrote:
>
> Yes it is mounted. But that's not always a problem. We have a test lustre
> cluster where it's mounted and the tune2fs works fine. But it fails in
> production.
>
> Production have failover turned on for the OSTs. Something absent on that
> On Aug 3, 2016, at 1:30 PM, Ben Evans wrote:
>
> I thought read caching was disabled by default, as the kernel's default
> handling of pages was better.
You might be right. It has been a while since I have set up a Lustre file
system from scratch, and I haven’t done so for newer versions
Do you have the Lustre read caching feature enabled? I think it should be on
by default, but you might want to check. If the files are only 20 KB, then I
would think the Lustre OSS nodes could keep them in memory most of the time to
speed up access (unless of course this is a metadata bottlene
> On Aug 2, 2016, at 10:38 AM, Gibbins, Faye wrote:
>
>
> tune2fs: MMP: device currently active while trying to open
> /dev/mapper/scratch--1--5-scratch_3
>
> MMP error info: last update: Tue Aug 2 15:34:09 2016
>
> node: edi-vf-1-5.ad.cirrus.com device: dm-19
>
> 0 edi-vf-1-5:~#
Is the d
> On Jul 28, 2016, at 9:54 PM, sohamm wrote:
>
> Client is configured for IB interface.
So it looks like there might be something wrong with the LNet config on the
client then. Based on the output from “lctl ping” that you ran from the
server, the client only reported a NID on the tcp netwo
Is the client supposed to have an IB interface configured, or is it just
supposed to mount over ethernet?
--
Rick Mohr
Senior HPC System Administrator
National Institute for Computational Sciences
http://www.nics.tennessee.edu
> On Jul 20, 2016, at 2:09 PM, sohamm wrote:
>
> Hi
>
> Any guid
> On Jun 20, 2016, at 5:00 PM, Jessica Otey wrote:
>
> All,
> I am in the process of preparing to upgrade a production lustre system
> running 1.8.9 to 2.4.3.
> This current system has 2 lnet routers.
> Our plan is to perform the upgrade in 2 stages:
> 1) Upgrade the MDS and OSSes to 2.4.3, lea
> On May 19, 2016, at 12:46 PM, Nathan Dauchy - NOAA Affiliate
> wrote:
>
> Thanks for pointing out the approach of trying to keep a single file from
> using too much space on an OST. It looks like the Log2(size_in_GB) method I
> proposed works well up to a point, but breaks down in the capa
> On May 18, 2016, at 1:22 PM, Nathan Dauchy - NOAA Affiliate
> wrote:
>
> Since there is the "increased overhead" of striping, and weather applications
> do unfortunately write MANY tiny files, we usually keep the filesystem
> default stripe count at 1. Unfortunately, there are several user
Have you tried doing a writeconf to regenerate the config logs?
--
Rick Mohr
Senior HPC System Administrator
National Institute for Computational Sciences
http://www.nics.tennessee.edu
> On May 17, 2016, at 12:08 PM, Randall Radmer wrote:
>
> We've been working with lustre systems for a few ye
> On Apr 13, 2016, at 2:53 PM, Mark Hahn wrote:
> thanks, we'll be trying the LU-5726 patch and cpu_npartitions things.
> it's quite a long thread - do I understand correctly that periodic
> vm.drop_caches=1 can postpone the issue?
Not really. I was periodically dropping the caches as a way to
> On Apr 13, 2016, at 8:02 AM, Tommi T wrote:
>
> We had to use lustre-2.5.3.90 on the MDS servers because of memory leak.
>
> https://jira.hpdd.intel.com/browse/LU-5726
Mark,
If you don’t have the patch for LU-5726, then you should definitely try to get
that one. If nothing else, reading t
> On Apr 12, 2016, at 6:46 PM, Mark Hahn wrote:
>
> all our existing Lustre MDSes run happily with vm.zone_reclaim_mode=0,
> and making this one consistent appears to have resolved a problem
> (in which one family of lustre kernel threads would appear to spin,
> "perf top" showing nearly all tim
1 - 100 of 194 matches
Mail list logo