Re: [lustre-discuss] billions of 50k files

2017-11-29 Thread Brian Andrus
Andreas, Thanks for responding. Right now, I am looking at using ZFS and an ssd/nvme for the journal disk. I suggested mirroring, but they aren't too keen on losing 50% of their purchased storage.. This particular system will likely not be scaled up at a future date. It seems like the 2.11

Re: [lustre-discuss] billions of 50k files

2017-11-29 Thread Dilger, Andreas
On Nov 29, 2017, at 15:31, Brian Andrus wrote: > > All, > > I have always seen lustre as a good solution for large files and not the best > for many small files. > Recently, I have seen a request for a small lustre system (2 OSSes, 1 MDS) > that would be for billions of

Re: [lustre-discuss] Lustre compilation error

2017-11-29 Thread Dilger, Andreas
Rick, Would you be able to open a ticket for this, and possibly submit a patch to fix the build? Cheers, Andreas On Nov 29, 2017, at 14:18, Mohr Jr, Richard Frank (Rick Mohr) > wrote: On Oct 18, 2017, at 9:44 AM, parag_k

Re: [lustre-discuss] Lustre 2.10.1 + RHEL7 Page Allocation Failures

2017-11-29 Thread Dilger, Andreas
In particular, see the patch https://review.whamcloud.com/30164 LU-10133 o2iblnd: fall back to vmalloc for mlx4/mlx5 If a large QP is allocated with kmalloc(), but fails due to memory fragmentation, fall back to vmalloc() to handle the allocation. This is done in the upstream kernel, but was

Re: [lustre-discuss] weird issue w. lnet routers

2017-11-29 Thread John Casu
thanks guys for all your help. looks like the issue is fundamentally poor performance across 100GbE, where I'm only getting ~50Gb/s using iperf. I believe the MTU is set correctly across all my systems Using connectx-4 in 100GbE mode. thanks again, -john On 11/28/17 9:03 PM, Colin Faber

Re: [lustre-discuss] Failure migrating OSTs in KVM lustre 2.7.0 testbed

2017-11-29 Thread Scott Wood
Once I've had one fail a migration between hosts, it stays failed. I've waited a bit and tried again, and it fails to mount with the same errors (or messages). I am then only able to remount it on the host that originally had it mounted. Once that has been done, it's happy and, the next time

Re: [lustre-discuss] Failure migrating OSTs in KVM lustre 2.7.0 testbed

2017-11-29 Thread Brian Andrus
Ok. So when you say 'occasionally' does that mean if you try the command again, it works? If so, I'm wondering if you are doing it before the timeout period has expired, so lustre is still expecting the OST to be on the original OSS. That is, it is still in a window where "maybe it will come

Re: [lustre-discuss] Failure migrating OSTs in KVM lustre 2.7.0 testbed

2017-11-29 Thread Scott Wood
Heh. Fair question, and yes. You had to rule it out though. fakemds1 and fakemds2 have /mnt/MGT and /mnt/MDT. fakeoss1 and fakeoss2 have /mnt/OST{0..3}. fakeoss3 and fakeoss4 have /mnt/OST{3..7}. Also to clarify, every command in my previous email that has " at " was actually the at

Re: [lustre-discuss] Failure migrating OSTs in KVM lustre 2.7.0 testbed

2017-11-29 Thread Brian Andrus
I know it may be obvious, but did you 'mkdir /mnt/OST7'? Brian Andrus On 11/29/2017 3:09 PM, Scott Wood wrote: [root@fakeoss4 ~]# mount /mnt/OST7 mount.lustre: increased /sys/block/vde/queue/max_sectors_kb from 1024 to 2147483647 mount.lustre: mount /dev/vde at /mnt/OST7 failed: No such file

Re: [lustre-discuss] billions of 50k files

2017-11-29 Thread E.S. Rosenberg
Maybe this would be where multiple MDS + small files on MDS would shine? My 1 millionth of a bitcoin, Eli On Thu, Nov 30, 2017 at 12:31 AM, Brian Andrus wrote: > All, > > I have always seen lustre as a good solution for large files and not the > best for many small files. >

[lustre-discuss] Failure migrating OSTs in KVM lustre 2.7.0 testbed

2017-11-29 Thread Scott Wood
Hi folks, In an effort to replicate a production environment to do a test upgrade, I've created a six server KVM testbed on a Centos 7.4 host with CentOS 6 guests. I have four OSS and two MDSs. I have qcow2 virtual disks visible to the servers in pairs. Each OSS has two OSTs and can also

Re: [lustre-discuss] Lustre compilation error

2017-11-29 Thread Mohr Jr, Richard Frank (Rick Mohr)
> On Oct 18, 2017, at 9:44 AM, parag_k wrote: > > > I got the source from github. > > My configure line is- > > ./configure --disable-client > --with-kernel-source-header=/usr/src/kernels/3.10.0-514.el7.x86_64/ > --with-o2ib=/usr/src/ofa_kernel/default/ > Are you

Re: [lustre-discuss] Announce: Lustre Systems Administration Guide

2017-11-29 Thread Arman Khalatyan
Hello, I am looking for some simple routing examples on ib0 to tcp. All examples in the documentation are based on OPA or Melanox. Found some inconsistency in the manual routing part: http://wiki.lustre.org/LNet_Router_Config_Guide section:ARP flux issue for MR node The Ethernet part is missing in

Re: [lustre-discuss] Lustre 2.10.1 + RHEL7 Page Allocation Failures

2017-11-29 Thread Jones, Peter A
Ah yes. One more thing – I believe that this has been addressed in the upcoming RHEL 7.5, so that might be another option for you to consider. On 2017-11-29, 5:47 AM, "lustre-discuss on behalf of Charles A Taylor"

Re: [lustre-discuss] Lustre 2.10.1 + RHEL7 Page Allocation Failures

2017-11-29 Thread Charles A Taylor
Thank you, Peter. I figured that would be the response but wanted to ask. We were hoping to get away from maintaining a MOFED build but it looks like that may not be the way to go. And you are correct about the JIRA ticket. I misspoke. It was the associated RH kernel bug that was

Re: [lustre-discuss] Lustre 2.10.1 + RHEL7 Page Allocation Failures

2017-11-29 Thread Jones, Peter A
Charles That ticket is completely open so you do have access to everything. As I understand it the options are to either use the latest MOFED update rather than relying on the in-kernel OFED (which I believe is the advise usually provided by Mellanox anyway) or else apply the kernel patch

Re: [lustre-discuss] Announce: Lustre Systems Administration Guide

2017-11-29 Thread Shawn Hall
Andreas, We’ll bring the idea up on today’s OpenSFS board call. If the community has recommendations on what this might look like (preferred capabilities or suggestions for Q/forum software, or a pointer to existing hosted Q platforms like Stack Overflow), please let me know. Shawn On

Re: [lustre-discuss] Recompiling client from the source doesnot contain lnetctl

2017-11-29 Thread Arman Khalatyan
even in the extracted source code the lnetctl does not compile. running make in the utils folder it is producing wirecheck,lst and routerstat, but not lnetctl. After running "make lnetctl" in the utils folder /tmp/lustre-2.10.2_RC1/lnet/utils it produces the executable. On Wed, Nov 29, 2017 at

[lustre-discuss] Lustre 2.10.1 + RHEL7 Lock Callback Timer Expired

2017-11-29 Thread Charles A Taylor
We have a genomics pipeline app (supernova) that fails consistently due to the client being evicted on the OSSs with a “lock callback timer expired”. I doubled “nlm_enqueue_min” across the cluster but then the timer simply expired after 200s rather than 100s so I don’t think that is the

Re: [lustre-discuss] Recompiling client from the source doesnot contain lnetctl

2017-11-29 Thread Arman Khalatyan
Hi Andreas, I just checked the yaml-devel it is installed: yum list installed | grep yaml libyaml.x86_64 0.1.4-11.el7_0 @base libyaml-devel.x86_64 0.1.4-11.el7_0 @base and still no success: rpm -qpl

[lustre-discuss] Lustre 2.10.1 + RHEL7 Page Allocation Failures

2017-11-29 Thread Charles A Taylor
Hi All, We recently upgraded from Lustre 2.5.3.90 on EL6 to 2.10.1 on EL7 (details below) but have hit what looks like LU-10133 (order 8 page allocation failures). We don’t have access to look at the JIRA ticket in more detail but from what we can tell the the fix is to change from vmalloc()