Andreas,
Thanks for responding.
Right now, I am looking at using ZFS and an ssd/nvme for the journal
disk. I suggested mirroring, but they aren't too keen on losing 50% of
their purchased storage..
This particular system will likely not be scaled up at a future date.
It seems like the 2.11 m
On Nov 29, 2017, at 15:31, Brian Andrus wrote:
>
> All,
>
> I have always seen lustre as a good solution for large files and not the best
> for many small files.
> Recently, I have seen a request for a small lustre system (2 OSSes, 1 MDS)
> that would be for billions of files that average 50k-
Rick,
Would you be able to open a ticket for this, and possibly submit a patch to fix
the build?
Cheers, Andreas
On Nov 29, 2017, at 14:18, Mohr Jr, Richard Frank (Rick Mohr)
mailto:rm...@utk.edu>> wrote:
On Oct 18, 2017, at 9:44 AM, parag_k
mailto:para...@citilindia.com>> wrote:
I got the
In particular, see the patch https://review.whamcloud.com/30164
LU-10133 o2iblnd: fall back to vmalloc for mlx4/mlx5
If a large QP is allocated with kmalloc(), but fails due to memory
fragmentation, fall back to vmalloc() to handle the allocation.
This is done in the upstream kernel, but was only
thanks guys for all your help.
looks like the issue is fundamentally poor performance across 100GbE, where I'm
only
getting ~50Gb/s using iperf. I believe the MTU is set correctly across all my
systems
Using connectx-4 in 100GbE mode.
thanks again,
-john
On 11/28/17 9:03 PM, Colin Faber wro
Once I've had one fail a migration between hosts, it stays failed. I've waited
a bit and tried again, and it fails to mount with the same errors (or
messages). I am then only able to remount it on the host that originally had
it mounted. Once that has been done, it's happy and, the next time
Ok. So when you say 'occasionally' does that mean if you try the command
again, it works?
If so, I'm wondering if you are doing it before the timeout period has
expired, so lustre is still expecting the OST to be on the original OSS.
That is, it is still in a window where "maybe it will come b
Heh. Fair question, and yes. You had to rule it out though. fakemds1 and
fakemds2 have /mnt/MGT and /mnt/MDT. fakeoss1 and fakeoss2 have
/mnt/OST{0..3}. fakeoss3 and fakeoss4 have /mnt/OST{3..7}. Also to clarify,
every command in my previous email that has " at " was actually the at symbol
I know it may be obvious, but did you 'mkdir /mnt/OST7'?
Brian Andrus
On 11/29/2017 3:09 PM, Scott Wood wrote:
[root@fakeoss4 ~]# mount /mnt/OST7
mount.lustre: increased /sys/block/vde/queue/max_sectors_kb from 1024
to 2147483647
mount.lustre: mount /dev/vde at /mnt/OST7 failed: No such file
Maybe this would be where multiple MDS + small files on MDS would shine?
My 1 millionth of a bitcoin,
Eli
On Thu, Nov 30, 2017 at 12:31 AM, Brian Andrus wrote:
> All,
>
> I have always seen lustre as a good solution for large files and not the
> best for many small files.
> Recently, I have seen
Hi folks,
In an effort to replicate a production environment to do a test upgrade, I've
created a six server KVM testbed on a Centos 7.4 host with CentOS 6 guests.
I have four OSS and two MDSs. I have qcow2 virtual disks visible to the
servers in pairs. Each OSS has two OSTs and can also
All,
I have always seen lustre as a good solution for large files and not the
best for many small files.
Recently, I have seen a request for a small lustre system (2 OSSes, 1
MDS) that would be for billions of files that average 50k-100k.
It seems to me, that for this to be 'of worth', the bl
> On Oct 18, 2017, at 9:44 AM, parag_k wrote:
>
>
> I got the source from github.
>
> My configure line is-
>
> ./configure --disable-client
> --with-kernel-source-header=/usr/src/kernels/3.10.0-514.el7.x86_64/
> --with-o2ib=/usr/src/ofa_kernel/default/
>
Are you still running into this i
Hello,
I am looking for some simple routing examples on ib0 to tcp.
All examples in the documentation are based on OPA or Melanox.
Found some inconsistency in the manual routing part:
http://wiki.lustre.org/LNet_Router_Config_Guide
section:ARP flux issue for MR node
The Ethernet part is missing in
Ah yes. One more thing – I believe that this has been addressed in the upcoming
RHEL 7.5, so that might be another option for you to consider.
On 2017-11-29, 5:47 AM, "lustre-discuss on behalf of Charles A Taylor"
mailto:lustre-discuss-boun...@lists.lustre.org>
on behalf of chas...@ufl.edu
Thank you, Peter. I figured that would be the response but wanted to ask. We
were hoping to get away from maintaining a MOFED build but it looks like that
may not be the way to go.
And you are correct about the JIRA ticket. I misspoke. It was the associated
RH kernel bug that was “private”,
Charles
That ticket is completely open so you do have access to everything. As I
understand it the options are to either use the latest MOFED update rather than
relying on the in-kernel OFED (which I believe is the advise usually provided
by Mellanox anyway) or else apply the kernel patch Andre
Andreas,
We’ll bring the idea up on today’s OpenSFS board call. If the community has
recommendations on what this might look like (preferred capabilities or
suggestions for Q&A/forum software, or a pointer to existing hosted Q&A
platforms like Stack Overflow), please let me know.
Shawn
On 11
even in the extracted source code the lnetctl does not compile.
running make in the utils folder it is producing wirecheck,lst and
routerstat, but not lnetctl.
After running "make lnetctl" in the utils folder
/tmp/lustre-2.10.2_RC1/lnet/utils
it produces the executable.
On Wed, Nov 29, 2017 at 1
We have a genomics pipeline app (supernova) that fails consistently due to the
client being evicted on the OSSs with a “lock callback timer expired”. I
doubled “nlm_enqueue_min” across the cluster but then the timer simply expired
after 200s rather than 100s so I don’t think that is the answe
Hi Andreas,
I just checked the yaml-devel it is installed:
yum list installed | grep yaml
libyaml.x86_64 0.1.4-11.el7_0 @base
libyaml-devel.x86_64 0.1.4-11.el7_0 @base
and still no success:
rpm -qpl rpmbuild/RPMS/x86_64/*
Hi All,
We recently upgraded from Lustre 2.5.3.90 on EL6 to 2.10.1 on EL7 (details
below) but have hit what looks like LU-10133 (order 8 page allocation failures).
We don’t have access to look at the JIRA ticket in more detail but from what we
can tell the the fix is to change from vmalloc() t
22 matches
Mail list logo