(Re-sending my response to the list)
Yes, I believe that there are cases when problems on a remote node can be
interpreted as local failures.
From: "nathan.dau...@noaa.gov"
Date: Sunday, March 8, 2020 at 3:56 AM
To: Chris Horn , "lustre-discuss@lists.lustre.org"
Cc: &qu
l).
When LNet is selecting the local and remote interfaces to use for a PUT or GET,
it considers the health value of each interface. Healthier interfaces are
preferred.
Chris Horn
On 3/9/20, 4:22 AM, "Degremont, Aurelien" wrote:
What's the impact of being in recovery m
could certainly lead to message timeouts, which
would in turn result in interfaces being placed into recovery mode.
Chris Horn
On 3/6/20, 8:59 AM, "lustre-discuss on behalf of Michael Di Domenico"
wrote:
along the aforementioned error i also see these at the same time
lus
Anything in dmesg? We need to know _why_ the network failed to start.
Chris Horn
From: Kurt Strosahl
Date: Wednesday, October 2, 2019 at 1:55 PM
To: Chris Horn , "lustre-discuss@lists.lustre.org"
Subject: Re: [lustre-discuss] Lustre rpm install creating a file that breaks
lustre
Might be best to open a ticket for this. What was the nature of the failure?
Chris Horn
From: lustre-discuss on behalf of
Kurt Strosahl
Date: Wednesday, October 2, 2019 at 1:30 PM
To: "lustre-discuss@lists.lustre.org"
Subject: [lustre-discuss] Lustre rpm install creating a file t
.13.0-101 (Ubuntu 14.04)
4.4.0-85.108(Ubuntu 14.04.5 LTS)
4.4.0-131 (Ubuntu 16.04)
4.15.0-32 (Ubuntu 18.04)
Chris Horn
On 8/30/18, 2:34 PM, "lustre-discuss on behalf of Andreas Dilger"
wrote:
On Aug 30, 2018, at 13:28, E.S
or down (alive or dead)
states of routers.
Chris Horn
From: lustre-discuss <lustre-discuss-boun...@lists.lustre.org> on behalf of
Makia Minich <ma...@systemfabricworks.com>
Date: Wednesday, May 9, 2018 at 8:51 AM
To: "lustre-discuss@lists.lustre.org" <lustre-discuss@lists.lus
.
At least, this was true before multi-rail. I’m not sure if that has changed
things w.r.t. route selection.
Chris Horn
From: lustre-discuss <lustre-discuss-boun...@lists.lustre.org> on behalf of
Preeti Malakar <malakar.pre...@gmail.com>
Date: Friday, January 26, 2018 at 10:28 AM
To: &qu
Is the MGS actually on tcp or is it on o2ib? Can you “lctl ping” the MGS LNet
nid from the client where you’re trying to mount?
Chris Horn
From: lustre-discuss <lustre-discuss-boun...@lists.lustre.org> on behalf of
Christopher Johnston <chjoh...@gmail.com>
Date: Friday, November 1
I would need more information to help you. Maybe provide the complete terminal
output of your build. Everything from getting the source to running ‘make rpms’.
Chris Horn
From: lustre-discuss <lustre-discuss-boun...@lists.lustre.org> on behalf of
parag_k <para...@citilindia.com>
It would be helpful if you provided more context. How did you acquire the
source? What was your configure line? Is there a set of build instructions that
you are following?
Chris Horn
From: lustre-discuss <lustre-discuss-boun...@lists.lustre.org> on behalf of
Parag Khuraswar
for
> the compute nodes. So I'm looking for another solution.
AFAIK, this is your only option short of developing your own code for this
situation (which would be cool!).
Chris Horn
On 10/16/17, 7:06 AM, "LOPEZ, ALEXANDRE" <alexandre.lo...@atos.net> wrote:
Chris,
I think the only way to do this today is to assign the clients in each “islet”
a unique LNet. What problems did that cause for you (besides the administrative
headache?)
Chris Horn
On 10/13/17, 9:51 AM, "lustre-discuss on behalf of LOPEZ, ALEXANDRE"
<lustre-discuss-boun...@lis
…” should pull in
any necessary modules. If that isn’t happening maybe you just need to run
depmod.
Chris Horn
From: Ravi Konila <ravibh...@gmail.com>
Reply-To: Ravi Konila <ravibh...@gmail.com>
Date: Friday, October 13, 2017 at 9:29 AM
To: Chris Horn <ho...@cray.com>, 'Lust
failures between the
evicted client and the server hosting demo-OST0002?
Chris Horn
On 10/13/17, 1:44 PM, "lustre-discuss on behalf of John Casu"
<lustre-discuss-boun...@lists.lustre.org on behalf of j...@chiraldynamics.com>
wrote:
client, server = 2.8.0, connected via 40GbE
https://jira.hpdd.intel.com/browse/LU-10119
I’ll push a patch
Chris Horn
On 10/13/17, 10:18 AM, "Dilger, Andreas" <andreas.dil...@intel.com> wrote:
Could you please file a Jira ticket (and possibly a patch) to fix this, so
it isn't forgotten.
Cheers, Andreas
script and try to restart lnet with systemctl.
Chris Horn
On 10/12/17, 3:39 PM, "lustre-discuss on behalf of David Rackley"
<lustre-discuss-boun...@lists.lustre.org on behalf of rack...@jlab.org> wrote:
Greetings,
I have built lustre-2.10.1_13_g2ee62fb on 3.10.0-69
The pre-built rpms are most likely compiled against the in-kernel IB drivers.
If you’re using the MOFED drivers you’ll need to recompile Lustre. The
instructions here may help you out http://wiki.lustre.org/Compiling_Lustre
Chris Horn
From: Ravi Konila <ravibh...@gmail.com>
Reply-To
Are you compiling Lustre yourself or using pre-built rpms?
Chris Horn
From: Ravi Konila <ravibh...@gmail.com>
Reply-To: Ravi Konila <ravibh...@gmail.com>
Date: Thursday, October 12, 2017 at 11:40 AM
To: Chris Horn <ho...@cray.com>, Parag Khuraswar <para...@citilindi
dmesg output should provide more information about the “Invalid argument” error
that you are seeing, but my guess would be that Lustre was compiled against a
different IB stack than what you have installed.
Chris Horn
From: lustre-discuss <lustre-discuss-boun...@lists.lustre.org> on
It is not recommended to run MOFED-4.0 with Lustre. Only 4.1 or higher.
Chris Horn
On 10/12/17, 8:57 AM, "lustre-discuss on behalf of Peter Kjellström"
<lustre-discuss-boun...@lists.lustre.org on behalf of c...@nsc.liu.se> wrote:
On Thu, 12 Oct 2017 18:27:34 +0530
"
nid: 192.168.1.2@o2ib
> # Multi-Rail: True
> # peer ni:
> # - nid: 192.168.1.2@o2ib
> # - nid: 192.168.2.2@o2ib
> # - primary nid: 172.16.1.1@o2ib1
> # Multi-Rail: True
> # peer ni:
> # - nid: 172.16.1.1@o2ib1
> #
The ko2iblnd-opa settings are tuned specifically for Intel OmniPath. Take a
look at the /usr/sbin/ko2iblnd-probe script to see how OPA hardware is detected
and the “ko2iblnd-opa” settings get used.
Chris Horn
From: lustre-discuss <lustre-discuss-boun...@lists.lustre.org> on behalf of
Ri
+to+the+Lustre+Manual+source
Chris Horn
From: lustre-discuss <lustre-discuss-boun...@lists.lustre.org> on behalf of
"Gibbins, Faye" <faye.gibb...@cirrus.com>
Date: Tuesday, June 6, 2017 at 6:41 AM
To: "lustre-discuss@lists.lustre.org" <lustre-discuss@lists.lustre.or
That work was done in LU-5614 and some related tickets. I think the one for
removing the kernel version from the package name was LU-7643.
Chris Horn
On 4/21/17, 12:58 PM, "lustre-discuss on behalf of Michael Di Domenico"
<lustre-discuss-boun...@lists.lustre.org on behalf
lnet networks=o2ib2(ib0)
Nodes con Cluster D: options lnet networks=o2ib3(ib0)”
Again, that’s just a guess on how these things are typically configured. You’ll
want to check if that is actually case for your clusters.
Chris Horn
On Jun 12, 2015, at 2:37 AM, Thrash Er mingorrubi
errno 16 is EBUSY (device or resource busy) and errno 114 is EALREADY
(Operation already in progress).
Chris Horn
On Feb 15, 2012, at 10:52 AM, Marina Cacciagrano wrote:
Hello,
On all the nodes of a lustre 1.8.2 , I often see messages similar to the
following in /var/log/syslog:
LustreError
Hard to say what's going on without additional context. The first message
relates to an MGS_CONNECT rpc (o250), the second messages relates to an
MDS_CONNECT rpc (o38). I would suspect network issues.
Chris Horn
On Feb 15, 2012, at 12:46 PM, Marina Cacciagrano wrote:
Thanks!
Maybe that means
FYI, there is some work being done to clean up obdfilter-survey. See
https://bugzilla.lustre.org/show_bug.cgi?id=24490
If there was a script issue you might try the patch from that bug to see if you
can reproduce.
https://bugzilla.lustre.org/show_bug.cgi?id=24490
Chris Horn
On Jul 6, 2011, at 3
29 matches
Mail list logo