We're in the same boat...gpfs snap hangs when the cluster / node is
unresponsive but they don't know how to give us a root cause without one. Very
frustrating.
-Original Message-
From: gpfsug-discuss-boun...@spectrumscale.org
[mailto:gpfsug-discuss-boun...@spectrumscale.org] On Behalf
Hi Bob,
I have the impression the biggest impact is to metadata-type operations
rather than throughput but don't quote me on that because I have very
little data to back it up. In the process of testing upgrading from GPFS
3.5 to 4.1 we ran fio on 1000 some nodes against an FS in our test
On Tue, 07 Mar 2017 21:17:35 +, Bryan Banister said:
> Just depends on how your problem is detected⦠is it in a log? Is it found
> by
> running a command (.e.g mm*)? Is it discovered in `ps` output? Is your
> scheduler failing jobs?
I think the problem here is that if you have a sudden
mmfsd crash - IBM says, “we need a trace to debug the issue”.
Sigh
Bob Oesterlin
Sr Principal Storage Engineer, Nuance
From: on behalf of Bryan Banister
Reply-To: gpfsug main discussion list
Just depends on how your problem is detected… is it in a log? Is it found by
running a command (.e.g mm*)? Is it discovered in `ps` output? Is your
scheduler failing jobs?
We have ongoing monitoring of most of these types of problem detection points
and an automated process to capture a
I’ve been told that V70o0 unified nodes (GPFS under the covers) run with
tracing enabled all the time.. but I agree with you Brian on the potential
impacts. But when you must catch a trace for a problem that occurs once every
few weeks – how else would I do it?
Bob Oesterlin
Sr Principal
The performance impact can be quite significant depending on what you are
tracing. We even having monitoring that looks for long running traces and the
recommended action is to “kill with impunity!!”
I believe IBM recommends never running clusters with continuous tracing.
-Bryan
From:
I’m considering enabling trace on all nodes all the time, doing something like
this:
mmtracectl --set --trace=def --trace-recycle=global
--tracedev-write-mode=overwrite --tracedev-overwrite-buffer-size=256M
mmtracectl --start
My questions are:
- What is the performance penalty of leaving this
Hello all
Is this necessary any more?
numastat -p mmfsd
seems to spread it out without it.
Thanks
Matt
The materials in this message are private and may contain Protected Healthcare
Information or other information of a sensitive nature. If you are not the