Re: [gpfsug-discuss] Potential problems - leaving trace enabled in over-write mode?

2017-03-07 Thread Sobey, Richard A
We're in the same boat...gpfs snap hangs when the cluster / node is unresponsive but they don't know how to give us a root cause without one. Very frustrating. -Original Message- From: gpfsug-discuss-boun...@spectrumscale.org [mailto:gpfsug-discuss-boun...@spectrumscale.org] On Behalf

Re: [gpfsug-discuss] Potential problems - leaving trace enabled in over-write mode?

2017-03-07 Thread Aaron Knister
Hi Bob, I have the impression the biggest impact is to metadata-type operations rather than throughput but don't quote me on that because I have very little data to back it up. In the process of testing upgrading from GPFS 3.5 to 4.1 we ran fio on 1000 some nodes against an FS in our test

Re: [gpfsug-discuss] Potential problems - leaving trace enabled in over-write mode?

2017-03-07 Thread valdis . kletnieks
On Tue, 07 Mar 2017 21:17:35 +, Bryan Banister said: > Just depends on how your problem is detected… is it in a log? Is it found > by > running a command (.e.g mm*)? Is it discovered in `ps` output? Is your > scheduler failing jobs? I think the problem here is that if you have a sudden

Re: [gpfsug-discuss] Potential problems - leaving trace enabled in over-write mode?

2017-03-07 Thread Oesterlin, Robert
mmfsd crash - IBM says, “we need a trace to debug the issue”. Sigh Bob Oesterlin Sr Principal Storage Engineer, Nuance From: on behalf of Bryan Banister Reply-To: gpfsug main discussion list

Re: [gpfsug-discuss] Potential problems - leaving trace enabled in over-write mode?

2017-03-07 Thread Bryan Banister
Just depends on how your problem is detected… is it in a log? Is it found by running a command (.e.g mm*)? Is it discovered in `ps` output? Is your scheduler failing jobs? We have ongoing monitoring of most of these types of problem detection points and an automated process to capture a

Re: [gpfsug-discuss] Potential problems - leaving trace enabled in over-write mode?

2017-03-07 Thread Oesterlin, Robert
I’ve been told that V70o0 unified nodes (GPFS under the covers) run with tracing enabled all the time.. but I agree with you Brian on the potential impacts. But when you must catch a trace for a problem that occurs once every few weeks – how else would I do it? Bob Oesterlin Sr Principal

Re: [gpfsug-discuss] Potential problems - leaving trace enabled in over-write mode?

2017-03-07 Thread Bryan Banister
The performance impact can be quite significant depending on what you are tracing. We even having monitoring that looks for long running traces and the recommended action is to “kill with impunity!!” I believe IBM recommends never running clusters with continuous tracing. -Bryan From:

[gpfsug-discuss] Potential problems - leaving trace enabled in over-write mode?

2017-03-07 Thread Oesterlin, Robert
I’m considering enabling trace on all nodes all the time, doing something like this: mmtracectl --set --trace=def --trace-recycle=global --tracedev-write-mode=overwrite --tracedev-overwrite-buffer-size=256M mmtracectl --start My questions are: - What is the performance penalty of leaving this

[gpfsug-discuss] numaMemoryInterleave=yes

2017-03-07 Thread Matt Weil
Hello all Is this necessary any more? numastat -p mmfsd seems to spread it out without it. Thanks Matt The materials in this message are private and may contain Protected Healthcare Information or other information of a sensitive nature. If you are not the