Re: [lustre-discuss] Updating mgsnode IP command completes successfully, but old IP remains

2021-12-06 Thread Nathan Dauchy - NOAA Affiliate via lustre-discuss
Ricardo, Your --mgsnode specification with all commas implies that you have four NIDs on a single host. But the rest of your writeup indicates two hosts. >From the Lustre manual, "13.12. Specifying NIDs and Failover": > Where multiple NIDs are specified separated by commas (for example, > 10.67

Re: [lustre-discuss] trimming flash-based external journal device

2021-08-05 Thread Nathan Dauchy - NOAA Affiliate via lustre-discuss
On Thu, Aug 5, 2021 at 3:23 PM Andreas Dilger wrote: > On Aug 5, 2021, at 13:29, Nathan Dauchy wrote: > > Andreas, thanks as always for your insight. Comments inline... > > On Thu, Aug 5, 2021 at 10:48 AM Andreas Dilger > wrote: > >> On Aug 5, 2021, at 09:28, Nathan Dauchy via lustre-discuss <

Re: [lustre-discuss] trimming flash-based external journal device

2021-08-05 Thread Nathan Dauchy - NOAA Affiliate via lustre-discuss
Andreas, thanks as always for your insight. Comments inline... On Thu, Aug 5, 2021 at 10:48 AM Andreas Dilger wrote: > On Aug 5, 2021, at 09:28, Nathan Dauchy via lustre-discuss < > lustre-discuss@lists.lustre.org> wrote: > > Question: Is it possible that a flash journal device on an ext4 > fi

[lustre-discuss] trimming flash-based external journal device

2021-08-05 Thread Nathan Dauchy - NOAA Affiliate via lustre-discuss
Greetings ext4 and flash storage experts! Motivation: We have ldiskfs OSTs that are primarily HDDs and use a Flash device for an external journal device. Recent IOR benchmarks showed that write performance dropped (suddenly?) to about 25% of the original baseline, yet read performance remains fi

Re: [lustre-discuss] Disabling max creates and migrating data doesn't seem to be reducing the usage on an OST

2021-02-17 Thread Nathan Dauchy - NOAA Affiliate via lustre-discuss
On Tue, Feb 16, 2021 at 9:46 AM Kurt Strosahl wrote: > During a maintenance window today I revooted the OSS that OST had been > mounted on, after it came up the usage dropped significantly > Kurt, That behavior you describe sounds a bit like issues reported on this list a while back: Data migr

Re: [lustre-discuss] Improving file create performance with larger create_count

2021-01-08 Thread Nathan Dauchy - NOAA Affiliate
Andreas, thanks for the insight and advice. Followup details inline below... On Thu, Jan 7, 2021 at 9:56 PM Andreas Dilger wrote: > On Jan 7, 2021, at 08:54, Nathan Dauchy - NOAA Affiliate wrote: > > I am looking for assistance on how to improve file create rate, as > measured

[lustre-discuss] Improving file create performance with larger create_count

2021-01-07 Thread Nathan Dauchy - NOAA Affiliate
Greetings Lustre Experts! I am looking for assistance on how to improve file create rate, as measured with MDtest. In particular, this is for filesystems with (4) MDTs that use progressive file layouts (PFL) to place the first part of each file on one of the (2) Flash OSTs, with the remainder of

Re: [lustre-discuss] More issues with cur_grant_bytes

2020-12-09 Thread Nathan Dauchy - NOAA Affiliate
On Tue, Dec 8, 2020 at 11:46 AM Kevin M. Hildebrand wrote: > We appear to be tripping over the same issues reported recently by > Tung-Han Hsieh and Simon Guilbault, namely that cur_grant_bytes is being > reduced to a very small value and causing abysmal performance. > I'm curious if anyone enco

Re: [lustre-discuss] Hidden QoS in Lustre ?

2020-11-11 Thread Nathan Dauchy - NOAA Affiliate
Simon, Tung-Han, You may also want to watch these tickets that seem to be related to the issue you describe: https://jira.whamcloud.com/browse/LU-14124 https://jira.whamcloud.com/browse/LU-14125 -Nathan On Mon, Nov 2, 2020 at 7:18 AM Simon Guilbault < simon.guilba...@calculquebec.ca> wrote: >

Re: [lustre-discuss] DF bug with lustre 2.12.4

2020-02-20 Thread Nathan Dauchy - NOAA Affiliate
On Thu, Feb 20, 2020 at 11:47 AM Konzem, Kevin P < kkon...@contractor.usgs.gov> wrote: > test this by running 'while [ true ];do /bin/df -TP /performance;done' on > two sessions on the same client. As soon as I start the second while loop, > the outputs go from: > Filesystem Type

Re: [lustre-discuss] MDT deadlocks LU-10697

2019-11-13 Thread Nathan Dauchy - NOAA Affiliate
On Wed, Nov 13, 2019 at 4:28 AM Thomas Roth wrote: > Hi all, > > we keep hitting LU-10697, which makes the users' experience quite painful. > There was a related issue in Lustre 2.12/2.13 which is also unresolved - > can't find the LU- at the moment. > > Thomas, Perhaps you are looking for LU-12

Re: [lustre-discuss] A question about lctl lfsck

2019-07-04 Thread Nathan Dauchy - NOAA Affiliate
On Wed, Jul 3, 2019 at 2:15 PM Kurt Strosahl wrote: > > Hopefully a simple question... If I run lctl lfsck_start is there a place > where I can get a list of what it did? > > Kurt, As far as I know, this is still an open feature request... https://jira.whamcloud.com/browse/LU-5202 (LFSCK 5: LFSC

Re: [lustre-discuss] Lustre client memory and MemoryAvailable

2019-04-24 Thread Nathan Dauchy - NOAA Affiliate
On Mon, Apr 15, 2019 at 9:18 PM Jacek Tomaka wrote: > > >signal_cache should have one entry for each process (or thread-group). > > That is what i thought as well, looking at the kernel source, allocations > from > signal_cache happen only during fork. > > I was recently chasing an issue with cli

Re: [lustre-discuss] stripe count recommendation, and proposal for auto-stripe tool

2016-06-09 Thread Nathan Dauchy - NOAA Affiliate
0552/ ? Thanks, Nathan On Wed, May 18, 2016 at 1:30 PM, Dilger, Andreas wrote: > > Cheers, Andreas > -- > Andreas Dilger > Lustre Principal Architect > Intel High Performance Data Division > > On 2016/05/18, 11:22, "Nathan Dauchy - NOAA Affiliate" < > nath

Re: [lustre-discuss] stripe count recommendation, and proposal for auto-stripe tool

2016-05-26 Thread Nathan Dauchy - NOAA Affiliate
Andreas, Thanks very much for your comments... On Wed, May 18, 2016 at 1:30 PM, Dilger, Andreas wrote: > > On 2016/05/18, 11:22, "Nathan Dauchy - NOAA Affiliate" < > nathan.dau...@noaa.gov> wrote: > > I'm looking for your experience and perhaps some

Re: [lustre-discuss] stripe count recommendation, and proposal for auto-stripe tool

2016-05-19 Thread Nathan Dauchy - NOAA Affiliate
On Wed, May 18, 2016 at 2:04 PM, Mohr Jr, Richard Frank (Rick Mohr) < rm...@utk.edu> wrote: > > 2) Use some sort of formula (like ORNL’s “file_size/100GB” or even your > log function) > > Since I mainly care about striping for large files and I want the stripe > count to increase as file size grow

Re: [lustre-discuss] stripe count recommendation, and proposal for auto-stripe tool

2016-05-19 Thread Nathan Dauchy - NOAA Affiliate
to understand the application access pattern. I can't see another way to > do that goal justice. (The Lustre ADIO in the MPI I/O library does this, > partly by controlling the I/O pattern through I/O aggregation for > collective I/Os.) > > So I think your tool can definitely

[lustre-discuss] stripe count recommendation, and proposal for auto-stripe tool

2016-05-18 Thread Nathan Dauchy - NOAA Affiliate
Greetings All, I'm looking for your experience and perhaps some lively discussion regarding "best practices" for choosing a file stripe count. The Lustre manual has good tips on "Choosing a Stripe Size", and in practice the default 1M rarely causes problems on our systems. Stripe Count on the oth

Re: [lustre-discuss] lshowmount equivalent?

2015-12-14 Thread Nathan Dauchy - NOAA Affiliate
On Mon, Dec 14, 2015 at 9:08 AM, Scott Nolin wrote: > > On 12/14/2015 12:43 AM, Dilger, Andreas wrote: > ... > >> Is this a tool that you are using? IIRC, there wasn't a particular reason >> that it was removed, except that when we asked LLNL (the authors) they >> said they were no longer using