Re: [lustre-discuss] LMT 3.2 - MDT display

2016-04-13 Thread Crowe, Tom
Thanks for the explanation Olaf. We updated LMT on a 2.7 file system, which only has a single MDT, so that explains why nothing “new” was showing up in the [ mdt list ] section. Thanks, Tom Crowe On Apr 13, 2016, at 5:40 PM, Faaland, Olaf P. mailto:faala...@llnl.gov>> wrote: Reposting to the

Re: [lustre-discuss] LMT 3.2 - MDT display

2016-04-13 Thread Faaland, Olaf P.
Reposting to the correct mailing list. To: Crowe, Tom; lustre-de...@lists.lustre.org Subject: Re: [lustre-devel] LMT 3.2 - MDT display Hi Tim, It sounds like maybe you see the summary and the OST list, but not the MDT list. The ltop display has 3 different compon

Re: [lustre-discuss] [lustre-devel] LMT 3.2 - MDT display

2016-04-13 Thread Christopher J. Morrone
This is more of a lustre-discuss topic, so I'm taking my reply there. Can you describe how you installed lmt? Most likely there is some problem with the cerebro setup, and the data isn't getting to the node on which you are running ltop. There are some instructions here: https://github.com/LLNL

Re: [lustre-discuss] MDS crashing: unable to handle kernel paging request at 00000000deadbeef (iam_container_init+0x18/0x70)

2016-04-13 Thread Mohr Jr, Richard Frank (Rick Mohr)
> On Apr 13, 2016, at 2:53 PM, Mark Hahn wrote: > thanks, we'll be trying the LU-5726 patch and cpu_npartitions things. > it's quite a long thread - do I understand correctly that periodic > vm.drop_caches=1 can postpone the issue? Not really. I was periodically dropping the caches as a way to

Re: [lustre-discuss] MDS crashing: unable to handle kernel paging request at 00000000deadbeef (iam_container_init+0x18/0x70)

2016-04-13 Thread Mark Hahn
We had to use lustre-2.5.3.90 on the MDS servers because of memory leak. https://jira.hpdd.intel.com/browse/LU-5726 Mark, If you don?t have the patch for LU-5726, then you should definitely try to get that one. If nothing else, reading through the bug report might be useful. It details som

Re: [lustre-discuss] MDS crashing: unable to handle kernel paging request at 00000000deadbeef (iam_container_init+0x18/0x70)

2016-04-13 Thread Mohr Jr, Richard Frank (Rick Mohr)
> On Apr 13, 2016, at 8:02 AM, Tommi T wrote: > > We had to use lustre-2.5.3.90 on the MDS servers because of memory leak. > > https://jira.hpdd.intel.com/browse/LU-5726 Mark, If you don’t have the patch for LU-5726, then you should definitely try to get that one. If nothing else, reading t

Re: [lustre-discuss] MDS crashing: unable to handle kernel paging request at 00000000deadbeef (iam_container_init+0x18/0x70)

2016-04-13 Thread Mohr Jr, Richard Frank (Rick Mohr)
> On Apr 12, 2016, at 6:46 PM, Mark Hahn wrote: > > all our existing Lustre MDSes run happily with vm.zone_reclaim_mode=0, > and making this one consistent appears to have resolved a problem > (in which one family of lustre kernel threads would appear to spin, > "perf top" showing nearly all tim

Re: [lustre-discuss] MDS crashing: unable to handle kernel paging request at 00000000deadbeef (iam_container_init+0x18/0x70)

2016-04-13 Thread Robin Humble
Hi Mark, On Tue, Apr 12, 2016 at 04:49:10PM -0400, Mark Hahn wrote: >One of our MDSs is crashing with the following: > >BUG: unable to handle kernel paging request at deadbeef >IP: [] iam_container_init+0x18/0x70 [osd_ldiskfs] >PGD 0 >Oops: 0002 [#1] SMP > >The MDS is running 2.5.3-RC1--PR

Re: [lustre-discuss] MDS crashing: unable to handle kernel paging request at 00000000deadbeef (iam_container_init+0x18/0x70)

2016-04-13 Thread Tommi T
Hi, We had to use lustre-2.5.3.90 on the MDS servers because of memory leak. https://jira.hpdd.intel.com/browse/LU-5726 I'm not sure if it's related but worthwhile to check. BR, Tommi - Original Message - From: Mark Hahn To: lustre-discuss@lists.lustre.org Sent: Tuesday, April 12, 2