Re: [lustre-discuss] MGTMDT device getting full

2015-10-16 Thread Torsten Harenberg
Thanks Chris for both of your mails. One of my users is already heavily deleting files. Of course metadata usage goes down now, but too early to conclude if that will recover a substancial amount of space. Am 16.10.2015 um 19:50 schrieb Christopher J. Morrone: > Oh. Doh. I missed the low Inode

Re: [lustre-discuss] MGTMDT device getting full

2015-10-16 Thread Christopher J. Morrone
Oh. Doh. I missed the low Inode usage that you listed in your first email. Ben was right, that kind of does point to some runaway log files like changelogs. Perhaps you enabled them by accident? Do you have any kind of HSM? It would be good to check the changelog_users regardless. If

[lustre-discuss] OSS Panics in ptlrpc_prep_bulk_page

2015-10-16 Thread Exec Unerd
We have a smallish cluster -- a few thousands cores on the client side; four OSSs on the Lustre server side. Under otherwise normal operations, some of the clients will stop being able to find some of the OSTs. When this happens, the OSSs start seeing an escalating error count. As more clients

Re: [lustre-discuss] OSS Panics in ptlrpc_prep_bulk_page

2015-10-16 Thread Patrick Farrell
Good afternoon, I think you've got to unwind this a bit. You've got a massive number of communication errors - I'd start there and try to analyze those. You've also got nodes trying to reach the failover partners of some of your OSTs - Are the OSSes dying? (That could cause the

Re: [lustre-discuss] Large-scale UID/GID changes via lfs

2015-10-16 Thread Dilger, Andreas
Since you need to change the ownership on both the MDT and all OSTs (to maintain quota correctness) there isn't much benefit to using the lfs tools for this. Cheers, Andreas > On Oct 14, 2015, at 20:53, Ms. Megan Larko wrote: > > Hello, > > I have been able to

[lustre-discuss] MGTMDT device getting full

2015-10-16 Thread Torsten Harenberg
Dear all, I just noticed that the metadata device of our ~120 TB lustre system is getting full - see monitoring plot attached. However, the amount of data stored is more of less the same (usually around ~70% of its capacity). I am trying to undestand what that means. Does a full MGTMDT can be

Re: [lustre-discuss] MGTMDT device getting full

2015-10-16 Thread Ben Evans
Looks like you¹ve got some really large changelogs built up. Did you have robin hood, or some other consumer running at some point that has since stalled? -Ben Evans On 10/16/15, 7:24 AM, "lustre-discuss on behalf of Torsten Harenberg"

Re: [lustre-discuss] MGTMDT device getting full

2015-10-16 Thread Torsten Harenberg
Am 16.10.2015 um 16:01 schrieb Ben Evans: > Looks like you¹ve got some really large changelogs built up. Did you have > robin hood, or some other consumer running at some point that has since > stalled? Don't think so, as I never heard about "Robin Hood" in the context of Lustre. The setup is

Re: [lustre-discuss] MGTMDT device getting full

2015-10-16 Thread Dilger, Andreas
On 2015/10/16, 12:03, "lustre-discuss on behalf of Torsten Harenberg" wrote: >Thanks Chris for both of your mails. > >One of my users is already heavily deleting files. Of course metadata >usage goes down now, but

Re: [lustre-discuss] OSS Panics in ptlrpc_prep_bulk_page

2015-10-16 Thread Stearman, Marc
I agree with Patrick. Funny thing about network based file systems, is that they tend not to work when the network is failing. Are you seeing any errors on your IB fabric? Any errors from the subnet manager? -Marc D. Marc Stearman Lustre Operations Lead stearm...@llnl.gov Office:

Re: [lustre-discuss] MGTMDT device getting full

2015-10-16 Thread Christopher J. Morrone
Hi Torsten, There is no reason to suspect that space usage on the MDT will be the same as the average space usage on the OSTs. Your MDT is storing the metadata about _all_ of the files in your Lustre filesystem. You can think of this metadata as a whole bunch of zero-length files with some