Re: [Nfs-ganesha-devel] mdcache growing beyond limits.

2018-04-04 Thread Daniel Gryniewicz
Okay, thanks. That confirms to me that we need to do something else. I'll start to look into this ASAP. Daniel On 04/04/2018 12:37 PM, Pradeep wrote: Hi Daniel, I tried increasing lanes to 1023. The usage looks better, but still over the limit: $2 = {entries_hiwat = 10, entries_used

Re: [Nfs-ganesha-devel] mdcache growing beyond limits.

2018-04-04 Thread Pradeep
Hi Daniel, I tried increasing lanes to 1023. The usage looks better, but still over the limit: $2 = {entries_hiwat = 10, entries_used = 299838, chunks_hiwat = 10, chunks_used = 1235, fds_system_imposed = 1048576, fds_hard_limit = 1038090, fds_hiwat = 943718, fds_lowat = 524288,

Re: [Nfs-ganesha-devel] mdcache growing beyond limits.

2018-04-03 Thread Matt Benjamin
Hi Frank, On Tue, Apr 3, 2018 at 11:33 AM, Frank Filz wrote: > Thanks for the explanation. You are observing in practice something I > considered in theory... > > I like the idea of demoting entries when entries > entries_hiwat, Matt, > Daniel, do you see any negative

Re: [Nfs-ganesha-devel] mdcache growing beyond limits.

2018-04-03 Thread Frank Filz
> One thing I tried is to not allocate new entries and return NFS4ERR_DELAY in > the > hope that the increased refcnt at LRU is temporary. This worked for some time; > but then I hit a case where I see all the entries at the LRU of L1 has a > refcnt of 2 > and the subsequent entries have a

Re: [Nfs-ganesha-devel] mdcache growing beyond limits.

2018-04-03 Thread Pradeep
Hi Daniel, Sure I will try that. One thing I tried is to not allocate new entries and return NFS4ERR_DELAY in the hope that the increased refcnt at LRU is temporary. This worked for some time; but then I hit a case where I see all the entries at the LRU of L1 has a refcnt of 2 and the subsequent

Re: [Nfs-ganesha-devel] mdcache growing beyond limits.

2018-04-03 Thread Daniel Gryniewicz
So, the way this is supposed to work is that getting a ref when the ref is 1 is always an LRU_REQ_INITIAL ref, so that moves it to the MRU. At that point, further refs don't move it around in the queue, just increment the refcount. This should be the case, because mdcache_new_entry() and

Re: [Nfs-ganesha-devel] mdcache growing beyond limits.

2017-08-14 Thread Pradeep
On Fri, Aug 11, 2017 at 8:52 AM, Daniel Gryniewicz wrote: > Right, this is reaping. I was thinking it was the lane thread. Reaping > only looks at the single LRU of each queue. We should probably look at > some small number of each lane, like 2 or 3. > This is the lane

Re: [Nfs-ganesha-devel] mdcache growing beyond limits.

2017-08-11 Thread Daniel Gryniewicz
On 08/11/2017 12:10 PM, Frank Filz wrote: Right, this is reaping. I was thinking it was the lane thread. Reaping only looks at the single LRU of each queue. We should probably look at some small number of each lane, like 2 or 3. Frank, this, in combination with the PIN lane, it probably the

Re: [Nfs-ganesha-devel] mdcache growing beyond limits.

2017-08-11 Thread Frank Filz
> Right, this is reaping. I was thinking it was the lane thread. Reaping only > looks at the single LRU of each queue. We should probably look at some > small number of each lane, like 2 or 3. > > Frank, this, in combination with the PIN lane, it probably the issue. Yea, that would be a

Re: [Nfs-ganesha-devel] mdcache growing beyond limits.

2017-08-11 Thread Daniel Gryniewicz
Right, this is reaping. I was thinking it was the lane thread. Reaping only looks at the single LRU of each queue. We should probably look at some small number of each lane, like 2 or 3. Frank, this, in combination with the PIN lane, it probably the issue. Daniel On 08/11/2017 11:21 AM,

Re: [Nfs-ganesha-devel] mdcache growing beyond limits.

2017-08-11 Thread Matt Benjamin
It's not supposed to, as presently defined, right (scan resistence)? Matt On Fri, Aug 11, 2017 at 11:48 AM, Daniel Gryniewicz wrote: > On 08/11/2017 09:21 AM, Frank Filz wrote: >>> >>> That seems overkill to me. How many strategies would we support (and >>> test)? >>> >>> Part

Re: [Nfs-ganesha-devel] mdcache growing beyond limits.

2017-08-11 Thread Daniel Gryniewicz
On 08/11/2017 09:21 AM, Frank Filz wrote: That seems overkill to me. How many strategies would we support (and test)? Part of the problem is that we've drastically changed how FDs are handled. We need to rethink how LRU should work in that context, I think. I wonder also if taking pinning

Re: [Nfs-ganesha-devel] mdcache growing beyond limits.

2017-08-11 Thread Pradeep
Hi Daniel, I'm testing with 2.5.1. I haven't changed those parameters. Those parameters only affect once you are in lru_run_lane(), right? Since the FDs are lower than low-watermark, it never calls lru_run_lane(). Thanks, Pradeep On Fri, Aug 11, 2017 at 5:43 AM, Daniel Gryniewicz

Re: [Nfs-ganesha-devel] mdcache growing beyond limits.

2017-08-11 Thread Matt Benjamin
initially, just a couple--but the strategizing step forces an internal api to develop. Matt On Fri, Aug 11, 2017 at 8:49 AM, Daniel Gryniewicz wrote: > That seems overkill to me. How many strategies would we support (and test)? > > Part of the problem is that we've drastically

Re: [Nfs-ganesha-devel] mdcache growing beyond limits.

2017-08-11 Thread Daniel Gryniewicz
Have you set Reaper_Work? Have you changed LRU_N_Q_LANES? (and which version of Ganesha?) Daniel On 08/10/2017 07:12 PM, Pradeep wrote: Debugged this a little more. It appears that the entries that can be reaped are not at the LRU position (head) of the L1 queue. So those can be free'd

Re: [Nfs-ganesha-devel] mdcache growing beyond limits.

2017-08-10 Thread Matt Benjamin
I think the particular thresholds of opens and inode count are interacting in a way we'd like to change. I think it might make sense to delegate the various decision points to maybe a vector of strategy functions, letting more varied approaches compete? Matt On Thu, Aug 10, 2017 at 7:12 PM,

Re: [Nfs-ganesha-devel] mdcache growing beyond limits.

2017-08-10 Thread Pradeep
Debugged this a little more. It appears that the entries that can be reaped are not at the LRU position (head) of the L1 queue. So those can be free'd later by lru_run(). I don't see it happening either for some reason. (gdb) p LRU[1].L1 $29 = {q = {next = 0x7fb459e71960, prev = 0x7fb3ec3c0d30},

Re: [Nfs-ganesha-devel] mdcache growing beyond limits.

2017-08-07 Thread William Allen Simpson
On 8/7/17 9:42 AM, Frank Filz wrote: It never has been. In cache_inode, a pin-ref kept it from being reaped, now any ref beyond 1 keeps it. Guess we need to do something about that... We need to put limits on state somewhere, that would take care of it mostly. We could still have some files

Re: [Nfs-ganesha-devel] mdcache growing beyond limits.

2017-08-07 Thread Frank Filz
> It never has been. In cache_inode, a pin-ref kept it from being reaped, now > any ref beyond 1 keeps it. Guess we need to do something about that... We need to put limits on state somewhere, that would take care of it mostly. We could still have some files in excess of high water mark due to

Re: [Nfs-ganesha-devel] mdcache growing beyond limits.

2017-08-07 Thread Daniel Gryniewicz
It never has been. In cache_inode, a pin-ref kept it from being reaped, now any ref beyond 1 keeps it. On Fri, Aug 4, 2017 at 1:31 PM, Frank Filz wrote: >> I'm hitting a case where mdcache keeps growing well beyond the high water >> mark. Here is a snapshot of the

Re: [Nfs-ganesha-devel] mdcache growing beyond limits.

2017-08-04 Thread Daniel Gryniewicz
When you create a new entry, you get back a ref with it, that you are expected to release when you're done. In addition, the hash table has a ref, so the initial ref of an entry is 2. Otherwise, you'd have to create it, and immediately get_ref(), which is still racy. Daniel On 08/04/2017

Re: [Nfs-ganesha-devel] mdcache growing beyond limits.

2017-08-04 Thread Pradeep
My mistake. As you both correctly pointed out, refcnt needs to be 1 for reclaim. It is initialized with 2. So some must be doing an unref()/put() to make it 1. On 8/4/17, Daniel Gryniewicz wrote: > On 08/04/2017 01:14 PM, Pradeep wrote: >> Hello, >> >> I'm hitting a case where

Re: [Nfs-ganesha-devel] mdcache growing beyond limits.

2017-08-04 Thread Daniel Gryniewicz
On 08/04/2017 01:14 PM, Pradeep wrote: Hello, I'm hitting a case where mdcache keeps growing well beyond the high water mark. Here is a snapshot of the lru_state: 1 = {entries_hiwat = 10, entries_used = 2306063, chunks_hiwat = 10, chunks_used = 16462, It has grown to 2.3 million

Re: [Nfs-ganesha-devel] mdcache growing beyond limits.

2017-08-04 Thread Frank Filz
> I'm hitting a case where mdcache keeps growing well beyond the high water > mark. Here is a snapshot of the lru_state: > > 1 = {entries_hiwat = 10, entries_used = 2306063, chunks_hiwat = 10, > chunks_used = 16462, > > It has grown to 2.3 million entries and each entry is ~1.6K. > > I

[Nfs-ganesha-devel] mdcache growing beyond limits.

2017-08-04 Thread Pradeep
Hello, I'm hitting a case where mdcache keeps growing well beyond the high water mark. Here is a snapshot of the lru_state: 1 = {entries_hiwat = 10, entries_used = 2306063, chunks_hiwat = 10, chunks_used = 16462, It has grown to 2.3 million entries and each entry is ~1.6K. I looked at