Tomas,
Apologies for delayed response...
Tomas Ă–gren wrote:
Interesting ! So, it is not the ARC which is consuming too much memory....
It is some other piece (not sure if it belongs to ZFS) which is causing
the crunch...
Or the other possibility is that ARC ate up too much and caused a near
crunch situation
and the kmem hit back and caused ARC to free up it's buffers (hence the
no_grow flag enabled).
So, it (ARC) could be osscillating between large caching and then
purging the caches.
You might want to keep track of these values (ARC size and no_grow flag)
and see how they
change over a period of time. This would help us understand the pattern.
I would guess it grows after boot until it hits some max and then stays
there.. but I can check it out..
No, that is not true. Its shrinks when there is memory pressure. The
values of 'c' and 'p' are
adjusted accordingly.
And if we know it ARC which is causing the crunch we could manually
change the values of
c_max to a comfortable value and that would limit the size of ARC.
But in the ZFS world, DNLC is part of the ARC, right?
Not really... ZFS uses the regular DNLC for lookup optimization.
However, the metadata/data
is cached in the ARC.
My original question was how to get rid of "data cache", but keep
"metadata cache" (such as DNLC)...
This is good question. AFAIK ARC does not really differentiate between
metadata and data.
So, I am not sure if we can control it. However, as I mentioned above
ZFS still uses the DNLC caching.
However, I would suggest
that you try it out on a non-production machine first.
By, default the c_max is set to 75% of physmem and that is the hard
limit. "c" is the soft limit and
ARC would try and grow upto 'c". The value of "c" is adjusted when there
is a need to cache more
but, it will never exceed "c_max".
Regarding the huge number of reads, I am sure you have already tried
disabling the VDEV prefetch.
If not, it is worth a try.
That was part of my original question, how? :)
Apologies :-) I was digging around the code and I find that
zfs_vdev_cache_bshift is the one which would
control the amount that is read. Currenty it is set to 16. So, we should
be able to modify this and reduce
the prefetch.
However, I will have to double check with more people and get back to you.
Thanks and regards,
Sanjeev.
/Tomas
--
Solaris Revenue Products Engineering,
India Engineering Center,
Sun Microsystems India Pvt Ltd.
Tel: x27521 +91 80 669 27521
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss