Re: [zfs-discuss] Some performance questions with ZFS/NFS/DNLC at snv_48
Tomas, Apologies for delayed response... Tomas Ögren wrote: Interesting ! So, it is not the ARC which is consuming too much memory It is some other piece (not sure if it belongs to ZFS) which is causing the crunch... Or the other possibility is that ARC ate up too much and caused a near crunch situation and the kmem hit back and caused ARC to free up it's buffers (hence the no_grow flag enabled). So, it (ARC) could be osscillating between large caching and then purging the caches. You might want to keep track of these values (ARC size and no_grow flag) and see how they change over a period of time. This would help us understand the pattern. I would guess it grows after boot until it hits some max and then stays there.. but I can check it out.. No, that is not true. Its shrinks when there is memory pressure. The values of 'c' and 'p' are adjusted accordingly. And if we know it ARC which is causing the crunch we could manually change the values of c_max to a comfortable value and that would limit the size of ARC. But in the ZFS world, DNLC is part of the ARC, right? Not really... ZFS uses the regular DNLC for lookup optimization. However, the metadata/data is cached in the ARC. My original question was how to get rid of data cache, but keep metadata cache (such as DNLC)... This is good question. AFAIK ARC does not really differentiate between metadata and data. So, I am not sure if we can control it. However, as I mentioned above ZFS still uses the DNLC caching. However, I would suggest that you try it out on a non-production machine first. By, default the c_max is set to 75% of physmem and that is the hard limit. c is the soft limit and ARC would try and grow upto 'c. The value of c is adjusted when there is a need to cache more but, it will never exceed c_max. Regarding the huge number of reads, I am sure you have already tried disabling the VDEV prefetch. If not, it is worth a try. That was part of my original question, how? :) Apologies :-) I was digging around the code and I find that zfs_vdev_cache_bshift is the one which would control the amount that is read. Currenty it is set to 16. So, we should be able to modify this and reduce the prefetch. However, I will have to double check with more people and get back to you. Thanks and regards, Sanjeev. /Tomas -- Solaris Revenue Products Engineering, India Engineering Center, Sun Microsystems India Pvt Ltd. Tel:x27521 +91 80 669 27521 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Some performance questions with ZFS/NFS/DNLC at snv_48
On 13 November, 2006 - Eric Kustarz sent me these 2,4K bytes: Tomas Ögren wrote: On 13 November, 2006 - Sanjeev Bagewadi sent me these 7,1K bytes: Regarding the huge number of reads, I am sure you have already tried disabling the VDEV prefetch. If not, it is worth a try. That was part of my original question, how? :) On recent bits, you can set 'zfs_vdev_cache_max' to 1 to disable the vdev cache. On earlier versions (snv_48), I did similar with ztune.sh[0], adding cache_size which I set to 0 (instead of 10M). This helped quite a lot, but there seems to be one more level of prefetching.. Example: capacity operationsbandwidth pool used avail read write read write -- - - - - - - ftp 1.67T 2.15T 1.26K 23 40.9M 890K raidz21.37T 551G674 10 22.3M 399K c4t0d0 - -210 3 3.19M 80.4K c4t1d0 - -211 3 3.19M 80.4K c4t2d0 - -211 3 3.19M 80.4K c5t0d0 - -210 3 3.19M 80.4K c5t1d0 - -242 4 3.19M 80.4K c5t2d0 - -211 3 3.19M 80.4K c5t3d0 - -211 3 3.19M 80.4K raidz2 305G 1.61T614 12 18.6M 491K c4t3d0 - -222 5 2.66M 99.1K c4t4d0 - -223 5 2.66M 99.1K c4t5d0 - -224 5 2.66M 99.1K c4t8d0 - -190 5 2.66M 99.1K c5t4d0 - -190 5 2.66M 99.1K c5t5d0 - -226 5 2.66M 99.1K c5t8d0 - -225 5 2.66M 99.1K -- - - - - - - Before this fix, the 'read bandwidth' for disks in the first raidz2 added up to way more than the raidz2 itself.. now it adds up correctly, but some other readahead causes a 1-10x factor too much, mostly hovering around 2-3x.. before it was hovering around 8-10x.. [0]: http://blogs.sun.com/roch/resource/ztune.sh /Tomas -- Tomas Ögren, [EMAIL PROTECTED], http://www.acc.umu.se/~stric/ |- Student at Computing Science, University of Umeå `- Sysadmin at {cs,acc}.umu.se ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Some performance questions with ZFS/NFS/DNLC at snv_48
On 13 November, 2006 - Sanjeev Bagewadi sent me these 7,1K bytes: Tomas, comments inline... arc::print struct arc { anon = ARC_anon mru = ARC_mru mru_ghost = ARC_mru_ghost mfu = ARC_mfu mfu_ghost = ARC_mfu_ghost size = 0x6f7a400 p = 0x5d9bd5a c = 0x5f6375a c_min = 0x400 c_max = 0x2e82a000 hits = 0x40e0a15 misses = 0x1cec4a4 deleted = 0x1b0ba0d skipped = 0x24ea64e13 hash_elements = 0x179d hash_elements_max = 0x60bb hash_collisions = 0x8dca3a hash_chains = 0x391 hash_chain_max = 0x8 no_grow = 0x1 } So, about 100MB and a memory crunch.. Interesting ! So, it is not the ARC which is consuming too much memory It is some other piece (not sure if it belongs to ZFS) which is causing the crunch... Or the other possibility is that ARC ate up too much and caused a near crunch situation and the kmem hit back and caused ARC to free up it's buffers (hence the no_grow flag enabled). So, it (ARC) could be osscillating between large caching and then purging the caches. You might want to keep track of these values (ARC size and no_grow flag) and see how they change over a period of time. This would help us understand the pattern. I would guess it grows after boot until it hits some max and then stays there.. but I can check it out.. And if we know it ARC which is causing the crunch we could manually change the values of c_max to a comfortable value and that would limit the size of ARC. But in the ZFS world, DNLC is part of the ARC, right? My original question was how to get rid of data cache, but keep metadata cache (such as DNLC)... However, I would suggest that you try it out on a non-production machine first. By, default the c_max is set to 75% of physmem and that is the hard limit. c is the soft limit and ARC would try and grow upto 'c. The value of c is adjusted when there is a need to cache more but, it will never exceed c_max. Regarding the huge number of reads, I am sure you have already tried disabling the VDEV prefetch. If not, it is worth a try. That was part of my original question, how? :) /Tomas -- Tomas Ögren, [EMAIL PROTECTED], http://www.acc.umu.se/~stric/ |- Student at Computing Science, University of Umeå `- Sysadmin at {cs,acc}.umu.se ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Some performance questions with ZFS/NFS/DNLC at snv_48
Tomas Ögren writes: On 13 November, 2006 - Sanjeev Bagewadi sent me these 7,1K bytes: Tomas, comments inline... arc::print struct arc { anon = ARC_anon mru = ARC_mru mru_ghost = ARC_mru_ghost mfu = ARC_mfu mfu_ghost = ARC_mfu_ghost size = 0x6f7a400 p = 0x5d9bd5a c = 0x5f6375a c_min = 0x400 c_max = 0x2e82a000 hits = 0x40e0a15 misses = 0x1cec4a4 deleted = 0x1b0ba0d skipped = 0x24ea64e13 hash_elements = 0x179d hash_elements_max = 0x60bb hash_collisions = 0x8dca3a hash_chains = 0x391 hash_chain_max = 0x8 no_grow = 0x1 } So, about 100MB and a memory crunch.. Interesting ! So, it is not the ARC which is consuming too much memory It is some other piece (not sure if it belongs to ZFS) which is causing the crunch... Or the other possibility is that ARC ate up too much and caused a near crunch situation and the kmem hit back and caused ARC to free up it's buffers (hence the no_grow flag enabled). So, it (ARC) could be osscillating between large caching and then purging the caches. You might want to keep track of these values (ARC size and no_grow flag) and see how they change over a period of time. This would help us understand the pattern. I would guess it grows after boot until it hits some max and then stays there.. but I can check it out.. And if we know it ARC which is causing the crunch we could manually change the values of c_max to a comfortable value and that would limit the size of ARC. But in the ZFS world, DNLC is part of the ARC, right? My original question was how to get rid of data cache, but keep metadata cache (such as DNLC)... However, I would suggest that you try it out on a non-production machine first. By, default the c_max is set to 75% of physmem and that is the hard limit. c is the soft limit and ARC would try and grow upto 'c. The value of c is adjusted when there is a need to cache more but, it will never exceed c_max. Regarding the huge number of reads, I am sure you have already tried disabling the VDEV prefetch. If not, it is worth a try. That was part of my original question, how? :) /Tomas -- Tomas Ögren, [EMAIL PROTECTED], http://www.acc.umu.se/~stric/ |- Student at Computing Science, University of Umeå `- Sysadmin at {cs,acc}.umu.se ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss Under memory pressure the arc will shrink and it will also shrink the dnlc by 3%. arc_reduce_dnlc_percent = 3 You could try to tune that number. -r ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Some performance questions with ZFS/NFS/DNLC at snv_48
Tomas Ögren wrote: On 13 November, 2006 - Sanjeev Bagewadi sent me these 7,1K bytes: Tomas, comments inline... arc::print struct arc { anon = ARC_anon mru = ARC_mru mru_ghost = ARC_mru_ghost mfu = ARC_mfu mfu_ghost = ARC_mfu_ghost size = 0x6f7a400 p = 0x5d9bd5a c = 0x5f6375a c_min = 0x400 c_max = 0x2e82a000 hits = 0x40e0a15 misses = 0x1cec4a4 deleted = 0x1b0ba0d skipped = 0x24ea64e13 hash_elements = 0x179d hash_elements_max = 0x60bb hash_collisions = 0x8dca3a hash_chains = 0x391 hash_chain_max = 0x8 no_grow = 0x1 } So, about 100MB and a memory crunch.. Interesting ! So, it is not the ARC which is consuming too much memory It is some other piece (not sure if it belongs to ZFS) which is causing the crunch... Or the other possibility is that ARC ate up too much and caused a near crunch situation and the kmem hit back and caused ARC to free up it's buffers (hence the no_grow flag enabled). So, it (ARC) could be osscillating between large caching and then purging the caches. You might want to keep track of these values (ARC size and no_grow flag) and see how they change over a period of time. This would help us understand the pattern. I would guess it grows after boot until it hits some max and then stays there.. but I can check it out.. And if we know it ARC which is causing the crunch we could manually change the values of c_max to a comfortable value and that would limit the size of ARC. But in the ZFS world, DNLC is part of the ARC, right? My original question was how to get rid of data cache, but keep metadata cache (such as DNLC)... However, I would suggest that you try it out on a non-production machine first. By, default the c_max is set to 75% of physmem and that is the hard limit. c is the soft limit and ARC would try and grow upto 'c. The value of c is adjusted when there is a need to cache more but, it will never exceed c_max. Regarding the huge number of reads, I am sure you have already tried disabling the VDEV prefetch. If not, it is worth a try. That was part of my original question, how? :) /Tomas On recent bits, you can set 'zfs_vdev_cache_max' to 1 to disable the vdev cache. eric ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Some performance questions with ZFS/NFS/DNLC at snv_48
Tomas, comments inline... Tomas Ögren wrote: On 10 November, 2006 - Sanjeev Bagewadi sent me these 3,5K bytes: 1. DNLC-through-ZFS doesn't seem to listen to ncsize. The filesystem currently has ~550k inodes and large portions of it is frequently looked over with rsync (over nfs). mdb said ncsize was about 68k and vmstat -s said we had a hitrate of ~30%, so I set ncsize to 600k and rebooted.. Didn't seem to change much, still seeing hitrates at about the same and manual find(1) doesn't seem to be that cached (according to vmstat and dnlcsnoop.d). When booting, the following message came up, not sure if it matters or not: NOTICE: setting nrnode to max value of 351642 NOTICE: setting nrnode to max value of 235577 Is there a separate ZFS-DNLC knob to adjust for this? Wild guess is that it has its own implementation which is integrated with the rest of the ZFS cache which throws out metadata cache in favour of data cache.. or something.. Current memory usage (for some values of usage ;): # echo ::memstat|mdb -k Page SummaryPagesMB %Tot Kernel 95584 746 75% Anon20868 163 16% Exec and libs1703131% Page cache 1007 71% Free (cachelist) 97 00% Free (freelist) 7745606% Total 127004 992 Physical 125192 978 /Tomas This memory usage shows nearly all of memory consumed by the kernel and probably by ZFS. ZFS can't add any more DNLC entries due to lack of memory without purging others. This can be seen from the number of dnlc_nentries being way less than ncsize. I don't know if there's a DMU or ARC bug to reduce the memory footprint of their internal structures for situations like this, but we are aware of the issue. Can you please check the zio buffers and the arc status ? Here is how you can do it : - Start mdb : ie. mdb -k ::kmem_cache - In the output generated above check the amount consumed by the zio_buf_*, arc_buf_t and arc_buf_hdr_t. ADDR NAME FLAG CFLAG BUFSIZE BUFTOTL 030002640a08 zio_buf_512 02 512 102675 030002640c88 zio_buf_1024 0200 02 1024 48 030002640f08 zio_buf_1536 0200 02 1536 70 030002641188 zio_buf_2048 0200 02 2048 16 030002641408 zio_buf_2560 0200 02 25609 030002641688 zio_buf_3072 0200 02 3072 16 030002641908 zio_buf_3584 0200 02 3584 18 030002641b88 zio_buf_4096 0200 02 4096 12 030002668008 zio_buf_5120 0200 02 5120 32 030002668288 zio_buf_6144 0200 02 61448 030002668508 zio_buf_7168 0200 02 7168 1032 030002668788 zio_buf_8192 0200 02 81928 030002668a08 zio_buf_10240 0200 02102408 030002668c88 zio_buf_12288 0200 02122884 030002668f08 zio_buf_14336 0200 0214336 468 030002669188 zio_buf_16384 0200 0216384 3326 030002669408 zio_buf_20480 0200 0220480 16 030002669688 zio_buf_24576 0200 02245763 030002669908 zio_buf_28672 0200 0228672 12 030002669b88 zio_buf_32768 0200 0232768 1935 03000266c008 zio_buf_40960 0200 0240960 13 03000266c288 zio_buf_49152 0200 02491529 03000266c508 zio_buf_57344 0200 02573447 03000266c788 zio_buf_65536 0200 0265536 3272 03000266ca08 zio_buf_73728 0200 0273728 10 03000266cc88 zio_buf_81920 0200 02819207 03000266cf08 zio_buf_90112 0200 02901125 03000266d188 zio_buf_98304 0200 02983047 03000266d408 zio_buf_1064960200 02 106496 12 03000266d688 zio_buf_1146880200 02 1146886 03000266d908 zio_buf_1228800200 02 1228805 03000266db88 zio_buf_1310720200 02 131072 92 030002670508 arc_buf_hdr_t 00 12811970 030002670788 arc_buf_t 00 40 7308 - Dump the values of arc arc::print struct arc arc::print struct arc {
Re: [zfs-discuss] Some performance questions with ZFS/NFS/DNLC at snv_48
Comments in line... Neil Perrin wrote: 1. DNLC-through-ZFS doesn't seem to listen to ncsize. The filesystem currently has ~550k inodes and large portions of it is frequently looked over with rsync (over nfs). mdb said ncsize was about 68k and vmstat -s said we had a hitrate of ~30%, so I set ncsize to 600k and rebooted.. Didn't seem to change much, still seeing hitrates at about the same and manual find(1) doesn't seem to be that cached (according to vmstat and dnlcsnoop.d). When booting, the following message came up, not sure if it matters or not: NOTICE: setting nrnode to max value of 351642 NOTICE: setting nrnode to max value of 235577 Is there a separate ZFS-DNLC knob to adjust for this? Wild guess is that it has its own implementation which is integrated with the rest of the ZFS cache which throws out metadata cache in favour of data cache.. or something.. Current memory usage (for some values of usage ;): # echo ::memstat|mdb -k Page SummaryPagesMB %Tot Kernel 95584 746 75% Anon20868 163 16% Exec and libs1703131% Page cache 1007 71% Free (cachelist) 97 00% Free (freelist) 7745606% Total 127004 992 Physical 125192 978 /Tomas This memory usage shows nearly all of memory consumed by the kernel and probably by ZFS. ZFS can't add any more DNLC entries due to lack of memory without purging others. This can be seen from the number of dnlc_nentries being way less than ncsize. I don't know if there's a DMU or ARC bug to reduce the memory footprint of their internal structures for situations like this, but we are aware of the issue. Can you please check the zio buffers and the arc status ? Here is how you can do it : - Start mdb : ie. mdb -k ::kmem_cache - In the output generated above check the amount consumed by the zio_buf_*, arc_buf_t and arc_buf_hdr_t. - Dump the values of arc arc::print struct arc - This should give you some like below. -- snip-- arc::print struct arc { anon = ARC_anon mru = ARC_mru mru_ghost = ARC_mru_ghost mfu = ARC_mfu mfu_ghost = ARC_mfu_ghost size = 0x3e2 -- tells you the current memory consumed by ARC buffer (including the the memory consumed for the data cached ie. zio_buff_* p = 0x1d06a06 c = 0x400 c_min = 0x400 c_max = 0x2f9aa800 hits = 0x2fd2 misses = 0xd1c deleted = 0x296 skipped = 0 hash_elements = 0xa85 hash_elements_max = 0xcc0 hash_collisions = 0x173 hash_chains = 0xbe hash_chain_max = 0x2 no_grow = 0 -- This would be set to 1 if we have a memory crunch } -- snip -- And as Niel pointed out we would probably need some way of limiting the ARC consumption. Regards, Sanjeev. Neil. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- Solaris Revenue Products Engineering, India Engineering Center, Sun Microsystems India Pvt Ltd. Tel:x27521 +91 80 669 27521 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Some performance questions with ZFS/NFS/DNLC at snv_48
On 10 November, 2006 - Sanjeev Bagewadi sent me these 3,5K bytes: Comments in line... Neil Perrin wrote: 1. DNLC-through-ZFS doesn't seem to listen to ncsize. The filesystem currently has ~550k inodes and large portions of it is frequently looked over with rsync (over nfs). mdb said ncsize was about 68k and vmstat -s said we had a hitrate of ~30%, so I set ncsize to 600k and rebooted.. Didn't seem to change much, still seeing hitrates at about the same and manual find(1) doesn't seem to be that cached (according to vmstat and dnlcsnoop.d). When booting, the following message came up, not sure if it matters or not: NOTICE: setting nrnode to max value of 351642 NOTICE: setting nrnode to max value of 235577 Is there a separate ZFS-DNLC knob to adjust for this? Wild guess is that it has its own implementation which is integrated with the rest of the ZFS cache which throws out metadata cache in favour of data cache.. or something.. Current memory usage (for some values of usage ;): # echo ::memstat|mdb -k Page SummaryPagesMB %Tot Kernel 95584 746 75% Anon20868 163 16% Exec and libs1703131% Page cache 1007 71% Free (cachelist) 97 00% Free (freelist) 7745606% Total 127004 992 Physical 125192 978 /Tomas This memory usage shows nearly all of memory consumed by the kernel and probably by ZFS. ZFS can't add any more DNLC entries due to lack of memory without purging others. This can be seen from the number of dnlc_nentries being way less than ncsize. I don't know if there's a DMU or ARC bug to reduce the memory footprint of their internal structures for situations like this, but we are aware of the issue. Can you please check the zio buffers and the arc status ? Here is how you can do it : - Start mdb : ie. mdb -k ::kmem_cache - In the output generated above check the amount consumed by the zio_buf_*, arc_buf_t and arc_buf_hdr_t. ADDR NAME FLAG CFLAG BUFSIZE BUFTOTL 030002640a08 zio_buf_512 02 512 102675 030002640c88 zio_buf_1024 0200 02 1024 48 030002640f08 zio_buf_1536 0200 02 1536 70 030002641188 zio_buf_2048 0200 02 2048 16 030002641408 zio_buf_2560 0200 02 25609 030002641688 zio_buf_3072 0200 02 3072 16 030002641908 zio_buf_3584 0200 02 3584 18 030002641b88 zio_buf_4096 0200 02 4096 12 030002668008 zio_buf_5120 0200 02 5120 32 030002668288 zio_buf_6144 0200 02 61448 030002668508 zio_buf_7168 0200 02 7168 1032 030002668788 zio_buf_8192 0200 02 81928 030002668a08 zio_buf_10240 0200 02102408 030002668c88 zio_buf_12288 0200 02122884 030002668f08 zio_buf_14336 0200 0214336 468 030002669188 zio_buf_16384 0200 0216384 3326 030002669408 zio_buf_20480 0200 0220480 16 030002669688 zio_buf_24576 0200 02245763 030002669908 zio_buf_28672 0200 0228672 12 030002669b88 zio_buf_32768 0200 0232768 1935 03000266c008 zio_buf_40960 0200 0240960 13 03000266c288 zio_buf_49152 0200 02491529 03000266c508 zio_buf_57344 0200 02573447 03000266c788 zio_buf_65536 0200 0265536 3272 03000266ca08 zio_buf_73728 0200 0273728 10 03000266cc88 zio_buf_81920 0200 02819207 03000266cf08 zio_buf_90112 0200 02901125 03000266d188 zio_buf_98304 0200 02983047 03000266d408 zio_buf_1064960200 02 106496 12 03000266d688 zio_buf_1146880200 02 1146886 03000266d908 zio_buf_1228800200 02 1228805 03000266db88 zio_buf_1310720200 02 131072 92 030002670508 arc_buf_hdr_t 00 12811970 030002670788 arc_buf_t 00 40 7308 - Dump the values of arc arc::print struct arc arc::print struct arc { anon = ARC_anon mru =
[zfs-discuss] Some performance questions with ZFS/NFS/DNLC at snv_48
Hello. We're currently using a Sun Blade1000 (2x750MHz, 1G ram, 2x160MB/s mpt scsi buses, skge GigE network) as a NFS backend with ZFS for distribution of free software like Debian (cdimage.debian.org, ftp.se.debian.org) and have run into some performance issues. We are running SX snv_48 and have run with a raidz2 with 7x300G for a while now, just added another 7x300G raidz2 today but I'll stick to old information so far. Tried Sol10u2 before, but nfs writes killed every bit of performance, snv_48 works much better in that regard. Working data set is about 1.2TB over ~550k inodes right now. Backend serves data to 2-4 linux frontends running Apache (with local raid0 mod_disk_cache), rsync (looking through entire debian trees every now and then) and vsftp (not used much). There are (at least?) two types of performance issues we've run into.. 1. DNLC-through-ZFS doesn't seem to listen to ncsize. The filesystem currently has ~550k inodes and large portions of it is frequently looked over with rsync (over nfs). mdb said ncsize was about 68k and vmstat -s said we had a hitrate of ~30%, so I set ncsize to 600k and rebooted.. Didn't seem to change much, still seeing hitrates at about the same and manual find(1) doesn't seem to be that cached (according to vmstat and dnlcsnoop.d). When booting, the following message came up, not sure if it matters or not: NOTICE: setting nrnode to max value of 351642 NOTICE: setting nrnode to max value of 235577 Is there a separate ZFS-DNLC knob to adjust for this? Wild guess is that it has its own implementation which is integrated with the rest of the ZFS cache which throws out metadata cache in favour of data cache.. or something.. 2. Readahead or something is killing all signs of performance Since there can be pretty many requests in the air at the same time, we're having issues with readahead.. Some regular numbers are 7x13MB/s being read from disk according to 'iostat -xnzm 5' and 'zpool iostat -v 5', and maybe 5MB/s is being sent back over the network.. This means that about 20x more is read from disk than actually being used. When testing single streams, the readahead helps and data isn't thrown away.. but when a bazilion nfs requests come at once, too much is being read by zfs compared to what was actually requested/being delivered. I saw some stuff about zfs_prefetch_disable in current (unreleased) code, will this help us perhaps? I've read about two layers of prefetch, one per vdev and one per disk.. Since the current working set is about 1.2TB, 1GB memory in the server and lots of one-shot file requests nature, we'd like to disable as much readahead and data cache as possible (since the chance of a positive data cache hit is very low).. Keeping dnlc stuff in memory would help though. Some URLs: zfs_prefetch_disable being integrated: http://dlc.sun.com/osol/on/downloads/current/on-changelog-20061103.html zfs_prefetch_disable itself http://src.opensolaris.org/source/search?q=zfs_prefetch_disabledefs=refs=path=hist= Soft Track Buffer / Prefetch: http://blogs.sun.com/roch/entry/the_dynamics_of_zfs As far as I've been able to tell using mdb, this is already lowered in b48? http://blogs.sun.com/roch/entry/tuning_the_knobs Suggestions, ideas etc? /Tomas -- Tomas Ögren, [EMAIL PROTECTED], http://www.acc.umu.se/~stric/ |- Student at Computing Science, University of Umeå `- Sysadmin at {cs,acc}.umu.se ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Some performance questions with ZFS/NFS/DNLC at snv_48
Tomas Ögren wrote On 11/09/06 09:59,: 1. DNLC-through-ZFS doesn't seem to listen to ncsize. The filesystem currently has ~550k inodes and large portions of it is frequently looked over with rsync (over nfs). mdb said ncsize was about 68k and vmstat -s said we had a hitrate of ~30%, so I set ncsize to 600k and rebooted.. Didn't seem to change much, still seeing hitrates at about the same and manual find(1) doesn't seem to be that cached (according to vmstat and dnlcsnoop.d). When booting, the following message came up, not sure if it matters or not: NOTICE: setting nrnode to max value of 351642 NOTICE: setting nrnode to max value of 235577 Is there a separate ZFS-DNLC knob to adjust for this? Wild guess is that it has its own implementation which is integrated with the rest of the ZFS cache which throws out metadata cache in favour of data cache.. or something.. A more complete and useful set of dnlc statistic can be obtained via kstat -n dnlcstats. As well as soft the limit on dnlc entries (ncsize) the current number of cached entries is also useful: echo ncsize/D | mdb -k echo dnlc_nentries/D | mdb -k nfs does have a maximum nmber of rnodes which is calculated from the memory available. It doesn't look like nrnode_max can be overridden. Having said that I actually think your problem is lack of memory. For each ZFS vnode held by the DNLC it uses a *lot* more memory than say UFS. Consequently it has to purge dnlc entries and I suspect with only 1GB that the ZFS ARC doesn't allow many dnlc entries. I don't know if that number is maintained anywhere, for you to check. Mark? Neil. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Some performance questions with ZFS/NFS/DNLC at snv_48
Neil Perrin wrote: Tomas Ögren wrote On 11/09/06 09:59,: 1. DNLC-through-ZFS doesn't seem to listen to ncsize. The filesystem currently has ~550k inodes and large portions of it is frequently looked over with rsync (over nfs). mdb said ncsize was about 68k and vmstat -s said we had a hitrate of ~30%, so I set ncsize to 600k and rebooted.. Didn't seem to change much, still seeing hitrates at about the same and manual find(1) doesn't seem to be that cached (according to vmstat and dnlcsnoop.d). When booting, the following message came up, not sure if it matters or not: NOTICE: setting nrnode to max value of 351642 NOTICE: setting nrnode to max value of 235577 Is there a separate ZFS-DNLC knob to adjust for this? Wild guess is that it has its own implementation which is integrated with the rest of the ZFS cache which throws out metadata cache in favour of data cache.. or something.. A more complete and useful set of dnlc statistic can be obtained via kstat -n dnlcstats. As well as soft the limit on dnlc entries (ncsize) the current number of cached entries is also useful: echo ncsize/D | mdb -k echo dnlc_nentries/D | mdb -k nfs does have a maximum nmber of rnodes which is calculated from the memory available. It doesn't look like nrnode_max can be overridden. Having said that I actually think your problem is lack of memory. For each ZFS vnode held by the DNLC it uses a *lot* more memory than say UFS. Consequently it has to purge dnlc entries and I suspect with only 1GB that the ZFS ARC doesn't allow many dnlc entries. I don't know if that number is maintained anywhere, for you to check. Mark? Neil. If the ARC detects low memory (via arc_reclaim_needed()), then we call arc_kmem_reap_now() and subsequently dnlc_reduce_cache() - which reduces the # of dnlc entries by 3% (ARC_REDUCE_DNLC_PERCENT). So yeah, dnlc_nentries would be really interesting to see (especially if its ncsize). eric ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Some performance questions with ZFS/NFS/DNLC at snv_48
On 09 November, 2006 - Neil Perrin sent me these 1,6K bytes: Tomas Ögren wrote On 11/09/06 09:59,: 1. DNLC-through-ZFS doesn't seem to listen to ncsize. The filesystem currently has ~550k inodes and large portions of it is frequently looked over with rsync (over nfs). mdb said ncsize was about 68k and vmstat -s said we had a hitrate of ~30%, so I set ncsize to 600k and rebooted.. Didn't seem to change much, still seeing hitrates at about the same and manual find(1) doesn't seem to be that cached (according to vmstat and dnlcsnoop.d). When booting, the following message came up, not sure if it matters or not: NOTICE: setting nrnode to max value of 351642 NOTICE: setting nrnode to max value of 235577 Is there a separate ZFS-DNLC knob to adjust for this? Wild guess is that it has its own implementation which is integrated with the rest of the ZFS cache which throws out metadata cache in favour of data cache.. or something.. A more complete and useful set of dnlc statistic can be obtained via kstat -n dnlcstats. As well as soft the limit on dnlc entries (ncsize) the current number of cached entries is also useful: This is after ~28h uptime: module: unixinstance: 0 name: dnlcstats class:misc crtime 47.5600948 dir_add_abort 0 dir_add_max 0 dir_add_no_memory 0 dir_cached_current 4 dir_cached_total107 dir_entries_cached_current 4321 dir_fini_purge 0 dir_hits11000 dir_misses 172814 dir_reclaim_any 25 dir_reclaim_last16 dir_remove_entry_fail 0 dir_remove_space_fail 0 dir_start_no_memory 0 dir_update_fail 0 double_enters 234918 enters 59193543 hits36690843 misses 59384436 negative_cache_hits 1366345 pick_free 0 pick_heuristic 57069023 pick_last 2035111 purge_all 1 purge_fs1 0 purge_total_entries 3748 purge_vfs 187 purge_vp95 snaptime99177.711093 vmstat -s: 96080561 total name lookups (cache hits 38%) echo ncsize/D | mdb -k echo dnlc_nentries/D | mdb -k ncsize: 60 dnlc_nentries: 19230 Not quite the same.. nfs does have a maximum nmber of rnodes which is calculated from the memory available. It doesn't look like nrnode_max can be overridden. rnode seems to take 472 bytes according to my test program.. which is a bit more than the 64 bytes per dnlc entry in ncsize docs.. Having said that I actually think your problem is lack of memory. For each ZFS vnode held by the DNLC it uses a *lot* more memory than say UFS. Consequently it has to purge dnlc entries and I suspect with only 1GB that the ZFS ARC doesn't allow many dnlc entries. I don't know if that number is maintained anywhere, for you to check. Mark? Current memory usage (for some values of usage ;): # echo ::memstat|mdb -k Page SummaryPagesMB %Tot Kernel 95584 746 75% Anon20868 163 16% Exec and libs1703131% Page cache 1007 71% Free (cachelist) 97 00% Free (freelist) 7745606% Total 127004 992 Physical 125192 978 /Tomas -- Tomas Ögren, [EMAIL PROTECTED], http://www.acc.umu.se/~stric/ |- Student at Computing Science, University of Umeå `- Sysadmin at {cs,acc}.umu.se ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Some performance questions with ZFS/NFS/DNLC at snv_48
eric kustarz wrote: If the ARC detects low memory (via arc_reclaim_needed()), then we call arc_kmem_reap_now() and subsequently dnlc_reduce_cache() - which reduces the # of dnlc entries by 3% (ARC_REDUCE_DNLC_PERCENT). So yeah, dnlc_nentries would be really interesting to see (especially if its ncsize). The version of statit that we're using is still attached to ancient 32-bit counters that /are/ overflowing on our runs. I'm fixing this at the moment and I'll send around a new binary this afternoon. blw ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Some performance questions with ZFS/NFS/DNLC at snv_48
On 09 November, 2006 - Tomas Ögren sent me these 4,4K bytes: On 09 November, 2006 - Neil Perrin sent me these 1,6K bytes: nfs does have a maximum nmber of rnodes which is calculated from the memory available. It doesn't look like nrnode_max can be overridden. rnode seems to take 472 bytes according to my test program.. which is a bit more than the 64 bytes per dnlc entry in ncsize docs.. But wait a minute.. I'm not interested in being an NFS client.. this is a server.. so wasting ~100MB on nfs client stuff that will never be used isn't that great.. setting to something really low and rebooting now.. /Tomas -- Tomas Ögren, [EMAIL PROTECTED], http://www.acc.umu.se/~stric/ |- Student at Computing Science, University of Umeå `- Sysadmin at {cs,acc}.umu.se ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Some performance questions with ZFS/NFS/DNLC at snv_48
Brian Wong wrote: eric kustarz wrote: If the ARC detects low memory (via arc_reclaim_needed()), then we call arc_kmem_reap_now() and subsequently dnlc_reduce_cache() - which reduces the # of dnlc entries by 3% (ARC_REDUCE_DNLC_PERCENT). So yeah, dnlc_nentries would be really interesting to see (especially if its ncsize). The version of statit that we're using is still attached to ancient 32-bit counters that /are/ overflowing on our runs. I'm fixing this at the moment and I'll send around a new binary this afternoon. blw Me and Spencer just fixed some statit bugs (such as getting it to not core on a thumper)... he has the changes, so i'd sync up with him (i'm not sure if they are they same bugs though). eric ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re[2]: [zfs-discuss] Some performance questions with ZFS/NFS/DNLC at snv_48
Hello Tomas, Thursday, November 9, 2006, 9:47:17 PM, you wrote: TÖ On 09 November, 2006 - Neil Perrin sent me these 1,6K bytes: TÖ Current memory usage (for some values of usage ;): TÖ # echo ::memstat|mdb -k TÖ Page SummaryPagesMB %Tot TÖ TÖ Kernel 95584 746 75% TÖ Anon20868 163 16% TÖ Exec and libs1703131% TÖ Page cache 1007 71% TÖ Free (cachelist) 97 00% TÖ Free (freelist) 7745606% TÖ Total 127004 992 TÖ Physical 125192 978 Well, when I rised ncsize on nfs server I got memory pressure problem. Leaving ncsize at default solved problem. -- Best regards, Robertmailto:[EMAIL PROTECTED] http://milek.blogspot.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Some performance questions with ZFS/NFS/DNLC at snv_48
Tomas Ögren wrote On 11/09/06 13:47,: On 09 November, 2006 - Neil Perrin sent me these 1,6K bytes: Tomas Ögren wrote On 11/09/06 09:59,: 1. DNLC-through-ZFS doesn't seem to listen to ncsize. The filesystem currently has ~550k inodes and large portions of it is frequently looked over with rsync (over nfs). mdb said ncsize was about 68k and vmstat -s said we had a hitrate of ~30%, so I set ncsize to 600k and rebooted.. Didn't seem to change much, still seeing hitrates at about the same and manual find(1) doesn't seem to be that cached (according to vmstat and dnlcsnoop.d). When booting, the following message came up, not sure if it matters or not: NOTICE: setting nrnode to max value of 351642 NOTICE: setting nrnode to max value of 235577 Is there a separate ZFS-DNLC knob to adjust for this? Wild guess is that it has its own implementation which is integrated with the rest of the ZFS cache which throws out metadata cache in favour of data cache.. or something.. A more complete and useful set of dnlc statistic can be obtained via kstat -n dnlcstats. As well as soft the limit on dnlc entries (ncsize) the current number of cached entries is also useful: This is after ~28h uptime: module: unixinstance: 0 name: dnlcstats class:misc crtime 47.5600948 dir_add_abort 0 dir_add_max 0 dir_add_no_memory 0 dir_cached_current 4 dir_cached_total107 dir_entries_cached_current 4321 dir_fini_purge 0 dir_hits11000 dir_misses 172814 dir_reclaim_any 25 dir_reclaim_last16 dir_remove_entry_fail 0 dir_remove_space_fail 0 dir_start_no_memory 0 dir_update_fail 0 double_enters 234918 enters 59193543 hits36690843 misses 59384436 negative_cache_hits 1366345 pick_free 0 pick_heuristic 57069023 pick_last 2035111 purge_all 1 purge_fs1 0 purge_total_entries 3748 purge_vfs 187 purge_vp95 snaptime99177.711093 vmstat -s: 96080561 total name lookups (cache hits 38%) echo ncsize/D | mdb -k echo dnlc_nentries/D | mdb -k ncsize: 60 dnlc_nentries: 19230 Not quite the same.. Having said that I actually think your problem is lack of memory. For each ZFS vnode held by the DNLC it uses a *lot* more memory than say UFS. Consequently it has to purge dnlc entries and I suspect with only 1GB that the ZFS ARC doesn't allow many dnlc entries. I don't know if that number is maintained anywhere, for you to check. Mark? Current memory usage (for some values of usage ;): # echo ::memstat|mdb -k Page SummaryPagesMB %Tot Kernel 95584 746 75% Anon20868 163 16% Exec and libs1703131% Page cache 1007 71% Free (cachelist) 97 00% Free (freelist) 7745606% Total 127004 992 Physical 125192 978 /Tomas This memory usage shows nearly all of memory consumed by the kernel and probably by ZFS. ZFS can't add any more DNLC entries due to lack of memory without purging others. This can be seen from the number of dnlc_nentries being way less than ncsize. I don't know if there's a DMU or ARC bug to reduce the memory footprint of their internal structures for situations like this, but we are aware of the issue. Neil. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss