Re: [zfs-discuss] Some performance questions with ZFS/NFS/DNLC at snv_48

2006-11-15 Thread Sanjeev Bagewadi

Tomas,

Apologies for delayed response...

Tomas Ögren wrote:


Interesting ! So, it is not the ARC which is consuming too much memory
It is some other piece (not sure if it belongs to ZFS) which is causing 
the crunch...


Or the other possibility is that ARC ate up too much and caused a near 
crunch situation
and the kmem hit back and caused ARC to free up it's buffers (hence the 
no_grow flag enabled).
So, it (ARC) could be osscillating between large caching and then 
purging the caches.


You might want to keep track of these values (ARC size and no_grow flag) 
and see how they

change over a period of time. This would help us understand the pattern.
   



I would guess it grows after boot until it hits some max and then stays
there.. but I can check it out..
 

No, that is not true. Its shrinks when there is memory pressure. The 
values of 'c' and 'p' are

adjusted accordingly.

And if we know it ARC which is causing the crunch we could manually 
change the values of
c_max to a comfortable value and that would limit the size of ARC. 
   



But in the ZFS world, DNLC is part of the ARC, right?
 

Not really... ZFS uses the regular DNLC for lookup optimization. 
However, the metadata/data

is cached in the ARC.


My original question was how to get rid of data cache, but keep
metadata cache (such as DNLC)...
 

This is good question. AFAIK ARC does not really differentiate between 
metadata and data.
So, I am not sure if we can control it. However, as I mentioned above 
ZFS still uses the DNLC caching.


 


However, I would suggest
that you try it out on a non-production machine first.

By, default the c_max is set to 75% of physmem and that is the hard 
limit. c is the soft limit and
ARC would try and grow upto 'c. The value of c is adjusted when there 
is a need to cache more

but, it will never exceed c_max.

Regarding the huge number of reads, I am sure you have already tried 
disabling the VDEV prefetch.

If not, it is worth a try.
   



That was part of my original question, how? :)
 

Apologies :-) I was digging around the code and I find that 
zfs_vdev_cache_bshift is the one which would
control the amount that is read. Currenty it is set to 16. So, we should 
be able to modify this and reduce

the prefetch.

However, I will have to double check with more people and get back to you.

Thanks and regards,
Sanjeev.


/Tomas
 




--
Solaris Revenue Products Engineering,
India Engineering Center,
Sun Microsystems India Pvt Ltd.
Tel:x27521 +91 80 669 27521 


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Some performance questions with ZFS/NFS/DNLC at snv_48

2006-11-14 Thread Tomas Ögren
On 13 November, 2006 - Eric Kustarz sent me these 2,4K bytes:

 Tomas Ögren wrote:
 On 13 November, 2006 - Sanjeev Bagewadi sent me these 7,1K bytes:
 Regarding the huge number of reads, I am sure you have already tried 
 disabling the VDEV prefetch.
 If not, it is worth a try.
 That was part of my original question, how? :)
 
 On recent bits, you can set 'zfs_vdev_cache_max' to 1 to disable the 
 vdev cache.

On earlier versions (snv_48), I did similar with ztune.sh[0], adding
cache_size which I set to 0 (instead of 10M).

This helped quite a lot, but there seems to be one more level of
prefetching..

Example:
   capacity operationsbandwidth
pool used  avail   read  write   read  write
--  -  -  -  -  -  -
ftp 1.67T  2.15T  1.26K 23  40.9M   890K
  raidz21.37T   551G674 10  22.3M   399K
c4t0d0  -  -210  3  3.19M  80.4K
c4t1d0  -  -211  3  3.19M  80.4K
c4t2d0  -  -211  3  3.19M  80.4K
c5t0d0  -  -210  3  3.19M  80.4K
c5t1d0  -  -242  4  3.19M  80.4K
c5t2d0  -  -211  3  3.19M  80.4K
c5t3d0  -  -211  3  3.19M  80.4K
  raidz2 305G  1.61T614 12  18.6M   491K
c4t3d0  -  -222  5  2.66M  99.1K
c4t4d0  -  -223  5  2.66M  99.1K
c4t5d0  -  -224  5  2.66M  99.1K
c4t8d0  -  -190  5  2.66M  99.1K
c5t4d0  -  -190  5  2.66M  99.1K
c5t5d0  -  -226  5  2.66M  99.1K
c5t8d0  -  -225  5  2.66M  99.1K
--  -  -  -  -  -  -

Before this fix, the 'read bandwidth' for disks in the first raidz2
added up to way more than the raidz2 itself.. now it adds up correctly,
but some other readahead causes a 1-10x factor too much, mostly hovering
around 2-3x.. before it was hovering around 8-10x..

[0]:
http://blogs.sun.com/roch/resource/ztune.sh

/Tomas
-- 
Tomas Ögren, [EMAIL PROTECTED], http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Some performance questions with ZFS/NFS/DNLC at snv_48

2006-11-13 Thread Tomas Ögren
On 13 November, 2006 - Sanjeev Bagewadi sent me these 7,1K bytes:

 Tomas,
 
 comments inline...
 
 
 arc::print struct arc   

 
 {
anon = ARC_anon
mru = ARC_mru
mru_ghost = ARC_mru_ghost
mfu = ARC_mfu
mfu_ghost = ARC_mfu_ghost
size = 0x6f7a400
p = 0x5d9bd5a
c = 0x5f6375a
c_min = 0x400
c_max = 0x2e82a000
hits = 0x40e0a15
misses = 0x1cec4a4
deleted = 0x1b0ba0d
skipped = 0x24ea64e13
hash_elements = 0x179d
hash_elements_max = 0x60bb
hash_collisions = 0x8dca3a
hash_chains = 0x391
hash_chain_max = 0x8
no_grow = 0x1
 }
 
 So, about 100MB and a memory crunch..
  
 
 Interesting ! So, it is not the ARC which is consuming too much memory
 It is some other piece (not sure if it belongs to ZFS) which is causing 
 the crunch...
 
 Or the other possibility is that ARC ate up too much and caused a near 
 crunch situation
 and the kmem hit back and caused ARC to free up it's buffers (hence the 
 no_grow flag enabled).
 So, it (ARC) could be osscillating between large caching and then 
 purging the caches.
 
 You might want to keep track of these values (ARC size and no_grow flag) 
 and see how they
 change over a period of time. This would help us understand the pattern.

I would guess it grows after boot until it hits some max and then stays
there.. but I can check it out..

 And if we know it ARC which is causing the crunch we could manually 
 change the values of
 c_max to a comfortable value and that would limit the size of ARC. 

But in the ZFS world, DNLC is part of the ARC, right?
My original question was how to get rid of data cache, but keep
metadata cache (such as DNLC)...

 However, I would suggest
 that you try it out on a non-production machine first.
 
 By, default the c_max is set to 75% of physmem and that is the hard 
 limit. c is the soft limit and
 ARC would try and grow upto 'c. The value of c is adjusted when there 
 is a need to cache more
 but, it will never exceed c_max.
 
 Regarding the huge number of reads, I am sure you have already tried 
 disabling the VDEV prefetch.
 If not, it is worth a try.

That was part of my original question, how? :)

/Tomas
-- 
Tomas Ögren, [EMAIL PROTECTED], http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Some performance questions with ZFS/NFS/DNLC at snv_48

2006-11-13 Thread Roch - PAE


Tomas Ögren writes:
  On 13 November, 2006 - Sanjeev Bagewadi sent me these 7,1K bytes:
  
   Tomas,
   
   comments inline...
   
   
   arc::print struct arc   
  
   
   {
  anon = ARC_anon
  mru = ARC_mru
  mru_ghost = ARC_mru_ghost
  mfu = ARC_mfu
  mfu_ghost = ARC_mfu_ghost
  size = 0x6f7a400
  p = 0x5d9bd5a
  c = 0x5f6375a
  c_min = 0x400
  c_max = 0x2e82a000
  hits = 0x40e0a15
  misses = 0x1cec4a4
  deleted = 0x1b0ba0d
  skipped = 0x24ea64e13
  hash_elements = 0x179d
  hash_elements_max = 0x60bb
  hash_collisions = 0x8dca3a
  hash_chains = 0x391
  hash_chain_max = 0x8
  no_grow = 0x1
   }
   
   So, about 100MB and a memory crunch..

   
   Interesting ! So, it is not the ARC which is consuming too much memory
   It is some other piece (not sure if it belongs to ZFS) which is causing 
   the crunch...
   
   Or the other possibility is that ARC ate up too much and caused a near 
   crunch situation
   and the kmem hit back and caused ARC to free up it's buffers (hence the 
   no_grow flag enabled).
   So, it (ARC) could be osscillating between large caching and then 
   purging the caches.
   
   You might want to keep track of these values (ARC size and no_grow flag) 
   and see how they
   change over a period of time. This would help us understand the pattern.
  
  I would guess it grows after boot until it hits some max and then stays
  there.. but I can check it out..
  
   And if we know it ARC which is causing the crunch we could manually 
   change the values of
   c_max to a comfortable value and that would limit the size of ARC. 
  
  But in the ZFS world, DNLC is part of the ARC, right?
  My original question was how to get rid of data cache, but keep
  metadata cache (such as DNLC)...
  
   However, I would suggest
   that you try it out on a non-production machine first.
   
   By, default the c_max is set to 75% of physmem and that is the hard 
   limit. c is the soft limit and
   ARC would try and grow upto 'c. The value of c is adjusted when there 
   is a need to cache more
   but, it will never exceed c_max.
   
   Regarding the huge number of reads, I am sure you have already tried 
   disabling the VDEV prefetch.
   If not, it is worth a try.
  
  That was part of my original question, how? :)
  
  /Tomas
  -- 
  Tomas Ögren, [EMAIL PROTECTED], http://www.acc.umu.se/~stric/
  |- Student at Computing Science, University of Umeå
  `- Sysadmin at {cs,acc}.umu.se
  ___
  zfs-discuss mailing list
  zfs-discuss@opensolaris.org
  http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Under memory pressure the arc will shrink and it will also
shrink the dnlc by 3%.

arc_reduce_dnlc_percent = 3

You could try to tune that number.

-r

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Some performance questions with ZFS/NFS/DNLC at snv_48

2006-11-13 Thread Eric Kustarz

Tomas Ögren wrote:

On 13 November, 2006 - Sanjeev Bagewadi sent me these 7,1K bytes:



Tomas,

comments inline...



arc::print struct arc   
 



{
 anon = ARC_anon
 mru = ARC_mru
 mru_ghost = ARC_mru_ghost
 mfu = ARC_mfu
 mfu_ghost = ARC_mfu_ghost
 size = 0x6f7a400
 p = 0x5d9bd5a
 c = 0x5f6375a
 c_min = 0x400
 c_max = 0x2e82a000
 hits = 0x40e0a15
 misses = 0x1cec4a4
 deleted = 0x1b0ba0d
 skipped = 0x24ea64e13
 hash_elements = 0x179d
 hash_elements_max = 0x60bb
 hash_collisions = 0x8dca3a
 hash_chains = 0x391
 hash_chain_max = 0x8
 no_grow = 0x1
}

So, about 100MB and a memory crunch..




Interesting ! So, it is not the ARC which is consuming too much memory
It is some other piece (not sure if it belongs to ZFS) which is causing 
the crunch...


Or the other possibility is that ARC ate up too much and caused a near 
crunch situation
and the kmem hit back and caused ARC to free up it's buffers (hence the 
no_grow flag enabled).
So, it (ARC) could be osscillating between large caching and then 
purging the caches.


You might want to keep track of these values (ARC size and no_grow flag) 
and see how they

change over a period of time. This would help us understand the pattern.



I would guess it grows after boot until it hits some max and then stays
there.. but I can check it out..


And if we know it ARC which is causing the crunch we could manually 
change the values of
c_max to a comfortable value and that would limit the size of ARC. 



But in the ZFS world, DNLC is part of the ARC, right?
My original question was how to get rid of data cache, but keep
metadata cache (such as DNLC)...



However, I would suggest
that you try it out on a non-production machine first.

By, default the c_max is set to 75% of physmem and that is the hard 
limit. c is the soft limit and
ARC would try and grow upto 'c. The value of c is adjusted when there 
is a need to cache more

but, it will never exceed c_max.

Regarding the huge number of reads, I am sure you have already tried 
disabling the VDEV prefetch.

If not, it is worth a try.



That was part of my original question, how? :)

/Tomas


On recent bits, you can set 'zfs_vdev_cache_max' to 1 to disable the 
vdev cache.


eric

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Some performance questions with ZFS/NFS/DNLC at snv_48

2006-11-12 Thread Sanjeev Bagewadi

Tomas,

comments inline...


Tomas Ögren wrote:


On 10 November, 2006 - Sanjeev Bagewadi sent me these 3,5K bytes:

 


1. DNLC-through-ZFS doesn't seem to listen to ncsize.

The filesystem currently has ~550k inodes and large portions of it is
frequently looked over with rsync (over nfs). mdb said ncsize was 
about

68k and vmstat -s  said we had a hitrate of ~30%, so I set ncsize to
600k and rebooted.. Didn't seem to change much, still seeing 
hitrates at

about the same and manual find(1) doesn't seem to be that cached
(according to vmstat and dnlcsnoop.d).
When booting, the following message came up, not sure if it matters 
or not:

NOTICE: setting nrnode to max value of 351642
NOTICE: setting nrnode to max value of 235577

Is there a separate ZFS-DNLC knob to adjust for this? Wild guess is 
that

it has its own implementation which is integrated with the rest of the
ZFS cache which throws out metadata cache in favour of data cache.. or
something.. 
   


Current memory usage (for some values of usage ;):
# echo ::memstat|mdb -k
Page SummaryPagesMB  %Tot
     
Kernel  95584   746   75%
Anon20868   163   16%
Exec and libs1703131%
Page cache   1007 71%
Free (cachelist)   97 00%
Free (freelist)  7745606%

Total  127004   992
Physical   125192   978


/Tomas


   


This memory usage shows nearly all of memory consumed by the kernel
and probably by ZFS.  ZFS can't add any more DNLC entries due to lack of
memory without purging others. This can be seen from  the number of
dnlc_nentries being way less than ncsize.
I don't know if there's a DMU or ARC bug to reduce the memory footprint
of their internal structures for situations like this, but we are 
aware of the

issue.
 


Can you please check the zio buffers and the arc status ?

Here is how you can do it :
- Start mdb : ie. mdb -k

   


::kmem_cache
 

- In the output generated above check the amount consumed by the 
zio_buf_*, arc_buf_t and

arc_buf_hdr_t.
   



ADDR NAME  FLAG  CFLAG  BUFSIZE  BUFTOTL

030002640a08 zio_buf_512    02  512   102675
030002640c88 zio_buf_1024  0200 02 1024   48
030002640f08 zio_buf_1536  0200 02 1536   70
030002641188 zio_buf_2048  0200 02 2048   16
030002641408 zio_buf_2560  0200 02 25609
030002641688 zio_buf_3072  0200 02 3072   16
030002641908 zio_buf_3584  0200 02 3584   18
030002641b88 zio_buf_4096  0200 02 4096   12
030002668008 zio_buf_5120  0200 02 5120   32
030002668288 zio_buf_6144  0200 02 61448
030002668508 zio_buf_7168  0200 02 7168 1032
030002668788 zio_buf_8192  0200 02 81928
030002668a08 zio_buf_10240 0200 02102408
030002668c88 zio_buf_12288 0200 02122884
030002668f08 zio_buf_14336 0200 0214336  468
030002669188 zio_buf_16384 0200 0216384 3326
030002669408 zio_buf_20480 0200 0220480   16
030002669688 zio_buf_24576 0200 02245763
030002669908 zio_buf_28672 0200 0228672   12
030002669b88 zio_buf_32768 0200 0232768 1935
03000266c008 zio_buf_40960 0200 0240960   13
03000266c288 zio_buf_49152 0200 02491529
03000266c508 zio_buf_57344 0200 02573447
03000266c788 zio_buf_65536 0200 0265536 3272
03000266ca08 zio_buf_73728 0200 0273728   10
03000266cc88 zio_buf_81920 0200 02819207
03000266cf08 zio_buf_90112 0200 02901125
03000266d188 zio_buf_98304 0200 02983047
03000266d408 zio_buf_1064960200 02   106496   12
03000266d688 zio_buf_1146880200 02   1146886
03000266d908 zio_buf_1228800200 02   1228805
03000266db88 zio_buf_1310720200 02   131072   92

030002670508 arc_buf_hdr_t  00  12811970
030002670788 arc_buf_t  00   40 7308

 


- Dump the values of arc

   


arc::print struct arc
 



 

arc::print struct arc   
   


{
   

Re: [zfs-discuss] Some performance questions with ZFS/NFS/DNLC at snv_48

2006-11-10 Thread Sanjeev Bagewadi

Comments in line...

Neil Perrin wrote:


1. DNLC-through-ZFS doesn't seem to listen to ncsize.

The filesystem currently has ~550k inodes and large portions of it is
frequently looked over with rsync (over nfs). mdb said ncsize was 
about

68k and vmstat -s  said we had a hitrate of ~30%, so I set ncsize to
600k and rebooted.. Didn't seem to change much, still seeing 
hitrates at

about the same and manual find(1) doesn't seem to be that cached
(according to vmstat and dnlcsnoop.d).
When booting, the following message came up, not sure if it matters 
or not:

NOTICE: setting nrnode to max value of 351642
NOTICE: setting nrnode to max value of 235577

Is there a separate ZFS-DNLC knob to adjust for this? Wild guess is 
that

it has its own implementation which is integrated with the rest of the
ZFS cache which throws out metadata cache in favour of data cache.. or
something.. 



Current memory usage (for some values of usage ;):
# echo ::memstat|mdb -k
Page SummaryPagesMB  %Tot
     
Kernel  95584   746   75%
Anon20868   163   16%
Exec and libs1703131%
Page cache   1007 71%
Free (cachelist)   97 00%
Free (freelist)  7745606%

Total  127004   992
Physical   125192   978


/Tomas
 


This memory usage shows nearly all of memory consumed by the kernel
and probably by ZFS.  ZFS can't add any more DNLC entries due to lack of
memory without purging others. This can be seen from  the number of
dnlc_nentries being way less than ncsize.
I don't know if there's a DMU or ARC bug to reduce the memory footprint
of their internal structures for situations like this, but we are 
aware of the

issue.


Can you please check the zio buffers and the arc status ?

Here is how you can do it :
- Start mdb : ie. mdb -k

 ::kmem_cache

- In the output generated above check the amount consumed by the 
zio_buf_*, arc_buf_t and

 arc_buf_hdr_t.

- Dump the values of arc

 arc::print struct arc

- This should give you some like below.
-- snip--
 arc::print struct arc
{
   anon = ARC_anon
   mru = ARC_mru
   mru_ghost = ARC_mru_ghost
   mfu = ARC_mfu
   mfu_ghost = ARC_mfu_ghost
   size = 0x3e2   -- tells you the current memory 
consumed by ARC buffer (including the
   the memory consumed 
for the data cached ie. zio_buff_*

   p = 0x1d06a06
   c = 0x400
   c_min = 0x400
   c_max = 0x2f9aa800
   hits = 0x2fd2
   misses = 0xd1c
   deleted = 0x296
   skipped = 0
   hash_elements = 0xa85
   hash_elements_max = 0xcc0
   hash_collisions = 0x173
   hash_chains = 0xbe
   hash_chain_max = 0x2
   no_grow = 0   -- This would be set to 1 if we have a 
memory crunch

}
-- snip --

And as Niel pointed out we would probably need some way of limiting the 
ARC consumption.


Regards,
Sanjeev.



Neil.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss




--
Solaris Revenue Products Engineering,
India Engineering Center,
Sun Microsystems India Pvt Ltd.
Tel:x27521 +91 80 669 27521 


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Some performance questions with ZFS/NFS/DNLC at snv_48

2006-11-10 Thread Tomas Ögren
On 10 November, 2006 - Sanjeev Bagewadi sent me these 3,5K bytes:

 Comments in line...
 
 Neil Perrin wrote:
 
 1. DNLC-through-ZFS doesn't seem to listen to ncsize.
 
 The filesystem currently has ~550k inodes and large portions of it is
 frequently looked over with rsync (over nfs). mdb said ncsize was 
 about
 68k and vmstat -s  said we had a hitrate of ~30%, so I set ncsize to
 600k and rebooted.. Didn't seem to change much, still seeing 
 hitrates at
 about the same and manual find(1) doesn't seem to be that cached
 (according to vmstat and dnlcsnoop.d).
 When booting, the following message came up, not sure if it matters 
 or not:
 NOTICE: setting nrnode to max value of 351642
 NOTICE: setting nrnode to max value of 235577
 
 Is there a separate ZFS-DNLC knob to adjust for this? Wild guess is 
 that
 it has its own implementation which is integrated with the rest of the
 ZFS cache which throws out metadata cache in favour of data cache.. or
 something.. 
 
 Current memory usage (for some values of usage ;):
 # echo ::memstat|mdb -k
 Page SummaryPagesMB  %Tot
      
 Kernel  95584   746   75%
 Anon20868   163   16%
 Exec and libs1703131%
 Page cache   1007 71%
 Free (cachelist)   97 00%
 Free (freelist)  7745606%
 
 Total  127004   992
 Physical   125192   978
 
 
 /Tomas
  
 
 This memory usage shows nearly all of memory consumed by the kernel
 and probably by ZFS.  ZFS can't add any more DNLC entries due to lack of
 memory without purging others. This can be seen from  the number of
 dnlc_nentries being way less than ncsize.
 I don't know if there's a DMU or ARC bug to reduce the memory footprint
 of their internal structures for situations like this, but we are 
 aware of the
 issue.
 
 Can you please check the zio buffers and the arc status ?
 
 Here is how you can do it :
 - Start mdb : ie. mdb -k
 
  ::kmem_cache
 
 - In the output generated above check the amount consumed by the 
 zio_buf_*, arc_buf_t and
  arc_buf_hdr_t.

ADDR NAME  FLAG  CFLAG  BUFSIZE  BUFTOTL

030002640a08 zio_buf_512    02  512   102675
030002640c88 zio_buf_1024  0200 02 1024   48
030002640f08 zio_buf_1536  0200 02 1536   70
030002641188 zio_buf_2048  0200 02 2048   16
030002641408 zio_buf_2560  0200 02 25609
030002641688 zio_buf_3072  0200 02 3072   16
030002641908 zio_buf_3584  0200 02 3584   18
030002641b88 zio_buf_4096  0200 02 4096   12
030002668008 zio_buf_5120  0200 02 5120   32
030002668288 zio_buf_6144  0200 02 61448
030002668508 zio_buf_7168  0200 02 7168 1032
030002668788 zio_buf_8192  0200 02 81928
030002668a08 zio_buf_10240 0200 02102408
030002668c88 zio_buf_12288 0200 02122884
030002668f08 zio_buf_14336 0200 0214336  468
030002669188 zio_buf_16384 0200 0216384 3326
030002669408 zio_buf_20480 0200 0220480   16
030002669688 zio_buf_24576 0200 02245763
030002669908 zio_buf_28672 0200 0228672   12
030002669b88 zio_buf_32768 0200 0232768 1935
03000266c008 zio_buf_40960 0200 0240960   13
03000266c288 zio_buf_49152 0200 02491529
03000266c508 zio_buf_57344 0200 02573447
03000266c788 zio_buf_65536 0200 0265536 3272
03000266ca08 zio_buf_73728 0200 0273728   10
03000266cc88 zio_buf_81920 0200 02819207
03000266cf08 zio_buf_90112 0200 02901125
03000266d188 zio_buf_98304 0200 02983047
03000266d408 zio_buf_1064960200 02   106496   12
03000266d688 zio_buf_1146880200 02   1146886
03000266d908 zio_buf_1228800200 02   1228805
03000266db88 zio_buf_1310720200 02   131072   92

030002670508 arc_buf_hdr_t  00  12811970
030002670788 arc_buf_t  00   40 7308

 - Dump the values of arc
 
  arc::print struct arc

 arc::print struct arc   
{
anon = ARC_anon
mru = 

[zfs-discuss] Some performance questions with ZFS/NFS/DNLC at snv_48

2006-11-09 Thread Tomas Ögren
Hello.

We're currently using a Sun Blade1000 (2x750MHz, 1G ram, 2x160MB/s mpt
scsi buses, skge GigE network) as a NFS backend with ZFS for
distribution of free software like Debian (cdimage.debian.org,
ftp.se.debian.org) and have run into some performance issues.

We are running SX snv_48 and have run with a raidz2 with 7x300G for a
while now, just added another 7x300G raidz2 today but I'll stick to old
information so far. Tried Sol10u2 before, but nfs writes killed every
bit of performance, snv_48 works much better in that regard.

Working data set is about 1.2TB over ~550k inodes right now. Backend
serves data to 2-4 linux frontends running Apache (with local raid0
mod_disk_cache), rsync (looking through entire debian trees every now
and then) and vsftp (not used much).

There are (at least?) two types of performance issues we've run into..

1. DNLC-through-ZFS doesn't seem to listen to ncsize.

The filesystem currently has ~550k inodes and large portions of it is
frequently looked over with rsync (over nfs). mdb said ncsize was about
68k and vmstat -s  said we had a hitrate of ~30%, so I set ncsize to
600k and rebooted.. Didn't seem to change much, still seeing hitrates at
about the same and manual find(1) doesn't seem to be that cached
(according to vmstat and dnlcsnoop.d).
When booting, the following message came up, not sure if it matters or not:
NOTICE: setting nrnode to max value of 351642
NOTICE: setting nrnode to max value of 235577

Is there a separate ZFS-DNLC knob to adjust for this? Wild guess is that
it has its own implementation which is integrated with the rest of the
ZFS cache which throws out metadata cache in favour of data cache.. or
something..


2. Readahead or something is killing all signs of performance

Since there can be pretty many requests in the air at the same time,
we're having issues with readahead..
Some regular numbers are 7x13MB/s being read from disk according to
'iostat -xnzm 5' and 'zpool iostat -v 5', and maybe 5MB/s is being sent
back over the network.. This means that about 20x more is read from disk
than actually being used. When testing single streams, the readahead
helps and data isn't thrown away.. but when a bazilion nfs requests come
at once, too much is being read by zfs compared to what was actually
requested/being delivered.

I saw some stuff about zfs_prefetch_disable in current (unreleased)
code, will this help us perhaps? I've read about two layers of prefetch,
one per vdev and one per disk.. Since the current working set is about
1.2TB, 1GB memory in the server and lots of one-shot file requests
nature, we'd like to disable as much readahead and data cache as
possible (since the chance of a positive data cache hit is very low)..
Keeping dnlc stuff in memory would help though.


Some URLs:

zfs_prefetch_disable being integrated:
http://dlc.sun.com/osol/on/downloads/current/on-changelog-20061103.html

zfs_prefetch_disable itself
http://src.opensolaris.org/source/search?q=zfs_prefetch_disabledefs=refs=path=hist=

Soft Track Buffer / Prefetch:
http://blogs.sun.com/roch/entry/the_dynamics_of_zfs

As far as I've been able to tell using mdb, this is already lowered in b48?
http://blogs.sun.com/roch/entry/tuning_the_knobs


Suggestions, ideas etc?


/Tomas
-- 
Tomas Ögren, [EMAIL PROTECTED], http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Some performance questions with ZFS/NFS/DNLC at snv_48

2006-11-09 Thread Neil Perrin



Tomas Ögren wrote On 11/09/06 09:59,:


1. DNLC-through-ZFS doesn't seem to listen to ncsize.

The filesystem currently has ~550k inodes and large portions of it is
frequently looked over with rsync (over nfs). mdb said ncsize was about
68k and vmstat -s  said we had a hitrate of ~30%, so I set ncsize to
600k and rebooted.. Didn't seem to change much, still seeing hitrates at
about the same and manual find(1) doesn't seem to be that cached
(according to vmstat and dnlcsnoop.d).
When booting, the following message came up, not sure if it matters or not:
NOTICE: setting nrnode to max value of 351642
NOTICE: setting nrnode to max value of 235577

Is there a separate ZFS-DNLC knob to adjust for this? Wild guess is that
it has its own implementation which is integrated with the rest of the
ZFS cache which throws out metadata cache in favour of data cache.. or
something..


A more complete and useful set of dnlc statistic can be obtained via
kstat -n dnlcstats. As well as soft the limit on dnlc entries (ncsize)
the current number of cached entries is also useful:

echo ncsize/D | mdb -k
echo dnlc_nentries/D | mdb -k

nfs does have a maximum nmber of rnodes which is calculated from the
memory available. It doesn't look like nrnode_max can be overridden.

Having said that I actually think your problem is lack of memory.
For each ZFS vnode held by the DNLC it uses a *lot* more memory
than say UFS. Consequently it has to purge dnlc entries and I
suspect with only 1GB that the ZFS ARC doesn't allow many dnlc entries.
I don't know if that number is maintained anywhere, for you to check.
Mark?

Neil.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Some performance questions with ZFS/NFS/DNLC at snv_48

2006-11-09 Thread eric kustarz

Neil Perrin wrote:



Tomas Ögren wrote On 11/09/06 09:59,:


1. DNLC-through-ZFS doesn't seem to listen to ncsize.

The filesystem currently has ~550k inodes and large portions of it is
frequently looked over with rsync (over nfs). mdb said ncsize was about
68k and vmstat -s  said we had a hitrate of ~30%, so I set ncsize to
600k and rebooted.. Didn't seem to change much, still seeing hitrates at
about the same and manual find(1) doesn't seem to be that cached
(according to vmstat and dnlcsnoop.d).
When booting, the following message came up, not sure if it matters or 
not:

NOTICE: setting nrnode to max value of 351642
NOTICE: setting nrnode to max value of 235577

Is there a separate ZFS-DNLC knob to adjust for this? Wild guess is that
it has its own implementation which is integrated with the rest of the
ZFS cache which throws out metadata cache in favour of data cache.. or
something..



A more complete and useful set of dnlc statistic can be obtained via
kstat -n dnlcstats. As well as soft the limit on dnlc entries (ncsize)
the current number of cached entries is also useful:

echo ncsize/D | mdb -k
echo dnlc_nentries/D | mdb -k

nfs does have a maximum nmber of rnodes which is calculated from the
memory available. It doesn't look like nrnode_max can be overridden.

Having said that I actually think your problem is lack of memory.
For each ZFS vnode held by the DNLC it uses a *lot* more memory
than say UFS. Consequently it has to purge dnlc entries and I
suspect with only 1GB that the ZFS ARC doesn't allow many dnlc entries.
I don't know if that number is maintained anywhere, for you to check.
Mark?

Neil.


If the ARC detects low memory (via arc_reclaim_needed()), then we call 
arc_kmem_reap_now() and subsequently dnlc_reduce_cache() - which reduces 
the # of dnlc entries by 3% (ARC_REDUCE_DNLC_PERCENT).


So yeah, dnlc_nentries would be really interesting to see (especially if 
its  ncsize).


eric
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Some performance questions with ZFS/NFS/DNLC at snv_48

2006-11-09 Thread Tomas Ögren
On 09 November, 2006 - Neil Perrin sent me these 1,6K bytes:

 
 
 Tomas Ögren wrote On 11/09/06 09:59,:
 
 1. DNLC-through-ZFS doesn't seem to listen to ncsize.
 
 The filesystem currently has ~550k inodes and large portions of it is
 frequently looked over with rsync (over nfs). mdb said ncsize was about
 68k and vmstat -s  said we had a hitrate of ~30%, so I set ncsize to
 600k and rebooted.. Didn't seem to change much, still seeing hitrates at
 about the same and manual find(1) doesn't seem to be that cached
 (according to vmstat and dnlcsnoop.d).
 When booting, the following message came up, not sure if it matters or not:
 NOTICE: setting nrnode to max value of 351642
 NOTICE: setting nrnode to max value of 235577
 
 Is there a separate ZFS-DNLC knob to adjust for this? Wild guess is that
 it has its own implementation which is integrated with the rest of the
 ZFS cache which throws out metadata cache in favour of data cache.. or
 something..
 
 A more complete and useful set of dnlc statistic can be obtained via
 kstat -n dnlcstats. As well as soft the limit on dnlc entries (ncsize)
 the current number of cached entries is also useful:

This is after ~28h uptime:

module: unixinstance: 0
name:   dnlcstats   class:misc
crtime  47.5600948
dir_add_abort   0
dir_add_max 0
dir_add_no_memory   0
dir_cached_current  4
dir_cached_total107
dir_entries_cached_current  4321
dir_fini_purge  0
dir_hits11000
dir_misses  172814
dir_reclaim_any 25
dir_reclaim_last16
dir_remove_entry_fail   0
dir_remove_space_fail   0
dir_start_no_memory 0
dir_update_fail 0
double_enters   234918
enters  59193543
hits36690843
misses  59384436
negative_cache_hits 1366345
pick_free   0
pick_heuristic  57069023
pick_last   2035111
purge_all   1
purge_fs1   0
purge_total_entries 3748
purge_vfs   187
purge_vp95
snaptime99177.711093


vmstat -s:
 96080561 total name lookups (cache hits 38%)

 
 echo ncsize/D | mdb -k
 echo dnlc_nentries/D | mdb -k

ncsize: 60
dnlc_nentries:  19230

Not quite the same..

 nfs does have a maximum nmber of rnodes which is calculated from the
 memory available. It doesn't look like nrnode_max can be overridden.

rnode seems to take 472 bytes according to my test program.. which is a
bit more than the 64 bytes per dnlc entry in ncsize docs..

 Having said that I actually think your problem is lack of memory.
 For each ZFS vnode held by the DNLC it uses a *lot* more memory
 than say UFS. Consequently it has to purge dnlc entries and I
 suspect with only 1GB that the ZFS ARC doesn't allow many dnlc entries.
 I don't know if that number is maintained anywhere, for you to check.
 Mark?

Current memory usage (for some values of usage ;):
# echo ::memstat|mdb -k
Page SummaryPagesMB  %Tot
     
Kernel  95584   746   75%
Anon20868   163   16%
Exec and libs1703131%
Page cache   1007 71%
Free (cachelist)   97 00%
Free (freelist)  7745606%

Total  127004   992
Physical   125192   978


/Tomas
-- 
Tomas Ögren, [EMAIL PROTECTED], http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Some performance questions with ZFS/NFS/DNLC at snv_48

2006-11-09 Thread Brian Wong

eric kustarz wrote:


If the ARC detects low memory (via arc_reclaim_needed()), then we call 
arc_kmem_reap_now() and subsequently dnlc_reduce_cache() - which 
reduces the # of dnlc entries by 3% (ARC_REDUCE_DNLC_PERCENT).


So yeah, dnlc_nentries would be really interesting to see (especially 
if its  ncsize).
The version of statit that we're using is still attached to ancient 
32-bit counters that /are/ overflowing on our runs. I'm fixing this at 
the moment and I'll send around a new binary this afternoon.


blw
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Some performance questions with ZFS/NFS/DNLC at snv_48

2006-11-09 Thread Tomas Ögren
On 09 November, 2006 - Tomas Ögren sent me these 4,4K bytes:

 On 09 November, 2006 - Neil Perrin sent me these 1,6K bytes:
 
  nfs does have a maximum nmber of rnodes which is calculated from the
  memory available. It doesn't look like nrnode_max can be overridden.
 
 rnode seems to take 472 bytes according to my test program.. which is a
 bit more than the 64 bytes per dnlc entry in ncsize docs..

But wait a minute.. I'm not interested in being an NFS client.. this is
a server.. so wasting ~100MB on nfs client stuff that will never be used
isn't that great.. setting to something really low and rebooting now..

/Tomas
-- 
Tomas Ögren, [EMAIL PROTECTED], http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Some performance questions with ZFS/NFS/DNLC at snv_48

2006-11-09 Thread eric kustarz

Brian Wong wrote:

eric kustarz wrote:



If the ARC detects low memory (via arc_reclaim_needed()), then we call 
arc_kmem_reap_now() and subsequently dnlc_reduce_cache() - which 
reduces the # of dnlc entries by 3% (ARC_REDUCE_DNLC_PERCENT).


So yeah, dnlc_nentries would be really interesting to see (especially 
if its  ncsize).


The version of statit that we're using is still attached to ancient 
32-bit counters that /are/ overflowing on our runs. I'm fixing this at 
the moment and I'll send around a new binary this afternoon.


blw


Me and Spencer just fixed some statit bugs (such as getting it to not 
core on a thumper)... he has the changes, so i'd sync up with him (i'm 
not sure if they are they same bugs though).


eric

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re[2]: [zfs-discuss] Some performance questions with ZFS/NFS/DNLC at snv_48

2006-11-09 Thread Robert Milkowski
Hello Tomas,

Thursday, November 9, 2006, 9:47:17 PM, you wrote:

TÖ On 09 November, 2006 - Neil Perrin sent me these 1,6K bytes:


TÖ Current memory usage (for some values of usage ;):
TÖ # echo ::memstat|mdb -k
TÖ Page SummaryPagesMB  %Tot
TÖ      
TÖ Kernel  95584   746   75%
TÖ Anon20868   163   16%
TÖ Exec and libs1703131%
TÖ Page cache   1007 71%
TÖ Free (cachelist)   97 00%
TÖ Free (freelist)  7745606%

TÖ Total  127004   992
TÖ Physical   125192   978

Well, when I rised ncsize on nfs server I got memory pressure problem.
Leaving ncsize at default solved problem.



-- 
Best regards,
 Robertmailto:[EMAIL PROTECTED]
   http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Some performance questions with ZFS/NFS/DNLC at snv_48

2006-11-09 Thread Neil Perrin



Tomas Ögren wrote On 11/09/06 13:47,:


On 09 November, 2006 - Neil Perrin sent me these 1,6K bytes:

 


Tomas Ögren wrote On 11/09/06 09:59,:

   


1. DNLC-through-ZFS doesn't seem to listen to ncsize.

The filesystem currently has ~550k inodes and large portions of it is
frequently looked over with rsync (over nfs). mdb said ncsize was about
68k and vmstat -s  said we had a hitrate of ~30%, so I set ncsize to
600k and rebooted.. Didn't seem to change much, still seeing hitrates at
about the same and manual find(1) doesn't seem to be that cached
(according to vmstat and dnlcsnoop.d).
When booting, the following message came up, not sure if it matters or not:
NOTICE: setting nrnode to max value of 351642
NOTICE: setting nrnode to max value of 235577

Is there a separate ZFS-DNLC knob to adjust for this? Wild guess is that
it has its own implementation which is integrated with the rest of the
ZFS cache which throws out metadata cache in favour of data cache.. or
something..
 


A more complete and useful set of dnlc statistic can be obtained via
kstat -n dnlcstats. As well as soft the limit on dnlc entries (ncsize)
the current number of cached entries is also useful:
   



This is after ~28h uptime:

module: unixinstance: 0
name:   dnlcstats   class:misc
   crtime  47.5600948
   dir_add_abort   0
   dir_add_max 0
   dir_add_no_memory   0
   dir_cached_current  4
   dir_cached_total107
   dir_entries_cached_current  4321
   dir_fini_purge  0
   dir_hits11000
   dir_misses  172814
   dir_reclaim_any 25
   dir_reclaim_last16
   dir_remove_entry_fail   0
   dir_remove_space_fail   0
   dir_start_no_memory 0
   dir_update_fail 0
   double_enters   234918
   enters  59193543
   hits36690843
   misses  59384436
   negative_cache_hits 1366345
   pick_free   0
   pick_heuristic  57069023
   pick_last   2035111
   purge_all   1
   purge_fs1   0
   purge_total_entries 3748
   purge_vfs   187
   purge_vp95
   snaptime99177.711093


vmstat -s:
96080561 total name lookups (cache hits 38%)

 


echo ncsize/D | mdb -k
echo dnlc_nentries/D | mdb -k
   



ncsize: 60
dnlc_nentries:  19230

Not quite the same..

 


Having said that I actually think your problem is lack of memory.
For each ZFS vnode held by the DNLC it uses a *lot* more memory
than say UFS. Consequently it has to purge dnlc entries and I
suspect with only 1GB that the ZFS ARC doesn't allow many dnlc entries.
I don't know if that number is maintained anywhere, for you to check.
Mark?
   



Current memory usage (for some values of usage ;):
# echo ::memstat|mdb -k
Page SummaryPagesMB  %Tot
     
Kernel  95584   746   75%
Anon20868   163   16%
Exec and libs1703131%
Page cache   1007 71%
Free (cachelist)   97 00%
Free (freelist)  7745606%

Total  127004   992
Physical   125192   978


/Tomas
 


This memory usage shows nearly all of memory consumed by the kernel
and probably by ZFS.  ZFS can't add any more DNLC entries due to lack of
memory without purging others. This can be seen from  the number of
dnlc_nentries being way less than ncsize.
I don't know if there's a DMU or ARC bug to reduce the memory footprint
of their internal structures for situations like this, but we are aware 
of the

issue.

Neil.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss