Re: [zfs-discuss] zfs fragmentation

2009-08-08 Thread Ed Spencer

On Fri, 2009-08-07 at 19:33, Richard Elling wrote:

 This is very unlikely to be a fragmentation problem. It is a  
 scalability problem
 and there may be something you can do about it in the short term.

You could be right.

Out test mail server consists of the exact same design, same hardware
(SUN4V)  but in a smaller configuration (less memory and 4 x 25g san
luns) has a backup/copy thoughput of 30GB/hour. Data used for testing
was copied from our production mail server.

  Adding another pool and copying all/some data over to it would only  
  a short term solution.
 
 I'll have to disagree.

What is the point of a filesystem the can grow to such a huge size and
not have functionality built in to optimize data layout?  Real world
implementations of filesystems that are intended to live for
years/decades need this functionality, don't they?

Our mail system works well, only the backup doesn't perform well.
All the features of ZFS that make reads perform well (prefetch, ARC)
have little effect.
 
We think backup is quite important. We do quite a few restores of months
old data. Snapshots help in the short term, but for longer term restores
we need to go to tape. 

Of course, as you can tell, I'm kinda stuck on this idea that file and
directory fragmentation is causing our issues with the backup. I don't
know how to analyze the pool to better understand the problem.

If we did chop the pool up into lets say 7 pools (one for each current
filesystem) then over time these 7 pools would grow and we would end up
with the same issues. Thats why it seems to me to be a short term
solution.

If our issues with zfs are scalability then you could say zfs is not
scalable. Is that true?
(It certianly is if the solution is too create more pools!).  

-- 
Ed 


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs fragmentation

2009-08-08 Thread Mattias Pantzare
  Adding another pool and copying all/some data over to it would only
  a short term solution.

 I'll have to disagree.

 What is the point of a filesystem the can grow to such a huge size and
 not have functionality built in to optimize data layout?  Real world
 implementations of filesystems that are intended to live for
 years/decades need this functionality, don't they?

 Our mail system works well, only the backup doesn't perform well.
 All the features of ZFS that make reads perform well (prefetch, ARC)
 have little effect.

 We think backup is quite important. We do quite a few restores of months
 old data. Snapshots help in the short term, but for longer term restores
 we need to go to tape.

Your scalability problem may be in your backup solution.

The problem is not how many Gb data you have but the number of files.

It was a while since I worked with networker so things may have changed.

If you are doing backups directly to tape you may have a buffering
problem. By simply staging backups on disk we got at lot faster
backups.

Have you configured networker to do several simultaneous backups from
your pool?
You can do that by having several zfs on the same pool or tell
networker to do backups one directory level down so that it thinks you
have more file systems. And don't forget to play with the parallelism
settings in networker.

This made a huge difference for us on VxFS.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs fragmentation

2009-08-08 Thread Bob Friesenhahn

On Sat, 8 Aug 2009, Ed Spencer wrote:


What is the point of a filesystem the can grow to such a huge size and
not have functionality built in to optimize data layout?  Real world
implementations of filesystems that are intended to live for
years/decades need this functionality, don't they?


Enterprise storage should work fine without needing to run a tool to 
optimize data layout or repair the filesystem.  Well designed software 
uses an approach which does not unravel through use.



Our mail system works well, only the backup doesn't perform well.
All the features of ZFS that make reads perform well (prefetch, ARC)
have little effect.


It is already known that ZFS prefetch is often not aggressive enough 
for bulk reads, and sometimes gets lost entirely.  I think that is the 
first issue to resolve in order to get your backups going faster.


Many of us here already tested our own systems and found that under 
some conditions ZFS was offering up only 30MB/second for bulk data 
reads regardless of how exotic our storage pool and hardware was.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs fragmentation

2009-08-08 Thread Ed Spencer

On Sat, 2009-08-08 at 09:17, Bob Friesenhahn wrote:
 Many of us here already tested our own systems and found that under 
 some conditions ZFS was offering up only 30MB/second for bulk data 
 reads regardless of how exotic our storage pool and hardware was.

Just so we are using the same units of measurements. Backup/copy
throughput on our development mail server is 8.5MB/sec. The people
running our backups would be over joyed with that performance.

However backup/copy throughput on our production mail server is 2.25
MB/sec.

The underlying disk is 15000 RPM 146GB FC drives.
Our performance may be hampered somewhat because the luns are on a
Network Appliance accessed via iSCSI, but not to the extent that we are
seeing, and it does not account for the throughput difference in the
development and production pools.

When I talk about fragmentation its not in the normal sense. I'm not
talking about blocks in a file not being sequential. I'm talking about
files in a single directory that end up spread across the entire
filesytem/pool. 

My problem right now is diagnosing the performance issues.  I can't
address them without understanding the underlying cause.  There is a
lack of tools to help in this area. There is also a lack of acceptance
that I'm actually having a problem with zfs. Its frustrating.

Anyone know how significantly increase the performance of a zfs
filesystem without causing any downtime to an Enterprise email system
used by 30,000 intolerant people, when you don't really know what is
causing the performance issues in the first place? (Yeah, it sucks to be
me!)

-- 
Ed 


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs fragmentation

2009-08-08 Thread Ed Spencer

On Sat, 2009-08-08 at 08:14, Mattias Pantzare wrote:

 Your scalability problem may be in your backup solution.
We've eliminated the backup system as being involved with the
performance issues. 

The servers are Solaris 10 with the OS on UFS filesystems. (In zfs
terms, the pool is old/mature). Solaris has been patched to a fairly
current level.  

Copying data from the zfs filesystem to the local ufs filesystem enjoys
the same throughput as the backup system. 

The test was simple. Create a test filesystem on the zfs pool. Restore
production email data to it. Reboot the server. Backup the data (29
minutes for a 15.8 gig of data). Reboot the server. Copy data from zfs
to ufs using a 'cp -pr ...' command, which also took 29 minutes. 

And if anyone is interested it only took 15 minutes to restore (write) 
the 15.8GB of data over the network. 

-- 
Ed 


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs fragmentation

2009-08-08 Thread Mike Gerdts
On Sat, Aug 8, 2009 at 3:02 PM, Ed Spencered_spen...@umanitoba.ca wrote:

 On Sat, 2009-08-08 at 09:17, Bob Friesenhahn wrote:

 Enterprise storage should work fine without needing to run a tool to
 optimize data layout or repair the filesystem.  Well designed software
 uses an approach which does not unravel through use.

 H, this is counter to my understanding. I always thought that to
 optimize sequential read performance you must store the data according
 to how the device will read the data.

 Spinning rust reads data in a sequential fashion. In order to optimize
 read performance it has to be laid down that way.

 When reading files in a directory, the files need to be laid out on the
 physical device sequentially for optimal read performance.

 I probably not he person to argue this point thoughIs there a DBA
 around?

The DBA's that I know use files that are at least hundreds of
megabytes in size.  Your problem is very different.

 Maybe my problems will go away once we move into the next generation of
 storage devices, SSD's! I'm starting to think that ZFS will really shine
 on SSD's.

Your problem seems to be related to cold reads in a pretty large data
set.  With SSD's (l2arc) you are likely to see a performance boost for
a larger set of recently read files, but my guess is that backups will
still be pretty slow.  There is likely more benefit in restore speed
with SSD's than there is in read speeds.  However, the NVRAM on the
NetApp that is backing your iSCSI LUNs is probably already giving you
most of this benefit (assuming low latency on network connections).

-- 
Mike Gerdts
http://mgerdts.blogspot.com/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs fragmentation

2009-08-08 Thread Ed Spencer

On Sat, 2009-08-08 at 15:12, Mike Gerdts wrote:

 The DBA's that I know use files that are at least hundreds of
 megabytes in size.  Your problem is very different.
Yes, definitely. 

I'm relating records in a table to my small files because our email
system treats the filesystem as a database.

And in the back of my mind I'm also thinking that you have to
rebuild/repair the database once in a while to improve performance.

And in my case, since the filesystem is the database, I want to do that
to zfs! 

At least thats what I'm thinking, however, and I always come back to
this, I'm not certian what is causing my problem. I need certainty
before taking action on the production system. 

-- 
Ed 


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs fragmentation

2009-08-08 Thread Mike Gerdts
On Sat, Aug 8, 2009 at 3:25 PM, Ed Spencered_spen...@umanitoba.ca wrote:

 On Sat, 2009-08-08 at 15:12, Mike Gerdts wrote:

 The DBA's that I know use files that are at least hundreds of
 megabytes in size.  Your problem is very different.
 Yes, definitely.

 I'm relating records in a table to my small files because our email
 system treats the filesystem as a database.

Right... but ZFS doesn't understand your application.  The reason that
a file system would put files that are in the same directory in the
same general area on a disk is to minimize seek time.  I would argue
that seek time doesn't matter a whole lot here - at least from the
vantage point of ZFS.  The LUNs that you have presented from the filer
are probably RAID6 across many disks.  ZFS seems to be doing a  4 way
stripe (or are you mirroring or raidz?).  Assuming you are doing
something like a 7+2 RAID6 on the back end, the contents would be
spread across 36 drives.[1]  The trick to making this perform well is
to have 36 * N worker threads.  Mail is a great thing to keep those
spindles kinda busy while getting decent performance.  A small number
of sequential readers - particularly with small files where you can't
do a reasonable job with read-ahead - has little chance of keeping
that number of drives busy.

1. Or you might have 4 LUNs presented from one 4+1 RAID5 in which you
may be forcing more head movement because ZFS thinks it can speed
things up by striping data across the LUNs.

ZFS can recognize a database (or other application) doing a sequential
read on a large file.  While data located sequentially on disk can be
helpful for reads, this is much less important when the pool sits
across tens of disks.  This is because it has the ability to spread
the iops across lots of disks, potentially reading a heavily
fragmented file much faster than a purely sequential file.

In either case, your backup application is competing for iops (and
seeks) with other workload.  With the NetApp backend there are likely
other applications on the same aggregate that are forcing head
movement away from any data belonging to these LUNs.

 And in the back of my mind I'm also thinking that you have to
 rebuild/repair the database once in a while to improve performance.

Certainly.  Databases become fragmented and are reorganized to fix this.

 And in my case, since the filesystem is the database, I want to do that
 to zfs!

 At least thats what I'm thinking, however, and I always come back to
 this, I'm not certian what is causing my problem. I need certainty
 before taking action on the production system.

Most databases are written in such a way that they can be optimized
for sequential reads (table scans) and for backups, whether on raw
disk or on a file system.  The more advanced the database is, the more
likely it is to ask the file system to get out of its way and *not* do
anything fancy.

It seems that cyrus was optimized for operations that make sense for a
mail program (deliver messages, retrieve messages, delete messages)
and nothing else.  I would argue that any application that creates
lots of tiny files is not optimized for backing up using a small
number of streams.

-- 
Mike Gerdts
http://mgerdts.blogspot.com/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs fragmentation

2009-08-08 Thread Ed Spencer

On Sat, 2009-08-08 at 15:20, Bob Friesenhahn wrote:

 A SSD slog backed by a SAS 15K JBOD array should perform much better 
 than a big iSCSI LUN.

Now...yes. We implemented this pool years ago. I believe, then, the
server would crash if you had a zfs drive fail. We decided to let the
netapp handle the disk redundency. Its worked out well. 

I've looked at those really nice Sun products adoringly. And a 7000
series appliance would also be a nice addition to our central NFS
service. Not to mention more cost effective than expanding our Network
Appliance (We have researchers who are quite hungry for storage and NFS
is always our first choice).

We now have quite an investment in the current implementation. Its
difficult to move away from. The netapp is quite a reliable product.

We are quite happy with zfs and our implementation. We just need to
address our backup performance and  improve it just a little bit!

We were almost lynched this spring because we encountered some pretty
severe zfs bugs. We are still running the IDR named A wad of ZFS bug
fixes for Solaris 10 Update 6. It took over a month to resolve the
issues.

I work at a University and Final Exams and year end occur at the same
time. I don't recommend having email problems during this time! People
are intolerant to email problems.

I live in hope that a Netapp OS update, or a solaris patch, or a zfs
patch, or a iscsi patch , or something will come along that improves our
performance just a bit so our backup people get off my back!

-- 
Ed 


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs fragmentation

2009-08-08 Thread Ed Spencer

On Sat, 2009-08-08 at 15:05, Mike Gerdts wrote:
 On Sat, Aug 8, 2009 at 12:51 PM, Ed Spencered_spen...@umanitoba.ca wrote:
 
  On Sat, 2009-08-08 at 09:17, Bob Friesenhahn wrote:
  Many of us here already tested our own systems and found that under
  some conditions ZFS was offering up only 30MB/second for bulk data
  reads regardless of how exotic our storage pool and hardware was.
 
  Just so we are using the same units of measurements. Backup/copy
  throughput on our development mail server is 8.5MB/sec. The people
  running our backups would be over joyed with that performance.
 
  However backup/copy throughput on our production mail server is 2.25
  MB/sec.
 
  The underlying disk is 15000 RPM 146GB FC drives.
  Our performance may be hampered somewhat because the luns are on a
  Network Appliance accessed via iSCSI, but not to the extent that we are
  seeing, and it does not account for the throughput difference in the
  development and production pools.
 
 NetApp filers run WAFL - Write Anywhere File Layout.  Even if ZFS
 arranged everything perfrectly (however that is defined) WAFL would
 undo its hard work.
 
 Since you are using iSCSI, I assume that you have disabled the Nagle
 algorithm and increased  tcp_xmit_hiwat and tcp_recv_hiwat.  If not,
 go do that now.
We've tried many different iscsi parameter changes on our development server:
Jumbo Frames
Disabling the Nagle
I'll double check next week on tcp_xmit_hiwat and tcp_recv_hiwat.

Nothing has made any real difference. 
We are only using about 5% of the bandwidth on our IPSan.

We use two cisco ethernet switches on the IPSAN. The iscsi initiators
use MPXIO in a round robin configuration.  

  When I talk about fragmentation its not in the normal sense. I'm not
  talking about blocks in a file not being sequential. I'm talking about
  files in a single directory that end up spread across the entire
  filesytem/pool.
 
 It's tempting to think that if the files were in roughly the same area
 of the block device that ZFS sees that reading the files sequentially
 would at least trigger a read-ahead at the filer.  I suspect that even
 a moderate amount of file creation and deletion would cause the I/O
 pattern to be random enough (not purely sequential) that the back-end
 storage would not have a reasonable chance of recognizing it as a good
 time for read-ahead.  Further, since the backup application is
 probably in a loop of:
 
 while there are more files in the directory
if next file mtime  last backup time
open file
read file contents, send to backup stream
close file
 end if
 end while
 
 In other words, other I/O operations are interspersed between the
 sequential data reads, some files are likely to be skipped, and there
 is latency introduced by writing to the data stream.  I would be
 surprised to see any file system do intelligent read-ahead here.  In
 other words, lots of small file operations make backups and especially
 restores go slowly.  More backup and restore streams will almost
 certainly help.  Multiplex the streams so that you can keep your tapes
 moving at a constant speed.

We backup to disk first and then put to tape later.

 Do you have statistics on network utilization to ensure that you
 aren't stressing it?
 
 Have you looked at iostat data to be sure that you are seeing asvc_t +
 wsvc_t that supports the number of operations that you need to
 perform?  That is if asvc_t + wsvc_t for a device adds up to 10 ms, a
 workload that waits for the completion of one I/O before issuing the
 next will max out at 100 iops.  Presumably ZFS should hide some of
 this from you[1], but it does suggest that each backup stream would be
 limited to about 100 files per second[2].  This is because the read
 request for one file does not happen before the close of the previous
 file[3].  Since cyrus stores each message as a separate file, this
 suggests that 2.5 MB/s corresponds to average mail message size of 25
 KB.
 
 1. via metadata caching, read-ahead on file data reads, etc.
 2. Assuming wsvc_t + asvc_t = 10 ms
 3. Assuming that networker is about as smart as tar, zip, cpio, etc.

There is a backup of a single filesystem in the pool going on right now:
# zpool iostat 5 5
   capacity operationsbandwidth
pool used  avail   read  write   read  write
--  -  -  -  -  -  -
space   1.05T   965G 97 69  5.24M  2.71M
space   1.05T   965G113 10  6.41M   996K
space   1.05T   965G100112  2.87M  1.81M
space   1.05T   965G112  8  2.35M  35.9K
space   1.05T   965G106  3  1.76M  55.1K

Here are examples :
iostat -xpn 5 5
extended device statistics
r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device

   17.1   29.2  746.7  317.1  0.0  0.60.0   12.5   0  27
c4t60A98000433469764E4A2D456A644A74d0
   25.0   11.9  991.9  277.0  0.0  0.60.0   16.1   0  36

Re: [zfs-discuss] Fwd: zfs fragmentation

2009-08-08 Thread Ed Spencer

On Sat, 2009-08-08 at 17:25, Mike Gerdts wrote:

 ndd -get /dev/tcp tcp_xmit_hiwat
 ndd -get /dev/tcp tcp_recv_hiwat
 grep tcp-nodelay /kernel/drv/iscsi.conf
# ndd -get /dev/tcp tcp_xmit_hiwat
2097152
# ndd -get /dev/tcp tcp_recv_hiwat
2097152
# grep tcp-nodelay /kernel/drv/iscsi.conf
#

 While backups are running (which is probably all the time given the
 backup rate)
 
 # look at service times
 iostat -xzn 10
Oh crap. Look like there are no backup jobs running right now. It must
have just ended.
 # is networker cpu bound?
No. The server is barely tasked by either the email system or networker.
 prstat -mL
 Some indication of how many backup jobs run concurrently would
 probably help frame any future discussion.
I'll get more info on the backups next week when the full backups run.
 
-- 
Ed 


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs fragmentation

2009-08-08 Thread Richard Elling


On Aug 8, 2009, at 5:02 AM, Ed Spencer wrote:



On Fri, 2009-08-07 at 19:33, Richard Elling wrote:


This is very unlikely to be a fragmentation problem. It is a
scalability problem
and there may be something you can do about it in the short term.


You could be right.

Out test mail server consists of the exact same design, same hardware
(SUN4V)  but in a smaller configuration (less memory and 4 x 25g san
luns) has a backup/copy thoughput of 30GB/hour. Data used for testing
was copied from our production mail server.


Adding another pool and copying all/some data over to it would only
a short term solution.


I'll have to disagree.


What is the point of a filesystem the can grow to such a huge size and
not have functionality built in to optimize data layout?  Real world
implementations of filesystems that are intended to live for
years/decades need this functionality, don't they?

Our mail system works well, only the backup doesn't perform well.
All the features of ZFS that make reads perform well (prefetch, ARC)
have little effect.


The best workload is one that doesn't read from disk to begin with :-)
For workloads with millions of files (eg large-scale mail servers) you
will need to increase the size of the Directory Name Lookup Cache
(DNLC). By default, it is way too small for such workloads. If the
directory names are in cache, then they do not have to be read from
disk -- a big win.

You can see how well the DNLC is working by looking at the output of
vmstat -s and look for the total name lookups. You can size DNLC
by tuning the ncsize parameter, but it requires a reboot.  See the
Solaris Tunable Parameters Guide for details.
http://docs.sun.com/app/docs/doc/817-0404/chapter2-35?a=view

I'd like to revisit the backup problem, but that is much more  
complicated

and probably won't fit in a mail thread very easily (hence, the white
paper :-)
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs fragmentation

2009-08-08 Thread Bob Friesenhahn

On Sat, 8 Aug 2009, Ed Spencer wrote:

   r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
  11.9   43.0  528.9 1972.8  0.0  2.10.0   38.9   0  31
c4t60A98000433469764E4A2D456A644A74d0
  17.0   19.6  496.9 1499.0  0.0  1.40.0   38.8   0  39
c4t60A98000433469764E4A2D456A696579d0
  14.0   30.0  670.2 1971.3  0.0  1.70.0   38.0   0  34
c4t60A98000433469764E4A476D2F664E4Fd0
  19.7   28.7  985.2 1647.6  0.0  1.60.0   32.5   0  37
c4t60A98000433469764E4A476D2F6B385Ad0


I have this in my /etc/system file:

* Set device I/O maximum concurrency
* 
http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide#Device_I.2FO_Queue_Size_.28I.2FO_Concurrency.29

set zfs:zfs_vdev_max_pending = 5

This parameter may be worthwhile to look at to reduce your asvc_t. 
It seems that the default (35) is tuned for a true JBOD setup and not 
a SAN-hosted LUN.


As I recall, you can use the kernel debugger to set it while the 
system is running and immediately see differences in iostat output.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs fragmentation

2009-08-08 Thread Mattias Pantzare
On Sat, Aug 8, 2009 at 20:20, Ed Spencered_spen...@umanitoba.ca wrote:

 On Sat, 2009-08-08 at 08:14, Mattias Pantzare wrote:

 Your scalability problem may be in your backup solution.
 We've eliminated the backup system as being involved with the
 performance issues.

 The servers are Solaris 10 with the OS on UFS filesystems. (In zfs
 terms, the pool is old/mature). Solaris has been patched to a fairly
 current level.

 Copying data from the zfs filesystem to the local ufs filesystem enjoys
 the same throughput as the backup system.

 The test was simple. Create a test filesystem on the zfs pool. Restore
 production email data to it. Reboot the server. Backup the data (29
 minutes for a 15.8 gig of data). Reboot the server. Copy data from zfs
 to ufs using a 'cp -pr ...' command, which also took 29 minutes.

Yes, that was expected. What hapens if you run two cp -pr at the same
time? I am guessing that two cp will take almost the same time as one.

If you get twice the performance from two cp  then you will get twice
the performance from doing two backups in parallel.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs-discuss Digest, Vol 46, Issue 50

2009-08-08 Thread Allen Eastwood
Does DNLC even play a part in ZFS, or are the Docs out of date?

Defines the number of entries in the directory name look-up cache (DNLC).
This parameter is used by UFS and NFS to cache elements of path names that
have been resolved.

No mention of ZFS.  Noticed that when discussing that with a customer of
mine.

The best workload is one that doesn't read from disk to begin with :-)
 For workloads with millions of files (eg large-scale mail servers) you
 will need to increase the size of the Directory Name Lookup Cache
 (DNLC). By default, it is way too small for such workloads. If the
 directory names are in cache, then they do not have to be read from
 disk -- a big win.

 You can see how well the DNLC is working by looking at the output of
 vmstat -s and look for the total name lookups. You can size DNLC
 by tuning the ncsize parameter, but it requires a reboot.  See the
 Solaris Tunable Parameters Guide for details.
 http://docs.sun.com/app/docs/doc/817-0404/chapter2-35?a=view

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs-discuss Digest, Vol 46, Issue 50

2009-08-08 Thread Tim Haley

Allen Eastwood wrote:

Does DNLC even play a part in ZFS, or are the Docs out of date?

Defines the number of entries in the directory name look-up cache 
(DNLC). This parameter is used by UFS and NFS to cache elements of path 
names that have been resolved.


No mention of ZFS.  Noticed that when discussing that with a customer of 
mine.



The best workload is one that doesn't read from disk to begin with :-)
For workloads with millions of files (eg large-scale mail servers) you
will need to increase the size of the Directory Name Lookup Cache
(DNLC). By default, it is way too small for such workloads. If the
directory names are in cache, then they do not have to be read from
disk -- a big win.

You can see how well the DNLC is working by looking at the output of
vmstat -s and look for the total name lookups. You can size DNLC
by tuning the ncsize parameter, but it requires a reboot.  See the
Solaris Tunable Parameters Guide for details.
http://docs.sun.com/app/docs/doc/817-0404/chapter2-35?a=view





Yes, ZFS uses the DNLC as well.

-tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss