Re: [zfs-discuss] slow ls or slow zfs
> On Mon, 29 Jun 2009, NightBird wrote: > > > > I checked the output of iostat. svc_t is between 5 > and 50, depending on when data is flushed to the disk > (CIFS write pattern). %b is between 10 and 50. > > %w is always 0. > > Example: > > devicer/sw/s kr/s kw/s wait actv svc_t > %w %b > sd27 31.5 127.0 935.9 616.7 0.0 11.9 75.2 >0 66 > d28 5.00.0 320.00.0 0.0 0.1 18.0 > 0 9 > > This tells me disks are busy but I do not know what > they are doing? > > are they spending time seeking, writting or > reading? > > It looks like your sd27 is being pounded with write > iops. It is close > to its limit. > > Can you post complete iostat output? Since you have > so many disks, > (which may not always be involved in the same stripe) > you may need to > have iostat average over a long period of time such > as 30 or 60 > seconds in order to see a less responsive disk. > Disks could be less > esponsive for many reasons, including vibrations in > their operating > environment. > > Also see Jeff Bonwick's "diskqual.sh" as described at > > http://www.mail-archive.com/zfs-disc...@opensolaris.or > g/msg15384.html > which is good at helping to find pokey disks. A > slightly modified > version is included below. > > Bob > -- > Bob Friesenhahn > bfrie...@simple.dallas.tx.us, > http://www.simplesystems.org/users/bfriesen/ > GraphicsMagick Maintainer, >http://www.GraphicsMagick.org/ I will run the script when the server is idle as recommended and report back. Here is the full iostat output (30sec). c9t40d0 seems to have a consistently higher svc_t time. r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 0.02.62.1 11.0 0.0 0.00.06.5 0 1 c8t0d0 0.12.60.4 11.0 0.0 0.00.05.1 0 0 c8t1d0 6.06.2 380.1 147.8 0.0 0.30.0 23.6 0 13 c9t8d0 6.26.5 390.8 147.9 0.0 0.30.0 21.9 0 13 c9t9d0 6.36.2 386.1 147.6 0.0 0.30.0 26.7 0 12 c9t10d0 6.76.2 413.5 147.8 0.0 0.30.0 21.9 0 14 c9t11d0 6.15.7 371.1 147.6 0.0 0.30.0 21.2 0 11 c9t12d0 6.75.9 407.3 147.6 0.0 0.30.0 21.4 0 13 c9t13d0 5.76.3 347.6 147.7 0.0 0.30.0 22.4 0 12 c9t14d0 7.05.9 426.5 147.5 0.0 0.30.0 20.6 0 13 c9t15d0 6.66.1 405.0 147.6 0.0 0.30.0 21.1 0 12 c9t16d0 6.66.2 405.2 147.7 0.0 0.30.0 21.1 0 12 c9t17d0 7.16.3 432.9 147.8 0.0 0.30.0 20.9 0 14 c9t18d0 6.76.5 411.6 147.9 0.0 0.30.0 23.6 0 13 c9t19d0 6.46.4 390.3 148.1 0.0 0.30.0 21.7 0 13 c9t20d0 6.96.9 424.4 147.9 0.0 0.30.0 19.8 0 13 c9t21d0 6.26.9 375.3 148.1 0.0 0.30.0 20.2 0 12 c9t22d0 5.76.8 349.5 147.9 0.0 0.30.0 20.9 0 12 c9t23d0 6.26.6 377.5 147.9 0.0 0.30.0 20.6 0 11 c9t24d0 5.46.7 328.2 147.9 0.0 0.20.0 20.7 0 11 c9t25d0 6.76.7 407.3 148.0 0.0 0.30.0 19.8 0 12 c9t26d0 6.56.9 396.7 148.1 0.0 0.30.0 20.4 0 13 c9t27d0 6.46.6 390.4 147.9 0.0 0.30.0 21.3 0 13 c9t28d0 6.86.3 416.0 147.6 0.0 0.40.0 26.8 0 13 c9t29d0 6.86.3 413.9 147.8 0.0 0.30.0 23.5 0 13 c9t30d0 7.5 33.5 446.9 312.0 0.0 1.80.0 45.0 0 18 c9t31d0 8.2 33.6 491.7 312.0 0.0 2.10.0 51.1 0 21 c9t32d0 7.0 34.3 414.9 312.3 0.0 1.90.0 47.0 0 20 c9t33d0 7.6 34.1 463.4 312.2 0.0 2.10.0 51.2 0 21 c9t34d0 7.9 33.5 474.4 312.0 0.0 2.20.0 52.9 0 21 c9t35d0 8.2 33.2 496.0 311.7 0.0 2.40.0 59.1 0 23 c9t36d0 8.0 33.2 481.0 311.9 0.0 2.00.0 48.8 0 21 c9t37d0 7.8 33.4 469.9 311.9 0.0 2.30.0 56.4 0 20 c9t38d0 8.5 34.1 518.7 312.4 0.0 2.30.0 54.3 0 22 c9t39d0 8.4 32.9 510.5 311.8 0.0 2.90.0 70.6 0 27 c9t40d0 8.2 34.3 501.5 312.4 0.0 2.30.0 55.1 0 24 c9t41d0 8.1 34.3 491.1 312.5 0.0 2.30.0 55.4 0 21 c9t42d0 8.5 34.3 510.9 312.7 0.0 2.30.0 53.3 0 23 c9t43d0 7.5 34.3 453.1 312.6 0.0 2.30.0 54.4 0 20 c9t44d0 7.0 33.7 420.9 312.1 0.0 2.30.0 55.7 0 19 c9t45d0 7.0 34.2 420.9 312.4 0.0 2.30.0 55.2 0 20 c9t46d0 7.9 35.1 474.6 312.5 0.0 2.10.0 49.1 0 22 c9t47d0 8.1 35.0 487.4 312.8 0.0 2.30.0 52.8 0 22 c9t48d0 8.1 34.2 491.3 312.2 0.0 2.10.0 50.2 0 20 c9t49d0 7.2 34.6 429.4 312.5 0.0 2.10.0 51.3 0 20 c9t50d0 7.6 35.2 459.3 312.6 0.0 2.30.0 54.1 0 21 c9t51d0 7.7 35.0 463.5 312.6 0.0 2.10.0 49.2 0 21 c9t52d0 7.4 35.1 442.6 312.8 0.0 2.10.0 48.8
Re: [zfs-discuss] slow ls or slow zfs
On Mon, 29 Jun 2009, NightBird wrote: I checked the output of iostat. svc_t is between 5 and 50, depending on when data is flushed to the disk (CIFS write pattern). %b is between 10 and 50. %w is always 0. Example: devicer/sw/s kr/s kw/s wait actv svc_t %w %b sd27 31.5 127.0 935.9 616.7 0.0 11.9 75.2 0 66 sd28 5.00.0 320.00.0 0.0 0.1 18.0 0 9 This tells me disks are busy but I do not know what they are doing? are they spending time seeking, writting or reading? It looks like your sd27 is being pounded with write iops. It is close to its limit. Can you post complete iostat output? Since you have so many disks, (which may not always be involved in the same stripe) you may need to have iostat average over a long period of time such as 30 or 60 seconds in order to see a less responsive disk. Disks could be less responsive for many reasons, including vibrations in their operating environment. Also see Jeff Bonwick's "diskqual.sh" as described at http://www.mail-archive.com/zfs-discuss@opensolaris.org/msg15384.html which is good at helping to find pokey disks. A slightly modified version is included below. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ #!/bin/ksh # Date: Mon, 14 Apr 2008 15:49:41 -0700 # From: Jeff Bonwick # To: Henrik Hjort # Cc: zfs-discuss@opensolaris.org # Subject: Re: [zfs-discuss] Performance of one single 'cp' # # No, that is definitely not expected. # # One thing that can hose you is having a single disk that performs # really badly. I've seen disks as slow as 5 MB/sec due to vibration, # bad sectors, etc. To see if you have such a disk, try my diskqual.sh # script (below). On my desktop system, which has 8 drives, I get: # # # ./diskqual.sh # c1t0d0 65 MB/sec # c1t1d0 63 MB/sec # c2t0d0 59 MB/sec # c2t1d0 63 MB/sec # c3t0d0 60 MB/sec # c3t1d0 57 MB/sec # c4t0d0 61 MB/sec # c4t1d0 61 MB/sec # # The diskqual test is non-destructive (it only does reads), but to # get valid numbers you should run it on an otherwise idle system. disks=`format &1 | nawk '$1 == "real" { printf("%.0f\n", 67.108864 / $2) }' } getspeed() { # Best out of 6 for iter in 1 2 3 4 5 6 do getspeed1 $1 done | sort -n | tail -2 | head -1 } for disk in $disks do echo $disk `getspeed $disk` MB/sec done ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] slow ls or slow zfs
NightBird wrote: On Fri, 26 Jun 2009, Richard Elling wrote: All the tools I have used show no IO problems. I think the problem is memory but I am unsure on how to troubleshoot it. Look for latency, not bandwidth. iostat will show latency at the device level. Unfortunately, the effect may not be all that obvious since the disks will only be driven as hard as the slowest disk and so the slowest disk may not seem much slower. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discu ss I checked the output of iostat. svc_t is between 5 and 50, depending on when data is flushed to the disk (CIFS write pattern). %b is between 10 and 50. %w is always 0. Example: devicer/sw/s kr/s kw/s wait actv svc_t %w %b sd27 31.5 127.0 935.9 616.7 0.0 11.9 75.2 0 66 This is a slow disk. Put your efforts here. sd28 5.00.0 320.00.0 0.0 0.1 18.0 0 9 This tells me disks are busy but I do not know what they are doing? are they spending time seeking, writting or reading? I also review some ARC stats. Here is the output. ARC Efficency: Cache Access Total: 199758875 Cache Hit Ratio: 74% 148652045 [Defined State for buffer] Cache Miss Ratio: 25% 51106830 [Undefined State for Buffer] REAL Hit Ratio: 73% 146091795 [MRU/MFU Hits Only] Data Demand Efficiency:94% Data Prefetch Efficiency:15% CACHE HITS BY CACHE LIST: Anon: --%Counter Rolled. That is interesting... but only from a developer standpoint. Most Recently Used: 22%33843327 (mru) [ Return Customer ] Most Frequently Used: 75%112248468 (mfu)[ Frequent Customer ] Most Recently Used Ghost:3%4833189 (mru_ghost)[ Return Customer Evicted, Now Back ] Most Frequently Used Ghost: 22%33831706 (mfu_ghost) [ Frequent Customer Evicted, Now Back ] It seems to me that mfu_ghost being at 22%, I may need a bigger ARC. Is ARC also designed to work with large memory foot prints (128GB for example or higher)? Will it be as efficient? Caching isn't your problem, though adding memory may hide the real problem for a while. You need faster disk. -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] slow ls or slow zfs
> On Fri, 26 Jun 2009, Richard Elling wrote: > > >> All the tools I have used show no IO problems. I > think the problem is > >> memory but I am unsure on how to troubleshoot it. > > > > Look for latency, not bandwidth. iostat will show > latency at the > > device level. > > Unfortunately, the effect may not be all that obvious > since the disks > will only be driven as hard as the slowest disk and > so the slowest > disk may not seem much slower. > > Bob > -- > Bob Friesenhahn > bfrie...@simple.dallas.tx.us, > http://www.simplesystems.org/users/bfriesen/ > GraphicsMagick Maintainer, >http://www.GraphicsMagick.org/ > > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discu > ss I checked the output of iostat. svc_t is between 5 and 50, depending on when data is flushed to the disk (CIFS write pattern). %b is between 10 and 50. %w is always 0. Example: devicer/sw/s kr/s kw/s wait actv svc_t %w %b sd27 31.5 127.0 935.9 616.7 0.0 11.9 75.2 0 66 sd28 5.00.0 320.00.0 0.0 0.1 18.0 0 9 This tells me disks are busy but I do not know what they are doing? are they spending time seeking, writting or reading? I also review some ARC stats. Here is the output. ARC Efficency: Cache Access Total: 199758875 Cache Hit Ratio: 74% 148652045 [Defined State for buffer] Cache Miss Ratio: 25% 51106830 [Undefined State for Buffer] REAL Hit Ratio: 73% 146091795 [MRU/MFU Hits Only] Data Demand Efficiency:94% Data Prefetch Efficiency:15% CACHE HITS BY CACHE LIST: Anon: --%Counter Rolled. Most Recently Used: 22%33843327 (mru) [ Return Customer ] Most Frequently Used: 75%112248468 (mfu)[ Frequent Customer ] Most Recently Used Ghost:3%4833189 (mru_ghost)[ Return Customer Evicted, Now Back ] Most Frequently Used Ghost: 22%33831706 (mfu_ghost) [ Frequent Customer Evicted, Now Back ] It seems to me that mfu_ghost being at 22%, I may need a bigger ARC. Is ARC also designed to work with large memory foot prints (128GB for example or higher)? Will it be as efficient? -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] slow ls or slow zfs
On Fri, 26 Jun 2009, Richard Elling wrote: All the tools I have used show no IO problems. I think the problem is memory but I am unsure on how to troubleshoot it. Look for latency, not bandwidth. iostat will show latency at the device level. Unfortunately, the effect may not be all that obvious since the disks will only be driven as hard as the slowest disk and so the slowest disk may not seem much slower. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] slow ls or slow zfs
> As others have mentioned, it would be easier to take a stab at this if there > is >some more data to look at. > >Have you done any ZFS tuning? If so, please provide the /etc/system, adb, zfs >>etc info. > >Can you provide zpool status output? > >As far as checking ls performance, just to remove name service lookups from >>the possibilities, lets use the '-n' option instead of '-l'. I know you >mentioned it >was unlikely to be a problem, but the less variables the better. > > >Can you characterize what your ''ls -an" output looks like? Is it 100 files >or >100,000? > >How about some sample output like: >for run in 1 2 3 4 >do > echo run $run >truss -c ls -an | wc -l >echo "" > echo >done > > zfs tuning: /etc/system set swapfs_minfree=0x2 set zfs:zfs_txg_synctime=1 Here is the output (200 files in that folder): truss -c ls -an | wc - l 203 syscall seconds calls errors _exit.000 1 read .000 1 write.000 3 open .000 8 3 close.000 6 time .004 610 brk .000 12 getpid .000 1 sysi86 .000 1 ioctl.000 2 2 execve .000 1 fcntl.000 1 openat .000 1 getcontext .000 1 setustack.000 1 pathconf .003 203 mmap .000 7 mmapobj .000 4 getrlimit.000 1 memcntl .000 6 sysconfig.000 2 lwp_private .000 1 acl .006 406 resolvepath .000 6 getdents64 .315 2 stat64 .003 209 1 lstat64 .375 203 fstat64 .000 4 -- sys totals: .7101704 6 usr time:.008 elapsed: 32.420 # zpool status pool: pool001 state: ONLINE status: The pool is formatted using an older on-disk format. The pool can still be used, but some features are unavailable. action: Upgrade the pool using 'zpool upgrade'. Once this is done, the pool will no longer be accessible on older software versions. scrub: none requested config: NAME STATE READ WRITE CKSUM pool001 ONLINE 0 0 0 raidz2 ONLINE 0 0 0 c9t19d0 ONLINE 0 0 0 c9t18d0 ONLINE 0 0 0 c9t17d0 ONLINE 0 0 0 c9t13d0 ONLINE 0 0 0 c9t15d0 ONLINE 0 0 0 c9t16d0 ONLINE 0 0 0 c9t11d0 ONLINE 0 0 0 c9t12d0 ONLINE 0 0 0 c9t14d0 ONLINE 0 0 0 c9t9d0 ONLINE 0 0 0 c9t8d0 ONLINE 0 0 0 c9t10d0 ONLINE 0 0 0 c9t30d0 ONLINE 0 0 0 c9t29d0 ONLINE 0 0 0 c9t28d0 ONLINE 0 0 0 c9t24d0 ONLINE 0 0 0 c9t26d0 ONLINE 0 0 0 c9t27d0 ONLINE 0 0 0 c9t22d0 ONLINE 0 0 0 c9t23d0 ONLINE 0 0 0 c9t25d0 ONLINE 0 0 0 c9t20d0 ONLINE 0 0 0 c9t21d0 ONLINE 0 0 0 spares c8t3d0 AVAIL c8t2d0 AVAIL errors: No known data errors [...] We are running the zpool version that came with b111b for now and have not decided if we want to upgrade to the version that comes with b117 ## vmstat -s 0 swap ins 0 swap outs 0 pages swapped in 0 pages swapped out 729488 total address trans. faults taken 2 page ins 0 page outs 2 pages paged in 0 pages paged out 264171 total reclaims 264171 reclaims from free list 0 micro (hat) faults 729488 minor (as) faults 2 major faults 147603 copy-on-write faults 219098 zero fill page faults 589820 pages examined by the clock daemon 0 revolutions of the clock hand 0 pages freed by the clock daemon 1765 forks 703 vforks 2478 execs 2267309301 cpu context switches 671340139 device interrupts 1036384 traps 28907635 system calls 2524120 total name lookups (cache hits 92%) 8836 user cpu 11632644 system cpu 15777851 idle cpu 0 wait cpu -- This message posted from opensolaris.org ___ zfs-discuss ma
Re: [zfs-discuss] slow ls or slow zfs
On Fri, 26 Jun 2009, NightBird wrote: Thanks Ian. I read the best practices and undestand the IO limitation I have created for this vdev. My system is a built for maximize capacity using large stripes, not performance. All the tools I have used show no IO problems. I think the problem is memory but I am unsure on how to troubleshoot it. Perhaps someone else has answered your question. The problem is not a shortage of I/O. The problem is that raidz and raidz2 do synchronized reads and writes (in a stripe) and so all of the disks in the stripe need to respond before the read or write can return. If one disk is a bit slower than the rest, then everything will be slower. With so many disks, the deck is stacked against you. Raidz can not use an infinite stripe size so the number of disks used for any given I/O is not all of the disks in the vdev, which may make finding the slow disk a bit more difficult. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] slow ls or slow zfs
NightBird wrote: Thanks Ian. I read the best practices and undestand the IO limitation I have created for this vdev. My system is a built for maximize capacity using large stripes, not performance. All the tools I have used show no IO problems. I think the problem is memory but I am unsure on how to troubleshoot it. Look for latency, not bandwidth. iostat will show latency at the device level. Other things that affect ls -la are name services and locale. Name services because the user ids are numbers and are converted to user names via the name service (these are cached in the name services cache daemon, so you can look at the nscd hit rates with "nscd -g"). The locale matters because the output is sorted, which is slower for locales which use unicode. This implies that the more entries in the directory, and the longer the names are with more common prefixes, the longer it takes to sort. I expect case insensitive sorts (common for CIFS environments) also take longer to sort. You could sort by a number instead, try "ls -c" or "ls -S" ls looks at metadata, which is compressed and typically takes little space. But it is also cached, which you can see by looking at the total name lookups in "vmstat -s" As others have pointed out, I think you will find that a 23-wide raidz, raidz2, raid-5, or raid-6 configuration is not a recipe for performance. -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] slow ls or slow zfs
As others have mentioned, it would be easier to take a stab at this if there is some more data to look at. Have you done any ZFS tuning? If so, please provide the /etc/system, adb, zfs etc info. Can you provide zpool status output? As far as checking ls performance, just to remove name service lookups from the possibilities, lets use the '-n' option instead of '-l'. I know you mentioned it was unlikely to be a problem, but the less variables the better. Can you characterize what your ''ls -an" output looks like? Is it 100 files or 100,000? How about some sample output like: for run in 1 2 3 4 do echo run $run truss -c ls -an | wc -l echo "" echo done -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] slow ls or slow zfs
>NightBird wrote: >> Hello, >> We have a server with a couple a raid-z2 pools, each with 23x1TB disks. This >> >gives us 19TB of useable space on each pool. The server has 2 x quad core >> cpu, >16GB RAM and are running b117. Average load is 4 and we use a log ot >> CIFS. >> >> We notice ZFS is slow. Even a simple 'ls -al' can take 20sec. After trying >> >again, it's cached and therefore quick. We also noticed 'ls' is relatively >> quick >>(~3secs). >> >> >Going back to your original question > >What is the data? Is it owned by one or many users? If the latter, the >problem >could be the time taken to fetch all the name service data. > >-- >Ian. Data is compressed files between 300KB and 2,000KB. This is in AD environment with 20+ servers, all running under a single AD account. So the Opensolaris server sees one owner. We have also 3 or 4 idmap group mappings. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] slow ls or slow zfs
NightBird wrote: Hello, We have a server with a couple a raid-z2 pools, each with 23x1TB disks. This gives us 19TB of useable space on each pool. The server has 2 x quad core cpu, 16GB RAM and are running b117. Average load is 4 and we use a log ot CIFS. We notice ZFS is slow. Even a simple 'ls -al' can take 20sec. After trying again, it's cached and therefore quick. We also noticed 'ls' is relatively quick (~3secs). Going back to your original question What is the data? Is it owned by one or many users? If the latter, the problem could be the time taken to fetch all the name service data. -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] slow ls or slow zfs
[Adding context] >> Hi Scott, >> >> Why do you assume there is a IO problem? >> I know my setup is unusual because of the large pool size. However, I have >> not seen any evidence this is a problem for my workload. >> prstat does not show any IO wait. > >The pool size isn't the issue, it's the large number of disks in each vdev. >Read >the suggested best practice links. Thanks Ian. I read the best practices and undestand the IO limitation I have created for this vdev. My system is a built for maximize capacity using large stripes, not performance. All the tools I have used show no IO problems. I think the problem is memory but I am unsure on how to troubleshoot it. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] slow ls or slow zfs
Thanks Ian. I read the best practices and undestand the IO limitation I have created for this vdev. My system is a built for maximize capacity using large stripes, not performance. All the tools I have used show no IO problems. I think the problem is memory but I am unsure on how to troubleshoot it. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] slow ls or slow zfs
NightBird wrote: [please keep enough context so you post makes sense to the mail list] Hi Scott, Why do you assume there is a IO problem? I know my setup is unusual because of the large pool size. However, I have not seen any evidence this is a problem for my workload. prstat does not show any IO wait. The pool size isn't the issue, it's the large number of disks in each vdev. Read the suggested best practice links. -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] slow ls or slow zfs
Hi Scott, Why do you assume there is a IO problem? I know my setup is unusual because of the large pool size. However, I have not seen any evidence this is a problem for my workload. prstat does not show any IO wait. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] slow ls or slow zfs
Hi, When you have a lot of random read/writes, raidz/raidz2 can be fairly slow. http://blogs.sun.com/roch/entry/when_to_and_not_to The recommendation is to break the disks into smaller raidz/z2 stripes, thereby improving IO. >From the ZFS Best Practices Guide: http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide#RAID-Z_Configuration_Requirements_and_Recommendations "The recommended number of disks per group is between 3 and 9. If you have more disks, use multiple groups." -Scott -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] slow ls or slow zfs
On Fri, Jun 26 at 15:18, NightBird wrote: Hello, We have a server with a couple a raid-z2 pools, each with 23x1TB disks. This gives us 19TB of useable space on each pool. The server has 2 x quad core cpu, 16GB RAM and are running b117. Average load is 4 and we use a log ot CIFS. We notice ZFS is slow. Even a simple 'ls -al' can take 20sec. After trying again, it's cached and therefore quick. We also noticed 'ls' is relatively quick (~3secs). As I understand it, each vdev gets roughly the performance of a single raw disk, with slight performance penalties once you exceed a certain size of 6-8 disks in a single vdev. --eric -- Eric D. Mudama edmud...@mail.bounceswoosh.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] slow ls or slow zfs
Hello, We have a server with a couple a raid-z2 pools, each with 23x1TB disks. This gives us 19TB of useable space on each pool. The server has 2 x quad core cpu, 16GB RAM and are running b117. Average load is 4 and we use a log ot CIFS. We notice ZFS is slow. Even a simple 'ls -al' can take 20sec. After trying again, it's cached and therefore quick. We also noticed 'ls' is relatively quick (~3secs). How can I improve the response time? How do I determine how much memory I need for ZFS caching? Here are some stats: > ::arc hits = 44025797 misses= 8452650 [..] p = 10646 MB c = 11712 MB c_min = 1918 MB c_max = 15350 MB size = 11712 MB -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss