[zfs-discuss] set zfs:zfs_vdev_max_pending
We have a zpool made of 4 512g iscsi luns located on a network appliance. We are seeing poor read performance from the zfs pool. The release of solaris we are using is: Solaris 10 10/09 s10s_u8wos_08a SPARC The server itself is a T2000 I was wondering how we can tell if the zfs_vdev_max_pending setting is impeding read performance of the zfs pool? (The pool consists of lots of small files). And if it is impeding read performance, how do we go about finding a new value for this parameter? Of course I may misunderstand this parameter entirely and would be quite happy for an proper explanation! -- Ed -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs fragmentation
I've come up with a better name for the concept of file and directory fragmentation which is, Filesystem Entropy. Where, over time, an active and volitile filesystem moves from an organized state to a disorganized state resulting in backup difficulties. Here are some stats which illustrate the issue: First the development mail server: == (Jump frames, Nagle disabled and tcp_xmit_hiwat,tcp_recv_hiwat set to 2097152) Small file workload (copy from zfs on iscsi network to local ufs filesystem) # zpool iostat 10 capacity operationsbandwidth pool used avail read write read write -- - - - - - - space 70.5G 29.0G 3 0 247K 59.7K space 70.5G 29.0G136 0 8.37M 0 space 70.5G 29.0G115 0 6.31M 0 space 70.5G 29.0G108 0 7.08M 0 space 70.5G 29.0G105 0 3.72M 0 space 70.5G 29.0G135 0 3.74M 0 space 70.5G 29.0G155 0 6.09M 0 space 70.5G 29.0G193 0 4.85M 0 space 70.5G 29.0G142 0 5.73M 0 space 70.5G 29.0G159 0 7.87M 0 Large File workload (cd and dvd iso's) # zpool iostat 10 capacity operationsbandwidth pool used avail read write read write -- - - - - - - space 70.5G 29.0G 3 0 224K 59.8K space 70.5G 29.0G462 0 57.8M 0 space 70.5G 29.0G427 0 53.5M 0 space 70.5G 29.0G406 0 50.8M 0 space 70.5G 29.0G430 0 53.8M 0 space 70.5G 29.0G382 0 47.9M 0 The production mail server: === Mail system is running with 790 imap users logged in (low imap work load). Two backup streams are running. Not using jumbo frames, nagle enabled, tcp_xmit_hiwat,tcp_recv_hiwat set to 2097152 - we've never seen any effect of changing the iscsi transport parameters under this small file workload. # zpool iostat 10 capacity operationsbandwidth pool used avail read write read write -- - - - - - - space 1.06T 955G 96 69 5.20M 2.69M space 1.06T 955G175105 8.96M 2.22M space 1.06T 955G182 16 4.47M 546K space 1.06T 955G170 16 4.82M 1.85M space 1.06T 955G145159 4.23M 3.19M space 1.06T 955G138 15 4.97M 92.7K space 1.06T 955G134 15 3.82M 1.71M space 1.06T 955G109123 3.07M 3.08M space 1.06T 955G106 11 3.07M 1.34M space 1.06T 955G120 17 3.69M 1.74M # prstat -mL PID USERNAME USR SYS TRP TFL DFL LCK SLP LAT VCX ICX SCL SIG PROCESS/LWPID 12438 root 12 6.9 0.0 0.0 0.0 0.0 81 0.1 508 84 4K 0 save/1 27399 cyrus 15 0.5 0.0 0.0 0.0 0.0 85 0.0 18 10 297 0 imapd/1 20230 root 3.9 8.0 0.0 0.0 0.0 0.0 88 0.1 393 33 2K 0 save/1 25913 root 0.5 3.3 0.0 0.0 0.0 0.0 96 0.0 22 2 1K 0 prstat/1 20495 cyrus1.1 0.2 0.0 0.0 0.5 0.0 98 0.0 14 3 191 0 imapd/1 1051 cyrus1.2 0.0 0.0 0.0 0.0 0.0 99 0.0 19 1 80 0 master/1 24350 cyrus0.5 0.5 0.0 0.0 1.4 0.0 98 0.0 57 1 484 0 lmtpd/1 22645 cyrus0.6 0.3 0.0 0.0 0.0 0.0 99 0.0 53 1 603 0 imapd/1 24904 cyrus0.3 0.4 0.0 0.0 0.0 0.0 99 0.0 66 0 863 0 imapd/1 18139 cyrus0.3 0.2 0.0 0.0 0.0 0.0 99 0.0 24 0 195 0 imapd/1 21459 cyrus0.2 0.3 0.0 0.0 0.0 0.0 99 0.0 54 0 635 0 imapd/1 24891 cyrus0.3 0.3 0.0 0.0 0.9 0.0 99 0.0 28 0 259 0 lmtpd/1 388 root 0.2 0.3 0.0 0.0 0.0 0.0 100 0.0 1 1 48 0 in.routed/1 21643 cyrus0.2 0.3 0.0 0.0 0.2 0.0 99 0.0 49 7 540 0 imapd/1 18684 cyrus0.2 0.3 0.0 0.0 0.0 0.0 100 0.0 48 1 544 0 imapd/1 25398 cyrus0.2 0.2 0.0 0.0 0.0 0.0 100 0.0 47 0 466 0 pop3d/1 23724 cyrus0.2 0.2 0.0 0.0 0.0 0.0 100 0.0 47 0 540 0 imapd/1 24909 cyrus0.1 0.2 0.0 0.0 0.2 0.0 99 0.0 25 1 251 0 lmtpd/1 16317 cyrus0.2 0.2 0.0 0.0 0.0 0.0 100 0.0 37 1 495 0 imapd/1 28243 cyrus0.1 0.3 0.0 0.0 0.0 0.0 100 0.0 32 0 289 0 imapd/1 20097 cyrus0.1 0.2 0.0 0.0 0.3 0.0 99 0.0 26 5 253 0 lmtpd/1 Total: 893 processes, 1125 lwps, load averages: 1.14, 1.16, 1.16 -- Ed ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs fragmentation
On Tue, 2009-08-11 at 07:58, Alex Lam S.L. wrote: At a first glance, your production server's numbers are looking fairly similar to the small file workload results of your development server. I thought you were saying that the development server has faster performance? The development serer was running only one cp -pr command. The production mail sevrer was running two concurrent backup jobs and of course the mail system, with each job having the same performance throughput as if there were a single job running. The single threaded backup jobs do not conflict with each other over performance. If we ran 20 concurrent backup jobs, overall performance would scale up quite a bit. (I would guess between 5 and 10 times the performance). (I just read Mike's post and will do some 'concurrency' testing). Users are currently evenly distributed over 5 filesystems (I previously mentioned 7 but its really 5 filesystems for users and 1 for system data, totalling 6, and one test filesystem). We backup 2 filesystems on tuesday, 2 filesystems on thursday, and 2 on saturday. We backup to disk and then clone to tape. Our backup people can only handle doing 2 filesystems per night. Creating more filesystems to increase the parallelism of our backup is one solution but its a major redesign of the of the mail system. Adding a second server to half the pool and thereby half the problem is another solution (and we would also create more filesystems at the same time). Moving the pool to a FC San or a JBOD may also increase performance. (Less layers, introduced by the appliance, thereby increasing performance.) I suspect that if we 'rsync' one of these filesystems to a second server/pool that we would also see a performance increase equal to what we see on the development server. (I don't know how zfs send a receive work so I don't know if it would address this Filesystem Entropy or specifically reorganize the files and directories). However, when we created a testfs filesystem in the zfs pool on the production server, and copied data to it, we saw the same performance as the other filesystems, in the same pool. We will have to do something to address the problem. A combination of what I just listed is our probable course of action. (Much testing will have to be done to ensure our solution will address the problem because we are not 100% sure what is the cause of performance degradation). I'm also dealing with Network Appliance to see if there is anything we can do at the filer end to increase performance. But I'm holding out little hope. But please, don't miss the point I'm trying to make. ZFS would benefit from a utility or a background process that would reorganize files and directories in the pool to optimize performance. A utility to deal with Filesystem Entropy. Currently a zfs pool will live as long as the lifetime of the disks that it is on, without reorganization. This can be a long long time. Not to mention slowly expanding the pool over time contributes to the issue. -- Ed ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs fragmentation
Concurrency/Parallelism testing. I have 6 different filesystems populated with email data on our mail development server. I rebooted the server before beginning the tests. The server is a T2000 (sun4v) machine so its ideally suited for this type of testing. The test was to tar (to /dev/null) each of the filesystems. Launch 1, gather stats launch another , gather stats, etc. The underlying storage system is a Network Appliance. Our only one. In production. Serving NFS, CIFS and iscsi. Other work the appliance is doing may effect these tests, and vice versa :) . No one seemed to notice I was running these tests. After 6 concurrent tar's running we are probabaly seeing benefits of the ARC. At certian points I included load averages and traffic stats for each of the iscsi ethernet interfaces that are configured with MPXIO. After the first 6 jobs, I launched duplicates of the 6. Then another 6, etc. At the end I included the zfs kernel statistics: 1 job = capacity operationsbandwidth pool used avail read write read write -- - - - - - - space 70.5G 29.0G 0 0 0 0 space 70.5G 29.0G 19 0 1.04M 0 space 70.5G 29.0G268 0 8.71M 0 space 70.5G 29.0G196 0 11.3M 0 space 70.5G 29.0G171 0 11.0M 0 space 70.5G 29.0G182 0 5.01M 0 space 70.5G 29.0G273 0 9.71M 0 space 70.5G 29.0G292 0 8.91M 0 space 70.5G 29.0G279 0 15.4M 0 space 70.5G 29.0G219 0 11.3M 0 space 70.5G 29.0G175 0 8.67M 0 2 jobs == capacity operationsbandwidth pool used avail read write read write -- - - - - - - space 70.5G 29.0G381 0 23.8M 0 space 70.5G 29.0G422 0 28.0M 0 space 70.5G 29.0G386 0 26.5M 0 space 70.5G 29.0G380 0 22.9M 0 space 70.5G 29.0G411 0 18.8M 0 space 70.5G 29.0G393 0 20.7M 0 space 70.5G 29.0G302 0 15.0M 0 space 70.5G 29.0G267 0 15.6M 0 space 70.5G 29.0G304 0 18.7M 0 space 70.5G 29.0G534 0 19.7M 0 space 70.5G 29.0G339 0 17.0M 0 3 jobs == capacity operationsbandwidth pool used avail read write read write -- - - - - - - space 70.5G 29.0G530 0 22.9M 0 space 70.5G 29.0G428 0 16.3M 0 space 70.5G 29.0G439 0 16.4M 0 space 70.5G 29.0G511 0 22.1M 0 space 70.5G 29.0G464 0 17.9M 0 space 70.5G 29.0G371 0 12.1M 0 space 70.5G 29.0G447 0 16.5M 0 space 70.5G 29.0G379 0 15.5M 0 4jobs == capacity operationsbandwidth pool used avail read write read write -- - - - - - - space 70.5G 29.0G434 0 22.0M 0 space 70.5G 29.0G506 0 29.5M 0 space 70.5G 29.0G424 0 21.3M 0 space 70.5G 29.0G643 0 36.0M 0 space 70.5G 29.0G688 0 31.1M 0 space 70.5G 29.0G726 0 37.6M 0 space 70.5G 29.0G652 0 24.8M 0 space 70.5G 29.0G646 0 33.9M 0 5jobs == capacity operationsbandwidth pool used avail read write read write -- - - - - - - space 70.5G 29.0G629 0 31.1M 0 space 70.5G 29.0G774 0 45.8M 0 space 70.5G 29.0G815 0 39.8M 0 space 70.5G 29.0G895 0 44.4M 0 space 70.5G 29.0G800 0 48.1M 0 space 70.5G 29.0G857 0 51.8M 0 space 70.5G 29.0G725 0 47.6M 0 6jobs == capacity operationsbandwidth pool used avail read write read write -- - - - - - - space 70.5G 29.0G924 0 58.8M 0 space 70.5G 29.0G767 0 51.8M 0 space 70.5G 29.0G862 0 48.4M 0 space 70.5G 29.0G977 0 43.9M 0 space 70.5G 29.0G954 0 53.7M 0 space 70.5G 29.0G903 0 48.3M 0 # uptime 2:19pm up 15 min(s), 2 users, load average: 1.44, 1.10, 0.67 26MB ( 1 minute average) on each iSCSI ethernet port 12jobs == capacity operationsbandwidth pool used avail read write read write -- - - - - -
Re: [zfs-discuss] zfs fragmentation
On Fri, 2009-08-07 at 19:33, Richard Elling wrote: This is very unlikely to be a fragmentation problem. It is a scalability problem and there may be something you can do about it in the short term. You could be right. Out test mail server consists of the exact same design, same hardware (SUN4V) but in a smaller configuration (less memory and 4 x 25g san luns) has a backup/copy thoughput of 30GB/hour. Data used for testing was copied from our production mail server. Adding another pool and copying all/some data over to it would only a short term solution. I'll have to disagree. What is the point of a filesystem the can grow to such a huge size and not have functionality built in to optimize data layout? Real world implementations of filesystems that are intended to live for years/decades need this functionality, don't they? Our mail system works well, only the backup doesn't perform well. All the features of ZFS that make reads perform well (prefetch, ARC) have little effect. We think backup is quite important. We do quite a few restores of months old data. Snapshots help in the short term, but for longer term restores we need to go to tape. Of course, as you can tell, I'm kinda stuck on this idea that file and directory fragmentation is causing our issues with the backup. I don't know how to analyze the pool to better understand the problem. If we did chop the pool up into lets say 7 pools (one for each current filesystem) then over time these 7 pools would grow and we would end up with the same issues. Thats why it seems to me to be a short term solution. If our issues with zfs are scalability then you could say zfs is not scalable. Is that true? (It certianly is if the solution is too create more pools!). -- Ed ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs fragmentation
On Sat, 2009-08-08 at 09:17, Bob Friesenhahn wrote: Many of us here already tested our own systems and found that under some conditions ZFS was offering up only 30MB/second for bulk data reads regardless of how exotic our storage pool and hardware was. Just so we are using the same units of measurements. Backup/copy throughput on our development mail server is 8.5MB/sec. The people running our backups would be over joyed with that performance. However backup/copy throughput on our production mail server is 2.25 MB/sec. The underlying disk is 15000 RPM 146GB FC drives. Our performance may be hampered somewhat because the luns are on a Network Appliance accessed via iSCSI, but not to the extent that we are seeing, and it does not account for the throughput difference in the development and production pools. When I talk about fragmentation its not in the normal sense. I'm not talking about blocks in a file not being sequential. I'm talking about files in a single directory that end up spread across the entire filesytem/pool. My problem right now is diagnosing the performance issues. I can't address them without understanding the underlying cause. There is a lack of tools to help in this area. There is also a lack of acceptance that I'm actually having a problem with zfs. Its frustrating. Anyone know how significantly increase the performance of a zfs filesystem without causing any downtime to an Enterprise email system used by 30,000 intolerant people, when you don't really know what is causing the performance issues in the first place? (Yeah, it sucks to be me!) -- Ed ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs fragmentation
On Sat, 2009-08-08 at 08:14, Mattias Pantzare wrote: Your scalability problem may be in your backup solution. We've eliminated the backup system as being involved with the performance issues. The servers are Solaris 10 with the OS on UFS filesystems. (In zfs terms, the pool is old/mature). Solaris has been patched to a fairly current level. Copying data from the zfs filesystem to the local ufs filesystem enjoys the same throughput as the backup system. The test was simple. Create a test filesystem on the zfs pool. Restore production email data to it. Reboot the server. Backup the data (29 minutes for a 15.8 gig of data). Reboot the server. Copy data from zfs to ufs using a 'cp -pr ...' command, which also took 29 minutes. And if anyone is interested it only took 15 minutes to restore (write) the 15.8GB of data over the network. -- Ed ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs fragmentation
On Sat, 2009-08-08 at 15:12, Mike Gerdts wrote: The DBA's that I know use files that are at least hundreds of megabytes in size. Your problem is very different. Yes, definitely. I'm relating records in a table to my small files because our email system treats the filesystem as a database. And in the back of my mind I'm also thinking that you have to rebuild/repair the database once in a while to improve performance. And in my case, since the filesystem is the database, I want to do that to zfs! At least thats what I'm thinking, however, and I always come back to this, I'm not certian what is causing my problem. I need certainty before taking action on the production system. -- Ed ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs fragmentation
On Sat, 2009-08-08 at 15:20, Bob Friesenhahn wrote: A SSD slog backed by a SAS 15K JBOD array should perform much better than a big iSCSI LUN. Now...yes. We implemented this pool years ago. I believe, then, the server would crash if you had a zfs drive fail. We decided to let the netapp handle the disk redundency. Its worked out well. I've looked at those really nice Sun products adoringly. And a 7000 series appliance would also be a nice addition to our central NFS service. Not to mention more cost effective than expanding our Network Appliance (We have researchers who are quite hungry for storage and NFS is always our first choice). We now have quite an investment in the current implementation. Its difficult to move away from. The netapp is quite a reliable product. We are quite happy with zfs and our implementation. We just need to address our backup performance and improve it just a little bit! We were almost lynched this spring because we encountered some pretty severe zfs bugs. We are still running the IDR named A wad of ZFS bug fixes for Solaris 10 Update 6. It took over a month to resolve the issues. I work at a University and Final Exams and year end occur at the same time. I don't recommend having email problems during this time! People are intolerant to email problems. I live in hope that a Netapp OS update, or a solaris patch, or a zfs patch, or a iscsi patch , or something will come along that improves our performance just a bit so our backup people get off my back! -- Ed ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs fragmentation
On Sat, 2009-08-08 at 15:05, Mike Gerdts wrote: On Sat, Aug 8, 2009 at 12:51 PM, Ed Spencered_spen...@umanitoba.ca wrote: On Sat, 2009-08-08 at 09:17, Bob Friesenhahn wrote: Many of us here already tested our own systems and found that under some conditions ZFS was offering up only 30MB/second for bulk data reads regardless of how exotic our storage pool and hardware was. Just so we are using the same units of measurements. Backup/copy throughput on our development mail server is 8.5MB/sec. The people running our backups would be over joyed with that performance. However backup/copy throughput on our production mail server is 2.25 MB/sec. The underlying disk is 15000 RPM 146GB FC drives. Our performance may be hampered somewhat because the luns are on a Network Appliance accessed via iSCSI, but not to the extent that we are seeing, and it does not account for the throughput difference in the development and production pools. NetApp filers run WAFL - Write Anywhere File Layout. Even if ZFS arranged everything perfrectly (however that is defined) WAFL would undo its hard work. Since you are using iSCSI, I assume that you have disabled the Nagle algorithm and increased tcp_xmit_hiwat and tcp_recv_hiwat. If not, go do that now. We've tried many different iscsi parameter changes on our development server: Jumbo Frames Disabling the Nagle I'll double check next week on tcp_xmit_hiwat and tcp_recv_hiwat. Nothing has made any real difference. We are only using about 5% of the bandwidth on our IPSan. We use two cisco ethernet switches on the IPSAN. The iscsi initiators use MPXIO in a round robin configuration. When I talk about fragmentation its not in the normal sense. I'm not talking about blocks in a file not being sequential. I'm talking about files in a single directory that end up spread across the entire filesytem/pool. It's tempting to think that if the files were in roughly the same area of the block device that ZFS sees that reading the files sequentially would at least trigger a read-ahead at the filer. I suspect that even a moderate amount of file creation and deletion would cause the I/O pattern to be random enough (not purely sequential) that the back-end storage would not have a reasonable chance of recognizing it as a good time for read-ahead. Further, since the backup application is probably in a loop of: while there are more files in the directory if next file mtime last backup time open file read file contents, send to backup stream close file end if end while In other words, other I/O operations are interspersed between the sequential data reads, some files are likely to be skipped, and there is latency introduced by writing to the data stream. I would be surprised to see any file system do intelligent read-ahead here. In other words, lots of small file operations make backups and especially restores go slowly. More backup and restore streams will almost certainly help. Multiplex the streams so that you can keep your tapes moving at a constant speed. We backup to disk first and then put to tape later. Do you have statistics on network utilization to ensure that you aren't stressing it? Have you looked at iostat data to be sure that you are seeing asvc_t + wsvc_t that supports the number of operations that you need to perform? That is if asvc_t + wsvc_t for a device adds up to 10 ms, a workload that waits for the completion of one I/O before issuing the next will max out at 100 iops. Presumably ZFS should hide some of this from you[1], but it does suggest that each backup stream would be limited to about 100 files per second[2]. This is because the read request for one file does not happen before the close of the previous file[3]. Since cyrus stores each message as a separate file, this suggests that 2.5 MB/s corresponds to average mail message size of 25 KB. 1. via metadata caching, read-ahead on file data reads, etc. 2. Assuming wsvc_t + asvc_t = 10 ms 3. Assuming that networker is about as smart as tar, zip, cpio, etc. There is a backup of a single filesystem in the pool going on right now: # zpool iostat 5 5 capacity operationsbandwidth pool used avail read write read write -- - - - - - - space 1.05T 965G 97 69 5.24M 2.71M space 1.05T 965G113 10 6.41M 996K space 1.05T 965G100112 2.87M 1.81M space 1.05T 965G112 8 2.35M 35.9K space 1.05T 965G106 3 1.76M 55.1K Here are examples : iostat -xpn 5 5 extended device statistics r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 17.1 29.2 746.7 317.1 0.0 0.60.0 12.5 0 27 c4t60A98000433469764E4A2D456A644A74d0 25.0 11.9 991.9 277.0 0.0 0.60.0 16.1 0 36
Re: [zfs-discuss] Fwd: zfs fragmentation
On Sat, 2009-08-08 at 17:25, Mike Gerdts wrote: ndd -get /dev/tcp tcp_xmit_hiwat ndd -get /dev/tcp tcp_recv_hiwat grep tcp-nodelay /kernel/drv/iscsi.conf # ndd -get /dev/tcp tcp_xmit_hiwat 2097152 # ndd -get /dev/tcp tcp_recv_hiwat 2097152 # grep tcp-nodelay /kernel/drv/iscsi.conf # While backups are running (which is probably all the time given the backup rate) # look at service times iostat -xzn 10 Oh crap. Look like there are no backup jobs running right now. It must have just ended. # is networker cpu bound? No. The server is barely tasked by either the email system or networker. prstat -mL Some indication of how many backup jobs run concurrently would probably help frame any future discussion. I'll get more info on the backups next week when the full backups run. -- Ed ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs fragmentation
Let me give a real life example of what I believe is a fragmented zfs pool. Currently the pool is 2 terabytes in size (55% used) and is made of 4 san luns (512gb each). The pool has never gotten close to being full. We increase the size of the pool by adding 2 512gb luns about once a year or so. The pool has been divided into 7 filesystems. The pool is used for imap email data. The email system (cyrus) has approximately 80,000 accounts all located within the pool, evenly distributed between the filesystems. Each account has a directory associated with it. This directory is the users inbox. Additional mail folders are subdirectories. Mail is stored as individual files. We receive mail at a rate of 0-20MB/Second, every minute of every hour of every day of every week, etc etc. Users recieve mail constantly over time. They read it and then either delete it or store it in a subdirectory/folder. I imagine that my mail (located in a single subdirectory structure) is spread over the entire pool because it has been received over time. I believe the data is highly fragmented (from a file and directory perspective). The result of this is that backup thoughput of a single filesystem in this pool is about 8GB/hour. We use EMC networker for backups. This is a problem. There are no utilities available to evaluate this type of fragmentation. There are no utilities to fix it. ZFS, from the mail system perspective works great. Writes and random reads operate well. Backup is a problem and not just because of small files, but small files scatterred over the entire pool. Adding another pool and copying all/some data over to it would only a short term solution. I believe zfs needs a feature that operates in the background and defrags the pool to optimize sequential reads of the file and directory structure. Ed -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Split responsibility for data with ZFS
I find this thread both interesting and disturbing. I'm fairly new to this list so please excuse me if my comments/opinions are simplistic or just incorrect. I think there's been to much FC SAN bashing so let me change the example. What if you buy a 7000 Series server (complete with zfs) and setup an IP SAN. You create a LUN and share it out to a Solaris 10 host. On the solaris host you create a ZFS pool with that iscsi LUN. Now my undersatnding is that you will not be able to correct errors on the zpool of the Solaris10 machine because zfs on the solaris 10 machine is not doing the raid. Another example would be if you were sharing out a lun to a vmware server, from your iscsi san or fc san, and creating solaris 10 virtual machines, with zfs booting. Another example would be Solaris 10 booting a zfs filesystem from a hardware mirrored pair of drives. Now these are examples of standard implementations of machines in a datacenter, specifically ones I have installed. From following this thread I now feel that if I have uncorrectable data errors on the zfs pools there will be no way to easily repair the pool. I see no reason that if I do detect errors as I scrub the zfs pool that I should be able to run a simple utility to fix the pools as I would a ufs filesystem and then recover the corrupted files from tape. I believe that for zfs to be used as a general purpose filesystem that there has to be support built into zfs to support these standard data center implementations, otherwise it will just become a specialized filesystem, like Netapp's WAFL, and there are alot more servers than storage appliances in the datacenter. I think this thread has put zfs in a negative light. I don't actually believe that I will experience many of these problems in an Enterprise class data center, but still I don't look forward to having to deal with the consequences of encountering these types of problems. Maybe zfs is not ready to be considered a general purpose filesystem. -- Ed Spencer ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS+NFS4 strange timestamps on file creation
Yes, I've seen them on nfs filesystems on solaris10 using a Netapp nfs server. Here's a link to a solution that I just implemented on a solaris10 server: https://equoria.net/index.php/Value_too_large_for_defined_data_type On Thu, 2008-12-04 at 15:31, Scott Williamson wrote: Has anyone seen files created on a linux client with negative or zero creation timestamps on zfs+nfs exported datasets? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- Ed Spencer http://home.cc.umanitoba.ca/~fastedy UNIX System Administrator Academic Computing and Networking EMail: [EMAIL PROTECTED] The University of Manitoba Telephone: (204) 474-8311 Winnipeg, Manitoba, Canada R3T 2N2 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss