[zfs-discuss] How to set zfs:zfs_recover=1 and aok=1 in GRUB at startup?
I can't edit now my /etc/system file because system is not booting. Is there a way to force this parameters to Solaris kernel on booting with Grub? -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] How to set zfs:zfs_recover=1 and aok=1 in GRUB at startup?
If I launch opensolaris with -kd I'm able to do this: aok/W 1 but if I type: zfs_recover/W 1 then I get an unkown symbol name error. Any idea how I could force this variables? -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] How to set zfs:zfs_recover=1 and aok=1 in GRUB at startup?
when I execute ::load zfs I get kernel panic because of this $...@#(*($...@# space_map_add problem. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] resilver = defrag?
From: Haudy Kazemi [mailto:kaze0...@umn.edu] With regard to multiuser systems and how that negates the need to defragment, I think that is only partially true. As long as the files are defragmented enough so that each particular read request only requires one seek before it is time to service the next read request, further defragmentation may offer only marginal benefit. On the other Here's a great way to quantify how much fragmentation is acceptable: Suppose you want to ensure at least 99% efficiency of the drive. At most 1% time wasted by seeking. Suppose you're talking about 7200rpm sata drives, which sustain 500Mbit/s transfer, and have average seek time 8ms. 8ms is 1% of 800ms. In 800ms, the drive could read 400 Mbit of sequential data. That's 40 MB So as long as the fragment size of your files are approx 40 MB or larger, then fragmentation has a negligible effect on performance. One seek per every 40MB read/written will yield less than 1% performance impact. For the heck of it, let's see how that would have computed with 15krpm SAS drives. Sustained transfer 1Gbit/s, and average seek 3.5ms 3.5ms is 1% of 350ms In 350ms, the drive could read 350 Mbit (call it 43MB) That's certainly in the ballpark of 40 MB. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] resilver = defrag?
From: Richard Elling [mailto:rich...@nexenta.com] With appropriate write caching and grouping or re-ordering of writes algorithms, it should be possible to minimize the amount of file interleaving and fragmentation on write that takes place. To some degree, ZFS already does this. The dynamic block sizing tries to ensure that a file is written into the largest block[1] Yes, but the block sizes in question are typically up to 128K. As computed in my email 1 minute ago ... The fragment size needs to be on the order of 40 MB in order to effectively eliminate performance loss of fragmentation. Also, ZFS has an intelligent prefetch algorithm that can hide some performance aspects of defragmentation on HDDs. Unfortunately, prefetch can only hide fragmentation on systems that have idle disk time. Prefetch isn't going to help you if you actually need to transfer a whole file as fast as possible. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] resilver = defrag?
Richard Elling wote: Define fragmentation? Maybe this is the wrong thread. I have noticed that an old pool can take 4 hours to scrub, with a large portion of the time reading from the pool disks at the rate of 150+ MB/s but zpool iostat reports 2 MB/s read speed. My naive interpretation is that the data scrub is looking for has become fragmented. Should I refresh the pool by zfs sending it to another pool then zfs receiving the data back again, the same scrub can take less than an hour with zpool iostat reporting more sane throughput. On an old pool which had lots of snapshots come and go, the scrub throughput is awful. On that same data, refreshed via zfs send/receive, the throughput much better. It would appear to me that this is an artifact of fragmentation, although I have nothing scientific on which to base this. Additional unscientific observations leads me to believe these same refreshed pools also perform better for non-scrub activities. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] dedicated ZIL/L2ARC
We are looking into the possibility of adding a dedicated ZIL and/or L2ARC devices to our pool. We are looking into getting 4 – 32GB Intel X25-E SSD drives. Would this be a good solution to slow write speeds? We are currently sharing out different slices of the pool to windows servers using comstar and fibrechannel. We are currently getting around 300MB/sec performance with 70-100% disk busy. Opensolaris snv_134 Dual 3.2GHz quadcores with hyperthreading 16GB ram Pool_1 – 18 raidz2 groups with 5 drives a piece and 2 hot spares Disks are around 30% full No dedup -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] dedicated ZIL/L2ARC
On Tue, Sep 14, 2010 at 06:59:07AM -0700, Wolfraider wrote: We are looking into the possibility of adding a dedicated ZIL and/or L2ARC devices to our pool. We are looking into getting 4 – 32GB Intel X25-E SSD drives. Would this be a good solution to slow write speeds? We are currently sharing out different slices of the pool to windows servers using comstar and fibrechannel. We are currently getting around 300MB/sec performance with 70-100% disk busy. Opensolaris snv_134 Dual 3.2GHz quadcores with hyperthreading 16GB ram Pool_1 – 18 raidz2 groups with 5 drives a piece and 2 hot spares Disks are around 30% full No dedup It'll probably help. I'd get two X-25E's for ZIL (and mirror them) and one or two of Intel's lower end X-25M for L2ARC. There are some SSD devices out there with a super-capacitor and significantly higher IOPs ratings than the X-25E that might be a better choice for a ZIL device, but the X-25E is a solid drive and we have many of them deployed as ZIL devices here. Ray ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] How to set zfs:zfs_recover=1 and aok=1 in GRUB at startup?
Here is the solution (thanks to Gavin Maltby from mdb forum): Boot with -kd option to enter in kmdb and type the following commands: aok/W 1 ::bp zfs`zfs_panic_recover :c wait that it stops at breakpoint then type this: zfs_recover/W1 :z :c -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] dedicated ZIL/L2ARC
Cool, we can get the Intel X25-E's for around $300 a piece from HP with the sled. I don't see the X25-M available so we will look at 4 of the X25-E's. Thanks :) -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] dedicated ZIL/L2ARC
On Tue, Sep 14, 2010 at 08:08:42AM -0700, Ray Van Dolson wrote: On Tue, Sep 14, 2010 at 06:59:07AM -0700, Wolfraider wrote: We are looking into the possibility of adding a dedicated ZIL and/or L2ARC devices to our pool. We are looking into getting 4 ??? 32GB Intel X25-E SSD drives. Would this be a good solution to slow write speeds? We are currently sharing out different slices of the pool to windows servers using comstar and fibrechannel. We are currently getting around 300MB/sec performance with 70-100% disk busy. Opensolaris snv_134 Dual 3.2GHz quadcores with hyperthreading 16GB ram Pool_1 ??? 18 raidz2 groups with 5 drives a piece and 2 hot spares Disks are around 30% full No dedup It'll probably help. I'd get two X-25E's for ZIL (and mirror them) and one or two of Intel's lower end X-25M for L2ARC. There are some SSD devices out there with a super-capacitor and significantly higher IOPs ratings than the X-25E that might be a better choice for a ZIL device, but the X-25E is a solid drive and we have many of them deployed as ZIL devices here. I thought Intel SSDs didn't respect CACHE FLUSH command and thus are subject to ZIL corruption if the server crashes or runs out of electricity? -- Pasi ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] What is the 1000 bit?
I recently created a test zpool (RAIDZ) on some iSCSI shares. I made a few test directories and files. When I do a listing, I see something I've never seen before: [r...@hostname anewdir] # ls -la total 6160 drwxr-xr-x 2 root other 4 Sep 14 14:16 . drwxr-xr-x 4 root root 5 Sep 14 15:04 .. -rw--T 1 root other2097152 Sep 14 14:16 barfile1 -rw--T 1 root other1048576 Sep 14 14:16 foofile1 I looked up the T bit in the man page for ls, and it says that T means The 1000 bit is turned on, and execution is off (undefined bit-state). Which is as clear as mud. I've googled around a lot but still can't find any real info about what this menas. I've been doing unix for a long time and have never seen it. Can anyone explain, or at least tell me if I should worry? Thanks. -- Learn more about Merchant Link at www.merchantlink.com. THIS MESSAGE IS CONFIDENTIAL. This e-mail message and any attachments are proprietary and confidential information intended only for the use of the recipient(s) named above. If you are not the intended recipient, you may not print, distribute, or copy this message or any attachments. If you have received this communication in error, please notify the sender by return e-mail and delete this message and any attachments from your computer. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] What is the 1000 bit?
On Tue, Sep 14, 2010 at 04:13:31PM -0400, Linder, Doug wrote: I recently created a test zpool (RAIDZ) on some iSCSI shares. I made a few test directories and files. When I do a listing, I see something I've never seen before: [r...@hostname anewdir] # ls -la total 6160 drwxr-xr-x 2 root other 4 Sep 14 14:16 . drwxr-xr-x 4 root root 5 Sep 14 15:04 .. -rw--T 1 root other2097152 Sep 14 14:16 barfile1 -rw--T 1 root other1048576 Sep 14 14:16 foofile1 I looked up the T bit in the man page for ls, and it says that T means The 1000 bit is turned on, and execution is off (undefined bit-state). Which is as clear as mud. It's the sticky bit. Nowadays it's only useful on directories, and really it's generally only used with 777 permissions. The chmod(1) (man -M/usr/man chmod) and chmod(2) (man -s 2 chmod) manpages describe the sticky bit. Nico -- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] resilver = defrag?
The difference between multi-user thinking and single-user thinking is really quite dramatic in this area. I came up the time-sharing side (PDP-8, PDP-11, DECSYSTEM-20); TOPS-20 didn't have any sort of disk defragmenter, and nobody thought one was particularly desirable, because the normal access pattern of a busy system was spread all across the disk packs anyway. On a desktop workstation, it makes some sense to think about loading big executable files fast -- that's something the user is sitting there waiting for, and there's often nothing else going on at that exact moment. (There *could* be significant things happening in the background, but quite often there aren't.) Similarly, loading a big document (single-file book manuscript, bitmap image, or whatever) happens at a point where the user has requested it and is waiting for it right then, and there's mostly nothing else going on. But on really shared disk space (either on a timesharing system, or a network file server serving a good-sized user base), the user is competing for disk activity (either bandwidth or IOPs, depending on the access pattern of the users). Generally you don't get to load your big DLL in one read -- and to the extent that you don't, it doesn't matter much how it's spread around the disk, because the head won't be in the same spot when you get your turn again. -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Unwanted filesystem mounting when using send/recv
I am looking at backing up my fileserver by replicating the filesystems onto an external disk using send/recv with something similar to: zfs send ... myp...@snapshot | zfs recv -d backup but have run into a bit of a gotcha with the mountpoint property: - If I use zfs send -R ... then the mountpoint gets replicated and the backup gets mounted over the top of my real filesystems. - If I skip the '-R' then none of the properties get backed up. Is there some way to have zfs recv not automatically mount filesystems when it creates them? -- Peter Jeremy pgpOliK2tC1Vs.pgp Description: PGP signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Unwanted filesystem mounting when using send/recv
On 09/15/10 12:56 PM, Peter Jeremy wrote: I am looking at backing up my fileserver by replicating the filesystems onto an external disk using send/recv with something similar to: zfs send ... myp...@snapshot | zfs recv -d backup but have run into a bit of a gotcha with the mountpoint property: - If I use zfs send -R ... then the mountpoint gets replicated and the backup gets mounted over the top of my real filesystems. - If I skip the '-R' then none of the properties get backed up. Is there some way to have zfs recv not automatically mount filesystems when it creates them? Use -u with zfs receive. -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Unwanted filesystem mounting when using send/recv
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 On 2010/09/14 17:56, Peter Jeremy wrote: I am looking at backing up my fileserver by replicating the filesystems onto an external disk using send/recv with something similar to: zfs send ... myp...@snapshot | zfs recv -d backup but have run into a bit of a gotcha with the mountpoint property: - If I use zfs send -R ... then the mountpoint gets replicated and the backup gets mounted over the top of my real filesystems. - If I skip the '-R' then none of the properties get backed up. Is there some way to have zfs recv not automatically mount filesystems when it creates them? zfs receive have a '-u' option to specify that no mount should be done. By the way it might be a good idea not to specify mountpoint at the sending site, which makes replication easier (one way is to have the topmost layer mountpoint=/ but canmount=off). Cheers, - -- Xin LI delp...@delphij.nethttp://www.delphij.net/ FreeBSD - The Power to Serve! Live free or die -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.16 (FreeBSD) iQEcBAEBCAAGBQJMkB5AAAoJEATO+BI/yjfB1QkIAIJ8IsZYkdUAH5ciqzVN/JPM Kvoc4Thk2YyVixBlh7ev3q40+EHOKRxr3GtNXNBjN6K3YqcKlVXVWK4ntU08RnwL f5bm2JQJoDjA/z2J+mDVmtsbI4kG9TavaOou9f7Bek9ql/UFowH48dMFTf0klR/3 /S1GtLoLma3eCOwxKPjy1gEj+EcxXB2C6Ip116y1MnNxlGXe80i+tGVFfAAwO416 1EGDJcvs3wDvU9s1/F9VZS4LSadEVOUkWfSLKa8toaB8GKhwWNIP0ZK2jSxPFRyN PryJOgE+N8tBGEce4TtMtouZ8wM/dPL0dB86YFk4OjAkkx4uNoY7PhWLMMveCdY= =TvJl -END PGP SIGNATURE- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] dedicated ZIL/L2ARC
From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- boun...@opensolaris.org] On Behalf Of Wolfraider We are looking into the possibility of adding a dedicated ZIL and/or L2ARC devices to our pool. We are looking into getting 4 – 32GB Intel X25-E SSD drives. Would this be a good solution to slow write speeds? If you have slow write speeds, a dedicated log device might help. (log devices are for writes, not for reads.) It sounds like your machine is an iscsi target. In which case, you're certainly doing a lot of sync writes, and therefore hitting your ZIL hard. So it's all but certain adding dedicated log devices will help. One thing to be aware of: Once you add dedicated log, *all* of your sync writes will hit that log device. While a single SSD or pair of SSD's will have fast IOPS, they can easily become a new bottleneck with worse performance than what you had before ... If you've got 80 spindle disks now, and by any chance, you perform sequential sync writes, then a single pair of SSD's won't compete. I'd suggest adding several SSD's for log devices, and no mirroring. Perhaps one SSD for every raidz2 vdev, or every other, or every third, depending on what you can afford. If you have slow reads, l2arc cache might help. (cache devices are for read, not write.) We are currently sharing out different slices of the pool to windows servers using comstar and fibrechannel. We are currently getting around 300MB/sec performance with 70-100% disk busy. You may be facing some other problem, aside from just having cache/log devices. I suggest giving us some more detail here. Such as ... Large sequential operations are good on raidz2. But if you're performing random IO, that performs pretty poor on raidz2. What sort of network are you using? I know you said comstar and fibrechannel, and sharing slices to windows ... I assume this means you're doing iscsi, right? Dual 4Gbit links per server? You're getting 2.4 Gbit and you expect what? You have a pool made up of 18 raidz2 vdev's with 5 drives each (capacity of 3 disks each) ... Is each vdev on its own bus? What type of bus is it? (Generally speaking, it is preferable to spread vdev's across buses, instead of making 1vdev on 1 bus, for reliability purposes) ... How many disks, of what type, on each bus? What type of bus, at what speed? What are the usage characteristics, how are you making your measurement? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss