[zfs-discuss] Any experience with an OCZ Z-Drive R2 with ZFS
After a rather fruitless non-committal exchange with OCZ, I'd like to know if there is any experience in this community with the OCZ Z-Drive... In particular, is it possible (and worthwhile) to put the device in jbod as opposed to raid-0 mode... an entry-level flashfire f20 'sort' of card... FYI the controller card is an LSI SAS1068e... It would appear interesting, if feasible, to create a ZFS mirrored boot drive. The additional devices (mirrored or not) might conveniently serve as data or cache devices. Anybody have one that has tested this or would be willing to? (even a first generation card for that matter). Thanks in advance -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] CR# 6574286, remove slog device
Did the fix for 6733267 make it to 134a (2010.05)? It isn't marked fixed, and i couldn't find it anywhere in the changelogs. Does that mean we'll have to wait for 2010.11 (or whatever v+2 is named)? Thanks, Moshe -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS and Comstar iSCSI BLK size
On Mon, May 10, 2010 at 3:53 PM, Geoff Nordli wrote: > Doesn't this alignment have more to do with aligning writes to the > stripe/segment size of a traditional storage array? The articles I am It is a lot like a stripe / segment size. If you want to think of it in those terms, you've got a segment of 512b (the iscsi block size) and a width of 16, giving you an 8k stripe size. Any write that is less than 8k will require a RMW cycle, and any write in multiples of 8k will do "full stripe" writes. If the write doesn't start on an 8k boundary, you risk having writes span multiple underlying zvol blocks. There's an explanation of WD's "Advanced Format" at Anandtech that describes the problem with 4k physical sectors, here http://www.anandtech.com/show/2888. Instead of sector, think zvol block though. When using a zvol, you've essentially got $volblocksize sized physical sectors, but the initiator sees the 512b block size that the LUN is reporting. If you don't block align, you risk having a write straddle two zfs blocks. There may be some benefit to using a 4k volblocksize, but you'll use more time and space on block checksums and, etc in your zpool. I think 8k is a reasonable trade off. > reading suggests creating a small unused partition to take up the space up > to 127bytes (assuming 128byte segment), then create the real partition from > the 128th sector going forward. I am not sure how this would happen with > zfs. If you're using the whole disk with zfs, you don't need to worry about it. If you're using fdisk partitions or slices, you need be a little more careful. I made an attempt to 4k block align the SSD that I'm using for a slog / L2ARC, which in theory should line up better with the devices erase boundary. While not really pertinent to this discussion it gives some idea on how to do it. You want the filesystem to start at a point where ( $offset * $sector_size * $sectors_per_cylinder ) % 4096 = 0. For most LBA drives, you've got 16065 sectors/cylinder and 512b sectors, giving 8 as the smallest offset that will align. ( 8 * 512 * 16065 ) % 4096 = 0 First you have to look at fdisk (on an SMI labeled disk) and realize that you're going to lose the first cylinder to the MBR. When you then create slices in format, it'll report one cylinder less than fdisk did, so remember to account for that in your offset. For an iscsi LUN used by a VM, you should align its filesystem on a zvol block boundary. Windows Vista and Server 2008 use 240 heads & 63 sectors/track, so they are already 8k block aligned. Linux, Solaris, and BSD also let you specify the geometry used by fdisk, but I wasn't comfortable doing it with Solaris since you have to create a geometry file first. For my 30GB OCZ Vertex: bh...@basestar:~$ pfexec fdisk -W - /dev/rdsk/c1t0d0p0 * /dev/rdsk/c1t0d0p0 default fdisk table * Dimensions: *512 bytes/sector * 63 sectors/track *255 tracks/cylinder * 3892 cylinders [..] * IdAct Bhead Bsect BcylEhead Esect EcylRsect Numsect 191 128 0 1 1 25463 102316065 62508915 bh...@basestar:~$ pfexec prtvtoc /dev/rdsk/c1t0d0p0 * /dev/rdsk/c1t0d0p0 partition map * * Dimensions: * 512 bytes/sector * 63 sectors/track * 255 tracks/cylinder * 16065 sectors/cylinder *3891 cylinders *3889 accessible cylinders * * Flags: * 1: unmountable * 10: read-only * * Unallocated space: * First SectorLast * Sector CountSector * 0112455112454 *62428590 48195 62476784 * * First SectorLast * Partition Tag FlagsSector CountSector Mount Directory 0 400 112455 2056320 2168774 1 4012168775 60243750 62412524 2 501 0 62508915 62508914 8 101 0 16065 16064 -B -- Brandon High : bh...@freaks.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS and Comstar iSCSI BLK size
>-Original Message- >From: Brandon High [mailto:bh...@freaks.com] >Sent: Monday, May 10, 2010 3:12 PM > >On Mon, May 10, 2010 at 1:53 PM, Geoff Nordli wrote: >> You are right, I didn't look at that property, and instead I was >> focused on the record size property. > >zvols don't have a recordsize - That's a property of filesystem datasets, not >volumes. Awesome, that makes things a lot clearer now :) > >> When I look at the stmfadm llift-lu -v it shows me the block size >> of "512". I am running NexentaCore 3.0 (b134+) . I wonder if the >> default size has changed with different versions. > >I see what you're referring to. The iscsi block size, which is what the LUN reports >to initiator as it's block size, vs. the block size written to disk. So in essence this is the disk "sector" size, again makes sense. Are people actually changing this value? > > >> As long as you are using a multiple of the file system block size, >> then alignment shouldn't be a problem with iscsi based zvols. When >> using a zvol comstar stores the metadata in a zvol object; instead of >> the first part of the volume. > >There can be an "off by one" error which will cause small writes to span blocks. If >the data is not block aligned, then a 4k write causes two read/modify/writes (on >zfs two blocks have to be read then written and block pointers updated) whereas >an aligned write will not require the existing data to be read. This is assuming that >the zvol block size = VM fs block size = 4k. In the case where the zvol block size is >a multiple of the VM fs block size (eg 4k VM fs, 8k zvol), then writing one fs block >will alway require a read for an aligned filesystem, but could require two for an >unaligned fs if the VM fs block spans two zvol blocks. > >There's been a lot of discussion about this lately with the introduction of WD's 4k >sector drives, since they have a 512b sector emulation mode. > Doesn't this alignment have more to do with aligning writes to the stripe/segment size of a traditional storage array? The articles I am reading suggests creating a small unused partition to take up the space up to 127bytes (assuming 128byte segment), then create the real partition from the 128th sector going forward. I am not sure how this would happen with zfs. Thanks for clearing up my misconceptions. Geoff ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS and Comstar iSCSI BLK size
On Mon, May 10, 2010 at 1:53 PM, Geoff Nordli wrote: > You are right, I didn't look at that property, and instead I was focused on > the record size property. zvols don't have a recordsize - That's a property of filesystem datasets, not volumes. > When I look at the stmfadm llift-lu -v it shows me the block size of > "512". I am running NexentaCore 3.0 (b134+) . I wonder if the default size > has changed with different versions. I see what you're referring to. The iscsi block size, which is what the LUN reports to initiator as it's block size, vs. the block size written to disk. Remember that up until very recently, most drives used 512 byte blocks. Most OS expect a 512b block and make certain assumptions based on that, which is probably why it's the default. >>Ensuring that your VM is block-aligned to 4k (or the guest OS's block >>size) boundaries will help performance and dedup as well. > > This is where I am probably the most confused l need to get straightened in > my mind. I thought dedup and compression is done on the record level. It's at the record level for filesystems, block level for zvol. > As long as you are using a multiple of the file system block size, then > alignment shouldn't be a problem with iscsi based zvols. When using a zvol > comstar stores the metadata in a zvol object; instead of the first part of > the volume. There can be an "off by one" error which will cause small writes to span blocks. If the data is not block aligned, then a 4k write causes two read/modify/writes (on zfs two blocks have to be read then written and block pointers updated) whereas an aligned write will not require the existing data to be read. This is assuming that the zvol block size = VM fs block size = 4k. In the case where the zvol block size is a multiple of the VM fs block size (eg 4k VM fs, 8k zvol), then writing one fs block will alway require a read for an aligned filesystem, but could require two for an unaligned fs if the VM fs block spans two zvol blocks. There's been a lot of discussion about this lately with the introduction of WD's 4k sector drives, since they have a 512b sector emulation mode. > What is the relationship between iscsi blk size and zvol block size? There is none. iscsi block size is what the target LUN reports to initiators. volblocksize is what size chunks are written to the pool. > What is the relationship between zvol block size and zvol record size? They are never both present on a dataset. volblocksize is only for volumes, recordsize is only for filesystems. Both control the size of the unit of data written to the pool. This unit of data is what the checksum is calculated on, and what the compression and dedup are performed on. -B -- Brandon High : bh...@freaks.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] osol monitoring question
On 10/05/2010 16:52, Peter Tribble wrote: For zfs, zpool iostat has some utility, but I find fsstat to be pretty useful. iostat, zpool iostat, fsstat - all of them are very useful and allow you to monitor I/O on different levels. And of course dtrace io, fsinfo and syscall providers are very useful at times. -- Robert Milkowski http://milek.blogspot.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zpool import hanging
On May 10, 2010, at 6:28 PM, Cindy Swearingen wrote: Hi Eduardo, Please use the following steps to collect more information: 1. Use the following command to get the PID of the zpool import process, like this: # ps -ef | grep zpool 2. Use the actual found in step 1 in the following command, like this: echo "0t::pid2proc|::walk thread|::findstack" | mdb -k Then, send the output. Hi Cindy, first of all, thank you for taking your time to answer my question. Here's the output of the command you requested: # echo "0t733::pid2proc|::walk thread|::findstack" | mdb -k stack pointer for thread 94e4db40: fe8000d3e5b0 [ fe8000d3e5b0 _resume_from_idle+0xf8() ] fe8000d3e5e0 swtch+0x12a() fe8000d3e600 cv_wait+0x68() fe8000d3e640 txg_wait_open+0x73() fe8000d3e670 dmu_tx_wait+0xc5() fe8000d3e6a0 dmu_tx_assign+0x38() fe8000d3e700 dmu_free_long_range_impl+0xe6() fe8000d3e740 dmu_free_long_range+0x65() fe8000d3e790 zfs_trunc+0x77() fe8000d3e7e0 zfs_freesp+0x66() fe8000d3e830 zfs_space+0xa9() fe8000d3e850 zfs_shim_space+0x15() fe8000d3e890 fop_space+0x2e() fe8000d3e910 zfs_replay_truncate+0xa8() fe8000d3e9b0 zil_replay_log_record+0x1ec() fe8000d3eab0 zil_parse+0x2ff() fe8000d3eb30 zil_replay+0xde() fe8000d3eb50 zfsvfs_setup+0x93() fe8000d3ebc0 zfs_domount+0x2e4() fe8000d3ecc0 zfs_mount+0x15d() fe8000d3ecd0 fsop_mount+0xa() fe8000d3ee00 domount+0x4d7() fe8000d3ee80 mount+0x105() fe8000d3eec0 syscall_ap+0x97() fe8000d3ef10 _sys_sysenter_post_swapgs+0x14b() The first message from this thread has three files attached with information from truss (tracing zpool import), zdb output and the entire list of threads taken from 'echo "::threadlist -v" | mdb -k'. Thanks, Eduardo Bragatto ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zpool import hanging
Hi Eduardo, Please use the following steps to collect more information: 1. Use the following command to get the PID of the zpool import process, like this: # ps -ef | grep zpool 2. Use the actual found in step 1 in the following command, like this: echo "0t::pid2proc|::walk thread|::findstack" | mdb -k Then, send the output. Thanks, Cindy On 05/10/10 14:22, Eduardo Bragatto wrote: On May 10, 2010, at 4:46 PM, John Balestrini wrote: Recently I had a similar issue where the pool wouldn't import and attempting to import it would essentially lock the server up. Finally I used pfexec zpool import -F pool1 and simply let it do it's thing. After almost 60 hours the imported finished and all has been well since (except my backup procedures have improved!). Hey John, thanks a lot for answering -- I already allowed the "zpool import" command to run from Friday to Monday and it did not complete -- I also made sure to start it using "truss" and literally nothing has happened during that time (the truss output file does not have anything new). While the "zpool import" command runs, I don't see any CPU or Disk I/O usage. "zpool iostat" shows very little I/O too: # zpool iostat -v capacity operationsbandwidth pool used avail read write read write - - - - - - backup31.4T 19.1T 11 2 29.5K 11.8K raidz1 11.9T 741G 2 0 3.74K 3.35K c3t102d0 - - 0 0 23.8K 1.99K c3t103d0 - - 0 0 23.5K 1.99K c3t104d0 - - 0 0 23.0K 1.99K c3t105d0 - - 0 0 21.3K 1.99K c3t106d0 - - 0 0 21.5K 1.98K c3t107d0 - - 0 0 24.2K 1.98K c3t108d0 - - 0 0 23.1K 1.98K raidz1 12.2T 454G 3 0 6.89K 3.94K c3t109d0 - - 0 0 43.7K 2.09K c3t110d0 - - 0 0 42.9K 2.11K c3t111d0 - - 0 0 43.9K 2.11K c3t112d0 - - 0 0 43.8K 2.09K c3t113d0 - - 0 0 47.0K 2.08K c3t114d0 - - 0 0 42.9K 2.08K c3t115d0 - - 0 0 44.1K 2.08K raidz1 3.69T 8.93T 3 0 9.42K610 c3t87d0 - - 0 0 43.6K 1.50K c3t88d0 - - 0 0 43.9K 1.48K c3t89d0 - - 0 0 44.2K 1.49K c3t90d0 - - 0 0 43.4K 1.49K c3t91d0 - - 0 0 42.5K 1.48K c3t92d0 - - 0 0 44.5K 1.49K c3t93d0 - - 0 0 44.8K 1.49K raidz1 3.64T 8.99T 3 0 9.40K 3.94K c3t94d0 - - 0 0 31.9K 2.09K c3t95d0 - - 0 0 31.6K 2.09K c3t96d0 - - 0 0 30.8K 2.08K c3t97d0 - - 0 0 34.2K 2.08K c3t98d0 - - 0 0 34.4K 2.08K c3t99d0 - - 0 0 35.2K 2.09K c3t100d0 - - 0 0 34.9K 2.08K - - - - - - Also, the third "raidz" entry shows less "write" in bandwidth (610). This is actually the first time it's a non-zero value. My last attempt to import it, was using this command: zpool import -o failmode=panic -f -R /altmount backup However it did not panic. As I mentioned in the first message, it mounts 189 filesystems and hangs on #190. While the command is hanging, I can use "zfs mount" to mount filesystems #191 and above (only one filesystem does not mount and causes the import procedure to hang). Before trying the command above, I was using only "zpool import backup", and the "iostat" output was showing ZERO for the third raidz from the list above (not sure if that means something, but it does look odd). I'm really on a dead end here, any help is appreciated. Thanks, Eduardo Bragatto. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS and Comstar iSCSI BLK size
>-Original Message- >From: Brandon High [mailto:bh...@freaks.com] >Sent: Monday, May 10, 2010 9:55 AM > >On Sun, May 9, 2010 at 9:42 PM, Geoff Nordli wrote: >> I am looking at using 8K block size on the zfs volume. > >8k is the default for zvols. > You are right, I didn't look at that property, and instead I was focused on the record size property. >> I was looking at the comstar iscsi settings and there is also a blk >> size configuration, which defaults as 512 bytes. That would make me >> believe that all of the IO will be broken down into 512 bytes which >> seems very inefficient. > >I haven't done any tuning on my comstar volumes, and they're using 8k blocks. >The setting is in the dataset's volblocksize parameter. When I look at the stmfadm llift-lu -v it shows me the block size of "512". I am running NexentaCore 3.0 (b134+) . I wonder if the default size has changed with different versions. > >> It seems this value should match the file system allocation/cluster >> size in the VM, maybe 4K if you are using an ntfs file system. > >You'll have more overhead using smaller volblocksize values, and get worse >compression (since compression is done on the block). If you have dedup >enabled, you'll create more entries in the DDT which can have pretty disastrous >consequences on write performance. > >Ensuring that your VM is block-aligned to 4k (or the guest OS's block >size) boundaries will help performance and dedup as well. This is where I am probably the most confused l need to get straightened in my mind. I thought dedup and compression is done on the record level. As long as you are using a multiple of the file system block size, then alignment shouldn't be a problem with iscsi based zvols. When using a zvol comstar stores the metadata in a zvol object; instead of the first part of the volume. As Roy pointed out, you have to be careful on the record size because DDT and L2ARC lists consuming lots of RAM. But it seems you have four things to look at: File system block size -> Iscsi blk size -> zvol block size -> zvol record size. What is the relationship between iscsi blk size and zvol block size? What is the relationship between zvol block size and zvol record size? Thanks, Geoff ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Time Slider in Solaris10
Oh.. thanks.. I did download the latest zfs-auto-snapshot: zfs-snapshot-0.11.2 Is there a more recent version? John Balestrini wrote: I believe that Time Slider is just a front end for zfs-auto-snapshot. John On May 10, 2010, at 1:17 PM, Mary Ellen Fitzpatrick wrote: Is Time Slider available in Solaris10? Or just in Opensolaris? I am running Solaris 10 5/09 s10x_u7wos_08 X86 and wanted to automate my snapshots. From reading blogs, seems zfs-auto-snapshot is obsolete and was/is being replaced by time-slider. But I can not seem to find it for Solaris10. I do have a script/cron job that will work, but wanted to test out Time Slider -- Thanks Mary Ellen ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- Thanks Mary Ellen Mary Ellen FitzPatrick Systems Analyst Bioinformatics Boston University 24 Cummington St. Boston, MA 02215 office 617-358-2771 cell 617-797-7856 mfitz...@bu.edu ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zpool import hanging
On May 10, 2010, at 4:46 PM, John Balestrini wrote: Recently I had a similar issue where the pool wouldn't import and attempting to import it would essentially lock the server up. Finally I used pfexec zpool import -F pool1 and simply let it do it's thing. After almost 60 hours the imported finished and all has been well since (except my backup procedures have improved!). Hey John, thanks a lot for answering -- I already allowed the "zpool import" command to run from Friday to Monday and it did not complete -- I also made sure to start it using "truss" and literally nothing has happened during that time (the truss output file does not have anything new). While the "zpool import" command runs, I don't see any CPU or Disk I/O usage. "zpool iostat" shows very little I/O too: # zpool iostat -v capacity operationsbandwidth pool used avail read write read write - - - - - - backup31.4T 19.1T 11 2 29.5K 11.8K raidz1 11.9T 741G 2 0 3.74K 3.35K c3t102d0 - - 0 0 23.8K 1.99K c3t103d0 - - 0 0 23.5K 1.99K c3t104d0 - - 0 0 23.0K 1.99K c3t105d0 - - 0 0 21.3K 1.99K c3t106d0 - - 0 0 21.5K 1.98K c3t107d0 - - 0 0 24.2K 1.98K c3t108d0 - - 0 0 23.1K 1.98K raidz1 12.2T 454G 3 0 6.89K 3.94K c3t109d0 - - 0 0 43.7K 2.09K c3t110d0 - - 0 0 42.9K 2.11K c3t111d0 - - 0 0 43.9K 2.11K c3t112d0 - - 0 0 43.8K 2.09K c3t113d0 - - 0 0 47.0K 2.08K c3t114d0 - - 0 0 42.9K 2.08K c3t115d0 - - 0 0 44.1K 2.08K raidz1 3.69T 8.93T 3 0 9.42K610 c3t87d0 - - 0 0 43.6K 1.50K c3t88d0 - - 0 0 43.9K 1.48K c3t89d0 - - 0 0 44.2K 1.49K c3t90d0 - - 0 0 43.4K 1.49K c3t91d0 - - 0 0 42.5K 1.48K c3t92d0 - - 0 0 44.5K 1.49K c3t93d0 - - 0 0 44.8K 1.49K raidz1 3.64T 8.99T 3 0 9.40K 3.94K c3t94d0 - - 0 0 31.9K 2.09K c3t95d0 - - 0 0 31.6K 2.09K c3t96d0 - - 0 0 30.8K 2.08K c3t97d0 - - 0 0 34.2K 2.08K c3t98d0 - - 0 0 34.4K 2.08K c3t99d0 - - 0 0 35.2K 2.09K c3t100d0 - - 0 0 34.9K 2.08K - - - - - - Also, the third "raidz" entry shows less "write" in bandwidth (610). This is actually the first time it's a non-zero value. My last attempt to import it, was using this command: zpool import -o failmode=panic -f -R /altmount backup However it did not panic. As I mentioned in the first message, it mounts 189 filesystems and hangs on #190. While the command is hanging, I can use "zfs mount" to mount filesystems #191 and above (only one filesystem does not mount and causes the import procedure to hang). Before trying the command above, I was using only "zpool import backup", and the "iostat" output was showing ZERO for the third raidz from the list above (not sure if that means something, but it does look odd). I'm really on a dead end here, any help is appreciated. Thanks, Eduardo Bragatto. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Time Slider in Solaris10
I believe that Time Slider is just a front end for zfs-auto-snapshot. John On May 10, 2010, at 1:17 PM, Mary Ellen Fitzpatrick wrote: > Is Time Slider available in Solaris10? Or just in Opensolaris? > I am running Solaris 10 5/09 s10x_u7wos_08 X86 and wanted to automate my > snapshots. From reading blogs, seems zfs-auto-snapshot is obsolete and was/is > being replaced by time-slider. But I can not seem to find it for Solaris10. > I do have a script/cron job that will work, but wanted to test out Time Slider > > -- > Thanks > Mary Ellen > > > > ___ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Time Slider in Solaris10
Is Time Slider available in Solaris10? Or just in Opensolaris? I am running Solaris 10 5/09 s10x_u7wos_08 X86 and wanted to automate my snapshots. From reading blogs, seems zfs-auto-snapshot is obsolete and was/is being replaced by time-slider. But I can not seem to find it for Solaris10. I do have a script/cron job that will work, but wanted to test out Time Slider -- Thanks Mary Ellen ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zpool import hanging
Howdy Eduardo, Recently I had a similar issue where the pool wouldn't import and attempting to import it would essentially lock the server up. Finally I used pfexec zpool import -F pool1 and simply let it do it's thing. After almost 60 hours the imported finished and all has been well since (except my backup procedures have improved!). Good luck! John On May 10, 2010, at 12:35 PM, Eduardo Bragatto wrote: > Hi again, > > As for the NFS issue I mentioned before, I made sure the NFS server was > working and was able to export before I attempted to import anything, then I > started a new "zpool import backup: -- my hope was that the NFS share was > causing the issue, since the only filesystem shared is the one causing the > problem, but that doesn't seem to be the case. > > I've done a lot of research and could not find a similar case to mine. The > most similar one I've found was this from 2008: > > http://opensolaris.org/jive/thread.jspa?threadID=70205&tstart=15 > > I simply can not import the pool although ZFS reports it as OK. > > In that old thread, the user was also having the "zpool import" hang issue, > however he was able to run these two commands (his pool was named data1): > > zdb -e -bb data1 > zdb -e - data1 > > While my system returns: > > # zdb -e -bb backup > zdb: can't open backup: File exists > # zdb -e -ddd backup > zdb: can't open backup: File exists > > Every documentation assumes you will be able to run "zpool import" before > troubleshooting, however my problem is exactly on that command. I don't even > know where to find more detailed documentation. > > I believe there's very knowledgeable people in this list. Could someone be > kind enough to take a look and at least point me in the right direction? > > Thanks, > Eduardo Bragatto. > ___ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zpool import hanging
Hi again, As for the NFS issue I mentioned before, I made sure the NFS server was working and was able to export before I attempted to import anything, then I started a new "zpool import backup: -- my hope was that the NFS share was causing the issue, since the only filesystem shared is the one causing the problem, but that doesn't seem to be the case. I've done a lot of research and could not find a similar case to mine. The most similar one I've found was this from 2008: http://opensolaris.org/jive/thread.jspa?threadID=70205&tstart=15 I simply can not import the pool although ZFS reports it as OK. In that old thread, the user was also having the "zpool import" hang issue, however he was able to run these two commands (his pool was named data1): zdb -e -bb data1 zdb -e - data1 While my system returns: # zdb -e -bb backup zdb: can't open backup: File exists # zdb -e -ddd backup zdb: can't open backup: File exists Every documentation assumes you will be able to run "zpool import" before troubleshooting, however my problem is exactly on that command. I don't even know where to find more detailed documentation. I believe there's very knowledgeable people in this list. Could someone be kind enough to take a look and at least point me in the right direction? Thanks, Eduardo Bragatto.___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] How can I be sure the zfs send | zfs received is correct?
I was expecting zfs send tank/export/projects/project1...@today would send everything up to @today. That is the only snapshot and I am not using the -i options. The things worries me is that tank/export/projects/project1_nb was the first file system that I tested with full dedup and compression. and the first ~300GB usage (before I merged the other file systems) showing ~2.5x dedup ratio. so the data should be easily more than 600 GB. My initial worry was the migration pool won't even have enough space to receive the file system when I started but the turn out to be very unexpected result. My question is where is the dedupped data went if the new pool is showing 1.0x dedup ratio and the old pool is show a 2.53 ratio yet both take up about the same size ~400GB. Is the -R option required for what I am trying to do? what I am try to do is to un-dedup the file system. I actually preferred if non of the properties was replicated. This is quite confusing and I won't be a surprise if other people are taking an incomplete backup with zfs send if that's the case. I will redo the send again with -R and see what happens. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Is it safe to disable the swap partition?
> "mg" == Mike Gerdts writes: mg> If Solaris is under memory pressure, [...] mg> The best thing to do with processes that can be swapped out mg> forever is to not run them. Many programs allocate memory they never use. Linux allows overcommitting by default (but disableable), but Solaris doesn't and can't, so on a Solaris system without swap those allocations turn into physical RAM that can never be used. At the time the never-to-be-used pages are allocated, ARC must be dumped to make room for them. With swap, pages that are allocated but never written can be backed by swap, and the ARC doesn't need to be dumped until the pages are actually written. Note that, in this hypothetical story, swap is never written at all, but it still has to be there. If you run a java vm on your ``storage server'', then you might care about this. I think the no-swap dogma is very soothing and yet very obviously wrong. If you want to get into the overcommit game, fine. If you want to play a game where you will overcommit up to the size of the ARC, well, ``meh'', but fine. Until then, though, swap makes sense. pgpA7wEb34DwB.pgp Description: PGP signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Is it safe to disable the swap partition?
On May 10, 2010, at 9:06 AM, Bob Friesenhahn wrote: > On Mon, 10 May 2010, Thomas Tornblom wrote: >> >> Sorry, but this is incorrect. >> >> Solaris (2 if you will) does indeed swap processes in case normal paging is >> deemed insufficient. >> >> See the chapters on Soft and Hard swapping in: >> >> http://books.google.com/books?id=r_cecYD4AKkC&pg=PA189&lpg=PA189&dq=solaris+internals+swapping&source=bl&ots=oBvgg3yAFZ&sig=lmXYtTLFWJr2JjueQVxsEylnls0&hl=sv&ei=JbXnS7nKF5L60wTtq9nTBg&sa=X&oi=book_result&ct=result&resnum=4&ved=0CCoQ6AEwAw#v=onepage&q&f=false > > If this book is correct, then I must be wrong. I certainly would not want to > use a system which is in this dire condition. It is correct (and recommended reading :-). I find this knowledge useful for troubleshooting. If you stumble across a stumbling system and notice that the vmstat "w" column is not zero, then you know that at some time in the past the system has experienced a severe memory shortfall. -- richard -- ZFS storage and performance consulting at http://www.RichardElling.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Mirroring USB Drive with Laptop for Backup purposes
> "bh" == Brandon High writes: bh> The drive should be on the same USB port because the device bh> path is saved in the zpool.cache. If you removed the bh> zpool.cache, it wouldn't matter where the drive was plugged bh> in. I thought it was supposed to go by devid. There was a bug a while ago that Solaris won't calculate devid for devices that say over SCSI they are ``removeable'' because, in the sense that a DynaMO or DVD-R is ``removeable'', the serial number returned by various identity commands or mode pages isn't bound to any set of stored bits, and the way devid's are used throughout Solaris means they are like a namespace or an array-of for a set of bit-stores so it's not appropriate for a DVD-R drive to have a devid. A DVD disc could have one, though---in fact a release of a pressed disc could appropriately have a non-serialized devid. However USB stick designers used to working with Microsoft don't bother to think through how the SCSI architecture should work in a sane world because they are used to reading chatty-idiot Microsoft manuals, so they fill out the page like a beaurocratic form with whatever feels appropriate and mark USB sticks ``removeable'', which according to the standard and to a sane implementer is a warning that the virtual SCSI disk attached to the virtual SCSI host adapter inside the USB pod might be soldered to removeable FLASH chips. It's quite stupid because before the OS has even determined what kind of USB device is plugged in, it already knows the device is removeable in that sense, just like it knows hot-swap SATA is removeable. USB is no more removeable, even in practical use, than SATA. (eSATA! *slap*) Even in the case of CF readers, it's probably wrong most of the time to set the removeable SCSI flag because the connection that's severable is between the virtual SCSI adapter in the ``reader'' and the virtual SCSI disk in the CF/SD/... card, while the removeable flag indicates severability between SCSI disk and storage medium. In the CF/SD/... reader case the serial number in the IDENTIFY command or mode pages will come from CF/SD/... and remain bound to the bits. The only case that might call for setting the bit is where the adapter is synthesizing a fake mode page where the removeable bit appears, but even then the bit should be clear so long as any serialized fields in other commands and mode pages are still serialized somehow (whether synthesized or not). Actual removeable in-the-scsi-standard's-sense HARD DISK drives mostly don't exist, and real removeable things in the real world attach as optical where an understanding of their removeability is embedded in the driver: ANYTHING the cd driver attaches will be treated removeable. consequently the bit is useless to the way solaris is using it, and does little more than break USB support in ways like this, but the developers refuse to let go of their dreams about what the bit was supposed to mean even though a flood of reality has guaranteed at this point their dream will never come true. I think there was some magical simon-sez flag they added to /kernel/drv/whatever.conf so the bug could be closed, so you might go hunting for that flag in which they will surely want you to encode in a baroque case-sensitive undocumented notation that ``The Microtraveler model 477217045 serial 80502813 attached to driver/hub/hub/port/function has a LYING REMOVEABLE FLAG'', but maybe you can somehow set it to '*' and rejoin reality. Still this won't help you on livecd's. It's probably wiser to walk away from USB unless/until there's a serious will to adopt the practical mindset needed to support it reasonably. pgpAoBbGUMwdU.pgp Description: PGP signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] How can I be sure the zfs send | zfs received is correct?
- "Jim Horng" skrev: > zfs send tank/export/projects/project1...@today | zfs receive -d > mpool Perhaps zfs send -R is what you're looking for... roy -- Roy Sigurd Karlsbakk (+47) 97542685 r...@karlsbakk.net http://blogg.karlsbakk.net/ -- I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og relevante synonymer på norsk. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Mirrored Servers
It sounds like you are looking for AVS. Consider a replication scenario where A is primary and B, secondary and A fails. Say you get A up again on Monday AM, but you are unable to summarily shut down B to bring A back online until Friday evening. During that whole time, you will not have a current mirror because AVS copies only in one direction from A to B. If you can, then things are much easier and less complex. I'd personally use ZFS Snapshots to keep the two servers in sync every 60 seconds. I've never tested this myself, but if you are depending on the server to perform NFS, it has been said here, http://opensolaris.org/jive/thread.jspa?messageID=174846𪫾, that this will fail because the secondary's filesystem will have a different FSID, which the NFS client won't recognize. -- Maurice Volaski, maurice.vola...@einstein.yu.edu Computing Support, Rose F. Kennedy Center Albert Einstein College of Medicine of Yeshiva University ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] How can I be sure the zfs send | zfs received is correct?
On Sun, May 9, 2010 at 11:16 AM, Jim Horng wrote: > zfs send tank/export/projects/project1...@today | zfs receive -d mpool This won't get any snapshots before @today, which may lead to the received size being smaller. I've also noticed that different pool types (eg: raidz vs. mirror) can lead slight differences in space usage. -B -- Brandon High : bh...@freaks.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS and Comstar iSCSI BLK size
- "Brandon High" skrev: > On Sun, May 9, 2010 at 9:42 PM, Geoff Nordli wrote: > > I am looking at using 8K block size on the zfs volume. > > 8k is the default for zvols. So with a 1TB zbol with default blocksize, dedup is done on 8k blocks? If so, some 32 gigs of memory (or l2arc) will be required per terabyte for the DDT, which is quite a lot... Best regards roy -- Roy Sigurd Karlsbakk (+47) 97542685 r...@karlsbakk.net http://blogg.karlsbakk.net/ -- I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og relevante synonymer på norsk. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS and Comstar iSCSI BLK size
On Sun, May 9, 2010 at 9:42 PM, Geoff Nordli wrote: > I am looking at using 8K block size on the zfs volume. 8k is the default for zvols. > I was looking at the comstar iscsi settings and there is also a blk size > configuration, which defaults as 512 bytes. That would make me believe that > all of the IO will be broken down into 512 bytes which seems very > inefficient. I haven't done any tuning on my comstar volumes, and they're using 8k blocks. The setting is in the dataset's volblocksize parameter. > It seems this value should match the file system allocation/cluster size in > the VM, maybe 4K if you are using an ntfs file system. You'll have more overhead using smaller volblocksize values, and get worse compression (since compression is done on the block). If you have dedup enabled, you'll create more entries in the DDT which can have pretty disastrous consequences on write performance. Ensuring that your VM is block-aligned to 4k (or the guest OS's block size) boundaries will help performance and dedup as well. -B -- Brandon High : bh...@freaks.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs performance issue
On Mon, May 10 at 9:08, Erik Trimble wrote: Abhishek Gupta wrote: Hi, I just installed OpenSolaris on my Dell Optiplex 755 and created raidz2 with a few slices on a single disk. I was expecting a good read/write performance but I got the speed of 12-15MBps. How can I enhance the read/write performance of my raid? Thanks, Abhi. You absolutely DON'T want to do what you've done. Creating a ZFS pool (or, for that matter, any RAID device,whether hardware or software) out of slices/partitions of a single disk is a recipe for horrible performance. In essence, you reduce your performance to 1/N (or worse) of the whole disk, where N is the number of slices you created. +1 raidz2 on a single device is the opposite of what you want to do. If you need some improvement in bitrot recovery you can enable multiple copies on a single disk, which may help, but for any of the raidz variants, you really need to use multiple physical devices to provide that capability. --eric -- Eric D. Mudama edmud...@mail.bounceswoosh.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs performance issue
Abhishek Gupta wrote: Hi, I just installed OpenSolaris on my Dell Optiplex 755 and created raidz2 with a few slices on a single disk. I was expecting a good read/write performance but I got the speed of 12-15MBps. How can I enhance the read/write performance of my raid? Thanks, Abhi. You absolutely DON'T want to do what you've done. Creating a ZFS pool (or, for that matter, any RAID device,whether hardware or software) out of slices/partitions of a single disk is a recipe for horrible performance. In essence, you reduce your performance to 1/N (or worse) of the whole disk, where N is the number of slices you created. So, create your zpool using disks or partitions from different disks. It's OK to have more than one partition on a disk - just use them in different pools for reasonable performance. -- Erik Trimble Java System Support Mailstop: usca22-123 Phone: x17195 Santa Clara, CA Timezone: US/Pacific (GMT-0800) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Is it safe to disable the swap partition?
On Mon, 10 May 2010, Thomas Tornblom wrote: Sorry, but this is incorrect. Solaris (2 if you will) does indeed swap processes in case normal paging is deemed insufficient. See the chapters on Soft and Hard swapping in: http://books.google.com/books?id=r_cecYD4AKkC&pg=PA189&lpg=PA189&dq=solaris+internals+swapping&source=bl&ots=oBvgg3yAFZ&sig=lmXYtTLFWJr2JjueQVxsEylnls0&hl=sv&ei=JbXnS7nKF5L60wTtq9nTBg&sa=X&oi=book_result&ct=result&resnum=4&ved=0CCoQ6AEwAw#v=onepage&q&f=false If this book is correct, then I must be wrong. I certainly would not want to use a system which is in this dire condition. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] zfs performance issue
Hi, I just installed OpenSolaris on my Dell Optiplex 755 and created raidz2 with a few slices on a single disk. I was expecting a good read/write performance but I got the speed of 12-15MBps. How can I enhance the read/write performance of my raid? Thanks, Abhi. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] osol monitoring question
On Mon, May 10, 2010 at 7:57 AM, Roy Sigurd Karlsbakk wrote: > Hi all > > It seems that if using zfs, the usual tools like vmstat, sar, top etc are > quite worthless, since zfs i/o load is not reported as iowait etc. Are there > any plans to rewrite the old performance monitoring tools or the zfs parts to > allow for standard monitoring tools? If not, what other tools exist that can > do the same? That's nothing to do with ZFS. Solaris 10 defines iowait to be exactly zero. Which it is, being essentially meaningless. Things like vmstat and sar are a bit old anyway; I'm playing with replacements for sar. Top is still pretty useful. For zfs, zpool iostat has some utility, but I find fsstat to be pretty useful. -- -Peter Tribble http://www.petertribble.co.uk/ - http://ptribble.blogspot.com/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] osol monitoring question
On May 10, 2010, at 12:16 AM, Roy Sigurd Karlsbakk wrote: > - "Michael Schuster" skrev: > >> On 10.05.10 08:57, Roy Sigurd Karlsbakk wrote: >>> Hi all >>> >>> It seems that if using zfs, the usual tools like vmstat, sar, top >> etc are quite worthless, since zfs i/o load is not reported as iowait >> etc. Are there any plans to rewrite the old performance monitoring >> tools or the zfs parts to allow for standard monitoring tools? If not, >> what other tools exist that can do the same? >> >> "zpool iostat" for one. The traditional tools are quite useful. But you have to know how to use them properly. The tools I use most often are: iostat, fsstat, nfsstat, iosnoop, and nicstat. > > I know that, and iostat, etc, but wouldn't it be rather consistent to > integrate with the tools that have been used the latest two or three decades? > wio shouldn't be reported as 0% when the disks are the bottleneck... Absolutely not. Wait for I/O is a processor state and has no direct relation to I/O bottlenecks. As a result, it caused confusion for the better part of the past 30 years. In Solaris 10, wio is always zero. Alan talks about this and refers to an Infodoc describing how wio is useless. http://blogs.sun.com/tpenta/entry/how_solaris_calculates_user_system However, in the brave new world, I can't find a reference to the infodoc. Perhaps someone with a SunSolve account can find it? Suffice to say, this still trips people up and you'll find many references to posts where people try to clarify this if you google a bit. -- richard -- ZFS storage and performance consulting at http://www.RichardElling.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Plugging in a hard drive after Solaris has booted up?
Ian Collins wrote: > Run |cfgadm -cconfigure |on the unconfigured Ids|, see the man page for > the gory details.| IF the BIOS is OK ;-) I have a problem with a DELL PC: If I disable the other SATA ports, Solaris is unable to detect new drives (linux does). If I enable other SATA ports, the DELL BIOS will stop and ask me whether I like to continue, so this is not an option that would survive a remote system crash/reboot. Jörg -- EMail:jo...@schily.isdn.cs.tu-berlin.de (home) Jörg Schilling D-13353 Berlin j...@cs.tu-berlin.de(uni) joerg.schill...@fokus.fraunhofer.de (work) Blog: http://schily.blogspot.com/ URL: http://cdrecord.berlios.de/private/ ftp://ftp.berlios.de/pub/schily ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Problems (bug?) with slow bulk ZFS filesystem creation
- "Roy Sigurd Karlsbakk" skrev: > - "charles" skrev: > > > Hi, > > > > This thread refers to Solaris 10, but it was suggested that I post > it > > here as ZFS developers may well be more likely to respond. > > > > > http://forums.sun.com/thread.jspa?threadID=5438393&messageID=10986502#10986502 > > > > Basically after about ZFS 1000 filesystem creations the creation > time > > slows down to around 4 seconds, and gets progressively worse. > > > > This is not the case for normal mkdir which creates thousands of > > directories very quickly. > > > > I wanted users home directories (60,000 of them) all to be > individual > > ZFS file systems, but there seems to be a bug/limitation due to the > > prohibitive creation time. > > Is there a chance of you running out of memory here? If ZFS runs out > of memory, it'll read indicies from disk instead of keeping them in > memory, something that can almost kill a system. Try to monitor the disk utilisation with iostat -xd 2 or something and compare the numbers with low/high dataset count. If disk usage increases, it's likely you're down on RAM. Adding more RAM or L2ARC might help. roy -- Roy Sigurd Karlsbakk (+47) 97542685 r...@karlsbakk.net http://blogg.karlsbakk.net/ -- I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og relevante synonymer på norsk. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Problems (bug?) with slow bulk ZFS filesystem creation
- "charles" skrev: > Hi, > > This thread refers to Solaris 10, but it was suggested that I post it > here as ZFS developers may well be more likely to respond. > > http://forums.sun.com/thread.jspa?threadID=5438393&messageID=10986502#10986502 > > Basically after about ZFS 1000 filesystem creations the creation time > slows down to around 4 seconds, and gets progressively worse. > > This is not the case for normal mkdir which creates thousands of > directories very quickly. > > I wanted users home directories (60,000 of them) all to be individual > ZFS file systems, but there seems to be a bug/limitation due to the > prohibitive creation time. Is there a chance of you running out of memory here? If ZFS runs out of memory, it'll read indicies from disk instead of keeping them in memory, something that can almost kill a system. Best regards roy -- Roy Sigurd Karlsbakk (+47) 97542685 r...@karlsbakk.net http://blogg.karlsbakk.net/ -- I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og relevante synonymer på norsk. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Problems (bug?) with slow bulk ZFS filesystem creation
Yes, I have recently tried the userquota option, (one ZFS filesystem with 60,000 quotas and 60,000 ordinary 'mkdir' home directories within), and this works finebut you end up with less granularity of snapshots. It does seem odd that after only 1000 ZFS filesystems there is a slow down. It does sound like a bug rather than a limitation. I know that the slow boot problem you mentioned did get resolved in more recent versions of Solaris, although I cannot test it with 60,000 filesystems, as life is too short to wait for them to create! -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Dedup stats per file system
On 10/05/2010 13:35, P-O Yliniemi wrote: Darren J Moffat skrev 2010-05-10 10:58: On 08/05/2010 21:45, P-O Yliniemi wrote: I have noticed that dedup is discussed a lot in this list right now.. Starting to experiment with dedup=on, I feel it would be interesting in knowing exactly how efficient dedup is. The problem is that I've found no way of checking this per file system. I have turned dedup on for a few file systems to try it out: You can't because dedup is per pool not per filesystem. Each file system gets to choose if it is opting in to the pool wide dedup. So dedup is operating on the pool level rather than the file system level, so if I have two file systems with dedup=on, they share the blocks and checksums pool wide ? Correct. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Problems (bug?) with slow bulk ZFS filesystem creation
charles wrote: > > Basically after about ZFS 1000 filesystem creations the creation time slows > down to around 4 seconds, and gets progressively worse. > You can speed up the process by initially setting the mountpoint to 'legacy'. It's not the creation that takes that much time, it's mounting and sharing. But, as said earlier, before taking it into production you should test if the reboot time is within limits. Also I think there is a hard limit of 64k mounts, at least for NFS. -Arne ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Dedup stats per file system
Darren J Moffat skrev 2010-05-10 10:58: On 08/05/2010 21:45, P-O Yliniemi wrote: I have noticed that dedup is discussed a lot in this list right now.. Starting to experiment with dedup=on, I feel it would be interesting in knowing exactly how efficient dedup is. The problem is that I've found no way of checking this per file system. I have turned dedup on for a few file systems to try it out: You can't because dedup is per pool not per filesystem. Each file system gets to choose if it is opting in to the pool wide dedup. So dedup is operating on the pool level rather than the file system level, so if I have two file systems with dedup=on, they share the blocks and checksums pool wide ? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Problems (bug?) with slow bulk ZFS filesystem creation
On 10 May, 2010 - charles sent me these 0,8K bytes: > Hi, > > This thread refers to Solaris 10, but it was suggested that I post it here as > ZFS developers may well be more likely to respond. > > http://forums.sun.com/thread.jspa?threadID=5438393&messageID=10986502#10986502 > > Basically after about ZFS 1000 filesystem creations the creation time slows > down to around 4 seconds, and gets progressively worse. > > This is not the case for normal mkdir which creates thousands of directories > very quickly. > > I wanted users home directories (60,000 of them) all to be individual ZFS > file systems, but there seems to be a bug/limitation due to the prohibitive > creation time. If you're going to share them over nfs, you'll be looking at even worse times. In my experience, you don't want to go over 1-2k filesystems due to various scalability problems, esp if you're doing NFS as well. It will be slow to create and slow when (re)booting, but other than that it might be ok.. Look into the zfs userquota/groupquota instead.. That's what I did, and it's partly because of these issues that the userquota/groupquota got implemented I guess. /Tomas -- Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/ |- Student at Computing Science, University of Umeå `- Sysadmin at {cs,acc}.umu.se ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Daily snapshots as replacement for incremental backups
Gabriele Bulfon wrote: Hello, I have a situation where a zfs file server holding lots of graphic files cannot be backed up daily with a full backup. My idea was initially to run a full backup on Sunday through the lto library on more dedicated tapes, then have an incremental backup run on daily tapes. Brainstorming on this, led me to the idea that I could actually stop thinking about incremental backups (that may always lead me to unsafe backups anyway for some unlucky reason) and substitute the idea with daily snapshots. Actually, the full disaster ricovery is on the Sunday full backups (that can be safely taken away on Monday), while the daily solution would be just a safe place for daily errors by users (people who delete files by mistake, for example). This can be done simply running a snapshot per day during the night. My idea is to have cron to rotate snapshots during working days, so that I always have Mon,Tue,Wen,Thu,Fri,Sat snapshots, and have the cron shell delete the oldest (actually, if I have to run a Mon snapshot, I will delete the old Mon snapshots, this should run the cycle). My questions are: - is this a good and common solution? Yes, though of course you realize that snapshots are not a disaster-recovery mechanism. They're not really backups, either, in the sense that they provide no security against larger-scale failures. - is there any zfs performance degradation caused by creating and deleting snapshots on a daily basis, maybe fragmenting the file system? No. Well, that's not strictly true, but you won't run into any issues with snapshots until you have a very large number of them simultaneously. 1000s, or more. Snapshots don't fragment the file system any more than deleting files does. Taking snapshots is instantaneous, while deleting a snapshot can vary in time from virtually instantaneous to taking several hours (or more), if you have dedup turned on and have a large amount of data (and, don't have sufficient L2ARC or RAM to hold the dedup table). In the latter case, it will impact performance, as the entire pool has to be scanned to allow for proper deletion of the deduped snapshot (i.e. it has to scan the entire pool to figure out which data is deduped, and what can be safely deleted). Thanx for any suggestion Gabriele. -- Erik Trimble Java System Support Mailstop: usca22-123 Phone: x17195 Santa Clara, CA Timezone: US/Pacific (GMT-0800) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Daily snapshots as replacement for incremental backups
Hello, I have a situation where a zfs file server holding lots of graphic files cannot be backed up daily with a full backup. My idea was initially to run a full backup on Sunday through the lto library on more dedicated tapes, then have an incremental backup run on daily tapes. Brainstorming on this, led me to the idea that I could actually stop thinking about incremental backups (that may always lead me to unsafe backups anyway for some unlucky reason) and substitute the idea with daily snapshots. Actually, the full disaster ricovery is on the Sunday full backups (that can be safely taken away on Monday), while the daily solution would be just a safe place for daily errors by users (people who delete files by mistake, for example). This can be done simply running a snapshot per day during the night. My idea is to have cron to rotate snapshots during working days, so that I always have Mon,Tue,Wen,Thu,Fri,Sat snapshots, and have the cron shell delete the oldest (actually, if I have to run a Mon snapshot, I will delete the old Mon snapshots, this should run the cycle). My questions are: - is this a good and common solution? - is there any zfs performance degradation caused by creating and deleting snapshots on a daily basis, maybe fragmenting the file system? Thanx for any suggestion Gabriele. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Problems (bug?) with slow bulk ZFS filesystem creation
Hi, This thread refers to Solaris 10, but it was suggested that I post it here as ZFS developers may well be more likely to respond. http://forums.sun.com/thread.jspa?threadID=5438393&messageID=10986502#10986502 Basically after about ZFS 1000 filesystem creations the creation time slows down to around 4 seconds, and gets progressively worse. This is not the case for normal mkdir which creates thousands of directories very quickly. I wanted users home directories (60,000 of them) all to be individual ZFS file systems, but there seems to be a bug/limitation due to the prohibitive creation time. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Mirroring USB Drive with Laptop for Backup purposes
On 05/ 7/10 10:07 PM, Bill McGonigle wrote: On 05/07/2010 11:08 AM, Edward Ned Harvey wrote: I'm going to continue encouraging you to staying "mainstream," because what people do the most is usually what's supported the best. If I may be the contrarian, I hope Matt keeps experimenting with this, files bugs, and they get fixed. His use case is very compelling - I know lots of SOHO folks who could really use a NAS where this 'just worked' The ZFS team has done well by thinking liberally about conventional assumptions. -Bill My plan indeed is to continue with this setup (going to upgrade to 138 to resolve my reboot issue). This particular use case for me is definitely compelling, the simply fact that I can plug my USB drive into another laptop and boot into the exact same environment is reason enough for me to continue with this setup and see how things go. Mind you doing occasional zfs send's to another backup drive might be something I'll do aswell :-) cheers Matt ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Hard disk buffer at 100%
Hi Eric, > Problem is the OP is mixing client 4k drives with 512b drives. How do you come to that assesment? Here's what I have: Ap_Id Information sata1/1::dsk/c7t1d0Mod: WDC WD10EADS-00L5B1 FRev: 01.01A01 sata1/2::dsk/c7t2d0Mod: WDC WD10EADS-00P8B0 FRev: 01.00A01 sata1/3::dsk/c7t3d0Mod: WDC WD10EADS-00P8B0 FRev: 01.00A01 sata1/4::dsk/c7t4d0Mod: WDC WD10EADS-00P8B0 FRev: 01.00A01 sata1/5::dsk/c7t5d0Mod: WDC WD10EADS-00P8B0 FRev: 01.00A01 sata2/1::dsk/c0t1d0Mod: WDC WD10EADS-00P8B0 FRev: 01.00A01 They all seem to indicate the older 512b from the WDC site unless I'm not understanding their spec sheets. > I doubt they're "broken" per say, they're just dramatically slower > than their peers in this workload. It does make sense though! My read speed (trying to copy 683Gb across to another machine) is roughly 7-8Mbps where I used to get on average 30-40Mbps. > As a replacement recommendation, we've been beating on the WD 1TB RE3 Cool, either the RE3 or black drives it is :-) Thanks, Em _ View photos of singles in your area! Looking for a hot date? http://clk.atdmt.com/NMN/go/150855801/direct/01/___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Dedup stats per file system
On 08/05/2010 21:45, P-O Yliniemi wrote: I have noticed that dedup is discussed a lot in this list right now.. Starting to experiment with dedup=on, I feel it would be interesting in knowing exactly how efficient dedup is. The problem is that I've found no way of checking this per file system. I have turned dedup on for a few file systems to try it out: You can't because dedup is per pool not per filesystem. Each file system gets to choose if it is opting in to the pool wide dedup. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Best practice for full stystem backup - equivelent of ufsdump/ufsrestore
erik.ableson said: "Just a quick comment for the send/recv operations, adding -R makes it recursive so you only need one line to send the rpool and all descendant filesystems. " Yes, I know of the -R flag, but it doesn't seem to work with sending loose snapshots to the backup pool. It obviously works when piped to a file. Sorry I can't remember what the error message was when I tried to 'send -R | receive backup-pool/rpool', it does work if done individually though. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Is it safe to disable the swap partition?
2010-05-10 05:58, Bob Friesenhahn skrev: On Sun, 9 May 2010, Edward Ned Harvey wrote: So, Bob, rub it in if you wish. ;-) I was wrong. I knew the behavior in Linux, which Roy seconded as "most OSes," and apparently we both assumed the same here, but that was wrong. I don't know if solaris and opensolaris both have the same swap behavior. I don't know if there's *ever* a situation where solaris/opensolaris would swap idle processes. But there's at least evidence that my two servers have not, or do not. Solaris and Linux are different in many ways since they are completely different operating systems. Solaris 2.X has never swapped processes. It only sends dirty pages to the paging device if there is a shortage of pages when more are requested, or if there are not enough free, but first it will purge seldom accessed read-only pages which can easily be restored. Zfs has changed things up again by not caching file data via the "unified page cache" and using a specialized ARC instead. It seems that simple paging and MMU control was found not to be smart enough. Bob Sorry, but this is incorrect. Solaris (2 if you will) does indeed swap processes in case normal paging is deemed insufficient. See the chapters on Soft and Hard swapping in: http://books.google.com/books?id=r_cecYD4AKkC&pg=PA189&lpg=PA189&dq=solaris+internals+swapping&source=bl&ots=oBvgg3yAFZ&sig=lmXYtTLFWJr2JjueQVxsEylnls0&hl=sv&ei=JbXnS7nKF5L60wTtq9nTBg&sa=X&oi=book_result&ct=result&resnum=4&ved=0CCoQ6AEwAw#v=onepage&q&f=false ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] osol monitoring question
- "Michael Schuster" skrev: > On 10.05.10 08:57, Roy Sigurd Karlsbakk wrote: > > Hi all > > > > It seems that if using zfs, the usual tools like vmstat, sar, top > etc are quite worthless, since zfs i/o load is not reported as iowait > etc. Are there any plans to rewrite the old performance monitoring > tools or the zfs parts to allow for standard monitoring tools? If not, > what other tools exist that can do the same? > > "zpool iostat" for one. I know that, and iostat, etc, but wouldn't it be rather consistent to integrate with the tools that have been used the latest two or three decades? wio shouldn't be reported as 0% when the disks are the bottleneck... Best regards roy -- Roy Sigurd Karlsbakk (+47) 97542685 r...@karlsbakk.net http://blogg.karlsbakk.net/ -- I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og relevante synonymer på norsk. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Best practice for full stystem backup - equivelent of ufsdump/ufsrestore
> Just a quick comment for the send/recv operations, adding -R makes it > recursive so you only need one line to send the rpool and all descendant > filesystems. Yes, I am aware of that, but it does not work when you are sending them loose to an existing pool. Can't remember the error message but it didn't work for me. It seems to work fine when redirecting to standard output ( > somefile.bck) or in your case piping it through gzip and then output to a file. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] osol monitoring question
On 10.05.10 08:57, Roy Sigurd Karlsbakk wrote: Hi all It seems that if using zfs, the usual tools like vmstat, sar, top etc are quite worthless, since zfs i/o load is not reported as iowait etc. Are there any plans to rewrite the old performance monitoring tools or the zfs parts to allow for standard monitoring tools? If not, what other tools exist that can do the same? "zpool iostat" for one. Michael -- michael.schus...@oracle.com http://blogs.sun.com/recursion Recursion, n.: see 'Recursion' ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] osol monitoring question
Hi all It seems that if using zfs, the usual tools like vmstat, sar, top etc are quite worthless, since zfs i/o load is not reported as iowait etc. Are there any plans to rewrite the old performance monitoring tools or the zfs parts to allow for standard monitoring tools? If not, what other tools exist that can do the same? Best regards roy -- Roy Sigurd Karlsbakk (+47) 97542685 r...@karlsbakk.net http://blogg.karlsbakk.net/ -- I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og relevante synonymer på norsk. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss