Re: [zfs-discuss] Single disk parity
Christian Auby wrote: On Wed, 8 Jul 2009, Moore, Joe wrote: That's true for the worst case, but zfs mitigates that somewhat by batching i/o into a transaction group. This means that i/o is done every 30 seconds (or 5 seconds, depending on the version you're running), allowing multiple writes to be written together in the disparate locations. I'd think that writing the same data two or three times is a much larger performance hit anyway. Calculating 5% parity and writing it in addition to the stripe might be heaps faster. Might try to do some tests on this. Before you get too happy, you should look at the current constraints. The minimum disk block size is 512 bytes for most disks, but there has been talk in the industry of cranking this up to 2 or 4 kBytes. For small files, your 5% becomes 100%, and you might as well be happy now and set copies=2. The largest ZFS block size is 128 kBytes, so perhaps you could do something with 5% overhead there, but you couldn't correct very many bits with only 5%. How many bits do you need to correct? I don't know... that is the big elephant in the room shaped like a question mark. Maybe zcksummon data will help us figure out what color the elephant might be. If you were to implement something at the DMU layer, which is where copies are, then without major structural changes to the blkptr, you are restricted to 3 DVAs. So the best you could do there is 50% overhead, which would be a 200% overhead for small files. If you were to implement at the SPA layer, then you might be able to get back to a more consistently small overhead, but that would require implementing a whole new vdev type, which means integration with install, grub, and friends. You would need to manage spatial diversity, which might impact the allocation code in strange ways, but surely is possible. The spatial diversity requirement means you basically can't gain much by replacing a compressor with additional data redundancy, though it might be an interesting proposal for the summer of code. Or you could just do it in user land, like par2. Bottom line: until you understand the failure modes you're trying to survive, you can't make significant progress except by accident. We know that redundant copies allows us to correct all bits for very little performance impact, but costs space. Trying to increase space without sacrificing dependability will cost something -- most likely performance. NB, one nice thing about copies is that you can set it per-file system. For my laptop, I don't set copies for the OS, but I do for my home directory. This is a case where I trade off dependability of read-only data which, is available on CD or on the net, to gain a little bit of space. But I don't compromise on dependability for my data. -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Migrating 10TB of data from NTFS is there a simple way?
You might also search for OpenSolaris NAS projects. Some that I've seen previously involve nearly the same config you're building - a CF card or USB stick with the OS and a number of HDDs in a zfs pool for the data only. I am not certain which ones I've seen, but you can look for EON, and PulsarOS... http://eonstorage.blogspot.com/2008_11_01_archive.html (features page) http://eonstorage.blogspot.com/2009/05/eon-zfs-nas-0591-based-on-snv114.html http://code.google.com/p/pulsaros/ http://pulsaros.digitalplayground.at/ Haven't yet tried them, though. //Jim -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Migrating 10TB of data from NTFS is there a simple way?
> Trying to spare myself the expense as this is my home system so budget is > a constraint. > What I am trying to avoid is having multiple raidz's because every time I > have another one I loose a lot of extra space to parity. Much like in raid 5. There's a common perception which I tend to share now, that "consumer" drives have a somewhat higher rate of unreliability and failure. Some aspects relate to the design priorities (i.e. balance price vs size vs duty cycle), or conspiracy-theory stuff (force consumers into buying more drives more often). Hand-made computers tend to increase that rate due to any number of reasons (components, connections, thermal issues, power source issues). I've passed that the hard way while building many home computers and cheap campus servers at my Uni. Including 24-drive linux filers with mdadm and hardware raid cards :) Another problem is that larger drives take a lot longer to rebuild (about 4 hours to write a single drive in your case with an otherwise idle system) or even resilver with a filled-up array like yours. This is especially a problem in classic RAID setups, where the whole drive is considered failed if anything goes wrong. It's quite often that some hidden problem occurs with another drive of the array, so it is all considered dead, and the chance grows with disk size. That's one of many other valid good reasons why "enterprise" drives are smaller. Hopefully ZFS does contain such failures down to a few blocks which have checksum mismatch. Anyway, I'd not be comfortable with large unreliable-big-drive sets even with some redundancy. Hence my somewhat arbitrary recommendation of 4-drive raidz1 sets. The industry seems to agree that at most 7-9 drives are reasonable for a single RAID5/6 volume (vdev in case of ZFS), though. Since you already have 2 clean 1Tb disks, you can buy just 2 more. In the end you'd have one 4*1Tb raidz1 and two 4*1.5Tb raidz1 vdevs in a pool, summing up to 3+(4.5*2) = 12Gb of usable space in a redundant set. For me personally, that would be worth its salt. There may however be some discrepancy between the space on the first set (3Gb) which amounts to just 2*1Tb drives freeing up. That can introduce more costly corrections into my calculations (i.e. a 5*1Tb disk set)... Concerning the sd/cf card for booting, I have no experience. From what I've seen, you can google for notes on card booting in Eeepc and similar netbooks, and for comments on the making of livecd/liveusb - capable Solaris distros (see some at http://www.opensolaris.org/os/downloads/). You'd probably need to make sure that the BIOS emulates the card as an IDE/SATA hard disk device, and/or bundle the needed drivers into the Solaris miniroot. > And last thx so very much for spending so much time and effort in > transferring > knowlege, I really do appreciate it. You're very welcome. I do hope this helps and you don't lose data in the progress, due to my possible mistakes or misconceptions, or otherwise ;) //Jim -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs root, jumpstart and flash archives
On 07/09/09 17:25, Mark Michael wrote: Thanks for the info. Hope that the pfinstall changes to support zfs root flash jumpstarts can be extended to support luupgrade -f at some point soon. BTW, where can I find an example profile? do I just substitute in the install_type flash_install archive_location ... for install_type initial_install ?? Here's a sample: install_type flash_install archive_location nfs schubert:/export/home/lalt/mirror.flar partitioning explicit pool rpool auto auto auto mirror c0t1d0s0 c0t0d0s0 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Single disk parity
> On Wed, 8 Jul 2009, Moore, Joe wrote: > That's true for the worst case, but zfs mitigates > that somewhat by > batching i/o into a transaction group. This means > that i/o is done every > 30 seconds (or 5 seconds, depending on the version > you're running), > allowing multiple writes to be written together in > the disparate > locations. > I'd think that writing the same data two or three times is a much larger performance hit anyway. Calculating 5% parity and writing it in addition to the stripe might be heaps faster. Might try to do some tests on this. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs root, jumpstart and flash archives
Thanks for the info. Hope that the pfinstall changes to support zfs root flash jumpstarts can be extended to support luupgrade -f at some point soon. BTW, where can I find an example profile? do I just substitute in the install_type flash_install archive_location ... for install_type initial_install ?? -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] creating a zpool inside a zone with zvols from the global zone
I'm not sure if this is the correct list for this query, however, I am trying to create a number of zpools inside a zone. I am running snv_117 and this is a ipkg banded zone, here is the zone configuration: a...@vs-idm:~$ zonecfg -z vsnfs-02 export > create -b > set zonepath=/rpool/zones/vsnfs-02 > set brand=ipkg > set autoboot=true > set ip-type=shared > add net > set address=xxx.xxx.xxx.xxx > set physical=e1000g0 > end > add device > set match=/dev/zvol/dsk/rpool/[uw][0123]-test > end > add device > set match=/dev/zvol/rdsk/rpool/[uw][0123]-test > end > The device for the pool is a zvol created in the global zone and added to the local zone using "add device" in zonecfg. I get this error: pfexec zpool create -m /VS/home/.u0 u0 /dev/zvol/dsk/rpool/u0-test > cannot create 'u0': permission denied > I take it I am trying to do something that is not intended? Thanks, Alastair ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zpool replace leaves pool degraded after resilvering
2009.06 is v111b, but you're running v111a. I don't know, but perhaps the a->b transition addressed this issue, among others? -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Very slow ZFS write speed to raw zvol
After reading many-many threads on ZFS performance today (top of the list in the forum, and some chains of references), I applied a bit of tuning to the server. In particular, I've set the zfs_write_limit_override to 384Mb so my cache is spooled to disks more frequently (if streaming lots of writes) and in smaller increments. * echo zfs_write_limit_override/W0t402653184 | mdb -kw set zfs:zfs_write_limit_override = 0x1800 The system seems to be working more smoothly (vs. jerky), and "zpool iostat" values are not quite as jumpy (i.e. 320MBps to 360MBps for a certain test). The results also seem faster and more consistent. With this tuning applied, I'm writing to a 40G zvol, 1M records (count=1048576) of: 4k (bs=4096): 17s (12s), 241MBps 8k (bs=8192): 29s (18s), 282MBps 16k (bs=16384): 54s (30s), 303MBps 32k (bs=32768): 113s (56s), 290MBps 64k (bs=65536): 269s (104s), 243MBps And 10240 larger records of: 1 MB (bs=1048576): 33s (8s), 310MBps 2 MB (bs=2097152): 74s (23s), 276MBps And 1024 yet larger records: 1 MB (bs=1048576): 4s (1s), 256MBps 4 MB (bs=4194304): 12s (5s), 341MBps 16MB (bs=16777216): 71s (18s), 230MBps 32MB (bs=33554432): 150s (36s), 218MBps So the zvol picture is quite better now (albeit not perfect - i.e. no values are near the 1GBps noted previously in "zpool iostat"), for both small and large blocks. For filesystem dataset the new values are very similar (like, to tenths of a second on smaller blocksizes!) but as the blocksize grows, filesystems start losing to the zvols. Overall the result seems lower than achieved before I tried tuning. 1M records (count=1048576) of: 4k (bs=4096): 17s (12s), 241MBps 8k (bs=8192): 29s (18s), 282MBps 16k (bs=16384): 67s (30s), 245MBps 32k (bs=32768): 144s (55s), 228MBps 64k (bs=65536): 275s (98s), 238MBps And 10240 larger records go better: 1 MB (bs=1048576): 33s (9s), 310MBps 2 MB (bs=2097152): 70s (21s), 292MBps And 1024 yet larger records: 1 MB (bs=1048576): 2.8s (0.8s), 366MBps 4 MB (bs=4194304): 12s (4s), 341MBps 16MB (bs=16777216): 55s (17s), 298MBps 32MB (bs=33554432): 140s (36s), 234MBps Occasionally I did reruns; user time for the same setups can vary significantly (like 65s vs 84s) while the system time stays pretty much the same. "zpool iostat" shows larger values (like 320MBps typically) but I think that can be attributed to writing parity stripes on raidz vdevs. //Jim PS: for completeness, I'll try smaller blocks without tuning in a future post. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Changing GUID
On Thu, Jul 9, 2009 at 8:42 PM, Norbert wrote: > Does anyone have the code/script to change the GUID of a ZFS pool? I did such tool for my client around a year ago and that client agreed to release the code. However, the API I've used is has been changed and not available anymore. So you cannot compile it on recent Nevada releases. I may consider to retrofit it if I will have enough time and motivation. -- Regards, Cyril ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Changing GUID
Does anyone have the code/script to change the GUID of a ZFS pool? -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Question about user/group quotas
Thanks for the link Richard, I guess the next question is, how safe would it be to run snv_114 in production? Running something that would be technically "unsupported" makes a few folks here understandably nervous... -Greg On Thu, 2009-07-09 at 10:13 -0700, Richard Elling wrote: > Greg Mason wrote: > > I'm trying to find documentation on how to set and work with user and > > group quotas on ZFS. I know it's quite new, but googling around I'm just > > finding references to a ZFS quota and refquota, which are > > filesystem-wide settings, not per user/group. > > > > Cindy does an excellent job of keeping the ZFS Admin Guide up to date. > http://opensolaris.org/os/community/zfs/docs/zfsadmin.pdf > See the section titled "Setting User or Group Quotas on a ZFS File System" > -- richard > > Also, after reviewing a few bugs, I'm a bit confused about which build > > has user quota support. I recall that snv_111 has user quota support, > > but not in rquotad. According to bug 6501037, ZFS user quota support is > > in snv_114. > > > > We're preparing to roll out OpenSolaris 2009.06 (snv_111b), and we're > > also curious about being able to utilize ZFS user quotas, as we're > > having problems with NFSv4 on our clients (SLES 10 SP2). We'd like to be > > able to use NFSv3 for now (one large ZFS filesystem, with user quotas > > set), until the flaws with our Linux NFS clients can be addressed. > > > > > ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?
I don't swear. The word it bleeped was not a bad word -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?
I have a much more generic question regarding this thread. I have a sun T5120 (T2 quad core, 1.4GHz) with two 10K RPM SAS drives in a mirrored pool running Solaris 10 u7. The disk performance seems horrible. I have the same apps running on a Sun X2100M2 (dual core 1.8GHz AMD) also running Solaris 10u7 and an old, really poor performing SATA drive (also with ZFS), and its disk performance seems at least 5x better. I'm not offering much detail here, but I had been attributing this to what I've always observed--Solaris on x86 performs far better than on sparc for any app I've ever used. I guess the real question would be is ZFS ready for production in Solaris 10, or should I flar this bugger up and rebuild with UFS? This thread concerns me, and I really want to keep ZFS on this system for its many features. Sorry if this is off-topic, but you guys got me wondering. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Single disk parity
Haudy Kazemi wrote: Adding additional data protection options are commendable. On the other hand I feel there are important gaps in the existing feature set that are worthy of a higher priority, not the least of which is the automatic recovery of uberblock / transaction group problems (see Victor Latushkin's recovery technique which I linked to in a recent post), This does not seem to be a widespread problem. We do see the occasional complaint on this forum, but considering the substantial number of ZFS implementations in existence today, the rate seems to be quite low. In other words, the impact does not seem to be high. Perhaps someone at Sun could comment on the call rate for such conditions? I counter this. The user impact is very high when the pool is completely inaccessible due to a minor glitch in the ZFS metadata, and the user is told to restore from backups, particularly if they've been considering snapshots to be their backups (I know they're not the same thing). The incidence rate may be low, but the impact is still high, and anecdotally there have been enough reports on list to know it is a real non-zero event probability. Impact in my context is statistical. If everyone was hitting this problem, then it would have been automated long ago. Sun does track such reports and will know their rate. Think earth-asteroid collisions...doesn't happen very often but is catastrophic when it does happen. Graceful handling of low incidence high impact events plays a role in real world robustness and is important in widescale adoption of a filesystem. It is about software robustness in the face of failure vs. brittleness. (In another area, I and others found MythTV's dependence on MySQL to be source of system brittleness.) Google adopts robustness principles in its Google File System (GFS) by not trusting the hardware at all and then keeping a minimum of three copies of everything on three separate computers. Right, so you also know that the reports of this problem are for non-mirrored pools. I agree with Google, mirrors work. Consider the users/admin's dilemma of choosing between a filesystem that offers all the great features of ZFS but can be broken (and is documented to have broken) with a few miswritten bytes, or choosing a filesystem with no great features but is also generally robust to wide variety of minor metadata corrupt issues. Complex filesystems need to take special measures that their complexity doesn't compromise their efforts at ensuring reliability. ZFS's extra metadata copies provide this versus simply duplicating the file allocation table as is done in FAT16/32 filesystems (a basic filesystem). The extra filesystem complexity also makes users more dependent upon built in recovery mechanisms and makes manual recovery more challenging. (This is an unavoidable result of more complicated filesystem design.) I agree 100%. But the question here is manual vs automated, not possible vs impossible. Even the venerable UFS fsck defers to manual if things are really messed up. More below. followed closely by a zpool shrink or zpool remove command that lets you resize pools and disconnect devices without replacing them. I saw postings or blog entries from about 6 months ago that this code was 'near' as part of solving a resilvering bug but have not seen anything else since. I think many users would like to see improved resilience in the existing features and the addition of frequently long requested features before other new features are added. (Exceptions can readily be made for new features that are trivially easy to implement and/or are not competing for developer time with higher priority features.) In the meantime, there is the copies flag option that you can use on single disks. With immense drives, even losing 1/2 the capacity to copies isn't as traumatic for many people as it was in days gone by. (E.g. consider a 500 gb hard drive with copies=2 versus a 128 gb SSD). Of course if you need all that space then it is a no-go. Space, performance, dependability: you can pick any two. Related threads that also had ideas on using spare CPU cycles for brute force recovery of single bit errors using the checksum: There is no evidence that the type of unrecoverable read errors we see are single bit errors. And while it is possible for an error handling code to correct single bit flips, multiple bit flips would remain as a large problem space. There are error codes which can correct multiple flips, but they quickly become expensive. This is one reason why nobody does RAID-2. Expensive in CPU cycles or engineering resources or hardware or dollars? If the argument is CPU cycles, then that is the same case made against software RAID as a whole and an argument increasingly broken by modern high performance CPUs. If the argument is engineering resources, consider the complexity of ZFS itself. If the a
Re: [zfs-discuss] Question about user/group quotas
Greg Mason wrote: I'm trying to find documentation on how to set and work with user and group quotas on ZFS. I know it's quite new, but googling around I'm just finding references to a ZFS quota and refquota, which are filesystem-wide settings, not per user/group. Cindy does an excellent job of keeping the ZFS Admin Guide up to date. http://opensolaris.org/os/community/zfs/docs/zfsadmin.pdf See the section titled "Setting User or Group Quotas on a ZFS File System" -- richard Also, after reviewing a few bugs, I'm a bit confused about which build has user quota support. I recall that snv_111 has user quota support, but not in rquotad. According to bug 6501037, ZFS user quota support is in snv_114. We're preparing to roll out OpenSolaris 2009.06 (snv_111b), and we're also curious about being able to utilize ZFS user quotas, as we're having problems with NFSv4 on our clients (SLES 10 SP2). We'd like to be able to use NFSv3 for now (one large ZFS filesystem, with user quotas set), until the flaws with our Linux NFS clients can be addressed. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Question about user/group quotas
Greg Mason wrote: I'm trying to find documentation on how to set and work with user and group quotas on ZFS. I know it's quite new, but googling around I'm just finding references to a ZFS quota and refquota, which are filesystem-wide settings, not per user/group. Also, after reviewing a few bugs, I'm a bit confused about which build has user quota support. I recall that snv_111 has user quota support, but not in rquotad. According to bug 6501037, ZFS user quota support is in snv_114. ZFS user quota support and the corresponding rquotad support did not integrate until build 114 so you would need to first install 2009.06 then switch to the http://pkg.opensolaris.org/dev repository and 'pkg image-update' -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs root, jumpstart and flash archives
Flash archive on zfs means archiving an entire root pool (minus any explicitly excluded datasets), not an individual BE. These types of flash archives can only be installed using Jumpstart and are intended to install an entire system, not an individual BE. Flash archives of a single BE could perhaps be implemented in the future. Lori On 07/09/09 09:56, Mark Michael wrote: I've been hoping to get my hands on patches that permit Sol10U7 to do a luupgrade -f of a ZFS root-based ABE since Solaris 10 10/08. Unfortunately, after applying patchids 119534-15 and 124630-26 to both the PBE and the miniroot of the OS image, I'm still getting the same "ERROR: Field 2 - Invalid disk name (insert_abe_name_here)". The flarcreate command I used was simply # flarcreate -n root_var_no_snap /export/fssnap/flars/root_var which created a flar file that was about 4 to 5 times the size of a UFS-based flar file. I then used the command # luupgrade -f -n be_d70 -s /export/fssnap/os_image \ > -a /export/fssnap/flars/root_var which then failed with the pfinstall diagnostic given above. What am I still doing wrong? ttfn mm mark.o.mich...@boeing.com mark.mich...@es.bss.boeing.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Question about user/group quotas
I'm trying to find documentation on how to set and work with user and group quotas on ZFS. I know it's quite new, but googling around I'm just finding references to a ZFS quota and refquota, which are filesystem-wide settings, not per user/group. Also, after reviewing a few bugs, I'm a bit confused about which build has user quota support. I recall that snv_111 has user quota support, but not in rquotad. According to bug 6501037, ZFS user quota support is in snv_114. We're preparing to roll out OpenSolaris 2009.06 (snv_111b), and we're also curious about being able to utilize ZFS user quotas, as we're having problems with NFSv4 on our clients (SLES 10 SP2). We'd like to be able to use NFSv3 for now (one large ZFS filesystem, with user quotas set), until the flaws with our Linux NFS clients can be addressed. -- Greg Mason System Administrator Michigan State University High Performance Computing Center ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs root, jumpstart and flash archives
I've been hoping to get my hands on patches that permit Sol10U7 to do a luupgrade -f of a ZFS root-based ABE since Solaris 10 10/08. Unfortunately, after applying patchids 119534-15 and 124630-26 to both the PBE and the miniroot of the OS image, I'm still getting the same "ERROR: Field 2 - Invalid disk name (insert_abe_name_here)". The flarcreate command I used was simply # flarcreate -n root_var_no_snap /export/fssnap/flars/root_var which created a flar file that was about 4 to 5 times the size of a UFS-based flar file. I then used the command # luupgrade -f -n be_d70 -s /export/fssnap/os_image \ > -a /export/fssnap/flars/root_var which then failed with the pfinstall diagnostic given above. What am I still doing wrong? ttfn mm mark.o.mich...@boeing.com mark.mich...@es.bss.boeing.com -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Migrating 10TB of data from NTFS is there a simple way?
> > I installed opensolaris and setup rpool as my base > install on a single 1TB drive > > If I understand correctly, you have rpool and the > data pool configured all as one > pool? Correct > That's not probably what you'd really want. For one > part, the bootable root pool > should all be available to GRUB from a single > hardware device and this precludes > any striping or raidz configurations for the root > pool (only single drives and > mirrors are supported). Makes sense > You should rather make a separate root pool (depends > on your installation size, > RAM -> swap, number of OS versions to roll back); I'd > suffice with anything from > 8 to 20Gb. And the rest of the disk (as another > slice) becomes the data pool which I would like to use a 16gb sd card for this- if there is a post or a resource on "how to" you know of pls point me to it. > can later be expanded by adding stripes. Obviously, > data already on the disk > won't magically become striped to all drives unless > you rewrite it. > > > a single 1TB drive > > Minor detail: I thought you were moving 1.5TB disks? > Or did you find a drive with > adequately few data (1 TB used)? I have 2 x 1TB drives that are clean and 8 by 1.5TB drives with all my data on. > > transfering data accross till the drive was empty > > I thought NTFS driver for Solaris is read-only? Nope I copied(not moved) all the data 800GB so far in 3 and a half hours succesfully to my rpool. > Not a good transactional approach. Delete original > data only after all copying has > completed (and perhaps cross-checked) and the disk > can actually be reused in the > ZFS pool. > > For example, if you were to remake the pool (as > suggested above for rpool and > below for raidz data pool) - where would you re-get > the original data for copying > over again? > > > I havent worked out if I can transform my zpool int > a zraid after I have > > copied all my data. > > My guess would be - no, you can't (not directly at > least). I think you can mirror the > striped pool's component drives on the fly, by buying > new drives one at a time - > which requires buying these drives. Or if you buy and > attach all 8-9 drives at once, Trying to spare myself the expense as this is my home system so budget is a constraint. > you can build another pool with raidz layout and > migrate all data to it. Your old > drives can then be attached to this pool as another > raidz vdev stripe (or even > mirror, but that's probably not needed for your > usecase). These scenarios are > not unlike raid50 or raid51, respectively. > > In case of striping, you can build and expand your > pool by vdev's of different > layout and size. As said before, currently there's a > problem that you can't shrink > the pool to remove devices (other than break mirrors > into single drives). > > Perhaps you can get away by buying now only the > "parity" drives for your future > pool layout (which depends on the number of > motherboard/controller connectors, > and power source capacity, and your computer case > size, etc.) and following the > ideas for "best-case" scenario from my post. Motherboard has 7 sata connectors in addition I have a Intel sata raid controller with 6 connectors which I havent put on yet and I am using a dual psu coolermaster case which supports 16 drives > > Then you'd start the pool by making a raidz1 device > of 3-5 drives total (new empty > ones, possibly including the "missing" fake parity > device), and then making and > attaching to the pool more new similar raidz vdev's > as you free up NTFS disks. > > I did some calculations on this last evening. > > For example, if your data fits on 8 "data" drives, > you can make 1*8-Ddrive raidz1 > set with 9 drives (8+1), 2*4-Ddrive sets with 10 > drives (8+2), 3*3-Ddrive sets with > 12 drives (9+3). > > I'd buy 4 new drives and stick with the latter > 12-drive pool scenario - > 1) build a complete 4-drive raidz1 set (3-Ddrive + > 1*Pdrive), > 2) move over 3 drives worth of data, > 3) build and attach a fake 4-drive raidz1 set > (3-Ddrive + 1 missing Pdrive), > 4) move over 3 drives worth of data, > 5) build and attach a fake 4-drive raidz1 set > (3-Ddrive + 1 missing Pdrive), > 6) move over 2 drives worth of data, > 7) complete the parities for the missing Pdrives of > the two faked sets. > > This does not in any way involve the capacity of your > bootroot drives (which I think > were expected to be a CF card, no?). So you already > have at least one such drive ;) > Even if your current drive is partially consumed by > the root pool, I think you can > sacrifice some 20Gb on each drive in one 4-disk > raidz1 vdev. You can mirror the > root pool with one of these drives, and make a > mirrored swap pool on the other > couple. Ok I am going to have to read through this slowly and fully understand the fake raid scenario. What I am trying to avoid is having multiple raidz's because every time I have another one I loos
Re: [zfs-discuss] Very slow ZFS write speed to raw zvol
On Jul 9, 2009, at 4:22 AM, Jim Klimov wrote: To tell the truth, I expected zvols to be faster than filesystem datasets. They seem to have less overhead without inodes, posix, acls and so on. So I'm puzzled by test results. I'm now considering the dd i/o block size, and it means a lot indeed, especially if compared to zvol results with small blocks like 64k. I ran a number of tests with a zvol recreated by commands before each run (this may however cause varying fragmentation impacting results of different runs): # zfs destroy -r pond/test; zfs create -V 30G pond/test; zfs set compression=off pond/test; sync; dd if=/dev/zero of=/dev/zvol/rdsk/ pond/test count=1000 bs=512; sync and tests going like # time dd if=/dev/zero of=/dev/zvol/rdsk/pond/test count=1024 bs=1048576 1024+0 records in 1024+0 records out real0m42.442s user0m0.006s sys 0m4.292s The test progresses were quite jumpy (with "zpool iostat pond 1" values varying from 30 to 70 MBps, reads coming in sometimes). So I'd stick to overall result - the rounded wallclock time it takes to write 1024 records of varying size and resulting average end-user MBps. I also write "sys" time since that's what is consumed by the kernel and the disk subsystem, after all. I don't write zpool iostat speeds, since they vary too much and I don't bother with a spreadsheen right now. But the reported values stay about halfway between "wallclock MBps" ans "sys MBps" calculations, on the perceived average, peaking at about 350MBps for large block sizes (>4MB). 1 MB (bs=1048576): 42s (4s), 24MBps 4 MB (bs=4194304): 42s (15s), 96MBps 16MB (bs=16777216): 129s-148s (62-64s), 127-110MBps 32MB (bs=33554432, 40Gb zvol): 303s (127s), 108MBps Similar results for writing a file to a filesystem; "zpool iostat" values again jumped anywhere between single MBps to GBps. Simple cleanups used like: # rm /pool/test30g; sync; time dd if=/dev/zero of=/pool/test30g count=1024 bs=33554432 Values remain somewhat consistent (in the same league, at least): 1 MB (bs=1048576, 10240 blocks): 20-21s (7-8s), 512-487MBps 1 MB (bs=1048576): 2.3s (0.6s), 445MBps 4 MB (bs=4194304): 8s (3s), 512MBps 16MB (bs=16777216): 37s (15s), 442MBps 32MB (bs=33554432): 74-103s (32-42s), 442-318MBps 64Kb (bs=65536, 545664 blocks): 94s (47s), 362MBps All in all, to make more precise results these tests should be made in greater numbers and averaged. But here we got some figures to think about... On a side note, now I'll pay more attention to tuning suggestions which involve multi-megabyte buffers for network sockets, etc. They can actually cause an impact to performance many times over! On another note, For some reason I occasionally got results like this: write: File too large 1+0 records in 1+0 records out I think the zvol was not considered created by that time. In about 10-15 sec I was able to commence the test run. Perhaps it helped that I "initialized" the zvol by a small write after creation, then: # dd if=/dev/zero of=/dev/zvol/rdsk/pond/test count=1000 bs=512 Strange... When running throughput tests the block sizes to be concerned about are: 4k, 8k, 16k and 64k. These are the sizes that most file systems an databases use. If you get 4k to perform well chances are the others will fall into line. -Ross ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs root, jumpstart and flash archives
Thanks everyone for the patch IDs. On Wed, Jul 8, 2009 at 4:50 PM, Enda O'Connor wrote: > Hi > for sparc > 119534-15 > 124630-26 > > > for x86 > 119535-15 > 124631-27 > > higher rev's of these will also suffice. > > Note these need to be applied to the miniroot of the jumpstart image so that > it can then install zfs flash archive. > please read the README notes in these for more specific instructions, > including instructions on miniroot patching. > > Enda > > Fredrich Maney wrote: >> >> Any idea what the Patch ID was? >> >> fpsm >> >> On Wed, Jul 8, 2009 at 3:43 PM, Bob >> Friesenhahn wrote: >>> >>> On Wed, 8 Jul 2009, Jerry K wrote: >>> It has been a while since this has been discussed, and I am hoping that you can provide an update, or time estimate. As we are several months into Update 7, is there any chance of an Update 7 patch, or are we still waiting for Update 8. >>> >>> I saw that a Solaris 10 patch for supporting Flash archives on ZFS came >>> out >>> about a week ago. >>> >>> Bob >>> -- >>> Bob Friesenhahn >>> bfrie...@simple.dallas.tx.us, >>> http://www.simplesystems.org/users/bfriesen/ >>> GraphicsMagick Maintainer, http://www.GraphicsMagick.org/ >>> ___ >>> zfs-discuss mailing list >>> zfs-discuss@opensolaris.org >>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss >>> >> ___ >> zfs-discuss mailing list >> zfs-discuss@opensolaris.org >> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > > ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Issues with ZFS and SVM?
I wonder exactly what's going on. Perhaps it is the cache flushes that is causing the SCSI errors when trying to use the SSD (Intel X25-E and X25-M) disks? Btw, I'm seeing the same behaviour on both an X4500 (SATA/Marwell controller) and the X4240 (SAS/LSI controller). Well, almost. On the X4500 I didn't seen the errors printed on the console, but things behaved strangely - and I did see the same speedup. If SVM silently disables cache flushes then perhaps there should be a HUGE warning printed somewhere (ZFS FAQ? Solaris documentation? In zpool when creating/adding devices?) about using ZFS with SVM? I wonder what the potential danger might be _if_ SVM disables cache flushes for the SLOG... Sure, that might mean a missed update on the filesystem, but since the data disks on the pool is raw disk devices the ZFS filesystem should be stable (sans any possibly missed updates). I think I can live with that. What I don't want is a corrupt 16TB zpool in case of a power outage... Message was edited by: pen -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Migrating 10TB of data from NTFS is there a simple way?
One more note, > For example, if you were to remake the pool (as suggested above for rpool and > below for raidz data pool) - where would you re-get the original data for > copying > over again? Of course, if you take on with the idea of buying 4 drives and building a raidz1 vdev right away, and if you actually moved (deleted) the data from the NTFS disk, you should start by creating this new pool with a complete raidz1 vdev. Then you transfer (copy then delete) data to it from your current ZFS pool and only then you remake/migrate the root pool if needed. Perhaps it would make sense to start with a faked raidz1 array (along with a new smaller root pool on its drives) made of just 3 more 1Tb disks, so you would just recycle and add your current zfs drive as a parity disk to this pool after all is complete. As you see, there's lots of options depending on budget, creativity and other factors. It is possible that in the course of your quest you'll try several of them. Starting out with a transactionable approach (i.e. not deleting the originals until necessary) pays off in such cases. //Jim -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Migrating 10TB of data from NTFS is there a simple way?
> I installed opensolaris and setup rpool as my base install on a single 1TB > drive If I understand correctly, you have rpool and the data pool configured all as one pool? That's not probably what you'd really want. For one part, the bootable root pool should all be available to GRUB from a single hardware device and this precludes any striping or raidz configurations for the root pool (only single drives and mirrors are supported). You should rather make a separate root pool (depends on your installation size, RAM -> swap, number of OS versions to roll back); I'd suffice with anything from 8 to 20Gb. And the rest of the disk (as another slice) becomes the data pool which can later be expanded by adding stripes. Obviously, data already on the disk won't magically become striped to all drives unless you rewrite it. > a single 1TB drive Minor detail: I thought you were moving 1.5TB disks? Or did you find a drive with adequately few data (1 TB used)? > transfering data accross till the drive was empty I thought NTFS driver for Solaris is read-only? Not a good transactional approach. Delete original data only after all copying has completed (and perhaps cross-checked) and the disk can actually be reused in the ZFS pool. For example, if you were to remake the pool (as suggested above for rpool and below for raidz data pool) - where would you re-get the original data for copying over again? > I havent worked out if I can transform my zpool int a zraid after I have > copied all my data. My guess would be - no, you can't (not directly at least). I think you can mirror the striped pool's component drives on the fly, by buying new drives one at a time - which requires buying these drives. Or if you buy and attach all 8-9 drives at once, you can build another pool with raidz layout and migrate all data to it. Your old drives can then be attached to this pool as another raidz vdev stripe (or even mirror, but that's probably not needed for your usecase). These scenarios are not unlike raid50 or raid51, respectively. In case of striping, you can build and expand your pool by vdev's of different layout and size. As said before, currently there's a problem that you can't shrink the pool to remove devices (other than break mirrors into single drives). Perhaps you can get away by buying now only the "parity" drives for your future pool layout (which depends on the number of motherboard/controller connectors, and power source capacity, and your computer case size, etc.) and following the ideas for "best-case" scenario from my post. Then you'd start the pool by making a raidz1 device of 3-5 drives total (new empty ones, possibly including the "missing" fake parity device), and then making and attaching to the pool more new similar raidz vdev's as you free up NTFS disks. I did some calculations on this last evening. For example, if your data fits on 8 "data" drives, you can make 1*8-Ddrive raidz1 set with 9 drives (8+1), 2*4-Ddrive sets with 10 drives (8+2), 3*3-Ddrive sets with 12 drives (9+3). I'd buy 4 new drives and stick with the latter 12-drive pool scenario - 1) build a complete 4-drive raidz1 set (3-Ddrive + 1*Pdrive), 2) move over 3 drives worth of data, 3) build and attach a fake 4-drive raidz1 set (3-Ddrive + 1 missing Pdrive), 4) move over 3 drives worth of data, 5) build and attach a fake 4-drive raidz1 set (3-Ddrive + 1 missing Pdrive), 6) move over 2 drives worth of data, 7) complete the parities for the missing Pdrives of the two faked sets. This does not in any way involve the capacity of your bootroot drives (which I think were expected to be a CF card, no?). So you already have at least one such drive ;) Even if your current drive is partially consumed by the root pool, I think you can sacrifice some 20Gb on each drive in one 4-disk raidz1 vdev. You can mirror the root pool with one of these drives, and make a mirrored swap pool on the other couple. //Jim -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Booting from detached mirror disk
You might also want to force ZFS into accepting a faulty root pool: # zpool set failmode=continue rpool //Jim -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs snapshoot of rpool/* to usb removable drives?
You can also select which snapshots you'd like to copy - and egrep away what you don't need. Here's what I did to back up some servers to a filer (as compressed ZFS snapshots stored into files or further simple deployment on multiple servers, as well as offsite rsyncing of the said files). The example below is a framework from our scratchpad docs, modify it to a specific server's environment. Apparently, such sending and receiving examples (see below) can be piped together without use of files (and gzip, ssh, whatever) within a local system. # ZFS snapshot dumps # prepare TAGPRV='20090427-01' TAGNEW='20090430-01-running' zfs snapshot -r pool/zones@"$TAGNEW" # incremental dump over NFS (needs set TAGNEW/TAGPRV) cd /net/back-a/export/DUMP/manual/`hostname` && \ for ZSn in `zfs list -t snapshot | grep "$TAGNEW" | awk '{ print $1 }'`; do ZSp=`echo $ZSn | sed "s/$TAGNEW/$TAGPRV/"`; Fi="`hostname`%`echo $ZSn | sed 's/\//_/g'`.incr.zfsshot.gz"; echo "=== `date`"; echo "= prev: $ZSp"; echo "= new: $ZSn"; echo "= new: incr-file: $Fi"; /bin/time zfs send -i "$ZSp" "$ZSn" | /bin/time pigz -c - > "$Fi"; echo " res = [$?]"; done # incremental dump over ssh (needs set TAGNEW/TAGPRV; paths hardcoded in the end) for ZSn in `zfs list -t snapshot | grep "$TAGNEW" | awk '{ print $1 }'`; do ZSp=`echo $ZSn | sed "s/$TAGNEW/$TAGPRV/"`; Fi="`hostname`%`echo $ZSn | sed 's/\//_/g'`.incr.zfsshot.gz"; echo "=== `date`"; echo "= prev: $ZSp"; echo "= new: $ZSn"; echo "= new: incr-file: $Fi"; /bin/time zfs send -i "$ZSp" "$ZSn" | /bin/time pigz -c - | ssh back-a "cat > /export/DUMP/manual/`hostname`/$Fi"; echo " res = [$?]"; done All in all, these lines send an incremental snapshot between $TAGPRV and $TAGNEW to per-server directories into per-snapshot files. They are quickly compressed with pigz (parallel gzip) before writing. First of all you'd of course need an initial dump (a full dump of any snapshot): # Initial dump of everything except swap volumes zfs list -H -t snapshot | egrep -vi 'swap|rpool/dump' | grep "@$TAGPRV" | awk '{ print $1 }' | while read Z; do F="`hostname`%`echo $Z | sed 's/\//_/g'`.zfsshot"; echo "`date`: $Z > $F.gz"; time zfs send "$Z" | pigz -9 > $F.gz; done Now, if your snapshots were named in an incrementing manner (like these timestamped examples above), you are going to have a directory with files named like this (it's assumed that incremented snapshots all make up a valid chain): servername%p...@20090214-01.zfsshot.gz servername%pool_zo...@20090214-01.zfsshot.gz servername%pool_zo...@20090405-03.incr.zfsshot.gz servername%pool_zo...@20090427-01.incr.zfsshot.gz servername%pool_zones_gene...@20090214-01.zfsshot.gz servername%pool_zones_gene...@20090405-03.incr.zfsshot.gz servername%pool_zones_gene...@20090427-01.incr.zfsshot.gz servername%pool_zones_general_...@20090214-01.zfsshot.gz servername%pool_zones_general_...@20090405-03.incr.zfsshot.gz servername%pool_zones_general_...@20090427-01.incr.zfsshot.gz The last one is a large snapshot of the zone (ns4) while the first ones are small datasets which simply form nodes in the hierarchical tree. There's lots of these usually :) You can simply import these files into a zfs pool by a script like: # for F in *.zfsshot.gz; do echo "=== $F"; gzcat "$F" | time zfs recv -nFvd pool; done Probably better use "zfs recv -nFvd" first (no-write verbose mode) to be certain about your write-targets and about overwriting stuff (i.e. "zfs recv -F" would destroy any newer snapshots, if any - so you can first check which ones, and possibly clone/rename them first). // HTH, Jim Klimov -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Very slow ZFS write speed to raw zvol
To tell the truth, I expected zvols to be faster than filesystem datasets. They seem to have less overhead without inodes, posix, acls and so on. So I'm puzzled by test results. I'm now considering the dd i/o block size, and it means a lot indeed, especially if compared to zvol results with small blocks like 64k. I ran a number of tests with a zvol recreated by commands before each run (this may however cause varying fragmentation impacting results of different runs): # zfs destroy -r pond/test; zfs create -V 30G pond/test; zfs set compression=off pond/test; sync; dd if=/dev/zero of=/dev/zvol/rdsk/pond/test count=1000 bs=512; sync and tests going like # time dd if=/dev/zero of=/dev/zvol/rdsk/pond/test count=1024 bs=1048576 1024+0 records in 1024+0 records out real0m42.442s user0m0.006s sys 0m4.292s The test progresses were quite jumpy (with "zpool iostat pond 1" values varying from 30 to 70 MBps, reads coming in sometimes). So I'd stick to overall result - the rounded wallclock time it takes to write 1024 records of varying size and resulting average end-user MBps. I also write "sys" time since that's what is consumed by the kernel and the disk subsystem, after all. I don't write zpool iostat speeds, since they vary too much and I don't bother with a spreadsheen right now. But the reported values stay about halfway between "wallclock MBps" ans "sys MBps" calculations, on the perceived average, peaking at about 350MBps for large block sizes (>4MB). 1 MB (bs=1048576): 42s (4s), 24MBps 4 MB (bs=4194304): 42s (15s), 96MBps 16MB (bs=16777216): 129s-148s (62-64s), 127-110MBps 32MB (bs=33554432, 40Gb zvol): 303s (127s), 108MBps Similar results for writing a file to a filesystem; "zpool iostat" values again jumped anywhere between single MBps to GBps. Simple cleanups used like: # rm /pool/test30g; sync; time dd if=/dev/zero of=/pool/test30g count=1024 bs=33554432 Values remain somewhat consistent (in the same league, at least): 1 MB (bs=1048576, 10240 blocks): 20-21s (7-8s), 512-487MBps 1 MB (bs=1048576): 2.3s (0.6s), 445MBps 4 MB (bs=4194304): 8s (3s), 512MBps 16MB (bs=16777216): 37s (15s), 442MBps 32MB (bs=33554432): 74-103s (32-42s), 442-318MBps 64Kb (bs=65536, 545664 blocks): 94s (47s), 362MBps All in all, to make more precise results these tests should be made in greater numbers and averaged. But here we got some figures to think about... On a side note, now I'll pay more attention to tuning suggestions which involve multi-megabyte buffers for network sockets, etc. They can actually cause an impact to performance many times over! On another note, For some reason I occasionally got results like this: write: File too large 1+0 records in 1+0 records out I think the zvol was not considered created by that time. In about 10-15 sec I was able to commence the test run. Perhaps it helped that I "initialized" the zvol by a small write after creation, then: # dd if=/dev/zero of=/dev/zvol/rdsk/pond/test count=1000 bs=512 Strange... -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zpool replace leaves pool degraded after resilvering
I forgot to mention this is a SunOS biscotto 5.11 snv_111a i86pc i386 i86pc version. Maurilio. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] zpool replace leaves pool degraded after resilvering
Hi, I have a pc where a pool suffered a disk failure, I did replace the failed disk and the pool resilvered but, after resilvering, it was in this state mauri...@biscotto:~# zpool status iscsi pool: iscsi state: DEGRADED status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: http://www.sun.com/msg/ZFS-8000-9P scrub: resilver completed after 12h33m with 0 errors on Thu Jul 9 00:07:12 2009 config: NAME STATE READ WRITE CKSUM iscsiDEGRADED 0 0 0 mirror ONLINE 0 0 0 c2t0d0 ONLINE 0 0 0 c2t5d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c2t6d0 ONLINE 0 0 0 c2t8d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c11t0d0 ONLINE 0 0 0 c11t1d0 ONLINE 0 0 0 mirror DEGRADED 0 0 0 c11t2d0 ONLINE 0 0 0 c11t3d0 DEGRADED 0 0 23,0M too many errors cache c1t4d0 ONLINE 0 0 0 errors: No known data errors It says it resilvered ok and that there are no known data errors, but pool is still marked as degraded. I did a zpool clear and now it says it is ok mauri...@biscotto:~# zpool status pool: iscsi state: ONLINE scrub: resilver completed after 12h33m with 0 errors on Thu Jul 9 00:07:12 2009 config: NAME STATE READ WRITE CKSUM iscsiONLINE 0 0 0 mirror ONLINE 0 0 0 c2t0d0 ONLINE 0 0 0 c2t5d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c2t6d0 ONLINE 0 0 0 c2t8d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c11t0d0 ONLINE 0 0 0 c11t1d0 ONLINE 0 0 0 mirror ONLINE 0 0 0 c11t2d0 ONLINE 0 0 0 c11t3d0 ONLINE 0 0 0 326G resilvered cache c1t4d0 ONLINE 0 0 0 errors: No known data errors Look at c11t3d0 which now reads 326G resilvered; my question is: is the pool ok? Why had I to issue a zpool clear if the resilvering process completed without problems? Best regards. Maurilio. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Very slow ZFS write speed to raw zvol
Hmm, scratch that. Maybe. I did not first get the point that your writes to a filesystem dataset work quickly. Perhaps filesystem is (better) cached indeed, i.e. *maybe* zvol writes are synchronous and zfs writes may be cached and thus async? Try playing around with relevant dataset attributes... I'm running a test on my system (a snv_114 Thumper, 16Gb RAM, used for other purposes as well), the CPU is mostly idle now (2.5-3.2% kernel time, that's about it). Seems I have results not unlike yours. Not cool because I wanted to play with COMSTAR iSCSI - and I'm not sure it will perform well ;) I'm dd'ing 30Gb to an uncompressed test zvol with same 64kb block sizes (maybe they are too small?), and zpool iostat goes like this - a hundred IOs at 7Mbps for a minute, then a burst of 100-170Mbps and 20-25K IOps for a second: pond5.79T 4.41T 0106 0 7.09M pond5.79T 4.41T 0 1.93K 0 20.7M pond5.79T 4.41T 0 13.3K 0 106M pond5.79T 4.41T 0116 0 7.76M pond5.79T 4.41T 0108 0 7.23M pond5.79T 4.41T 0107 0 7.16M pond5.79T 4.41T 0107 0 7.16M or pond5.79T 4.41T 0117 0 7.83M pond5.79T 4.41T 0 5.61K 0 49.7M pond5.79T 4.41T 0 19.0K504 149M pond5.79T 4.41T 0104 0 6.96M Weird indeed. It wrote 10Gb (according to "zfs get usedbydataset pond/test") taking roughly 30 minutes after which I killed it. Now, writing to an uncompressed filesystem dataset (although very far from what's trumpeted as Thumper performance) yields quite different numbers: pond5.80T 4.40T 1 3.64K 1022 457M pond5.80T 4.40T 0866967 75.7M pond5.80T 4.40T 0 4.65K 0 586M pond5.80T 4.40T 6802 33.4K 69.2M pond5.80T 4.40T 29 2.44K 1.10M 301M pond5.80T 4.40T 32691 735K 25.0M pond5.80T 4.40T 56 1.59K 2.29M 184M pond5.80T 4.40T150768 4.61M 10.5M pond5.80T 4.40T 2 0 25.5K 0 pond5.80T 4.40T 0 2.75K 0 341M pond5.80T 4.40T 7 3.96K 339K 497M pond5.80T 4.39T 85740 3.57M 59.0M pond5.80T 4.39T 67 0 2.22M 0 pond5.80T 4.39T 9 4.67K 292K 581M pond5.80T 4.39T 4 1.07K 126K 137M pond5.80T 4.39T 27333 338K 9.15M pond5.80T 4.39T 5 0 28.0K 3.99K pond5.82T 4.37T 1 5.42K 1.67K 677M pond5.83T 4.37T 3 1.69K 8.36K 173M pond5.83T 4.37T 2 0 5.49K 0 pond5.83T 4.37T 0 6.32K 0 790M pond5.83T 4.37T 2290 7.95K 27.8M pond5.83T 4.37T 0 9.64K 1.23K 1.18G The numbers are jumpy (maybe due to fragmentation, other processes, etc.) but there are often spikes in excess of 500MBps. The whole test took a relatively little time: # time dd if=/dev/zero of=/pond/tmpnocompress/test30g bs=65536 count=50 50+0 records in 50+0 records out real1m27.657s user0m0.302s sys 0m46.976s # du -hs /pond/tmpnocompress/test30g 30G /pond/tmpnocompress/test30g To detail about the pool: The pool is on a Sun X4500 with 48 250Gb SATA drives. It was created as a 9x5 set (9 stripes made of 5-disk raidz1 vdevs) spread across different controllers, with the command: # zpool create -f pond \ raidz1 c0t0d0 c1t0d0 c4t0d0 c6t0d0 c7t0d0 \ raidz1 c0t1d0 c1t2d0 c4t3d0 c6t5d0 c7t6d0 \ raidz1 c1t1d0 c4t1d0 c5t1d0 c6t1d0 c7t1d0 \ raidz1 c0t2d0 c4t2d0 c5t2d0 c6t2d0 c7t2d0 \ raidz1 c0t3d0 c1t3d0 c5t3d0 c6t3d0 c7t3d0 \ raidz1 c0t4d0 c1t4d0 c4t4d0 c6t4d0 c7t4d0 \ raidz1 c0t5d0 c1t5d0 c4t5d0 c5t5d0 c7t5d0 \ raidz1 c0t6d0 c1t6d0 c4t6d0 c5t6d0 c6t6d0 \ raidz1 c1t7d0 c4t7d0 c5t7d0 c6t7d0 c7t7d0 \ spare c0t7d0 Alas, while there were many blogs, I couldn't find a definitive answer last year as to which Thumper layout is optimal in performance and/or reliability (in regard to 6 controllers of 8 disks each, with 2 disks on one of the controllers reserved for booting). As a result, we spread each raidz1 across 5 controllers, so the loss of one controller should have minimal impact on data loss on the average. Since the system layout is not symmetrical, some controllers are more important than others (say, the boot one). //Jim -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Migrating 10TB of data from NTFS is there a simple way?
> Ok so this is my solution, pls be advised I am a > total linux nube so I am learning as I go along. I > installed opensolaris and setup rpool as my base > install on a single 1TB drive. I attached one of my > NTFS drives to the system then used a utility called > prtparts to get the name of the NTFS drive attached > and then mounted it succesfully. > I then started transfering data accross till the > drive was empty (this is currently in progress) Once > thats done I will add the empty NTFS drive to my ZFS > pool and repeat the operation with my other drives. > > This leaves me with the issue of redundancy which is > sorely lacking, ideally I would like to do the same > think directly into a zraid pool, but I understand > from what I have read that you cant add single drives > to a zraid and I want all my drives in a single pool > as only to loose the space for the pool redundancy > once. > > I havent worked out if I can transform my zpool int a > zraid after I have copied all my data. > > Once again thx for the great support. And maybe > someone can direct me to an area in a forum that > explains y I cant use sudo... Hope this helps http://forums.opensolaris.com/thread.jspa?threadID=583&tstart=-1 -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss