[zfs-discuss] problem with zpool import - zil and cache drive are not displayed?
I'm at a loss, I've managed to get myself into a fix. I'm not sure where the problem is, but essentially i have a zpool i cannot import. This particular pool used to have a two drives (not shown below), one for cache and another for log. I'm unsure why they are no longer detected on zpool import... the disks are still connected to the system and show up when running format for a list. dar...@lexx:~# zpool import pool: tank id: 15136317365944618902 state: UNAVAIL status: The pool was last accessed by another system. action: The pool cannot be imported due to damaged devices or data. see: http://www.sun.com/msg/ZFS-8000-EY config: tank UNAVAIL missing device raidz1-0 ONLINE c6t4d0 ONLINE c6t5d0 ONLINE c6t6d0 ONLINE c6t7d0 ONLINE raidz1-1 ONLINE c6t0d0 ONLINE c6t1d0 ONLINE c6t2d0 ONLINE c6t3d0 ONLINE dar...@lexx:~# The above disks are the data disks which appear to be online without issue. i was running version 22 on this pool. Any help appreciated -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] pool scrub clean, filesystem broken
Cindy, Thanks for the quick response. Consulting ZFS history I note the following actions: imported my three disk raid-z pool originally created on the most recent version of OpenSolaris but now running NexantaStor 3.03 upgraded my pool destroyed two file systems I was no longer using (neither of these were of course the file system at issue) destroyed a snapshot on another filesystem played around with permissions (these were my only actions directly on the file system) None of these actions seemed to have a negative impact on the filesystem and it was working well when I gracefully shutdown (to physically move the computer). I am a bit at a loss. With copy-on-write and a clean pool how can I have corruption? -brian On Mon, Aug 2, 2010 at 12:52 PM, Cindy Swearingen cindy.swearin...@oracle.com wrote: Brian, You might try using zpool history -il to see what ZFS operations, if any, might have lead up to this problem. If zpool history doesn't provide any clues, then what other operations might have occurred prior to this state? It looks like something trappled this file system... Thanks, Cindy On 08/02/10 10:26, Brian wrote: Thanks Preston. I am actually using ZFS locally, connected directly to 3 sata drives in a raid-z pool. The filesystem is ZFS and it mounts without complaint and the pool is clean. I am at a loss as to what is happening. -brian -- Brian Merrell, Director of Technology Backstop LLP 1455 Pennsylvania Ave., N.W. Suite 400 Washington, D.C. 20004 202-628-BACK (2225) merre...@backstopllp.com www.backstopllp.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] iScsi slow
Le 27 mai 2010 à 07:03, Brent Jones a écrit : On Wed, May 26, 2010 at 5:08 AM, Matt Connolly matt.connolly...@gmail.com wrote: I've set up an iScsi volume on OpenSolaris (snv_134) with these commands: sh-4.0# zfs create rpool/iscsi sh-4.0# zfs set shareiscsi=on rpool/iscsi sh-4.0# zfs create -s -V 10g rpool/iscsi/test The underlying zpool is a mirror of two SATA drives. I'm connecting from a Mac client with global SAN initiator software, connected via Gigabit LAN. It connects fine, and I've initialiased a mac format volume on that iScsi volume. Performance, however, is terribly slow, about 10 times slower than an SMB share on the same pool. I expected it would be very similar, if not faster than SMB. Here's my test results copying 3GB data: iScsi: 44m01s 1.185MB/s SMB share: 4m2711.73MB/s Reading (the same 3GB) is also worse than SMB, but only by a factor of about 3: iScsi: 4m3611.34MB/s SMB share: 1m4529.81MB/s cleaning up some old mail Not unexpected. Filesystems have readahead code to prefetch enough to cover the latency of the read request. iSCSI only responds to the request. Put a filesystem on top of iscsi and try again. For writes, iSCSI is synchronous and SMB is not. -r Is there something obvious I've missed here? -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss Try jumbo frames, and making sure flow control is enabled on your iSCSI switches and all network cards -- Brent Jones br...@servuhome.net ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] When is the L2ARC refreshed if on a separate drive?
I'm running a mirrored pair of 2 TB SATA drives as my data storage drives on my home workstation, a Core i7-based machine with 10 GB of RAM. I recently added a sandforce-based 60 GB SSD (OCZ Vertex 2, NOT the pro version) as an L2ARC to the single mirrored pair. I'm running B134, with ZFS pool version 22, with dedup enabled. If I understand correctly, the dedup table should be in the L2ARC on the SSD, and I should have enough RAM to keep the references to that table in memory, and that this is therefore a well-performing solution. My question is what happens at power off. Does the cache device essentially get cleared, and the machine has to rebuild it when it boots? Or is it persistent. That is, should performance improve after a little while following a reboot, or is it always constant once it builds the L2ARC once? Rather informally, it sometimes seems that the hard drives are a bit slower the first time they load a program now, vs. when I didn't have the SSD installed as a cache device on the pool. But this is mainly an impression. Thanks for your help! -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Using multiple logs on single SSD devices
- Second question, how about this: partition the two X25E drives into two, and then mirror each half of each drive as log devices for each pool. Am I missing something with this scheme? On boot, will the GUID for each pool get found by the system from the partitioned log drives? IIRC several posts in here, some by Cindy, have been about using devices shared among pools, and what's said is that this is not recommended because of potential deadlocks. If I were you, I'd get another couple of SSDs for the new pool. Vennlige hilsener / Best regards roy -- Roy Sigurd Karlsbakk (+47) 97542685 r...@karlsbakk.net http://blogg.karlsbakk.net/ -- I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og relevante synonymer på norsk. -- Vennlige hilsener / Best regards roy -- Roy Sigurd Karlsbakk (+47) 97542685 r...@karlsbakk.net http://blogg.karlsbakk.net/ -- I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og relevante synonymer på norsk. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] LTFS and LTO-5 Tape Drives
Has anyone looked into the new LTFS on LTO-5 for tape backups? Any idea how this would work with ZFS? I'm presuming ZFS send / receive are not going to work. But it seems rather appealing to have the metadata properly with the data, and being able to browse files directly instead of having to rely on backup software, however nice tar may be. Has anyone used this with OpenSolaris, or have an opinion on how this would work in practice? Thanks! -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] problem with zpool import - zil and cache drive are not displayed?
Darren, It looks like you've lost your log device. The newly integrated missing log support will help once it's available. In the meantime, you should run 'zdb -l' on your log device to make sure the label is still intact. Thanks, George Darren Taylor wrote: I'm at a loss, I've managed to get myself into a fix. I'm not sure where the problem is, but essentially i have a zpool i cannot import. This particular pool used to have a two drives (not shown below), one for cache and another for log. I'm unsure why they are no longer detected on zpool import... the disks are still connected to the system and show up when running format for a list. dar...@lexx:~# zpool import pool: tank id: 15136317365944618902 state: UNAVAIL status: The pool was last accessed by another system. action: The pool cannot be imported due to damaged devices or data. see: http://www.sun.com/msg/ZFS-8000-EY config: tank UNAVAIL missing device raidz1-0 ONLINE c6t4d0 ONLINE c6t5d0 ONLINE c6t6d0 ONLINE c6t7d0 ONLINE raidz1-1 ONLINE c6t0d0 ONLINE c6t1d0 ONLINE c6t2d0 ONLINE c6t3d0 ONLINE dar...@lexx:~# The above disks are the data disks which appear to be online without issue. i was running version 22 on this pool. Any help appreciated ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Which ZFS events/errors appears in FMA?
Hi, Is there a summary somewhere which describes exactly which ZFS related events/errors appears in FMA today, also some sort of roadmap about events/errors that are planned to be reported via FMA in the future? Regards, sendai -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS Restripe
Hi, I have a large pool (~50TB total, ~42TB usable), composed of 4 raidz1 volumes (of 7 x 2TB disks each): # zpool iostat -v | grep -v c4 capacity operationsbandwidth pool used avail read write read write - - - - - - backup35.2T 15.3T602272 15.3M 11.1M raidz1 11.6T 1.06T138 49 2.99M 2.33M raidz1 11.8T 845G163 54 3.82M 2.57M raidz1 6.00T 6.62T161 84 4.50M 3.16M raidz1 5.88T 6.75T139 83 4.01M 3.09M - - - - - - Originally there were only the first two raidz1 volumes, and the two from the bottom were added later. You can notice that by the amount of used / free space. The first two volumes have ~11TB used and ~1TB free, while the other two have around ~6TB used and ~6TB free. I have hundreds of zfs'es storing backups from several servers. Each ZFS has about 7 snapshots of older backups. I have the impression I'm getting degradation in performance due to the limited space in the first two volumes, specially the second, which has only 845GB free. Is there any way to re-stripe the pool, so I can take advantage of all spindles across the raidz1 volumes? Right now it looks like the newer volumes are doing the heavy while the other two just hold old data. Thanks, Eduardo Bragatto ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Using multiple logs on single SSD devices
On Aug 2, 2010, at 8:18 PM, Edward Ned Harvey wrote: From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- boun...@opensolaris.org] On Behalf Of Jonathan Loran Because you're at pool v15, it does not matter if the log device fails while you're running, or you're offline and trying to come online, or whatever. Simply if the log device fails, unmirrored, and the version is less than 19, the pool is simply lost. There are supposedly techniques to recover, so it's not necessarily a data unrecoverable by any means situation, but you certainly couldn't recover without a server crash, or at least shutdown. And it would certainly be a nightmare, at best. The system will not fall back to ZIL in the main pool. That was a feature created in v19. Yes, after sending my query yesterday, I found the zfs best practices guide, which I haven't read for a long time, many update w/r to SSD devices (many by you Ed, no?). I also found the long thread on this list, which somehow I missed in my first pass about SSD best practices. After reading this, I became much more nervious. My previous assumption when I added the log was based upon the IOP rate I saw to the ZIL, and the number of IOP an Intel X25e could take, and it looked like the drive should last a few years, at least. But of course, that ssumes no other failure modes. Given the high price of failure, now that I know the system will suddenly go south, I realized that action needed to be taken ASAP to mirror the log. I'm afraid it's too late for that, unless you're willing to destroy recreate your pool. You cannot remove the existing log device. You cannot shrink it. You cannot replace it with a smaller one. The only things you can do right now are: (a) Start mirroring that log device with another device of the same size or larger. or (b) Buy another SSD which is larger than the first. Create a slice on the 2nd which is equal to the size of the first. Mirror the first onto the slice of the 2nd. After resilver, detach the first drive, and replace it with another one of the larger drives. Slice the 3rd drive just like the 2nd, and mirror the 2nd drive slice onto it. Now you've got a mirrored sliced device, without any downtime, but you had to buy 2x 2x larger drives in order to do it. or (c) Destroy recreate your whole pool, but learn from your mistake. This time, slice each SSD, and mirror the slices to form the log device. BTW, ask me how I know this in such detail? It's cuz I made the same mistake last year. There was one interesting possibility we considered, but didn't actually implement: We are running a stripe of mirrors. We considered the possibility of breaking the mirrors, creating a new pool out of the other half using the SSD properly sliced. Using zfs send to replicate all the snapshots over to the new pool, up to a very recent time. Then, we'd be able to make a very short service window. Shutdown briefly, send that one final snapshot to the new pool, destroy the old pool, rename the new pool to take the old name, and bring the system back up again. Instead of scheduling a long service window. As soon as the system is up again, start mirroring and resilvering (er ... initial silvering), and of course, slice the SSD before attaching the mirror. Naturally there is some risk, running un-mirrored long enough to send the snaps... and so forth. Anyway, just an option to consider. Destroying this pool is very much off the table. It holds home directories for our whole lab, about 375 of them. If I take the system offline, then no one works until it's back up. You could say this machine is mission critical. The host has been very reliable. Everyone is now spoiled by how it never goes down, and I'm very proud of that fact. The only way I could recreate the pool would be through some clever means like you give, or I thought perhaps using AVS to replicate one side of the mirror, then everything could be done through a quick reboot. One other idea I had was using a sparse zvol for the log, but I think eventually, the sparse volume would fill up beyond its physical capacity. On top of that, this would mean we would have a log that is a zvol from another zpool, which I think could a cause boot race condition. I think the real solution to my immediate problem is this: Bite the bullet, and add storage to the existing pool. It won't be as clean as I like, and it would disturb my nicely balanced mirror stripe with new large empty vdevs, which I fear could impact performance down the road when the original stripe fills up, and all writes go to the new vdevs. Perhaps by the time that happens, the feature to rebalance the pool will be available, if that's even being worked on. Maybe that's wishful thinking. At any rate, if I don't have to add another pool, I can mirror the logs I have: problem solved. Finally, I'm told by my SE that ZFS in
[zfs-discuss] snapshot space - miscalculation?
zfs get all claims that i have 523G used by snapshot. i want to get rid of it. but when i look at the space used by each snapshot i can't find the one that can occupy so much space daten/backups used 959G - daten/backups usedbysnapshots 523G - daten/backups usedbydataset 437G - daten/backups usedbychildren 0 - daten/backups usedbyrefreservation0 - daten/back...@zfs-auto-snap_hourly-2009-12-20-16_00 used 228M - daten/back...@zfs-auto-snap_hourly-2009-12-20-17_00 used 150K - daten/back...@zfs-auto-snap_monthly-2009-12-25-21_43 used 7,94M- daten/back...@zfs-auto-snap_monthly-2010-02-01-00_00 used 60,3M- daten/back...@zfs-auto-snap:daily-2010-03-01-19:20used 0- daten/back...@zfs-auto-snap:monthly-2010-03-01-19:20 used 0- daten/back...@zfs-auto-snap:weekly-2010-03-01-19:20 used 0- daten/back...@zfs-auto-snap:daily-2010-03-02-00:00used 0- daten/back...@zfs-auto-snap:daily-2010-03-03-00:00used 0- daten/back...@zfs-auto-snap:daily-2010-03-04-00:00used 0- daten/back...@zfs-auto-snap_monthly-2010-03-04-20_27 used 0- daten/back...@zfs-auto-snap_monthly-2010-04-01-00_00 used 57,4M- daten/back...@zfs-auto-snap_monthly-2010-05-01-00_00 used 57,5M- daten/back...@zfs-auto-snap_monthly-2010-06-01-00_00 used 57,4M- daten/back...@zfs-auto-snap_monthly-2010-07-01-00_00 used 0- daten/back...@zfs-auto-snap_daily-2010-07-02-00_00used 0- daten/back...@zfs-auto-snap_daily-2010-07-03-00_00used 0- daten/back...@zfs-auto-snap_daily-2010-07-04-00_00used 0- daten/back...@zfs-auto-snap_daily-2010-07-05-00_00used 2,92M- daten/back...@zfs-auto-snap_daily-2010-07-06-00_00used 132K - daten/back...@zfs-auto-snap_daily-2010-07-07-00_00used 0- daten/back...@zfs-auto-snap_daily-2010-07-08-00_00used 0- daten/back...@zfs-auto-snap_daily-2010-07-09-00_00used 0- daten/back...@zfs-auto-snap_daily-2010-07-10-00_00used 0- daten/back...@zfs-auto-snap_daily-2010-07-11-00_00used 0
Re: [zfs-discuss] When is the L2ARC refreshed if on a separate drive?
On 03 August, 2010 - valrh...@gmail.com sent me these 1,2K bytes: I'm running a mirrored pair of 2 TB SATA drives as my data storage drives on my home workstation, a Core i7-based machine with 10 GB of RAM. I recently added a sandforce-based 60 GB SSD (OCZ Vertex 2, NOT the pro version) as an L2ARC to the single mirrored pair. I'm running B134, with ZFS pool version 22, with dedup enabled. If I understand correctly, the dedup table should be in the L2ARC on the SSD, and I should have enough RAM to keep the references to that table in memory, and that this is therefore a well-performing solution. My question is what happens at power off. Does the cache device essentially get cleared, and the machine has to rebuild it when it boots? Or is it persistent. That is, should performance improve after a little while following a reboot, or is it always constant once it builds the L2ARC once? L2ARC is currently cleared at boot. There is an RFE to make it persistent. /Tomas -- Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/ |- Student at Computing Science, University of Umeå `- Sysadmin at {cs,acc}.umu.se ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Using multiple logs on single SSD devices
On Aug 3, 2010, at 9:29 AM, Roy Sigurd Karlsbakk wrote: - Second question, how about this: partition the two X25E drives into two, and then mirror each half of each drive as log devices for each pool. Am I missing something with this scheme? On boot, will the GUID for each pool get found by the system from the partitioned log drives? IIRC several posts in here, some by Cindy, have been about using devices shared among pools, and what's said is that this is not recommended because of potential deadlocks. No, you misunderstand. The potential deadlock condition occurs when you use ZFS in a single system to act as both the file system and a device. For example, using a zvol on rpool as a ZIL for another pool. For devices themselves, ZFS has absolutely no problem using block devices as presented by partitions or slices. This has been true for all file systems for all time. -- richard -- Richard Elling rich...@nexenta.com +1-760-896-4422 Enterprise class storage for everyone www.nexenta.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] iScsi slow
On Aug 3, 2010, at 5:56 PM, Robert Milkowski mi...@task.gda.pl wrote: On 03/08/2010 22:49, Ross Walker wrote: On Aug 3, 2010, at 12:13 PM, Roch Bourbonnaisroch.bourbonn...@sun.com wrote: Le 27 mai 2010 à 07:03, Brent Jones a écrit : On Wed, May 26, 2010 at 5:08 AM, Matt Connolly matt.connolly...@gmail.com wrote: I've set up an iScsi volume on OpenSolaris (snv_134) with these commands: sh-4.0# zfs create rpool/iscsi sh-4.0# zfs set shareiscsi=on rpool/iscsi sh-4.0# zfs create -s -V 10g rpool/iscsi/test The underlying zpool is a mirror of two SATA drives. I'm connecting from a Mac client with global SAN initiator software, connected via Gigabit LAN. It connects fine, and I've initialiased a mac format volume on that iScsi volume. Performance, however, is terribly slow, about 10 times slower than an SMB share on the same pool. I expected it would be very similar, if not faster than SMB. Here's my test results copying 3GB data: iScsi: 44m01s 1.185MB/s SMB share: 4m2711.73MB/s Reading (the same 3GB) is also worse than SMB, but only by a factor of about 3: iScsi: 4m3611.34MB/s SMB share: 1m4529.81MB/s cleaning up some old mail Not unexpected. Filesystems have readahead code to prefetch enough to cover the latency of the read request. iSCSI only responds to the request. Put a filesystem on top of iscsi and try again. For writes, iSCSI is synchronous and SMB is not. It may be with ZFS, but iSCSI is neither synchronous nor asynchronous is is simply SCSI over IP. It is the application using the iSCSI protocol that determines whether it is synchronous, issue a flush after write, or asynchronous, wait until target flushes. I think the ZFS developers didn't quite understand that and wanted strict guidelines like NFS has, but iSCSI doesn't have those, it is a lower level protocol than NFS is, so they forced guidelines on it and violated the standard. Nothing has been violated here. Look for WCE flag in COMSTAR where you can control how a given zvol should behave (synchronous or asynchronous). Additionally in recent build you have zfs set sync={disabled|default|always} which also works with zvols. So you do have a control over how it is supposed to behave and to make it nice it is even on per zvol basis. It is just that the default is synchronous. Ah, ok, my experience has been with Solaris and the iscsitgt which, correct me if I am wrong, is still synchronous only. -Ross ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] iScsi slow
-Original Message- From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-boun...@opensolaris.org] On Behalf Of Robert Milkowski Sent: Tuesday, August 03, 2010 5:57 PM To: zfs-discuss@opensolaris.org Subject: Re: [zfs-discuss] iScsi slow On 03/08/2010 22:49, Ross Walker wrote: On Aug 3, 2010, at 12:13 PM, Roch Bourbonnaisroch.bourbonn...@sun.com wrote: Le 27 mai 2010 à 07:03, Brent Jones a écrit : On Wed, May 26, 2010 at 5:08 AM, Matt Connolly matt.connolly...@gmail.com wrote: I've set up an iScsi volume on OpenSolaris (snv_134) with these commands: sh-4.0# zfs create rpool/iscsi sh-4.0# zfs set shareiscsi=on rpool/iscsi sh-4.0# zfs create -s -V 10g rpool/iscsi/test The underlying zpool is a mirror of two SATA drives. I'm connecting from a Mac client with global SAN initiator software, connected via Gigabit LAN. It connects fine, and I've initialiased a mac format volume on that iScsi volume. Performance, however, is terribly slow, about 10 times slower than an SMB share on the same pool. I expected it would be very similar, if not faster than SMB. Here's my test results copying 3GB data: iScsi: 44m01s 1.185MB/s SMB share: 4m2711.73MB/s Reading (the same 3GB) is also worse than SMB, but only by a factor of about 3: iScsi: 4m3611.34MB/s SMB share: 1m4529.81MB/s cleaning up some old mail Not unexpected. Filesystems have readahead code to prefetch enough to cover the latency of the read request. iSCSI only responds to the request. Put a filesystem on top of iscsi and try again. For writes, iSCSI is synchronous and SMB is not. It may be with ZFS, but iSCSI is neither synchronous nor asynchronous is is simply SCSI over IP. It is the application using the iSCSI protocol that determines whether it is synchronous, issue a flush after write, or asynchronous, wait until target flushes. I think the ZFS developers didn't quite understand that and wanted strict guidelines like NFS has, but iSCSI doesn't have those, it is a lower level protocol than NFS is, so they forced guidelines on it and violated the standard. Nothing has been violated here. Look for WCE flag in COMSTAR where you can control how a given zvol should behave (synchronous or asynchronous). Additionally in recent build you have zfs set sync={disabled|default|always} which also works with zvols. So you do have a control over how it is supposed to behave and to make it nice it is even on per zvol basis. It is just that the default is synchronous. And if it's synchronous, you can still accelerate performance by using L2ARC and SLOG devices, just like you can with NFS, correct? -Will ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] iScsi slow
On Aug 3, 2010, at 5:56 PM, Robert Milkowski mi...@task.gda.pl wrote: On 03/08/2010 22:49, Ross Walker wrote: On Aug 3, 2010, at 12:13 PM, Roch Bourbonnaisroch.bourbonn...@sun.com wrote: Le 27 mai 2010 à 07:03, Brent Jones a écrit : On Wed, May 26, 2010 at 5:08 AM, Matt Connolly matt.connolly...@gmail.com wrote: I've set up an iScsi volume on OpenSolaris (snv_134) with these commands: sh-4.0# zfs create rpool/iscsi sh-4.0# zfs set shareiscsi=on rpool/iscsi sh-4.0# zfs create -s -V 10g rpool/iscsi/test The underlying zpool is a mirror of two SATA drives. I'm connecting from a Mac client with global SAN initiator software, connected via Gigabit LAN. It connects fine, and I've initialiased a mac format volume on that iScsi volume. Performance, however, is terribly slow, about 10 times slower than an SMB share on the same pool. I expected it would be very similar, if not faster than SMB. Here's my test results copying 3GB data: iScsi: 44m01s 1.185MB/s SMB share: 4m2711.73MB/s Reading (the same 3GB) is also worse than SMB, but only by a factor of about 3: iScsi: 4m3611.34MB/s SMB share: 1m4529.81MB/s cleaning up some old mail Not unexpected. Filesystems have readahead code to prefetch enough to cover the latency of the read request. iSCSI only responds to the request. Put a filesystem on top of iscsi and try again. For writes, iSCSI is synchronous and SMB is not. It may be with ZFS, but iSCSI is neither synchronous nor asynchronous is is simply SCSI over IP. It is the application using the iSCSI protocol that determines whether it is synchronous, issue a flush after write, or asynchronous, wait until target flushes. I think the ZFS developers didn't quite understand that and wanted strict guidelines like NFS has, but iSCSI doesn't have those, it is a lower level protocol than NFS is, so they forced guidelines on it and violated the standard. Nothing has been violated here. Look for WCE flag in COMSTAR where you can control how a given zvol should behave (synchronous or asynchronous). Additionally in recent build you have zfs set sync={disabled|default|always} which also works with zvols. So you do have a control over how it is supposed to behave and to make it nice it is even on per zvol basis. It is just that the default is synchronous. I forgot to ask, if the ZVOL is set async with WCE will it still honor a flush command from the initiator and flush those TXGs held for the ZVOL? -Ross ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] When is the L2ARC refreshed if on a separate drive?
Thanks for the info! -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] iScsi slow
On 03/08/2010 23:20, Ross Walker wrote: Nothing has been violated here. Look for WCE flag in COMSTAR where you can control how a given zvol should behave (synchronous or asynchronous). Additionally in recent build you have zfs set sync={disabled|default|always} which also works with zvols. So you do have a control over how it is supposed to behave and to make it nice it is even on per zvol basis. It is just that the default is synchronous. Ah, ok, my experience has been with Solaris and the iscsitgt which, correct me if I am wrong, is still synchronous only. I don't remember if it offered or not an ability to manipulate zvol's WCE flag but if it didn't then you can do it anyway as it is a zvol property. For an example see http://milek.blogspot.com/2010/02/zvols-write-cache.html -- Robert Milkowski http://milek.blogspot.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] problem with zpool import - zil and cache drive are not displayed?
On Aug 4, 2010, at 12:23 AM, Darren Taylor wrote: Hi George, I think you are right. The log device looks to have suffered a complete loss, there is no data on the disk at all. The log device was a acard ram drive (with battery backup), but somehow it has faulted clearing all data. --victor gave me this advice, and queried about the zpool.cache-- Looks like there's a hardware problem with c7d0 as it appears to contain garbage. Do you have zpool.cache with this pool configuration available? Besides containing garbage former log device now appears to have different geometry and is not able to read in the higher LBA ranges. So i'd say it is broken. c7d0 was the log device. I'm unsure what the next step is, but i'm assuming there is a way to grab the drives original config from the zpool.cache file and apply back to the drive? I mocked up log device in a file, and that made zpool import more happy: bash-4.0# zpool import pool: tank id: 15136317365944618902 state: DEGRADED status: The pool was last accessed by another system. action: The pool can be imported despite missing or damaged devices. The fault tolerance of the pool may be compromised if imported. see: http://www.sun.com/msg/ZFS-8000-EY config: tankDEGRADED raidz1-0 ONLINE c6t4d0 ONLINE c6t5d0 ONLINE c6t6d0 ONLINE c6t7d0 ONLINE raidz1-1 ONLINE c6t0d0 ONLINE c6t1d0 ONLINE c6t2d0 ONLINE c6t3d0 ONLINE cache c8d1 logs c13d1s0 UNAVAIL cannot open bash-4.0# zpool import -fR / tank cannot import 'tank': one or more devices is currently unavailable Recovery is possible, but will result in some data loss. Returning the pool to its state as of July 21, 2010 03:49:50 AM NZST should correct the problem. Approximately 91 seconds of data must be discarded, irreversibly. After rewind, several persistent user-data errors will remain. Recovery can be attempted by executing 'zpool import -F tank'. A scrub of the pool is strongly recommended after recovery. bash-4.0# So if you are happy with the results, you can perform actual import with zpool import -fF -R / tank You should then be able to remove log device completely. regards victor ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Restripe
Short answer: No. Long answer: Not without rewriting the previously written data. Data is being striped over all of the top level VDEVs, or at least it should be. But there is no way, at least not built into ZFS, to re-allocate the storage to perform I/O balancing. You would basically have to do this manually. Either way, I'm guessing this isn't the answer you wanted but hey, you get what you get. On Tue, Aug 3, 2010 at 13:52, Eduardo Bragatto edua...@bragatto.com wrote: Hi, I have a large pool (~50TB total, ~42TB usable), composed of 4 raidz1 volumes (of 7 x 2TB disks each): # zpool iostat -v | grep -v c4 capacity operationsbandwidth pool used avail read write read write - - - - - - backup35.2T 15.3T602272 15.3M 11.1M raidz1 11.6T 1.06T138 49 2.99M 2.33M raidz1 11.8T 845G163 54 3.82M 2.57M raidz1 6.00T 6.62T161 84 4.50M 3.16M raidz1 5.88T 6.75T139 83 4.01M 3.09M - - - - - - Originally there were only the first two raidz1 volumes, and the two from the bottom were added later. You can notice that by the amount of used / free space. The first two volumes have ~11TB used and ~1TB free, while the other two have around ~6TB used and ~6TB free. I have hundreds of zfs'es storing backups from several servers. Each ZFS has about 7 snapshots of older backups. I have the impression I'm getting degradation in performance due to the limited space in the first two volumes, specially the second, which has only 845GB free. Is there any way to re-stripe the pool, so I can take advantage of all spindles across the raidz1 volumes? Right now it looks like the newer volumes are doing the heavy while the other two just hold old data. Thanks, Eduardo Bragatto ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- You can choose your friends, you can choose the deals. - Equity Private If Linux is faster, it's a Solaris bug. - Phil Harman Blog - http://whatderass.blogspot.com/ Twitter - @khyron4eva ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Restripe
On Aug 3, 2010, at 10:08 PM, Khyron wrote: Long answer: Not without rewriting the previously written data. Data is being striped over all of the top level VDEVs, or at least it should be. But there is no way, at least not built into ZFS, to re- allocate the storage to perform I/O balancing. You would basically have to do this manually. Either way, I'm guessing this isn't the answer you wanted but hey, you get what you get. Actually, that was the answer I was expecting, yes. The real question, then, is: what data should I rewrite? I want to rewrite data that's written on the nearly full volumes so they get spread to the volumes with more space available. Should I simply do a zfs send | zfs receive on all ZFSes I have? (we are talking about 400 ZFSes with about 7 snapshots each, here)... Or is there a way to rearrange specifically the data from the nearly full volumes? Thanks, Eduardo Bragatto ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] [install-discuss] Installing on alternate hardware
Wow! Thanks for the information James, after consulting with my manager we're going to install the text-install version. I'm going to try that as we're installing it on a new disk. Just curious, if I do an export of about 3 zvols and reimport them, the mounts will be there but will I have to reconfigure CIFS, permissions and users etc? Sorry, I'm but a n00b. Thanks, Em Date: Tue, 3 Aug 2010 22:48:36 +1000 From: j...@opensolaris.org To: emilygrettelis...@hotmail.com CC: carls...@workingcode.com; install-disc...@opensolaris.org Subject: Re: [install-discuss] Installing on alternate hardware On 3/08/10 10:20 PM, Emily Grettel wrote: Thanks for the reply James, If it were my system, I'd export the ZFS volumes containing my data, reinstall on the new motherboard, and then reimport ZFS. I was thinking that too, but unfortunately I've created quite a few zones and there are quite a few users on the system. Redoing the entire server will take a week :( Thanks though, I shall try driver-discuss too! The essential problem is that your new motherboard will have different paths to each device. As James mentioned, you could change the first line of /etc/path_to_inst, or. here's the _unsupported_ totally ugly hack way of getting a new motherboard up and running. Before you start, BE VERY GRATEFUL you're running ZFS. (I'll explain why a little later). * touch /reconfigure * poweroff * replace motherboard * turn system on * do whatever bios futzing is needed in order to find your primary boot device * at the grub boot menu, select your desired BE, navigate to the kernel$ line and hit 'e' * go to the end of this line, and hit 'a' (to add), then add -arvs (ie, a space, then -arvs) and hit escape * hit 'b' to boot * Unless you're prompted for where /etc/path_to_inst is, hit enter each time you're prompted during the boot process. * When you're asked for a username for single-user mode, type root and enter your root password. * Run these operations to test: format /dev/null zpool status -v zpool import -a zfs list dladm show-link dladm show-ether The format test will print out the device paths for the devices which the kernel has probed. Note these for later. The zpool status -xv test will show you the paths to each vdev in your pools. The zpool import -a test will attempt to import as many pools as can be found. This should work seamlessly, and you should then see all your datasets in the zfs list test. The dladm tests will show you what NICs you have installed. Note the instance numbers - they almost certainly will have changed from what you have configured with /etc/hostname.$nic$inst. Change the /etc/hostname file to reflect the new instance number(s). Also, if you are running a graphics head on this system, and you've got a customised /etc/X11/xorg.conf, make sure you check the BusID settings to make sure that they're correct. Use the /usr/bin/scanpci utility for this. Now, why should be grateful for ZFS? Because ZFS uses the cXtYdZ number as a fallback for detecting and opening devices. What it uses as a primary method is the device id, or devid. This is closely related to the GUID aka Globally Unique IDentifier. If you want more info about devids and guids, you can review a presentation I wrote about them a while back: http://www.slideshare.net/JamesCMcPherson/what-is-a-guid James C. McPherson -- Oracle http://www.jmcp.homeunix.com/blog ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] [install-discuss] Installing on alternate hardware
On 4/08/10 12:55 PM, Emily Grettel wrote: Wow! Thanks for the information James, after consulting with my manager we're going to install the text-install version. Better to stick with the supportable methods, imho :-) I'm going to try that as we're installing it on a new disk. Just curious, if I do an export of about 3 zvols and reimport them, the mounts will be there but will I have to reconfigure CIFS, permissions and users etc? I would not expect so. You export zpools, not the the datasets within them. ZVols are datasets within pools, and whatever properties you have configured for them within the pool should stick around over an export/import operation. If they don't, I would be very, very surprised. [note: everybody was a noob at some point] James C. McPherson -- Oracle http://www.jmcp.homeunix.com/blog ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] problem with zpool import - zil and cache drive are not displayed?
From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- boun...@opensolaris.org] On Behalf Of Darren Taylor I'm not sure where the problem is, but essentially i have a zpool i cannot import. This particular pool used to have a two drives (not shown below), one for cache and another for log. I'm unsure why they are no longer detected on zpool import... the disks are still connected to the system and show up when running format for a list. Perhaps the log cache were not using entire devices, but rather, just slices? I could be wrong, but I don't think zpool import will scan slices by default. If slices exist on the cache log devices, I might suggest using the -d option of zpool import. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Restripe
On Aug 3, 2010, at 10:57 PM, Richard Elling wrote: Unfortunately, zpool iostat is completely useless at describing performance. The only thing it can do is show device bandwidth, and everyone here knows that bandwidth is not performance, right? Nod along, thank you. I totally understand that, I only used the output to show the space utilization per raidz1 volume. Yes, and you also notice that the writes are biased towards the raidz1 sets that are less full. This is exactly what you want :-) Eventually, when the less empty sets become more empty, the writes will rebalance. Actually, if we are going to consider the values from zpool iostats, they are just slightly biased towards the volumes I would want -- for example, on the first post I've made, the volume with less free space had 845GB free.. that same volume now has 833GB -- I really would like to just stop writing to that volume at this point as I've experience very bad performance in the past when a volume gets nearly full. As a reference, here's the information I posted less than 12 hours ago: # zpool iostat -v | grep -v c4 capacity operationsbandwidth pool used avail read write read write - - - - - - backup35.2T 15.3T602272 15.3M 11.1M raidz1 11.6T 1.06T138 49 2.99M 2.33M raidz1 11.8T 845G163 54 3.82M 2.57M raidz1 6.00T 6.62T161 84 4.50M 3.16M raidz1 5.88T 6.75T139 83 4.01M 3.09M - - - - - - And here's the info from the same system, as I write now: # zpool iostat -v | grep -v c4 capacity operationsbandwidth pool used avail read write read write - - - - - - backup35.3T 15.2T541208 9.90M 6.45M raidz1 11.6T 1.06T116 38 2.16M 1.41M raidz1 11.8T 833G122 39 2.28M 1.49M raidz1 6.02T 6.61T152 64 2.72M 1.78M raidz1 5.89T 6.73T149 66 2.73M 1.77M - - - - - - As you can see, the second raidz1 volume is not being spared and has been providing with almost as much space as the others (and even more compared to the first volume). I have the impression I'm getting degradation in performance due to the limited space in the first two volumes, specially the second, which has only 845GB free. Impressions work well for dating, but not so well for performance. Does your application run faster or slower? You're a funny guy. :) Let me re-phrase it: I'm sure I'm getting degradation in performance as my applications are waiting more on I/O now than they used to do (based on CPU utilization graphs I have). The impression part, is that the reason is the limited space in those two volumes -- as I said, I already experienced bad performance on zfs systems running nearly out of space before. Is there any way to re-stripe the pool, so I can take advantage of all spindles across the raidz1 volumes? Right now it looks like the newer volumes are doing the heavy while the other two just hold old data. Yes, of course. But it requires copying the data, which probably isn't feasible. I'm willing to copy data around to get this accomplish, I'm really just looking for the best method -- I have more than 10TB free, so I have some space to play with if I have to duplicate some data and erase the old copy, for example. Thanks, Eduardo Bragatto ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] problem with zpool import - zil and cache drive are not displayed?
On Aug 3, 2010, at 8:39 PM, Edward Ned Harvey wrote: From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- boun...@opensolaris.org] On Behalf Of Darren Taylor I'm not sure where the problem is, but essentially i have a zpool i cannot import. This particular pool used to have a two drives (not shown below), one for cache and another for log. I'm unsure why they are no longer detected on zpool import... the disks are still connected to the system and show up when running format for a list. Perhaps the log cache were not using entire devices, but rather, just slices? I could be wrong, but I don't think zpool import will scan slices by default. Entire devices do not exist, only slices. If slices exist on the cache log devices, I might suggest using the -d option of zpool import. The -d option allows searching in another directory. -- richard -- Richard Elling rich...@nexenta.com +1-760-896-4422 Enterprise class storage for everyone www.nexenta.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Restripe
I notice you use the word volume which really isn't accurate or appropriate here. If all of these VDEVs are part of the same pool, which as I recall you said they are, then writes are striped across all of them (with bias for the more empty aka less full VDEVs). You probably want to zfs send the oldest dataset (ZFS terminology for a file system) into a new dataset. That oldest dataset was created when there were only 2 top level VDEVs, most likely. If you have multiple datasets created when you had only 2 VDEVs, then send/receive them both (in serial fashion, one after the other). If you have room for the snapshots too, then send all of it and then delete the source dataset when done. I think this will achieve what you want. You may want to get a bit more specific and choose from the oldest datasets THEN find the smallest of those oldest datasets and send/receive it first. That way, the send/receive completes in less time, and when you delete the source dataset, you've now created more free space on the entire pool but without the risk of a single dataset exceeding your 10 TiB of workspace. ZFS' copy-on-write nature really wants no less than 20% free because you never update data in place; a new copy is always written to disk. You might want to consider turning on compression on your new datasets too, especially if you have free CPU cycles to spare. I don't know how compressible your data is, but if it's fairly compressible, say lots of text, then you might get some added benefit when you copy the old data into the new datasets. Saving more space, then deleting the source dataset, should help your pool have more free space, and thus influence your writes for better I/O balancing when you do the next (and the next) dataset copies. HTH. On Tue, Aug 3, 2010 at 22:48, Eduardo Bragatto edua...@bragatto.com wrote: On Aug 3, 2010, at 10:08 PM, Khyron wrote: Long answer: Not without rewriting the previously written data. Data is being striped over all of the top level VDEVs, or at least it should be. But there is no way, at least not built into ZFS, to re-allocate the storage to perform I/O balancing. You would basically have to do this manually. Either way, I'm guessing this isn't the answer you wanted but hey, you get what you get. Actually, that was the answer I was expecting, yes. The real question, then, is: what data should I rewrite? I want to rewrite data that's written on the nearly full volumes so they get spread to the volumes with more space available. Should I simply do a zfs send | zfs receive on all ZFSes I have? (we are talking about 400 ZFSes with about 7 snapshots each, here)... Or is there a way to rearrange specifically the data from the nearly full volumes? Thanks, Eduardo Bragatto ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- You can choose your friends, you can choose the deals. - Equity Private If Linux is faster, it's a Solaris bug. - Phil Harman Blog - http://whatderass.blogspot.com/ Twitter - @khyron4eva ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS Restripe
On Aug 3, 2010, at 8:55 PM, Eduardo Bragatto wrote: On Aug 3, 2010, at 10:57 PM, Richard Elling wrote: Unfortunately, zpool iostat is completely useless at describing performance. The only thing it can do is show device bandwidth, and everyone here knows that bandwidth is not performance, right? Nod along, thank you. I totally understand that, I only used the output to show the space utilization per raidz1 volume. Yes, and you also notice that the writes are biased towards the raidz1 sets that are less full. This is exactly what you want :-) Eventually, when the less empty sets become more empty, the writes will rebalance. Actually, if we are going to consider the values from zpool iostats, they are just slightly biased towards the volumes I would want -- for example, on the first post I've made, the volume with less free space had 845GB free.. that same volume now has 833GB -- I really would like to just stop writing to that volume at this point as I've experience very bad performance in the past when a volume gets nearly full. The tipping point for the change in the first fit/best fit allocation algorithm is now 96%. Previously, it was 70%. Since you don't specify which OS, build, or zpool version, I'll assume you are on something modern. NB, zdb -m will show the pool's metaslab allocations. If there are no 100% free metaslabs, then it is a clue that the allocator might be working extra hard. As a reference, here's the information I posted less than 12 hours ago: # zpool iostat -v | grep -v c4 capacity operationsbandwidth pool used avail read write read write - - - - - - backup35.2T 15.3T602272 15.3M 11.1M raidz1 11.6T 1.06T138 49 2.99M 2.33M raidz1 11.8T 845G163 54 3.82M 2.57M raidz1 6.00T 6.62T161 84 4.50M 3.16M raidz1 5.88T 6.75T139 83 4.01M 3.09M - - - - - - And here's the info from the same system, as I write now: # zpool iostat -v | grep -v c4 capacity operationsbandwidth pool used avail read write read write - - - - - - backup35.3T 15.2T541208 9.90M 6.45M raidz1 11.6T 1.06T116 38 2.16M 1.41M raidz1 11.8T 833G122 39 2.28M 1.49M raidz1 6.02T 6.61T152 64 2.72M 1.78M raidz1 5.89T 6.73T149 66 2.73M 1.77M - - - - - - As you can see, the second raidz1 volume is not being spared and has been providing with almost as much space as the others (and even more compared to the first volume). Yes, perhaps 1.5-2x data written to the less full raidz1 sets. The exact amount of data is not shown, because zpool iostat doesn't show how much data is written, it shows the bandwidth. I have the impression I'm getting degradation in performance due to the limited space in the first two volumes, specially the second, which has only 845GB free. Impressions work well for dating, but not so well for performance. Does your application run faster or slower? You're a funny guy. :) Let me re-phrase it: I'm sure I'm getting degradation in performance as my applications are waiting more on I/O now than they used to do (based on CPU utilization graphs I have). The impression part, is that the reason is the limited space in those two volumes -- as I said, I already experienced bad performance on zfs systems running nearly out of space before. OK, so how long are they waiting? Try iostat -zxCn and look at the asvc_t column. This will show how the disk is performing, though it won't show the performance delivered by the file system to the application. To measure the latter, try fsstat zfs (assuming you are on a Solaris distro) Also, if these are HDDs, the media bandwidth decreases and seeks increase as they fill. ZFS tries to favor the outer cylinders (lower numbered metaslabs) to take this into account. Is there any way to re-stripe the pool, so I can take advantage of all spindles across the raidz1 volumes? Right now it looks like the newer volumes are doing the heavy while the other two just hold old data. Yes, of course. But it requires copying the data, which probably isn't feasible. I'm willing to copy data around to get this accomplish, I'm really just looking for the best method -- I have more than 10TB free, so I have some space to play with if I have to duplicate some data and erase the old copy, for example. zfs send/receive is usually the best method. -- richard -- Richard Elling rich...@nexenta.com +1-760-896-4422 Enterprise class storage for everyone www.nexenta.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org