Re: [zfs-discuss] VM's on ZFS - 7210
No. From what I've seen, ZFS will periodically flush writes from the ZIL to disk. You may run into a read starvation situation where ZFS is so busy flushing to disk that you won't get reads. If you have VMs where developers expect low latency interactivity, they get unhappy. Trust me. :) One way to address this is either have an ARC that's large enough, or add a cache-device for the zpool. I have a config where ~20 ESX VMs share a single OpenSolaris NFS server. It has an Intel X25E for ZIL and X25M for cache. It seems to be doing ok. There are actually two of these setups. For one of them, the cache SSD died recently, and you can feel it when ZFS goes to disk for some uncached piece of data. I'll be replacing the cache SSD next week. -Paul On 8/27/10 1:22 PM, John wrote: Wouldn't it be possible to saturate the SSD ZIL with enough backlogged sync writes? What I mean is, doesn't the ZIL eventually need to make it to the pool, and if the pool as a whole (spinning disks) can't keep up with 30+ vm's of write requests, couldn't you fill up the ZIL that way? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Opensolaris is apparently dead
Apparently, I must not be using the right web form... I would update the case sometimes via the web, and it seems like no one actually saw it. Or, some other engineer comes along and asks me the same set of questions that were already answered (and recorded in the case records!). Another story. I had a bad DIMM in an X4240. The support tech was almost dismissive that we had a bad DIMM. Provided him with explorer outputs, IPMI outputs, reseated the DIMM, rebooted, etc. Didn't hear from him for like a week. I complained. He said I forgot to give him the full output of prtdiag -v to verify the size of each DIMM... as if you can't tell by the explorer file. Silence for another week, I complained again, then I heard from the parts department that the part was being shipped. Not exactly friendly support. When it was just Sun, their support was pretty good. Around the time it was announced that Oracle was going to acquire Sun, Sun's support just went south. I wouldn't recommend Sun servers on the basis of the quality of the support I've been getting. -Paul On 8/18/10 2:39 PM, John D Groenveld wrote: In message4c6c4e30.7060...@ianshome.com, Ian Collins writes: If you count Monday this week as lately, we have never had to wait more than 24 hours for replacement drives for our 45x0 or 7000 series Same here, but two weeks ago for a failed drive in an X4150. Last week SunSolve was sending my service order requests to /dev/null, but someone manually entered after I submitted web feedback. John groenv...@acm.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Is dedupe ready for prime time?
I've been reading this list for a while, there's lots of discussion about b134 and deduplication. I see some stuff about snapshots not being destroyed, and maybe some recovery issues. What I'd like to know is, is ZFS with deduplication stable enough to use? I have two NFS servers, each running OpenSolaris 2009.06 (111b), as datastores for VMWare ESX hosts. It works great right now, with ZIL offload and L2ARC SSDs. I still get occasional complaints from developers saying the storage is slow - which I'm guessing is that read latency is not stellar on a shared storage. Write latency is probably not an issue due to the ZIL offload. I'm guessing deduplication would solve a lot of this read latency problem, having to do fewer read IOs. But is it stable? Can I do nightly recursive snapshots and periodically destroy old snapshots without worrying about a dozen VMs suddenly losing their datastore? I'd love to hear from your experience. Thanks, -Paul Choi ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Is dedupe ready for prime time?
Roy, Thanks for the info. Yeah, the bug you mentioned is pretty critical. In terms of SSDs, I have Intel X25-M for L2ARC and X25-E for ZIL. And the host has 24G RAM. I'm just waiting for that 2010.03 release or whatever we want to call it when it's released... -Paul On 5/18/10 12:49 PM, Roy Sigurd Karlsbakk wrote: - Paul Choipaulc...@plaxo.com skrev: I've been reading this list for a while, there's lots of discussion about b134 and deduplication. I see some stuff about snapshots not being destroyed, and maybe some recovery issues. What I'd like to know is, is ZFS with deduplication stable enough to use? No, currently ZFS dedup is not ready for production. There are several bugs are filed, and the most problematic ones are that the system can be rendered unusable for days in some situations. Also, if using dedup, plan well your memory and spend money on l2arc, since it _will_ require either massive amounts of RAM or some good SSDs for l2arc. Best regards roy -- Roy Sigurd Karlsbakk (+47) 97542685 r...@karlsbakk.net http://blogg.karlsbakk.net/ -- I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og relevante synonymer på norsk. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Is it possible to replicate an entire zpool with AVS?
Hello, Is it possible to replicate an entire zpool with AVS? From what I see, you can replicate a zvol, because AVS is filesystem agnostic. I can create zvols within a pool, and AVS can replicate replicate those, but that's not really what I want. If I create a zpool called disk1, paulc...@nfs01b:/dev/zvol# find /dev/zvol /dev/zvol /dev/zvol/dsk /dev/zvol/dsk/rpool /dev/zvol/dsk/rpool/dump /dev/zvol/dsk/rpool/swap /dev/zvol/rdsk /dev/zvol/rdsk/rpool /dev/zvol/rdsk/rpool/dump /dev/zvol/rdsk/rpool/swap paulc...@nfs01b:/dev/zvol# The only zvol entries I see are for zvols that have been explicitly created. Any tricks to using AVS with a zpool? Or should I just opt for periodic zfs snapshot and zfs send/recieve? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Does zpool clear delete corrupted files
Hm. That's odd. zpool clear should've cleared the list of errors. Unless you were accessing files at the same time, so there were more checksum errors being reported upon reads. As for zpool scrub, there's no benefit in your case. Since you are reading from the zpool, and there's checksums being done as you read - and I assume you're going to read every single file there is. zpool scrub is good when you want to ensure that checksum is good for the whole zpool, including files you haven't read recently. Well, good luck with your recovery efforts. -Paul Jonathan Loran wrote: Well, I tried to clear the errors, but zpool clear didn't clear them. I think the errors are in the metadata in such a way that they can't be cleared. I'm actually a bit scared to scrub it before I grab a backup, so I'm going to do that first. After the backup, I need to break the mirror to pull the x4540 out, and I just hope that can succeed. If not, we'll be loosing some data between the time the backup is taken and I roll out the new storage. Let this be a double warning to all you zfs-ers out there: Make sure you have redundancy at the zfs layer, and also do backups. Unfortunately for me, penny pinching has precluded both for us until now. Jon On Jun 1, 2009, at 4:19 PM, A Darren Dunham wrote: On Mon, Jun 01, 2009 at 03:19:59PM -0700, Jonathan Loran wrote: Kinda scary then. Better make sure we delete all the bad files before I back it up. That shouldn't be necessary. Clearing the error count doesn't disable checksums. Every read is going to verify checksums on the file data blocks. If it can't find at least one copy with a valid checksum, you should just get an I/O error trying to read the file, not invalid data. What's odd is we've checked a few hundred files, and most of them don't seem to have any corruption. I'm thinking what's wrong is the metadata for these files is corrupted somehow, yet we can read them just fine. Are you still getting errors? -- Darren ___ zfs-discuss mailing list zfs-discuss@opensolaris.org mailto:zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss - _/ _/ / - Jonathan Loran - - -/ / /IT Manager - - _ / _ / / Space Sciences Laboratory, UC Berkeley -/ / / (510) 643-5146 jlo...@ssl.berkeley.edu mailto:jlo...@ssl.berkeley.edu - __/__/__/ AST:7731^29u18e3 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Does zpool clear delete corrupted files
zpool clear just clears the list of errors (and # of checksum errors) from its stats. It does not modify the filesystem in any manner. You run zpool clear to make the zpool forget that it ever had any issues. -Paul Jonathan Loran wrote: Hi list, First off: # cat /etc/release Solaris 10 6/06 s10x_u2wos_09a X86 Copyright 2006 Sun Microsystems, Inc. All Rights Reserved. Use is subject to license terms. Assembled 09 June 2006 Here's an (almost) disaster scenario that came to life over the past week. We have a very large zpool containing over 30TB, composed (foolishly) of three concatenated iSCSI SAN devices. There's no redundancy in this pool at the zfs level. We are actually in the process of migrating this to a x4540 + j4500 setup, but since the x4540 is part of the existing pool, we need to mirror it, then detach it so we can build out the replacement storage. What happened was some time after I had attached the mirror to the x4540, the scsi_vhci/network connection went south, and the server panicked. Since this system has been up, over the past 2.5 years, this has never happened before. When we got the thing glued back together, it immediately started resilvering from the beginning, and reported about 1.9 million data errors. The list from zpool status -v gave over 883k bad files. This is a small percentage of the total number of files in this volume: over 80 million (1%). My question is this: When we clear the pool with zpool clear, what happens to all of the bad files? Are they deleted from the pool, or do the error counters just get reset, leaving the bad files in tact? I'm going to perform a full backup of this guy (not so easy on my budget), and I would rather only get the good files. Thanks, Jon - _/ _/ / - Jonathan Loran - - -/ / /IT Manager - - _ / _ / / Space Sciences Laboratory, UC Berkeley -/ / / (510) 643-5146 jlo...@ssl.berkeley.edu mailto:jlo...@ssl.berkeley.edu - __/__/__/ AST:7731^29u18e3 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Does zpool clear delete corrupted files
If you run zpool scrub on the zpool, it'll do its best to identify the file(s) or filesystems/snapshots that have issues. Since you're on a single zpool, it won't self-heal any checksum errors... It'll take a long time, though, to scrub 30TB... -Paul Jonathan Loran wrote: Kinda scary then. Better make sure we delete all the bad files before I back it up. What's odd is we've checked a few hundred files, and most of them don't seem to have any corruption. I'm thinking what's wrong is the metadata for these files is corrupted somehow, yet we can read them just fine. I wish I could tell which ones are really bad, so we wouldn't have to recreate them unnecessarily. They are mirrored in various places, or can be recreated via reprocessing, but recreating/restoring that many files is no easy task. Thanks, Jon On Jun 1, 2009, at 2:41 PM, Paul Choi wrote: zpool clear just clears the list of errors (and # of checksum errors) from its stats. It does not modify the filesystem in any manner. You run zpool clear to make the zpool forget that it ever had any issues. -Paul Jonathan Loran wrote: Hi list, First off: # cat /etc/release Solaris 10 6/06 s10x_u2wos_09a X86 Copyright 2006 Sun Microsystems, Inc. All Rights Reserved. Use is subject to license terms. Assembled 09 June 2006 Here's an (almost) disaster scenario that came to life over the past week. We have a very large zpool containing over 30TB, composed (foolishly) of three concatenated iSCSI SAN devices. There's no redundancy in this pool at the zfs level. We are actually in the process of migrating this to a x4540 + j4500 setup, but since the x4540 is part of the existing pool, we need to mirror it, then detach it so we can build out the replacement storage. What happened was some time after I had attached the mirror to the x4540, the scsi_vhci/network connection went south, and the server panicked. Since this system has been up, over the past 2.5 years, this has never happened before. When we got the thing glued back together, it immediately started resilvering from the beginning, and reported about 1.9 million data errors. The list from zpool status -v gave over 883k bad files. This is a small percentage of the total number of files in this volume: over 80 million (1%). My question is this: When we clear the pool with zpool clear, what happens to all of the bad files? Are they deleted from the pool, or do the error counters just get reset, leaving the bad files in tact? I'm going to perform a full backup of this guy (not so easy on my budget), and I would rather only get the good files. Thanks, Jon - _/ _/ / - Jonathan Loran - - -/ / /IT Manager - - _ / _ / / Space Sciences Laboratory, UC Berkeley -/ / / (510) 643-5146 jlo...@ssl.berkeley.edu mailto:jlo...@ssl.berkeley.edu - __/__/__/ AST:7731^29u18e3 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss - _/ _/ / - Jonathan Loran - - -/ / /IT Manager - - _ / _ / / Space Sciences Laboratory, UC Berkeley -/ / / (510) 643-5146 jlo...@ssl.berkeley.edu - __/__/__/ AST:7731^29u18e3 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Monitoring ZFS host memory use
Ben Rockwood's written a very useful util called arc_summary: http://www.cuddletech.com/blog/pivot/entry.php?id=979 It's really good for looking at ARC usage (including memory usage). You might be able to make some guesses based on kstat -n zfs_file_data and kstat -n zfs_file_data_buf. Look for mem_inuse. Running ::memstat in mdb -k also shows Kernel memory usage (probably includes ZFS overhead) and ZFS File Data memory usage. But it's painfully slow to run. kstat is probably better. -Paul Choi Richard Elling wrote: Bob Friesenhahn wrote: On Wed, 6 May 2009, Troy Nancarrow (MEL) wrote: Please forgive me if my searching-fu has failed me in this case, but I've been unable to find any information on how people are going about monitoring and alerting regarding memory usage on Solaris hosts using ZFS. The problem is not that the ZFS ARC is using up the memory, but that the script Nagios is using to check memory usage simply sees, say 96% RAM used, and alerts. Memory is meant to be used. 96% RAM use is good since it represents an effective use of your investment. Actually, I think a percentage of RAM is a bogus metric to measure. For example, on a 2TBytes system, you would be wasting 80 GBytes. Perhaps you should look for a more meaningful threshold. -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss