Re: [zfs-discuss] Large scale performance query
I may have RAIDZ reading wrong here. Perhaps someone could clarify. For a read-only workload, does each RAIDZ drive act like a stripe, similar to RAID5/6? Do they have independant queues? It would seem that there is no escaping read/modify/write operations for sub-block writes, forcing the RAIDZ group to act like a single stripe. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Large scale performance query
RAIDZ has to rebuild data by reading all drives in the group, and reconstructing from parity. Mirrors simply copy a drive. Compare 3tb mirros vs. 9x3tb RAIDZ2. Mirrors: Read 3tb Write 3tb RAIDZ2: Read 24tb Reconstruct data on CPU Write 3tb In this case, RAIDZ is at least 8x slower to resilver (assuming CPU and writing happen in parallel). In the mean time, performance for the array is severely degraded for RAIDZ, but not for mirrors. Aside from resilvering, for many workloads, I have seen over 10x (!) better performance from mirrors. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Large scale performance query
I may have RAIDZ reading wrong here. Perhaps someone could clarify. For a read-only workload, does each RAIDZ drive act like a stripe, similar to RAID5/6? Do they have independant queues? It would seem that there is no escaping read/modify/write operations for sub-block writes, forcing the RAIDZ group to act like a single stripe. Can RAIDZ even do a partial block read? Perhaps it needs to read the full block (from all drives) in order to verify the checksum. If so, then RAIDZ groups would always act like one stripe, unlike RAID5/6. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Large scale performance query
Thanks for clarifying. If a block is spread across all drives in a RAIDZ group, and there are no partial block reads, how can each drive in the group act like a stripe? Many RAID56 implementations can do partial block reads, allowing for parallel random reads across drives (as long as there are no writes in the queue). Perhaps you are saying that they act like stripes for bandwidth purposes, but not for read ops/sec? -Rob -Original Message- From: Bob Friesenhahn [mailto:bfrie...@simple.dallas.tx.us] Sent: Saturday, August 06, 2011 11:41 AM To: Rob Cohen Cc: zfs-discuss@opensolaris.org Subject: Re: [zfs-discuss] Large scale performance query On Sat, 6 Aug 2011, Rob Cohen wrote: Can RAIDZ even do a partial block read? Perhaps it needs to read the full block (from all drives) in order to verify the checksum. If so, then RAIDZ groups would always act like one stripe, unlike RAID5/6. ZFS does not do partial block reads/writes. It must read the whole block in order to validate the checksum. If there is a checksum failure, then RAID5 type algorithms are used to produce a corrected block. For this reason, it is wise to make sure that the zfs filesystem blocksize is appropriate for the task, and make sure that the system has sufficient RAM that the zfs ARC can cache enough data that it does not need to re-read from disk for recently accessed files. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Large scale performance query
If I'm not mistaken, a 3-way mirror is not implemented behind the scenes in the same way as a 3-disk raidz3. You should use a 3-way mirror instead of a 3-disk raidz3. RAIDZ2 requires at least 4 drives, and RAIDZ3 requires at least 5 drives. But, yes, a 3-way mirror is implemented totally differently. Mirrored drives have identical copies of the data. RAIDZ drives store the data once, plus parity data. A 3-way mirror gives imporved redundancy and read performance, but at a high capacity cost, and slower writes than a 2-way mirror. It's more common to do 2-way mirrors + hot spare. This gives comparable protection to RAIDZ2, but with MUCH better performance. Of course, mirrors cost more capacity, but it helps that ZFS's compression and thin provisioning can often offset the loss in capacity, without sacrificing performance (especially when used in combination with L2ARC). -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Large scale performance query
Generally, mirrors resilver MUCH faster than RAIDZ, and you only lose redundancy on that stripe, so combined, you're much closer to RAIDZ2 odds than you might think, especially with hot spare(s), which I'd reccommend. When you're talking about IOPS, each stripe can support 1 simultanious user. Writing: Each RAIDZ group = 1 stripe. Each mirror group = 1 stripe. So, 216 drives can be 24 stripes or 108 stripes. Reading: Each RAIDZ group = 1 stripe. Each mirror group = 1 stripe per drive. So, 216 drives can be 24 stripes or 216 stripes. Actually, reads from mirrors are even more efficient than reads from stripes, because the software can optimally load balance across mirrors. So, back to the original poster's question, 9 stripes might be enough to support 5 clients, but 216 stripes could support many more. Actually, this is an area where RAID5/6 has an advantage over RAIDZ, if I understand correctly, because for RAID5/6 on read-only workloads, each drive acts like a stripe. For workloads with writing, though, RAIDZ is significantly faster than RAID5/6, but mirrors/RAID10 give the best performance for all workloads. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Large scale performance query
Try mirrors. You will get much better multi-user performance, and you can easily split the mirrors across enclosures. If your priority is performance over capacity, you could experiment with n-way mirros, since more mirrors will load balance reads better than more stripes. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] latest zpool version in solaris 11 express
plus virtualbox 4.1 with network in a box would like snv_159 from http://www.virtualbox.org/wiki/Changelog Solaris hosts: New Crossbow based bridged networking driver for Solaris 11 build 159 and above Rob ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
RE: ZFS crypto source
I guessed you wouldn't be able to say, even if... The only shortfall in capability that I'm aware of is the secure boot/FDE, which we discussed previously. I am mostly interested in the source to see how features have been implemented and to understand the system structure. I certainly wouldn't presume to make changes! On the slightly more general topic of source on opensolaris, are the designs for subsystems/features available? I've found PSARC cases for some things but I expect that more detailed design, system interaction and use cases are documented as part of the development process. Are any of these types of document made public to assist in understanding at a higher level than the source code? Again, this is really to help me understand the system, rather than to attempt any modification. Regards Rob -Original Message- From: Darren J Moffat [mailto:darr...@opensolaris.org] Sent: 10 May 2011 11:17 To: Rob O'Leary Cc: zfs-crypto-discuss@opensolaris.org Subject: Re: ZFS crypto source On 07/05/2011 10:57, Rob O'Leary wrote: Is the source for ZFS crypto likely to be released on opensolaris.org? Older versions of the source area available from the zfs-crypto project gates: /zfs-crypto/gate/ However in some important areas these differ quite a bit from what was finally integrated and are not on disk compatible. I searched in /onnv/onnv-gate/usr/src/uts/common/fs/zfs, which may have been the wrong place, for aes and crypt and got no results so I assume that the zfs encryption has not been released to date. Correct the source has not been released. I do not know anything about future plans nor would I be able to comment here at this time even if I did. Please bring this up with your Oracle account/support team representative if it is important to your business. Is there something in particular you want to do with the source if you had it available to you ? Are there changes you want to make ? -- Darren J Moffat ___ zfs-crypto-discuss mailing list zfs-crypto-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-crypto-discuss
RE: Booting from encrypted ZFS WAS: RE: How to mount encryptedfile system at boot? Why no pass phraserequesed
Hi Dan, Your first two interpretations are correct. I like the idea of netbooting but unfortunately, although a good idea, it doesn't fit with the details of our use case - we temporarily take our system to a trusted location, use it and then remove it, so we do not have a permanent presence at the trusted locations (other than our base location). This means that providing the netboot environment is effectively the same problem, as anything on the same network as the data becomes subject to the same rules regarding protection. Putting the boot system on the key media isn't quite the same as transporting the key on media alone - the key media can be read-only/only used at boot to authenticate, whereas the boot system is on writable media. (I have already considered read-only boot images on DVD but due to the low numbers of systems and the need to make permanent changes to the system, I do not consider this approach operable.) Regarding tampering and tamper detection, when the disks are transported, we do not rely on an IT approach to these issues. Regards, Rob -Original Message- From: Daniel Carosone [mailto:d...@geek.com.au] Sent: 28 April 2011 03:21 To: Rob O'Leary Cc: Troels N?rgaard Nielsen; zfs-crypto-discuss@opensolaris.org Subject: Re: Booting from encrypted ZFS WAS: RE: How to mount encryptedfile system at boot? Why no pass phraserequesed If I understood correctly: - there is no requirement for the system to boot (or be bootable) outside of your secure locations. - you are willing to accept separate tracking and tagging of removable media, e.g. for key distribution. Consider, at least for purposes of learning from the comparison: - having the machines netboot only, and provide the netboot environment only within the secure locations. - having the system disks on the removable media that is handled separately, not just the keys. Both of these share the property that the physical chassis being transported contains only encrypted disks, leaving you to make other tradeoffs with respect to risks and handling of the bootstrapping data (including keys). My primary interest in encrypted zfs boot for the OS is more around the integrity of the boot media, for devices that may be exposed to tampering of various kinds. This is a complex issue that can only be partly addressed by ZFS, even with such additional features. Do these sorts of concerns apply to your environment? If someone was to intercept one of these machines in transit, and tamper with OS and system executables in such a way as to disclose information/keys or otherwise alter their operation when next booted in the secure environment, would that be a concern? -- Dan. ___ zfs-crypto-discuss mailing list zfs-crypto-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-crypto-discuss
RE: Booting from encrypted ZFS WAS: RE: How to mount encrypted filesystem at boot? Why no pass phraserequesed
Hi Michel, I had noticed these drives in the past, but your email reminded me and I followed your link, thanks. A bit of googling showed that not everyone is having a great experience and I couldn't find the barracuda fde promised in the press release. I also need SAS because of read while writing issues and these are momentus sata disks (despite link names below). Can I mix sas and sata in the same controller? Reliable, SAS, FDE does not seem to be available... Regards, Rob http://forums.seagate.com/t5/Barracuda-XT-Barracuda-and/Issues-with-ST932032 2AS-FDE-3-drives/m-p/29247#M12876 http://forums.seagate.com/t5/Barracuda-XT-Barracuda-and/Recovering-Formattin g-FDE-Drives/td-p/7412 -Original Message- From: michel.bell...@malaiwah.com [mailto:michel.bell...@malaiwah.com] Sent: 27 April 2011 12:08 To: Rob O'Leary Cc: zfs-crypto-discuss@opensolaris.org Subject: Re: Booting from encrypted ZFS WAS: RE: How to mount encrypted filesystem at boot? Why no pass phraserequesed Hi, I think the best solution for your OS drives is to have a look at disks that offer built-in full disk encryption (FDE) just like the ones offered by Seagate (example: http://www.google.com/url?q=http://www.seagate.com/ww/v/index.jsp%3Flocale%3 Den-US%26name%3Ddn_sec_intro_fde%26vgnextoid%3D1831bb5f5ed93110VgnVCM10f 5ee0a0aRCRDsa=Uei=8_e3TfvVIInBtgemsdjeBAved=0CAgQFjAAusg=AFQjCNGt_c3Vokq 4D6hL8k25rfUcIrB2Bw). While it does not offer the flexibility of ZFS encrypted datasets, I think it would be appropriate in your situation. I would rely on that encryption for the OS with a static passphrase asked at boot-time, but still point sensitive informations to the ZFS pool for better management of the keys, if your auditor asks them to be rolled once in a while (for data, at least). My 2 cents, Michel Envoyé de mon terminal mobile BlackBerry par le biais du réseau de Rogers Sans-fil -Original Message- From: Rob O'Leary raole...@btinternet.com Sender: zfs-crypto-discuss-boun...@opensolaris.org Date: Wed, 27 Apr 2011 11:46:02 To: Troels Nørgaard Nielsentro...@norgaard.co Cc: zfs-crypto-discuss@opensolaris.org Subject: RE: Booting from encrypted ZFS WAS: RE: How to mount encrypted file system at boot? Why no pass phraserequesed Hi Troels, There are two things here. First, I don't want to learn another set of administration tasks (I've just had a quick look at Trusted Extensions and am shuddering at the thought) and second, the problem isn't when the system is running but when it is stopped. I believe the problem is called data at rest. Also, notice the line where I said the auditors like a simple story. They really do. I still want to be able to print and use the network without incurring lots of admin, re-programming or performance overhead. (Our applications are very network heavy.) But, when I shutdown I want the data on the disks to be un-intelligible. In terms of management/learning overhead, we are very familiar with tracking and accounting for documents and keys, so having a few extra keys and usb sticks to look after is no problem. Unfortunately, I don't know enough about grub and zfs booting. So, I shall resist the temptation of can't it just Almost. I'm sure there's a way. Chain from authentication phase and getting key to main boot...? (Sorry, I had to.) Best regards, Rob -Original Message- From: Troels Nørgaard Nielsen [mailto:tro...@norgaard.co] Sent: 27 April 2011 11:13 To: Rob O'Leary Cc: zfs-crypto-discuss@opensolaris.org Subject: Re: Booting from encrypted ZFS WAS: RE: How to mount encrypted file system at boot? Why no pass phraserequesed Hi Rob, Wouldn't the use of Solaris Trusted Extensions by placing all 'secure' operations inside a label that can only write to the filesystem (that is encrypted) with the same label, do for you what the auditors are seeking? The base idea of Trusted Extensions is that no data can escape it's label (guarded by syscall checks), to ensure traffic to the label, one can use IPSec with labeling, etc. I think Darren is dragging along here, because implementing zfs-crypto on rpool requires grub to be aware of zfs-crypto, which is kinda hard (e.g. grub doesn't support multiple vdev or raidz-n yet). Best regards Troels Nørgaard Nørgaard Consultancy Den 27/04/2011 kl. 09.54 skrev Rob O'Leary: Requirements The main requirement is to convince our security auditors that all the data on our systems is encrypted. The systems are moved between multiple trusted locations and the principle need is to ensure that, if lost or stolen while on the move, no data can be accessed. The systems are not required to operate except in a trusted location. Storing the data on encrypted zfs filesystems seems like it should be sufficient for this. But the counter argument is that you cannot _guarentee_ that no data will be accidentally copied onto un-encrypted parts of the system, say as part of the print spooling of a data report (by the system
Re: [zfs-discuss] [?] - What is the recommended number of disks for a consumer PC with ZFS
References: Thread: ZFS effective short-stroking and connection to thin provisioning? http://opensolaris.org/jive/thread.jspa?threadID=127608 Confused about consumer drives and zfs can someone help? http://opensolaris.org/jive/thread.jspa?threadID=132253 Recommended RAM for ZFS on various platforms http://opensolaris.org/jive/thread.jspa?threadID=132072 Performance advantages of spool with 2x raidz2 vdevs vs. Single vdev - Spindles http://opensolaris.org/jive/thread.jspa?threadID=132127 -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] problem adding second MD1000 enclosure to LSI 9200-16e
As a follow-up, I tried a SuperMicro enclosure (SC847E26-RJBOD1). I have 3 sets of 15 drives. I got the same results when I loaded the second set of drives (15 to 30). Then, I tried changing the LSI 9200's BIOS setting for max INT 13 drives from 24 (the default) to 15. From then on, the SuperMicro enclosure worked fine, even with all 45 drives, and no kernel hangs. I suspect that the BIOS setting would have worked with 1 MD1000 enclosure, but I never tested the MD1000s, after I had the SuperMicro enclosure running. I'm not sure if the kernal hang with max int13=24 was a hardware problem, or a Solaris bug. - Rob I have 15x SAS drives in a Dell MD1000 enclosure, attached to an LSI 9200-16e. This has been working well. The system is boothing off of internal drives, on a Dell SAS 6ir. I just tried to add a second storage enclosure, with 15 more SAS drives, and I got a lockup during Loading Kernel. I got the same results, whether I daisy chained the enclosures, or plugged them both directly into the LSI 9200. When I removed the second enclosure, it booted up fine. I also have an LSI MegaRAID 9280-8e I could use, but I don't know if there is a way to pass the drives through, without creating RAID0 virtual drives for each drive, which would complicate replacing disks. The 9280 boots up fine, and the systems can see new virtual drives. Any suggestions? Is there some sort of boot procedure, in order to get the system to recognize the second enclosure without locking up? Is there a special way to configure one of these LSI boards? -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] l2arc_noprefetch
When running real data, as opposed to benchmarks, I notice that my l2arc stops filling, even though the majority of my reads are still going to primary storage. I'm using 5 SSDs for L2ARC, so I'd expect to get good throughput, even with sequential reads. I'd like to experiment with disabling the l2arc_noprefetch feature, to see how the performance compares by caching more data. How exactly do I do that? Right now, I added the following line to /etc/system, but it doesn't seem to have made a difference. I'm still seeing most of my reads go to primary storage, even though my cache should be warm by now, and my SSDs are far from full. set zfs:l2arc_noprefetch = 0 Am I setting this wrong? Am misunderstanding this option? Thanks, Rob -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] problem adding second MD1000 enclosure to LSI 9200-16e
I have 15x SAS drives in a Dell MD1000 enclosure, attached to an LSI 9200-16e. This has been working well. The system is boothing off of internal drives, on a Dell SAS 6ir. I just tried to add a second storage enclosure, with 15 more SAS drives, and I got a lockup during Loading Kernel. I got the same results, whether I daisy chained the enclosures, or plugged them both directly into the LSI 9200. When I removed the second enclosure, it booted up fine. I also have an LSI MegaRAID 9280-8e I could use, but I don't know if there is a way to pass the drives through, without creating RAID0 virtual drives for each drive, which would complicate replacing disks. The 9280 boots up fine, and the systems can see new virtual drives. Any suggestions? Is there some sort of boot procedure, in order to get the system to recognize the second enclosure without locking up? Is there a special way to configure one of these LSI boards? Thanks, Rob -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] problem adding second MD1000 enclosure to LSI 9200-16e
Markus, I'm pretty sure that I have the MD1000 plugged in properly, especially since the same connection works on the 9280 and Perc 6/e. It's not in split mode. Thanks for the suggestion, though. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] WarpDrive SLP-300
BTW, any new storage-controller-related drivers introduced in snv151a? the 64bit driver in 147 -rwxr-xr-x 1 root sys 401200 Sep 14 08:44 mpt -rwxr-xr-x 1 root sys 398144 Sep 14 09:23 mpt_sas is a different size than 151a -rwxr-xr-x 1 root sys 400936 Nov 15 23:05 /kernel/drv/amd64/mpt -rwxr-xr-x 1 root sys 399952 Nov 15 23:06 /kernel/drv/amd64/mpt_sas and mpt_sas has a new printf: reset was running, this event can not be handled this time Rob ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs record size implications
Thanks, Richard. Your answers were very helpful. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] zfs record size implications
I have read some conflicting things regarding the ZFs record size setting. Could you guys verify/correct my these statements: (These reflect my understanding, not necessarily the facts!) 1) The ZFS record size in a zvol is the unit that dedup happens at. So, for a volume that is shared to an NTFS machine, if the NTFS cluster size is smaller than the zvol record size, dedup will get dramatically worse, since it won't dedup clusters that are positioned differently in zvol records. 2) For shared folders, the record size is the allocation unit size, so large records can waste a substantial amount of space, in cases with lots of very small files. This is different than a HW raid stripe size, which only affects performance, not space usage. 3) Although small record sizes have a large RAM overhead for dedup tables, as long as the dedup table working set fits in RAM, and the rest fits in L2ARC, performance will be good. Thanks, Rob -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] stripes of different size mirror groups
I have a couple drive enclosures: 15x 450gb 15krpm SAS 15x 600gb 15krpm SAS I'd like to set them up like RAID10. Previously, I was using two hardware RAID10 volumes, with the 15th drive as a hot spare, in each enclosure. Using ZFS, it could be nice to make them a single volume, so that I could share L2ARC and ZIL devices, rather than buy two sets. It appears possible to set up 7x450gb mirrored sets and 7x600gb mirrored sets in the same volume, without losing capacity. Is that a bad idea? Is there a problem with having different stripe sizes, like this? Thanks, Rob -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] stripes of different size mirror groups
Thanks, Ian. If I understand correctly, the performance would then drop to the same level as if I set them up as separate volumes in the first place. So, I get double the performance for 75% of my data, and equal performance for 25% of my data, and my L2ARC will adapt to my working set across both enclosures. That sounds like all upside, and no downside, unless I'm missing something. Are there any other problems? -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Performance advantages of spool with 2x raidz2 vdevs vs. Single vdev
Hi guys, I am about to reshape my data spool and am wondering what performance diff. I can expect from the new config. Vs. The old. The old config. Is a pool of a single vdev of 8 disks raidz2. The new pool config is 2vdev's of 7 disk raidz2 in a single pool. I understand it should be better with higher io throughputand better read/write rates...but interested to hear the science behind it. ... FYI, it's just a home serverbut I like it. Some answers (and questions) are here: http://www.opensolaris.org/jive/thread.jspa?threadID=102368tstart=0 *** We need this explained in the ZFS FAQ by a Panel of Experts *** Q: I (we) have a Home Computer and desire to use ZFS with a few large, cheap, (consumer-grade) Drives. What can I expect from 3 Drives, would I be better off with 4 or 5. Please note: I doubt I can afford as many as 10 Drives nor could I stuff them into my Box so please suggest options that use less than that many (most prefefably less than 7). A: ? Thanks, Rob -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] [?] - What is the recommended number of disks for a consumer PC with ZFS
I'm building my new storage server, all the parts should come in this week. ... Another answer is here: http://eonstorage.blogspot.com/2010/03/whats-best-pool-to-build-with-3-or-4.html Rob -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Confused about consumer drives and zfs can someone help?
I wanted to build a small back up (maybe also NAS) server using A common question that I am trying to get answered (and have a few) here: http://www.opensolaris.org/jive/thread.jspa?threadID=102368tstart=0 Rob -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Recommended RAM for ZFS on various platforms
I'm currently planning on running FreeBSD with ZFS, but I wanted to double-check how much memory I'd need for it to be stable. The ZFS wiki currently says you can go as low as 1 GB, but recommends 2 GB; however, elsewhere I've seen someone claim that you need at least 4 GB. ... How about other OpenSolaris-based OSs, like NexentaStor? ... If it matters, I'm currently planning on RAID-Z2 with 4x500GB consumer-grade SATA drives. ... This is on an AMD64 system, and the OS in question will be running inside of VirtualBox ... Thanks, Michael Buy the biggest Chips you can afford and if you need to pair them (for performance) do so. You want to keep as many Memory Slots open as you can so you can add more memory later. I think you (or I) would be unhappy with a measly 4GB in a new System but in reality it would be OK. If it is not OK (for you) then you have open Memory Slots in which to add more Chips (which you are certain to want to do in the future). Rob -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] [?] - What is the recommended number of disks for a consumer PC with ZFS
I'm building my new storage server, all the parts should come in this week... How did it turn out ? Did 8x1TB Drives seem to be the correct number or a couple too many (based on the assumption that you did not run out of space; I mean solely from a performance / 'ZFS usability' standpoint - as opposed to over three dozen tiny Drives). Thanks for your reply, Rob -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] reconstruct recovery of rpool zpool and zfs file system with bad sectors
Roy, Thanks for your reply. I did get a new drive and attempted the approach (as you have suggested pre your reply) however once booted off the OpenSolaris Live CD (or the rebuilt new drive), I was not able to import the rpool (which I had established had sector errors). I expect I should have had some success if the vdev labels were intact (I currently suspect some critical boot files are impacted by bad sectors resulting in failed boot attempts from that partition slice). Unfortunately, I didn't keep a copy of the messages (if any - I have tried many permutations since). At my last attempt ... I installed knoppix (debian) on one of the partitions (also allowed access to smartctl and hdparm too - I was hoping to reduce the read timeout to speed up the exercise), then added zfs-fuse (to access the space I will use to stage the recovery file) and added dd_rescue and gnu ddrescue packages. smartctl appears not to be able to manage the disk while attached to usb (but I am guessing because don't have much experience with it). At this point I attempted dd_rescue to create an image of the partition with bad sectors (hoping there were efficiencies beyong normal dd) but it was at 5.6GB in 36 hours, so again I needed to abort however it does log the blocks attempted so far so hopefully I can skip past them when I next get an opportunity. Although it does now appear that gnu ddrescue is the preferred of the two utilities which I may opt to use to look at creating an image of the partition before attempting recovery of the slice (rpool). As an aside, I noticed that the knoppix 'dmesg | grep sd' command which reflects the primary partition devices, no longer appears to reflect the solaris partition (p2) slice devices (as it would the extended p4 partitions logical partition devices configured). I suspect due to this, the rpool (one of the solaris partition slices) appears not to be detected by the knoppix zfs-fuse 'zpool import' (although I can access the zpool which exists on partition p3). I wonder if this is related to the transition from ufs to zfs? -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] reconstruct recovery of rpool zpool and zfs file system with bad sectors
Folks I posted this question on (OpenSolaris - Help) without any replies http://opensolaris.org/jive/thread.jspa?threadID=129436tstart=0 and am re-posting here in the hope someone can help ... I have updated the wording a little too (in an attempt to clarify) I currently use OpenSolaris on a Toshiba M10 laptop. One morning the system wouldn't boot OpenSolaris 2009.06 (it was simply unable progress to the second stage grub). On further investigation I discovered the hdd partition slice with rpool appeared to have bad sectors. Faced with either a rebuild or an attempt at recovery, I first made an attempt to recover the slice before rebuilding. The c7t0d0 HDD (p0) was divided into p1 (NTFS 24GB), p2 (OpenSolaris 24GB), p3 (OpenSolaris zfs pool for data 160GB) and p4 (50GB extended with 32GB pcfs, 12GB linux and linux swap) partitions (or something close to that). On the first Solaris partition (p2), slice 0 was the OpenSolaris rpool zpool. To attempt recovery I booted the OpenSolaris 2009.06 live CD and was able to import the ZFS pool which was configured on p3. On the p2 device (Solaris boot partition which wouldn't boot) I then ran dd if=/dev/rdsk/c7t0d0s2 bs=512 conv=sync, noerror of=/p0/s2image.dd. Due to sector read error timeouts, this took longer than my maintenance window allowed and I ended up aborting the attempt with a significant amount of sectors already captured. On block examination of this (so far) captured image.dd, I noticed the first two s0 vdev labels appeared to be intact. I then skipped the expected number of s2 sectors to get to the s0 start and copied blocks to attempt to reconstruct the s0 rpool (against this I ran zdb -l which reported the first two labels) and gave me the encouragement necessary to continue the exercise. At the next opportunity I ran the command again using the skip directive to capture the balance of slice. The result was that I had two files (images) comprising the good c7t0d0s0 sectors (with I expect the bad padded) Ie. an s0image_start.dd and s0image_end.dd As mentioned at this stage I was able to run 'zfs -l s0image_start.dd' and see the first two vdev labels and 'zfs -l s0image_end.dd' and see the last two vdev labels. I then combined the two files (I tried various approaches eg. cat and dd with the append directive) however only the first two vdev labels appear to be readable in the resulting s0image_s0.dd? The resulting file size, which I expect is largely good sectors with padding for bad sectors, matches that of the prtvtoc s0 sectors count multiplied by 512. Can anyone advise .. why I am unable to read the third and forth vdev labels once the start and end files are combined? Is there another approach that may prove more fruitful? Once I have the file (with labels being in the correct places) I was intending to attempt to import the vdev zpool as rpool2 or attempt any repair procedures I could locate (as far as was possible anyway) to see what data could be recovered (besides it was an opportunity to get another close look at ZFS). Incidentally *only* the c7t0d0s0 slice appeared to have bad sectors (I do wonder what the significance is of this?). -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Does ZFS use large memory pages?
Hi Gary, I would not remove this line in /etc/system. We have been combatting this bug for a while now on our ZFS file system running JES Commsuite 7. I would be interested in finding out how you were able to pin point the problem. We seem to have no worries with the system currently, but when the file system gets above 80% we seems to have quite a number of issues, much the same as what you've had in the past, ps and prstats hanging. are you able to tell me the IDR number that you applied? Thanks, Rob -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Snv_126 Kernel PF Panic
Hey All, I'm having some issues with a snv_126 file server running on an HP ML370 G6 server with an Adaptec Raid Card (31605). The server has the rpool, plus two raidz2 data pools (one is 1.5TB and 1.0TB respectively). I have been using e-sata to backup the pools to a pool that contains 3x 1.5 Tb drives every week. This has all worked great for the last 4 or so months. Starting last week, the machine would panic and reboot when attempting to perform a backup. This week, the machine has been randomly rebooting every 3-15 hours (with or without backup pool attached), complaining of: (#pf Page fault) rp=ff0010568eb0 addr=30 occurred in module zfs due to a NULL pointer dereference I use cron to perform a scrub of all pools every night, and there have been no errors what so ever. Below is the output from mdb $C on the core dump: rcher...@stubborn2:/var/crash/Stubborn2$ mdb 0 Loading modules: [ unix genunix specfs dtrace mac cpu.generic uppc pcplusmp rootnex scsi_vhci zfs sd sockfs ip hook neti sctp arp usba uhci fctl md lofs fcip fcp cpc random crypto smbsrv nfs logindmux ptm ufs nsmb sppp ipc ] $C ff000f4ef3b0 vdev_is_dead+0xc(0) ff000f4ef3d0 vdev_readable+0x16(0) ff000f4ef410 vdev_mirror_child_select+0x61(ff02fa41da10) ff000f4ef450 vdev_mirror_io_start+0xda(ff02fa41da10) ff000f4ef490 zio_vdev_io_start+0x1ba(ff02fa41da10) ff000f4ef4c0 zio_execute+0xa0(ff02fa41da10) ff000f4ef4e0 zio_nowait+0x42(ff02fa41da10) ff000f4ef580 arc_read_nolock+0x82d(0, ff02d716b000, ff02e3fdc000, 0, 0, 6, 3, ff000f4ef65c, ff000f4ef670) ff000f4ef620 arc_read+0x75(0, ff02d716b000, ff02e3fdc000, ff02e3a7f928, 0, 0, 6, 3, ff000f4ef65c, ff000f4ef670) ff000f4ef6c0 dbuf_prefetch+0x131(ff02e3a80018, 20) ff000f4ef710 dmu_zfetch_fetch+0xa8(ff02e3a80018, 20, 1) ff000f4ef750 dmu_zfetch_dofetch+0xb8(ff02e3a80278, ff02f4c52868) ff000f4ef7b0 dmu_zfetch_find+0x436(ff02e3a80278, ff000f4ef7c0, 1) ff000f4ef870 dmu_zfetch+0xac(ff02e3a80278, 2b, 4000, 1) ff000f4ef8d0 dbuf_read+0x170(ff02f3d8ea00, 0, 2) ff000f4ef950 dnode_hold_impl+0xed(ff02e2a2f040, 1591, 1, ff02e4e71478, ff000f4ef998) ff000f4ef980 dnode_hold+0x2b(ff02e2a2f040, 1591, ff02e4e71478, ff000f4ef998) ff000f4ef9e0 dmu_tx_hold_object_impl+0x4a(ff02e4e71478, ff02e2a2f040, 1591, 2, 0, 0) ff000f4efa00 dmu_tx_hold_bonus+0x2a(ff02e4e71478, 1591) ff000f4efa50 zfs_inactive+0x99(ff030213ae80, ff02d4ed6d88, 0) ff000f4efaa0 fop_inactive+0xaf(ff030213ae80, ff02d4ed6d88, 0) ff000f4efac0 vn_rele+0x5f(ff030213ae80) ff000f4efae0 smb_node_free+0x7d(ff02f098b2a0) ff000f4efb10 smb_node_release+0x9a(ff02f098b2a0) ff000f4efb30 smb_ofile_delete+0x76(ff03026d5d18) ff000f4efb60 smb_ofile_release+0x84(ff03026d5d18) ff000f4efb80 smb_request_free+0x23(ff02fa4b0058) ff000f4efbb0 smb_session_worker+0x6e(ff02fa4b0058) ff000f4efc40 taskq_d_thread+0xb1(ff02e51b9e90) ff000f4efc50 thread_start+8() I can provide any other info that may be need. Thank you in advance for your help! Rob -- Rob Cherveny Manager of Information Technology American Junior Golf Association 770.868.4200 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] sharing a ssd between rpool and l2arc
you can't use anything but a block device for the L2ARC device. sure you can... http://mail.opensolaris.org/pipermail/zfs-discuss/2010-March/039228.html it even lives through a reboot (rpool is mounted before other pools) zpool create -f test c9t3d0s0 c9t4d0s0 zfs create -V 3G rpool/cache zpool add test cache /dev/zvol/dsk/rpool/cache reboot if your asking for a L2ARC on rpool, well, yea, its not mounted soon enough, but the point is to put rpool, swap, and L2ARC for your storage pool all on a single SSD.. Rob ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Sun Flash Accelerator F20 numbers
if you disable the ZIL altogether, and you have a power interruption, failed cpu, or kernel halt, then you're likely to have a corrupt unusable zpool the pool will always be fine, no matter what. or at least data corruption. yea, its a good bet that data sent to your file or zvol will not be there when the box comes back, even though your program had finished seconds before the crash. Rob ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] SSD As ARC
Can't you slice the SSD in two, and then give each slice to the two zpools? This is exactly what I do ... use 15-20 GB for root and the rest for an L2ARC. I like the idea of swapping on SSD too, but why not make a zvol for the L2ARC so your not limited by the hard partitioning? Rob ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] SSD As ARC
I like the idea of swapping on SSD too, but why not make a zvol for the L2ARC so your not limited by the hard partitioning? it lives through a reboot.. zpool create -f test c9t3d0s0 c9t4d0s0 zfs create -V 3G rpool/cache zpool add test cache /dev/zvol/dsk/rpool/cache reboot zpool status pool: rpool state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM rpool ONLINE 0 0 0 mirror ONLINE 0 0 0 c9t1d0s0 ONLINE 0 0 0 c9t2d0s0 ONLINE 0 0 0 errors: No known data errors pool: test state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM test ONLINE 0 0 0 c9t3d0s0 ONLINE 0 0 0 c9t4d0s0 ONLINE 0 0 0 cache /dev/zvol/dsk/rpool/cache ONLINE 0 0 0 errors: No known data errors ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS send and receive corruption across a WAN link?
Can a ZFS send stream become corrupt when piped between two hosts across a WAN link using 'ssh'? For example a host in Australia sends a stream to a host in the UK as follows: # zfs send tank/f...@now | ssh host.uk receive tank/bar -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Poor ZIL SLC SSD performance
An UPS plus disabling zil, or disabling synchronization, could possibly achieve the same result (or maybe better) iops wise. Even with the fastest slog, disabling zil will always be faster... (less bytes to move) This would probably work given that your computer never crashes in an uncontrolled manner. If it does, some data may be lost (and possibly the entire pool lost, if you are unlucky). the pool would never be at risk, but when your server reboots, its clients will be confused that things it sent, and the server promised it had saved, are gone. For some clients, this small loss might be the loss of their entire dataset. Rob ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Reading ZFS config for an extended period
RFE open to allow you to store [DDT] on a separate top level VDEV hmm, add to this spare, log and cache vdevs, its to the point of making another pool and thinly provisioning volumes to maintain partitioning flexibility. taemun: hay, thanks for closing the loop! Rob ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Cores vs. Speed?
I like the original Phenom X3 or X4 we all agree ram is the key to happiness. The debate is what offers the most ECC ram for the least $. I failed to realize the AM3 cpus accepted UnBuffered ECC DDR3-1333 like Lynnfield. To use Intel's 6 slots vs AMD 4 slots, one must use Registered ECC. So the low cost mission is something like AMD Phenom II X4 955 Black Edition Deneb 3.2GHz Socket AM3 125W $150 http://www.newegg.com/Product/Product.aspx?Item=N82E16819103808 $ 85 http://www.newegg.com/Product/Product.aspx?Item=N82E16813131609 $ 60 http://www.newegg.com/Product/Product.aspx?Item=N82E16820139050 But we are still stuck at 8G without going to expensive ram or a more expensive CPU. Rob ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Cores vs. Speed?
if zfs overlaps mirror reads across devices. it does... I have one very old disk in this mirror and when I attach another element one can see more reads going to the faster disks... this past isn't right after the attach but since the reboot, but one can still see the reads are load balanced depending on the response of elements in the vdev. 13 % zpool iostat -v capacity operationsbandwidth pool used avail read write read write -- - - - - - - rpool 7.01G 142G 0 0 1.60K 1.44K mirror7.01G 142G 0 0 1.60K 1.44K c9t1d0s0 - - 0 0674 1.46K c9t2d0s0 - - 0 0687 1.46K c9t3d0s0 - - 0 0720 1.46K c9t4d0s0 - - 0 0750 1.46K but I also support your conclusions. Rob ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Cores vs. Speed?
Intel's RAM is faster because it needs to be. I'm confused how AMD's dual channel, two way interleaved 128-bit DDR2-667 into an on-cpu controller is faster than Intel's Lynnfield dual channel, Rank and Channel interleaved DDR3-1333 into an on-cpu controller. http://www.anandtech.com/printarticle.aspx?i=3634 With the AMD CPU, the memory will run cooler and be cheaper. cooler yes, but only $2 more per gig for 2x bandwidth? http://www.newegg.com/Product/Product.aspx?Item=N82E16820139050 http://www.newegg.com/Product/Product.aspx?Item=N82E16820134652 and if one uses all 16 slots, that 667Mhz simm runs at 533Mhz with AMD. The same is true for Lynnfield if one uses Registered DDR3, one only gets 800Mhz with all 6 slots. (single or dual rank) Regardless, for zfs, memory is more important than raw CPU agreed! but everything must be balanced. Rob ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Cores vs. Speed?
I am leaning towards AMD because of ECC support well, lets look at Intel's offerings... Ram is faster than AMD's at 1333Mhz DDR3 and one gets ECC and thermal sensor for $10 over non-ECC http://www.newegg.com/Product/Product.aspx?Item=N82E16820139040 This MB has two Intel ethernets and for an extra $30 an ether KVM (LOM) http://www.newegg.com/Product/Product.aspx?Item=N82E16813182212 One needs a Xeon 34xx for ECC, the 45W versions isn't on newegg, and ignoring the one without Hyper-Threading leaves us http://www.newegg.com/Product/Product.aspx?Item=N82E16819117225 Yea @ 95W it isn't exactly low power, but 4 cores @ 2533MHz and another 4 Hyper-Thread cores is nice.. If you only need one core, the marketing paperwork claims it will push to 2.93GHz too. But the ram bandwidth is the big win for Intel. Avoid the temptation, but @ 2.8Ghz without ECC, this close $$ http://www.newegg.com/Product/Product.aspx?Item=N82E16819115214 Now, this gets one to 8G ECC easily...AMD's unfair advantage is all those ram slots on their multi-die MBs... A slow AMD cpu with 64G ram might be better depending on your working set / dedup requirements. Rob ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] verging OT: how to buy J4500 w/o overpriced drives
true. but I buy a Ferrari for the engine and bodywork and chassis engineering. It is totally criminal what Sun/EMC/Dell/Netapp do charging its interesting to read this with another thread containing: timeout issue is definitely the WD10EARS disks. replaced 24 of them with ST32000542AS (f/w CC34), and the problem departed with the WD disks. everyone needs to eat, if Ferrari spreads their NRE over the wheels, it might be because they are light and have been tested to not melt from the heat. Sun/EMC/Dell/Netapp tests each of their components and sells the total car. I'm thankful Sun shares their research and we can build on it. (btw, netapp ontap 8 is freebsd, and runs on std hardware after alittle bios work :-) Rob ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Best 1.5TB drives for consumer RAID?
a 1U or 2U JBOD chassis for 2.5 drives, from http://supermicro.com/products/nfo/chassis_storage.cfm the E1 (single) or E2 (dual) options have a SAS expander so http://supermicro.com/products/chassis/2U/?chs=216 fits your build or build it your self with http://supermicro.com/products/accessories/mobilerack/CSE-M28E2.cfm ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] 4 Internal Disk Configuration
By partitioning the first two drives, you can arrange to have a small zfs-boot mirrored pool on the first two drives, and then create a second pool as two mirror pairs, or four drives in a raidz to support your data. agreed.. 2 % zpool iostat -v capacity operationsbandwidth pool used avail read write read write - - - - - - r 8.34G 21.9G 0 5 1.62K 17.0K mirror 8.34G 21.9G 0 5 1.62K 17.0K c5t0d0s0 - - 0 2 3.30K 17.2K c5t1d0s0 - - 0 2 3.66K 17.2K - - - - - - z 375G 355G 6 32 67.2K 202K mirror 133G 133G 2 14 24.7K 84.2K c5t0d0s7 - - 0 3 53.3K 84.3K c5t1d0s7 - - 0 3 53.2K 84.3K mirror 120G 112G 1 9 21.3K 59.6K c5t2d0- - 0 2 38.4K 59.7K c5t3d0- - 0 2 38.2K 59.7K mirror 123G 109G 1 8 21.3K 58.6K c5t4d0- - 0 2 36.4K 58.7K c5t5d0- - 0 2 37.2K 58.7K - - - - - - ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs, raidz, spare and jbod
Hello Arnaud, Thanks for your reply. We have a system ( 2 x Xeon 5410, Intel S5000PSL mobo and 8 GB memory) with 12 x 500 GB SATA disks on a Areca 1130 controller. rpool is a mirror over 2 disks. 8 disks in raidz2, 1 spare. We have 2 aggr links. Our goal is a ESX storage system, I am using ISCSI and NFS to serve space to our ESX 4.0 servers. We can remove a disk, with no problem. I can do a replace and the disk is being resilverd. That works fine here. Our problem comes when we make it the server a little bit harder! When we give the server a hard time, copy 60G+ of data or do some other stuff to give the system some load it hangs. This happens after 5 minutes or after 30 minutes or later but it hangs. Then we get the problems of the attached pictures. I have also emaild Areca. I'll hope the can fix it.. Regards, Rob -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] unable to zfs destroy
this one has me alittle confused. ideas? j...@opensolaris:~# zpool import z cannot mount 'z/nukeme': mountpoint or dataset is busy cannot share 'z/cle2003-1': smb add share failed j...@opensolaris:~# zfs destroy z/nukeme internal error: Bad exchange descriptor Abort (core dumped) j...@opensolaris:~# adb core core file = core -- program ``/sbin/zfs'' on platform i86pc SIGABRT: Abort $c libc_hwcap1.so.1`_lwp_kill+0x15(1, 6, 80462a8, fee9bb5e) libc_hwcap1.so.1`raise+0x22(6, 0, 80462f8, fee7255a) libc_hwcap1.so.1`abort+0xf2(8046328, fedd, 8046328, 8086570, 8086970, 400) libzfs.so.1`zfs_verror+0xd5(8086548, 813, fedc5178, 804635c) libzfs.so.1`zfs_standard_error_fmt+0x225(8086548, 32, fedc5178, 808acd0) libzfs.so.1`zfs_destroy+0x10e(808acc8, 0, 0, 80479c8) destroy_callback+0x69(808acc8, 8047910, 80555ec, 8047910) zfs_do_destroy+0x31f(2, 80479c8, 80479c4, 80718dc) main+0x26a(3, 80479c4, 80479d4, 8053fdf) _start+0x7d(3, 8047ae4, 8047ae8, 8047af0, 0, 8047af9) ^d j...@opensolaris:~# uname -a SunOS opensolaris 5.11 snv_130 i86pc i386 i86pc j...@opensolaris:~# zpool status -v z pool: z state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://www.sun.com/msg/ZFS-8000-8A scrub: scrub in progress for 0h39m, 19.15% done, 2h46m to go config: NAMESTATE READ WRITE CKSUM z ONLINE 0 0 2 c3t0d0s7 ONLINE 0 0 4 c3t1d0s7 ONLINE 0 0 0 c2d0 ONLINE 0 0 4 errors: Permanent errors have been detected in the following files: z/nukeme:0x0 j...@opensolaris:~# zfs list z/nukeme NAME USED AVAIL REFER MOUNTPOINT z/nukeme 49.0G 496G 49.0G /z/nukeme j...@opensolaris:~# zdb -d z/nukeme 0x0 zdb: can't open 'z/nukeme': Device busy there is also no mount point /z/nukeme any ideas how to nuke /z/nukeme? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Update - mpt errors on snv 101b
I can report io errors with Chenbro based LSI SASx36 IC based expanders tested with 111b/121/128a/129. The HBA was LSI 1068 based. If I bypass expander by adding more HBA controllers, mpt does not have io errors. -nola On Dec 8, 2009, at 6:48 AM, Bruno Sousa wrote: Hi James, Thank you for your feedback, and i will send the prtconf -v output for your email. I also have another system where i can test something if that's the case, and if you need extra information or even access to the system, please let me know it. Thank you, Bruno James C. McPherson wrote: Bruno Sousa wrote: Hi all, During this problem i did a power-off/power-on in the server and the bus reset/scsi timeout issue persisted. After that i decided to poweroff/power on the jbod array, and after that everything became normal. No scsi timeouts, normal performance, everything is okay now. With this is it safe to assume that the problem may becaused by the SAS expander (one single LSI SASX36 Expander Chip) used by the supermicro jbod chassis, and not by the hba/mpt driver? Hi Bruno, that is indeed what I, personally, suspect is the case. Tracking that down and conclusively proving so is, however, another thing entirely. Could you send the output from prtconf -v for your host please, so that we can have a look at the vital information for the enclosure services and SMP nodes that the SAS Expander presents/ thankyou, James C. McPherson -- Senior Kernel Software Engineer, Solaris Sun Microsystems http://blogs.sun.com/jmcphttp://www.jmcp.homeunix.com/blog ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] How can we help fix MPT driver post build 129
How can we help with what is outlined below. I can reproduce these at will, so if anyone at Sun would like an environment to test this situation let me know. What is the best info to grab for you folks to help here? Thanks - nola This is in regard to these threads: http://www.opensolaris.org/jive/thread.jspa?messageID=421400#421400 http://www.opensolaris.org/jive/thread.jspa?threadID=118947tstart=0 http://www.opensolaris.org/jive/thread.jspa?threadID=117702tstart=1 http://www.opensolaris.org/jive/thread.jspa?messageID=437031tstart=0 And bug IDs: 6894775 mpt driver timeouts and bus resets under load 6900767 Server hang with LSI 1068E based SAS controller under load Exec Summary: Those using the LSI 1068 chipset with the LSI SAS2x IC expander have IO errors under load from about build 118 to 129 (last build I tested). At build 111b, it worked. If you take the same hardware and load test scripts, run under 111b your OK, run under @118 and on you suffer from for example: Dec 5 08:17:04 gb2000-007 scsi: [ID 365881 kern.info] /p...@7a,0/pci8086,3...@7/pci1000,3...@0 (mpt1): Dec 5 08:17:04 gb2000-007 Log info 0x3000 received for target 79. Dec 5 08:17:04 gb2000-007 scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc Dec 5 08:17:07 gb2000-007 scsi: [ID 243001 kern.warning] WARNING: /p...@7a,0/pci8086,3...@7/pci1000,3...@0 (mpt1): Dec 5 08:17:07 gb2000-007 SAS Discovery Error on port 4. DiscoveryStatus is DiscoveryStatus is |Unaddressable device found| Dec 5 08:18:09 gb2000-007 scsi: [ID 107833 kern.warning] WARNING: /p...@7a,0/pci8086,3...@7/pci1000,3...@0 (mpt1): Dec 5 08:18:09 gb2000-007 Disconnected command timeout for Target 79 Dec 5 08:18:14 gb2000-007 scsi: [ID 365881 kern.info] /p...@7a,0/pci8086,3...@7/pci1000,3...@0 (mpt1): Dec 5 08:18:14 gb2000-007 Log info 0x3113 received for target 79. Dec 5 08:18:14 gb2000-007 scsi_status=0x0, ioc_status=0x8048, scsi_state=0xc Dec 5 08:18:17 gb2000-007 scsi: [ID 243001 kern.warning] WARNING: /p...@7a,0/pci8086,3...@7/pci1000,3...@0 (mpt1): Dec 5 08:18:17 gb2000-007 mpt_handle_event_sync: IOCStatus=0x8000, IOCLogInfo=0x3000 Dec 5 08:18:17 gb2000-007 scsi: [ID 243001 kern.warning] WARNING: /p...@7a,0/pci8086,3...@7/pci1000,3...@0 (mpt1): Dec 5 08:18:17 gb2000-007 mpt_handle_event: IOCStatus=0x8000, IOCLogInfo=0x3000 Dec 5 08:18:19 gb2000-007 scsi: [ID 365881 kern.info] /p...@7a,0/pci8086,3...@7/pci1000,3...@0 (mpt1): Dec 5 08:18:19 gb2000-007 Log info 0x3000 received for target 79. Dec 5 08:18:19 gb2000-007 scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc Dec 5 08:18:22 gb2000-007 scsi: [ID 243001 kern.warning] WARNING: /p...@7a,0/pci8086,3...@7/pci1000,3...@0 (mpt1): Dec 5 08:18:22 gb2000-007 SAS Discovery Error on port 4. DiscoveryStatus is DiscoveryStatus is |Unaddressable device found| Dec 5 08:19:24 gb2000-007 scsi: [ID 107833 kern.warning] WARNING: /p...@7a,0/pci8086,3...@7/pci1000,3...@0 (mpt1): Dec 5 08:19:24 gb2000-007 Disconnected command timeout for Target 79 Dec 5 08:19:29 gb2000-007 scsi: [ID 365881 kern.info] /p...@7a,0/pci8086,3...@7/pci1000,3...@0 (mpt1): Dec 5 08:19:29 gb2000-007 Log info 0x3113 received for target 79. Dec 5 08:19:29 gb2000-007 scsi_status=0x0, ioc_status=0x8048, scsi_state=0xc Dec 5 08:19:32 gb2000-007 scsi: [ID 243001 kern.warning] WARNING: /p...@7a,0/pci8086,3...@7/pci1000,3...@0 (mpt1): Dec 5 08:19:32 gb2000-007 mpt_handle_event_sync: IOCStatus=0x8000, IOCLogInfo=0x3000 Dec 5 08:19:32 gb2000-007 scsi: [ID 243001 kern.warning] WARNING: /p...@7a,0/pci8086,3...@7/pci1000,3...@0 (mpt1): Dec 5 08:19:32 gb2000-007 mpt_handle_event: IOCStatus=0x8000, IOCLogInfo=0x3000 Dec 5 08:19:34 gb2000-007 scsi: [ID 365881 kern.info] /p...@7a,0/pci8086,3...@7/pci1000,3...@0 (mpt1): Dec 5 08:19:34 gb2000-007 Log info 0x3000 received for target 79. Dec 5 08:19:34 gb2000-007 scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc Dec 5 08:19:37 gb2000-007 scsi: [ID 243001 kern.warning] WARNING: /p...@7a,0/pci8086,3...@7/pci1000,3...@0 (mpt1): Dec 5 08:19:37 gb2000-007 SAS Discovery Error on port 4. DiscoveryStatus is DiscoveryStatus is |Unaddressable device found| Dec 5 08:20:39 gb2000-007 scsi: [ID 107833 kern.warning] WARNING: /p...@7a,0/pci8086,3...@7/pci1000,3...@0 (mpt1): Dec 5 08:20:39 gb2000-007 Disconnected command timeout for Target 79 Dec 5 08:20:39 gb2000-007 scsi: [ID 243001 kern.warning] WARNING: /p...@7a,0/pci8086,3...@7/pci1000,3...@0 (mpt1): Dec 5 08:20:39 gb2000-007 mpt_handle_event_sync: IOCStatus=0x8000, IOCLogInfo=0x31112000 Dec 5 08:20:44 gb2000-007 scsi: [ID 365881 kern.info] /p...@7a,0/pci8086,3...@7/pci1000,3...@0 (mpt1): Dec 5 08:20:44 gb2000-007 Log info 0x3113 received for target 79. Dec 5 08:20:44 gb2000-007 scsi_status=0x0, ioc_status=0x8048, scsi_state=0xc Dec 5 08:20:44 gb2000-007 scsi:
Re: [zfs-discuss] Separate Zil on HDD ?
2 x 500GB mirrored root pool 6 x 1TB raidz2 data pool I happen to have 2 x 250GB Western Digital RE3 7200rpm be better than having the ZIL 'inside' the zpool. listing two log devices (stripe) would have more spindles than your single raidz2 vdev.. but for low cost fun one might make a tinny slice on all the disks of the raidz2 and list six log devices (6 way stripe) and not bother adding the other two disks. Rob ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Workaround for mpt timeouts in snv_127
Chenbro 16 hotswap bay case. It has 4 mini backplanes that each connect via an SFF-8087 cable StarTech HSB430SATBK hmm, both are passive backplanes with one SATA tunnel per link... no SAS Expanders (LSISASx36) like those found in SuperMicro or J4x00 with 4 links per connection. wonder if there is a LSI issue with too many links in HBA mode? Rob ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] scrub differs in execute time?
P45 Gigabyte EP45-DS3P. I put the AOC card into a PCI slot I'm not sure how many half your disks are or how your vdevs are configured, but the ICH10 has 6 sata ports at 300MB and one PCI port at 266MB (that's also shared with the IT8213 IDE chip) so in an ideal world your scrub bandwidth would be 300*6 MB with 6 disks on ICH10, in a strip 300*1 MB with 6 disks on ICH10, in a raidz 300*3+(266/3) MB with 3 disks on ICH10, and 3 on shared PCI in a strip 266/3 MB with 3 disks on ICH10, and 3 on shared PCI in a raidz 266/6 MB with 6 disks on shared PCI in a stripe 266/6 MB with 6 disks on shared PCI in a raidz we know disk don't go that fast anyway, but going from a 8h to 15h scrub is very reasonable depending on vdev config. Rob ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] scrub differs in execute time?
The ICH10 has a 32-bit/33MHz PCI bus which provides 133MB/s at half duplex. you are correct, I thought ICH10 used a 66Mhz bus, when infact its 33Mhz. The AOC card works fine in a PCI-X 64Bit/133Mhz slot good for 1,067 MB/s even if the motherboard uses a PXH chip via 8 lane PCIE. Rob ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] raidz-1 vs mirror
from a two disk (10krpm) mirror layout to a three disk raidz-1. wrights will be unnoticeably slower for raidz1 because of parity calculation and latency of a third spindle. but reads will be 1/2 the speed of the mirror because it can split the reads between two disks. another way to say the same thing: a raidz will be the speed of the slowest disk in the array, while a mirror will be x(Number of mirrors) time faster for reads or the the speed of the slowest disk for wrights. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] PSARC recover files?
frequent snapshots offer outstanding oops protection. Rob ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] PSARC recover files?
Maybe to create snapshots after the fact how does one quiesce a drive after the fact? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS + fsck
ZFS scrub will detect many types of error in your data or the filesystem metadata. If you have sufficient redundancy in your pool and the errors were not due to dropped or misordered writes, then they can often be automatically corrected during the scrub. If ZFS detects an error from which it cannot automatically recover, it will often instantly lock your entire pool to prevent any read or write access, informing you only that you must destroy it and restore from backups to get your data back. Your only recourse in such situations is to do exactly that, or enlist the help of Victor Latushkin to attempt to recover your pool using painstaking manual manipulation. Recent putbacks seem to indicate that future releases will provide a mechanism to allow mere mortals to recover from some of the errors caused by dropped writes. cheers, Rob -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] sub-optimal ZFS performance
So the solution is to never get more than 90% full disk space while that's true, its not Henrik's main discovery. Henrik points out that 1/4 of the arc is used for metadata, and sometime that's not enough.. if echo ::arc | mdb -k | egrep ^size isn't reaching echo ::arc | mdb -k | egrep ^c and you are maxing out your metadata space, check: echo ::arc | mdb -k | grep meta_ one can set the metadata space (1G in this case) with: echo arc_meta_limit/Z 0x400 | mdb -kw So while Henrik's FS had some fragmentation, 1/4 of c_max wasn't enough metadata arc space for number of files in /var/pkg/download good find Henrik! Rob ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs code and fishworks fork
are you going to ask NetApp to support ONTAP on Dell systems, well, ONTAP 5.0 is built on freebsd, so it wouldn't be too hard to boot on dell hardware. Hay, at least it can do aggregates larger than 16T now... http://www.netapp.com/us/library/technical-reports/tr-3786.html Rob ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZPOOL Metadata / Data Error - Help
Action: Restore the file in question if possible. Otherwise restore the entire pool from backup. metadata:0x0 metadata:0x15 bet its in a snapshot that looks to have been destroyed already. try zpool clear POOL01 zpool scrub POOL01 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] bigger zfs arc
zfs will use as much memory as is necessary but how is necessary calculated? using arc_summary.pl from http://www.cuddletech.com/blog/pivot/entry.php?id=979 my tiny system shows: Current Size: 4206 MB (arcsize) Target Size (Adaptive): 4207 MB (c) Min Size (Hard Limit):894 MB (zfs_arc_min) Max Size (Hard Limit):7158 MB (zfs_arc_max) so arcsize is close to the desired c, no pressure here but it would be nice to know how c is calculated as its much smaller than zfs_arc_max on a system like yours with nothing else on it. When an L2ARC is attached does it get used if there is no memory pressure? My guess is no. for the same reason an L2ARC takes so long to fill. arc_summary.pl from the same system is Most Recently Used Ghost:0% 9367837 (mru_ghost) [ Return Customer Evicted, Now Back ] Most Frequently Used Ghost: 0% 11138758 (mfu_ghost) [ Frequent Customer Evicted, Now Back ] so with no ghosts, this system wouldn't benefit from an L2ARC even if added In review: (audit welcome) if arcsize = c and is much less than zfs_arc_max, there is no point in adding system ram in hopes of increase arc. if m?u_ghost is a small %, there is no point in adding an L2ARC. if you do add a L2ARC, one must have ram between c and zfs_arc_max for its pointers. Rob ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] zpool create over old pool recovery
Folks, Need help with ZFS recovery following zfs create ... We recently received new laptops (hardware refresh) and I simply transfered the multiboot hdd (using OpenSolaris 2008.11 as the primary production OS) from the old laptop to the new one (used the live DVD to do the zpool import, updated the boot archive and did a devfsadm) and worked away as usual. I then wanted to use the WinXP distribution which was shipped with the new laptop and discovered the existing partition was too small. I bought a new hdd and proceeded to partition it as required (1.WinXP 24GB, 2.Solaris 24GB, 3.Solaris 130GB, 4.extended with logical 5.FAT32 30GB, 6.Linux 20GB, 7.Linux swap 6GB). I would consider myself inexperienced with ZFS (one of the reasons I opted for OpenSolaris was to get more familiar with it and the other features before they were adopted by customers). So although I bet there would be more elgant ways to do this I stuck with what I know. I connected the hdd (with functional but inappropriate partition sizes) to the usb port. I 'dd' the first Solaris partition across (OpenSolaris rpool dataset) to the new drive (used the live DVD to do the zpool import, update_grub and bootadm) then all was as it should be. It appears I may have been able to do this from the hdd after booting OpenSolaris but I wasn't aware of how to deal with two pools of the same name - Ie. rpool. I subsequently copied the linux OS across (booted linux, created ext3fs and copied OS files across using tar). Also from Linux, I created the FAT32 filesystem and copied the data across with tar). - all okay and functional. At this point all I need to do was copy the second 130GB Solaris partition (zfs filesystem) across and proceeded to create the new pool and zfs file system. My intention was to mount the two and simply copy the data across. What I did do was zpool create to the device where the existing pool (of valid data) was and created a zfs filesystem. [b]When I did the zpool status I realised what I had done and promptly disconnected the hdd from the usb port[/b]. I have googled to see if anyone has successfully recovered data following an inadvertent 'zpool create' without success. The Sun url also says the data cannot be recovered and should be sourced from a backup. I dont have a recent backup (and I guess worse yet - don't know what I will have lost by going back to a 8 monthish old backup). So I guess what I'm hoping is that like other filesystems if the zfs 'superstructures' are removed the data would still be in place and using some of the cached detail perhaps it can be pieced back? I know it's a long shot but as I don't know ZFS well enough, I must ask the question. Some documentation seemed to suggest ZFS would advise if a healthy pool existed before blowing it away (perhaps only if it is mounted? this wasn't imported?). As there is very obviously a risk here it would be a good time to add any possible checks to zpool create. Does anyone have any recovery advise? -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] The importance of ECC RAM for ZFS
The post I read said OpenSolaris guest crashed, and the guy clicked the ``power off guest'' button on the virtual machine. I seem to recall guest hung. 99% of solaris hangs (without a crash dump) are hardware in nature. (my experience backed by an uptime of 1116days) so the finger is still pointed at VirtualBox's hardware implementation. as for ZFS requiring better hardware, you could turn off checksums and other protections so one isn't notified of issues making it act like the others. Rob ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Another user looses his pool (10TB) in this case and 40 days work
the machine hung and I had to power it off. kinda getting off the zpool import --tgx -3 request, but hangs are exceptionally rare and usually ram or other hardware issue, solairs usually abends on software faults. r...@pdm # uptime 9:33am up 1116 day(s), 21:12, 1 user, load average: 0.07, 0.05, 0.05 r...@pdm # date Mon Jul 20 09:33:07 EDT 2009 r...@pdm # uname -a SunOS pdm 5.9 Generic_112233-12 sun4u sparc SUNW,Ultra-250 Rob ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Understanding SAS/SATA Backplanes and Connectivity
c4 scsi-bus connectedconfigured unknown c4::dsk/c4t15d0disk connectedconfigured unknown : c4::dsk/c4t33d0disk connectedconfigured unknown c4::es/ses0ESI connectedconfigured unknown thanks! so SATA disks show up JBOD in IT mode.. Is there some magic that load balances the 4 SAS ports as this shows up as one scsi-bus? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS write I/O stalls
CPU is smoothed out quite a lot yes, but the area under the CPU graph is less, so the rate of real work performed is less, so the entire job took longer. (allbeit smoother) Rob ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS and Dinamic Stripe
try to be spread across different vdevs. % zpool iostat -v capacity operationsbandwidth pool used avail read write read write -- - - - - - - z686G 434G 40 5 2.46M 271K c1t0d0s7 250G 194G 14 1 877K 94.2K c1t1d0s7 244G 200G 15 2 948K 96.5K c0d0 193G 39.1G 10 1 689K 80.2K note that c0d0 is basically full, but still serving 10 of every 15 reads, and 82% of the writes. Rob ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] BugID formally known as 6746456
This appears to be the fix related to the ACL's which they seem to throw all of the ASSERT panics in zfs_fuid.c under even if they have nothing to do with ACL's; my case being one of those. Thanks for the pointer though! -Rob -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] BugID formally known as 6746456
Does anyone know if related problems to the panic's dismissed as duplicate of 6746456 ever resulted in Solaris 10 patches? It sounds like they were actually solved in OpenSolaris but S10 is still panicing predictably when Linux NFS clients try to change a nobody UID/GID on a ZFS exported filesystem. Specifically the NFS induced panics related to the nobody id not mapping correctly, or, more precisely, attempts to change user/group ID nobody causing S10u7 to blow chunks in zfs_fuid.c zfs_fuid_table_load's ASSERT? While the workaround to change the id's on the server is possible, it pretty much torpedo's management's view of Solaris' stability and sending fileserver duty back to Linux... :( Anybody could create a nobody file and put the system into endless boot-loops without this being patched. I'm hoping further work on this issue was done on the S10 side of the house and there is a stealthy patch ID that can fix the issue. Thanks, -Rob -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] problems with l2arc in 2009.06
correct ratio of arc to l2arc? from http://blogs.sun.com/brendan/entry/l2arc_screenshots It costs some DRAM to reference the L2ARC, at a rate proportional to record size. For example, it currently takes about 15 Gbytes of DRAM to reference 600 Gbytes of L2ARC - at an 8 Kbyte ZFS record size. If you use a 16 Kbyte record size, that cost would be halve - 7.5 Gbytes. This means you shouldn't, for example, configure a system with only 8 Gbytes of DRAM, 600 Gbytes of L2ARC, and an 8 Kbyte record size - if you did, the L2ARC would never fully populate. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Replacing HDD with larger HDD..
zpool offline grow /var/tmp/disk01 zpool replace grow /var/tmp/disk01 /var/tmp/bigger_disk01 one doesn't need to offline before the replace, so as long as you have one free disk interface one can cfgadm -c configure sata0/6 each disk as you go... or you can offline and cfgadm each disk in the same port too as you go. It is still the same size. I would expect it to go to 9G. a reboot or export/import would have fixed this. cannot import 'grow': no such pool available you meant to type zpool import -d /var/tmp grow Rob ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] RAIDZ2: only half the read speed?
How does one look at the disk traffic? iostat -xce 1 OpenSolaris, raidz2 across 8 7200 RPM SATA disks: 17179869184 bytes (17 GB) copied, 127.308 s, 135 MB/s OpenSolaris, flat pool across the same 8 disks: 17179869184 bytes (17 GB) copied, 61.328 s, 280 MB/s one raidz2 set of 8 disks can't be faster than the slowest disk in the set as its one vdev... I would have expected the 8 vdev set to be 8x faster than the single raidz[12] set, but like Richard said, there is another bottle neck in there that iostat will show. Rob ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] SAS 15K drives as L2ARC
use a bunch of 15K SAS drives as L2ARC cache for several TBs of SATA disks? perhaps... depends on the workload, and if the working set can live on the L2ARC used mainly as astronomical images repository hmm, perhaps two trays of 1T SATA drives all mirrors rather than raidz sets of one tray. ie: pls don't discount how one arranges the vdev in a given configuration. Rob ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] zpool import crash, import degraded mirror?
When I type `zpool import` to see what pools are out there, it gets to /1: open(/dev/dsk/c5t2d0s0, O_RDONLY) = 6 /1: stat64(/usr/local/apache2/lib/libdevid.so.1, 0x08042758) Err#2 ENOENT /1: stat64(/usr/lib/libdevid.so.1, 0x08042758)= 0 /1: d=0x02D90002 i=241208 m=0100755 l=1 u=0 g=2 sz=61756 /1: at = Apr 29 23:41:17 EDT 2009 [ 1241062877 ] /1: mt = Apr 27 01:45:19 EDT 2009 [ 124089 ] /1: ct = Apr 27 01:45:19 EDT 2009 [ 124089 ] /1: bsz=61952 blks=122 fs=zfs /1: resolvepath(/usr/lib/libdevid.so.1, /lib/libdevid.so.1, 1023) = 18 /1: open(/usr/lib/libdevid.so.1, O_RDONLY)= 7 /1: mmapobj(7, 0x0002, 0xFEC70640, 0x080427C4, 0x) = 0 /1: close(7)= 0 /1: memcntl(0xFEC5, 4048, MC_ADVISE, MADV_WILLNEED, 0, 0) = 0 /1: fxstat(2, 6, 0x080430C0)= 0 /1: d=0x04A0 i=5015 m=0060400 l=1 u=0 g=0 rdev=0x01800340 /1: at = Nov 19 21:19:26 EST 2008 [ 1227147566 ] /1: mt = Nov 19 21:19:26 EST 2008 [ 1227147566 ] /1: ct = Apr 29 23:23:11 EDT 2009 [ 1241061791 ] /1: bsz=8192 blks=1 fs=devfs /1: modctl(MODSIZEOF_DEVID, 0x01800340, 0x080430BC, 0xFEC51239, 0xFE8E92C0) = 0 /1: modctl(MODGETDEVID, 0x01800340, 0x0038, 0x080D5A48, 0xFE8E92C0) = 0 /1: fxstat(2, 6, 0x080430C0)= 0 /1: d=0x04A0 i=5015 m=0060400 l=1 u=0 g=0 rdev=0x01800340 /1: at = Nov 19 21:19:26 EST 2008 [ 1227147566 ] /1: mt = Nov 19 21:19:26 EST 2008 [ 1227147566 ] /1: ct = Apr 29 23:23:11 EDT 2009 [ 1241061791 ] /1: bsz=8192 blks=1 fs=devfs /1: modctl(MODSIZEOF_MINORNAME, 0x01800340, 0x6000, 0x080430BC, 0xFE8E92C0) = 0 /1: modctl(MODGETMINORNAME, 0x01800340, 0x6000, 0x0002, 0x0808FFC8) = 0 /1: close(6)= 0 /1: ioctl(3, ZFS_IOC_POOL_STATS, 0x08042220)= 0 and then the machine dies consistently with: panic[cpu1]/thread=ff01d045a3a0: BAD TRAP: type=e (#pf Page fault) rp=ff000857f4f0 addr=260 occurred in module unix due to a NULL pointer dereference zpool: #pf Page fault Bad kernel fault at addr=0x260 pid=576, pc=0xfb854e8b, sp=0xff000857f5e8, eflags=0x10246 cr0: 8005003bpg,wp,ne,et,ts,mp,pe cr4: 6f8xmme,fxsr,pge,mce,pae,pse,de cr2: 260 cr3: 12b69 cr8: c rdi: 260 rsi:4 rdx: ff01d045a3a0 rcx:0 r8: 40 r9:21ead rax:0 rbx:0 rbp: ff000857f640 r10: bf88840 r11: ff01d041e000 r12:0 r13: 260 r14:4 r15: ff01ce12ca28 fsb:0 gsb: ff01ce985ac0 ds: 4b es: 4b fs:0 gs: 1c3 trp:e err:2 rip: fb854e8b cs: 30 rfl:10246 rsp: ff000857f5e8 ss: 38 ff000857f3d0 unix:die+dd () ff000857f4e0 unix:trap+1752 () ff000857f4f0 unix:cmntrap+e9 () ff000857f640 unix:mutex_enter+b () ff000857f660 zfs:zio_buf_alloc+2c () ff000857f6a0 zfs:arc_get_data_buf+173 () ff000857f6f0 zfs:arc_buf_alloc+a2 () ff000857f770 zfs:dbuf_read_impl+1b0 () ff000857f7d0 zfs:dbuf_read+fe () ff000857f850 zfs:dnode_hold_impl+d9 () ff000857f880 zfs:dnode_hold+2b () ff000857f8f0 zfs:dmu_buf_hold+43 () ff000857f990 zfs:zap_lockdir+67 () ff000857fa20 zfs:zap_lookup_norm+55 () ff000857fa80 zfs:zap_lookup+2d () ff000857faf0 zfs:dsl_pool_open+91 () ff000857fbb0 zfs:spa_load+696 () ff000857fc00 zfs:spa_tryimport+95 () ff000857fc40 zfs:zfs_ioc_pool_tryimport+3e () ff000857fcc0 zfs:zfsdev_ioctl+10b () ff000857fd00 genunix:cdev_ioctl+45 () ff000857fd40 specfs:spec_ioctl+83 () ff000857fdc0 genunix:fop_ioctl+7b () ff000857fec0 genunix:ioctl+18e () ff000857ff10 unix:brand_sys_sysenter+1e6 () the offending disk, c5t2d0s0, is part of a mirror that if removed I can see the results (from the other mirror half) and the machine does not crash. all 8 labels look diff perfect version=13 name='r' state=0 txg=2110897 pool_guid=10861732602511278403 hostid=13384243 hostname='nas' top_guid=6092190056527819247 guid=16682108003687674581 vdev_tree type='mirror' id=0 guid=6092190056527819247 whole_disk=0 metaslab_array=23 metaslab_shift=31 ashift=9 asize=320032473088 is_log=0 children[0] type='disk' id=0 guid=16682108003687674581 path='/dev/dsk/c5t2d0s0'
Re: [zfs-discuss] Motherboard for home zfs/solaris file server
Not. Intel decided we don't need ECC memory on the Core i7 I thought that was a Core i7 vs Xeon E55xx for socket LGA-1366 so that's why this X58 MB claims ECC support: http://supermicro.com/products/motherboard/Xeon3000/X58/X8SAX.cfm ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Is Disabling ARC on SolarisU4 possible?
Thanks Nathan, I want to test the underlying performance, of course the problem is I want to test the 16 or so disks in the stripe, rather than individual devices. Thanks Rob On 28/01/2009 22:23, Nathan Kroenert nathan.kroen...@sun.com wrote: Also - My experience with a very small ARC is that your performance will stink. ZFS is an advanced filesystem that IMO makes some assumptions about capability and capacity of current hardware. If you don't give what it's expecting, your results may be equally unexpected. If you are keen to test the *actual* disk performance, you should just use the underlying disk device like /dev/rdsk/c0t0d0s0 Beware, however, that any writes to these devices will indeed result in the loss of the data on those devices, zpools or other. Cheers. Nathan. Richard Elling wrote: Rob Brown wrote: Afternoon, In order to test my storage I want to stop the cacheing effect of the ARC on a ZFS filesystem. I can do similar on UFS by mounting it with the directio flag. No, not really the same concept, which is why Roch wrote http://blogs.sun.com/roch/entry/zfs_and_directio I saw the following two options on a nevada box which presumably control it: primarycache secondarycache Yes, to some degree this offers some capability. But I don't believe they are in any release of Solaris 10. -- richard But I¹m running Solaris 10U4 which doesn¹t have them -can I disable it? Many thanks Rob *|* *Robert Brown - **ioko *Professional Services *| | **Mobile:* +44 (0)7769 711 885 *| * ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- // // Nathan Kroenert nathan.kroen...@sun.com // // Systems Engineer Phone: +61 3 9869-6255 // // Sun Microsystems Fax:+61 3 9869-6288 // // Level 7, 476 St. Kilda Road Mobile: 0419 305 456// // Melbourne 3004 VictoriaAustralia // // | Robert Brown - ioko Professional Services | | Mobile: +44 (0)7769 711 885 | ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Is Disabling ARC on SolarisU4 possible?
Afternoon, In order to test my storage I want to stop the cacheing effect of the ARC on a ZFS filesystem. I can do similar on UFS by mounting it with the directio flag. I saw the following two options on a nevada box which presumably control it: primarycache secondarycache But I¹m running Solaris 10U4 which doesn¹t have them -can I disable it? Many thanks Rob | Robert Brown - ioko Professional Services | | Mobile: +44 (0)7769 711 885 | ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Practical Application of ZFS
ZFS is the bomb. It's a great file system. What are it's real world applications besides solaris userspace? What I'd really like is to utilize the benefits of ZFS across all the platforms we use. For instance, we use Microsoft Windows Servers as our primary platform here. How might I utilize ZFS to protect that data? The only way I can visualize doing so would be to virtualize the windows server and store it's image in a ZFS pool. That would add additional overhead but protect the data at the disk level. It would also allow snapshots of the Windows Machine's virtual file. However none of these benefits would protect Windows from hurting it's own data, if you catch my meaning. Obviously ZFS is ideal for large databases served out via application level or web servers. But what other practical ways are there to integrate the use of ZFS into existing setups to experience it's benefits. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Practical Application of ZFS
I am not experienced with iSCSI. I understand it's block level disk access via TCP/IP. However I don't see how using it eliminates the need for virtualization. Are you saying that a Windows Server can access a ZFS drive via iSCSI and store NTFS files? -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Practical Application of ZFS
Wow. I will read further into this. That seems like it could have great applications. I assume the same is true of FCoE? -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] SMART data
the sata framework uses the sd driver so its: 4 % smartctl -d scsi -a /dev/rdsk/c4t2d0s0 smartctl version 5.36 [i386-pc-solaris2.8] Copyright (C) 2002-6 Bruce Allen Home page is http://smartmontools.sourceforge.net/ Device: ATA WDC WD1001FALS-0 Version: 0K05 Serial number: Device type: disk Local Time is: Mon Dec 8 15:14:22 2008 EST Device supports SMART and is Enabled Temperature Warning Disabled or Not Supported SMART Health Status: OK Current Drive Temperature: 45 C Error Counter logging not supported No self-tests have been logged 5 % /opt/SUNWhd/hd/bin/hd -e c4t2 Revision: 16 Offline status 132 Selftest status 0 Seconds to collect 19200 Time in minutes to run short selftest 2 Time in minutes to run extended selftest 221 Offline capability 123 SMART capability 3 Error logging capability 1 Checksum 0x86 Identification Status Current Worst Raw data 1 Raw read error rate0x2f 200 2000 3 Spin up time 0x27 253 253 6216 4 Start/Stop count 0x32 100 100 11 5 Reallocated sector count 0x33 200 2000 7 Seek error rate0x2e 100 2530 9 Power on hours count 0x32 100 100 446 10 Spin retry count 0x32 100 2530 11 Recalibration Retries count0x32 100 2530 12 Device power cycle count 0x32 100 100 11 192 Power off retract count0x32 200 200 10 193 Load cycle count 0x32 200 200 11 194 Temperature0x22 105 103 45/ 0/ 0 (degrees C cur/min/max) 196 Reallocation event count 0x32 200 2000 197 Current pending sector count 0x32 200 2000 198 Scan uncorrected sector count 0x30 200 2000 199 Ultra DMA CRC error count 0x32 200 2000 200 Write/Multi-Zone Error Rate0x8200 2000 http://www.opensolaris.org/jive/thread.jspa?threadID=84296 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs iscsi sustained write performance
(with iostat -xtc 1) it sure would be nice to know if actv 0 so we would know if the lun was busy because its queue is full or just slow (svc_t 200) for tracking errors try `iostat -xcen 1` and `iostat -E` Rob ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Is SUNWhd for Thumper only?
(http://cuddletech.com/blog/pivot/entry.php?id=993). Will the SUNWhd can't dump all SMART data, but get some temps on a generic box.. 4 % hd -a fdisk DeviceSerialVendor Model Rev Temperature Type ------ - --- c3t0d0p0ATA ST3750640AS K255 C (491 F) EFI c3t1d0p0ATA ST3750640AS K255 C (491 F) EFI c3t2d0p0ATA ST3750640AS K255 C (491 F) EFI c3t4d0p0ATA ST3750640AS K255 C (491 F) EFI c3t5d0p0ATA ST3750640AS K255 C (491 F) EFI c4t0d0p0ATA WDC WD1001FALS-0 0K05 43 C (109 F) EFI c4t1d0p0ATA WDC WD1001FALS-0 0K05 43 C (109 F) EFI c4t2d0p0ATA WDC WD1001FALS-0 0K05 43 C (109 F) EFI c4t4d0p0ATA WDC WD1001FALS-0 0K05 42 C (107 F) EFI c4t5d0p0ATA WDC WD1001FALS-0 0K05 43 C (109 F) EFI c5t0d0p0 TSSTcorp CD/DVDW SH-S162A TS02 None None c5t1d0p0ATA WDC WD3200JD-00K 5J08 0 C (32 F) Solaris2 c5t2d0p0ATA WDC WD3200JD-00K 5J08 0 C (32 F) Solaris2 c5t3d0p0ATA WDC WD3200JD-00K 5J08 0 C (32 F) Solaris2 c5t4d0p0ATA WDC WD3200JD-00K 5J08 0 C (32 F) Solaris2 c5t5d0p0ATA WDC WD3200JD-00K 5J08 0 C (32 F) Solaris2 Do you know of a solaris tool to get SMART data? Rob ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ? SX:CE snv_91 - ZFS - raid and mirror - drive sizes don't add correctl
Bump. Some of the threads on this were last posted to over a year ago. I checked 6485689 and it is not fixed yet, is there any work being done in this area? Thanks, Rob There may be some work being done to fix this: zpool should support raidz of mirrors http://bugs.opensolaris.org/bugdatabase/view_bug.do?bu g_id=6485689 Discussed in this thread: Mirrored Raidz ( Posted: Oct 19, 2006 9:02 PM ) http://opensolaris.org/jive/thread.jspa?threadID=15854 tstart=0 The suggested solution (by jone http://opensolaris.org/jive/thread.jspa?messageID=6627 9 ) is: # zpool create a1pool raidz c0t0d0 c0t1d0 c0t2d0 .. # zpool create a2pool raidz c1t0d0 c1t1d0 c1t2d0 .. # zfs create -V a1pool/vol # zfs create -V a2pool/vol # zpool create mzdata mirror /dev/zvol/dsk/a1pool/vol /dev/zvol/dsk/a2pool/vol -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Still more questions WRT selecting a mobo for small ZFS RAID
WD Caviar Black drive [...] Intel E7200 2.53GHz 3MB L2 The P45 based boards are a no-brainer 16G of DDR2-1066 with P45 or 8G of ECC DDR2-800 with 3210 based boards That is the question. Rob ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Inexpensive ZFS home server
I don't think the Pentium E2180 has the lanes to use ECC RAM. look at the north bridge, not the cpu.. the PowerEdge SC440 uses intel 3000 MCH which supports up to 8GB unbuffered ECC or non-ECC DDR2 667/533 SDRAM. its been replaced with the intel 32x0 that uses DDR2 800/667MHz unbuffered ECC / non-ECC SDRAM. http://www.intel.com/products/server/chipsets/3200-3210/3200-3210-overview.htm Rob ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS + OpenSolaris for home NAS?
ECC? $60 unbuffered 4GB 800MHz DDR2 ECC CL5 DIMM (Kit Of 2) http://www.provantage.com/kingston-technology-kvr800d2e5k2-4g~7KIN90H4.htm for Intel 32x0 north bridge like http://www.provantage.com/supermicro-x7sbe~7SUPM11K.htm ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs-auto-snapshot 0.11 work (was Re: zfs-auto-snapshot with at scheduling )
The other changes that will appear in 0.11 (which is nearly done) are: Still looking forward to seeing .11 :) Think we can expect a release soon? (or at least svn access so that others can check out the trunk?) This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] force a reset/reinheit zfs acls?
Hello All! Is there a command to force a re-inheritance/reset of ACLs? e.g., if i have a directory full of folders that have been created with inherited ACLs, and i want to change the ACLs on the parent folder, how can i force a reapply of all ACLs? This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] force a reset/reinheit zfs acls?
Rob wrote: Hello All! Is there a command to force a re-inheritance/reset of ACLs? e.g., if i have a directory full of folders that have been created with inherited ACLs, and i want to change the ACLs on the parent folder, how can i force a reapply of all ACLs? There isn't an easy way to do exactly what you want. That's unfortunate :( How do I go about requesting a feature like this? This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ? SX:CE snv_91 - ZFS - raid and mirror - drive sizes don't add correctl
There may be some work being done to fix this: zpool should support raidz of mirrors http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6485689 Discussed in this thread: Mirrored Raidz ( Posted: Oct 19, 2006 9:02 PM ) http://opensolaris.org/jive/thread.jspa?threadID=15854tstart=0 This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS deduplication
Hi All Is there any hope for deduplication on ZFS ? Mertol Ozyoney Storage Practice - Sales Manager Sun Microsystems Email [EMAIL PROTECTED] There is always hope. Seriously thought, looking at http://en.wikipedia.org/wiki/Comparison_of_revision_control_software there are a lot of choices of how we could implement this. SVN/K , Mercurial and Sun Teamware all come to mind. Simply ;) merge one of those with ZFS. It _could_ be as simple (with SVN as an example) of using directory listings to produce files which were then 'diffed'. You could then view the diffs as though they were changes made to lines of source code. Just add a tree subroutine to allow you to grab all the diffs that referenced changes to file 'xyz' and you would have easy access to all the changes of a particular file (or directory). With the speed optimized ability added to use ZFS snapshots with the tree subroutine to rollback a single file (or directory) you could undo / redo your way through the filesystem. Using a LKCD (http://www.faqs.org/docs/Linux-HOWTO/Linux-Crash-HOWTO.html) you could sit out on the play and watch from the sidelines -- returning to the OS when you thought you were 'safe' (and if not, jumping backout). Thus, Mertol, it is possible (and could work very well). Rob This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ? SX:CE snv_91 - ZFS - raid and mirror - drive
Though possible, I don't think we would classify it as a best practice. -- richard Looking at http://opensolaris.org/os/community/volume_manager/ I see: Supports RAID-0, RAID-1, RAID-5, Root mirroring and Seamless upgrades and live upgrades (that would go nicely with my ZFS root mirror - right). I also don't see that there is a nice GUI for those that desire one ... Looking at http://evms.sourceforge.net/gui_screen/ I see some great screenshots and page http://evms.sourceforge.net/ says it supports: Ext2/3, JFS, ReiserFS, XFS, Swap, OCFS2, NTFS, FAT -- so it might be better to suggest adding ZFS there instead of focusing on non-ZFS solutions in this ZFS discussion group. Rob This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS deduplication
On Tue, 22 Jul 2008, Miles Nordin wrote: scrubs making pools uselessly slow? Or should it be scrub-like so that already-written filesystems can be thrown into the dedup bag and slowly squeezed, or so that dedup can run slowly during the business day over data written quickly at night (fast outside-business-hours backup)? I think that the scrub-like model makes the most sense since ZFS write performance should not be penalized. It is useful to implement score-boarding so that a block is not considered for de-duplication until it has been duplicated a certain number of times. In order to decrease resource consumption, it is useful to perform de-duplication over a span of multiple days or multiple weeks doing just part of the job each time around. Deduping a petabyte of data seems quite challenging yet ZFS needs to be scalable to these levels. Bob Friesenhahn In case anyone (other than Bob) missed it, this is why I suggested File-Level Dedup: ... using directory listings to produce files which were then 'diffed'. You could then view the diffs as though they were changes made ... We could have: Block-Level (if we wanted to restore an exact copy of the drive - duplicate the 'dd' command) or Byte-Level (if we wanted to use compression - duplicate the 'zfs set compression=on rpool' _or_ 'bzip' commands) ... etc... assuming we wanted to duplicate commands which already implement those features, and provide more than we (the filesystem) needs at a very high cost (performance). So I agree with your comment about the need to be mindful of resource consumption, the ability to do this over a period of days is also useful. Indeed the Plan9 filesystem simply snapshots to WORM and has no delete - nor are they able to fill their drives faster than they can afford to buy new ones: Venti Filesystem http://www.cs.bell-labs.com/who/seanq/p9trace.html Rob This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ? SX:CE snv_91 - ZFS - raid and mirror - drive
Solaris will allow you to do this, but you'll need to use SVM instead of ZFS. Or, I suppose, you could use SVM for RAID-5 and ZFS to mirror those. -- richard Or run Linux ... Richard, The ZFS Best Practices Guide says not. Do not use the same disk or slice in both an SVM and ZFS configuration. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Adding my own compression to zfs
Robert Milkowski wrote: During christmass I managed to add my own compression to zfs - it as quite easy. Great to see innovation but unless your personal compression method is somehow better (very fast with excellent compression) then would it not be a better idea to use an existing (leading edge) compression method ? 7-Zip's (http://www.7-zip.org/) 'newest' methods are LZMA and PPMD (http://www.7-zip.org/7z.html). There is a proprietary license for LZMA that _might_ interest Sun but PPMD is no explicit license see this link: Using PPMD for compression http://www.codeproject.com/KB/recipes/ppmd.aspx Rob This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] How to delete hundreds of emtpy snapshots
I got overzealous with snapshot creation. Every 5 mins is a bad idea. Way too many. What's the easiest way to delete the empty ones? zfs list takes FOREVER You might enjoy reading: ZFS snapshot massacre http://blogs.sun.com/chrisg/entry/zfs_snapshot_massacre. (Yes, the . is part of the URL (NMF) - so add it or you'll 404). Rob This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ? SX:CE snv_91 - ZFS - raid and mirror - drive
-Peter Tribble wrote: On Sun, Jul 6, 2008 at 8:48 AM, Rob Clark wrote: I have eight 10GB drives. ... I have 6 remaining 10 GB drives and I desire to raid 3 of them and mirror them to the other 3 to give me raid security and integrity with mirrored drive performance. I then want to move my /export directory to the new drive. ... You can't do that. You can't layer raidz and mirroring. You'll either have to use raidz for the lot, or just use mirroring: zpool create temparray mirror c1t2d0 c1t4d0 mirror c1t5d0 c1t3d0 mirror c1t6d0 c1t8d0 -Peter Tribble Solaris may not allow me to do that but the concept is not unheard of: Quoting: Proceedings of the Third USENIX Conference on File and Storage Technologies http://www.usenix.org/publications/library/proceedings/fast04/tech/corbett/corbett.pdf Mirrored RAID-4 and RAID-5 protect against higher order failures [4]. However, the efficiency of the array as measured by its data capacity divided by its total disk space is reduced. [4] Qin Xin, E. Miller, T. Schwarz, D. Long, S. Brandt, W. Litwin, ”Reliability mechanisms for very large storage systems”, 20th IEEE/11th NASA Boddard Conference on Mass Storage Systems and Technologies, San Diego, CA, pgs. 146-156, Apr. 2003. Rob This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Raid-Z with N^2+1 disks
On July 14, 2008 7:49:58 PM -0500 Bob Friesenhahn [EMAIL PROTECTED] wrote: With ZFS and modern CPUs, the parity calculation is surely in the noise to the point of being unmeasurable. I would agree with that. The parity calculation has *never* been a factor in and of itself. The problem is having to read the rest of the stripe and then having to wait for a disk revolution before writing. -frank And this is where a HW RAID controller comes in. We hope it has a uP for the calculations, full knowledge of the head positions, and a list of free blocks -- then it simply chooses one of the drives that suit the criteria for the RAID level used and writes immediately to the free block under one of the heads. If only ... Maybe in a few years Sun will make a HW RAID controller using ZFS once we all get the bugs out. With Flash updates this should work wonderfully. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ? SX:CE snv_91 - ZFS - raid and mirror - drive sizes don't add correc
Peter Tribble wrote: Because what you've created is a pool containing two components: - a 3-drive raidz - a 3-drive mirror concatenated together. OK. Seems odd that ZFS would allow that (would people want that configuration instead of what I am attempting to do). I think that what you're trying to do based on your description is to create one raidz and mirror that to another raidz. (Or create a raidz out of mirrored drives.) You can't do that. You can't layer raidz and mirroring. You'll either have to use raidz for the lot, or just use mirroring: zpool create temparray mirror c1t2d0 c1t4d0 mirror c1t5d0 c1t3d0 mirror c1t6d0 c1t8d0 Bummer. Curiously I can get that same odd size with either of these two commands (the second attempt sort of looks like it is raid + mirroring): # zpool create temparray1 mirror c1t2d0 c1t4d0 mirror c1t3d0 c1t5d0 mirror c1t6d0 c1t8d0 # zpool status pool: rpool state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM rpool ONLINE 0 0 0 mirror ONLINE 0 0 0 c1t0d0s0 ONLINE 0 0 0 c1t1d0s0 ONLINE 0 0 0 errors: No known data errors pool: temparray1 state: ONLINE scrub: none requested config: NAMESTATE READ WRITE CKSUM temparray1 ONLINE 0 0 0 mirrorONLINE 0 0 0 c1t2d0 ONLINE 0 0 0 c1t4d0 ONLINE 0 0 0 mirrorONLINE 0 0 0 c1t3d0 ONLINE 0 0 0 c1t5d0 ONLINE 0 0 0 mirrorONLINE 0 0 0 c1t6d0 ONLINE 0 0 0 c1t8d0 ONLINE 0 0 0 errors: No known data errors # zfs list NAMEUSED AVAIL REFER MOUNTPOINT rpool 4.36G 5.42G35K /rpool rpool/ROOT 3.09G 5.42G18K legacy rpool/ROOT/snv_91 3.09G 5.42G 3.01G / rpool/ROOT/snv_91/var 84.5M 5.42G 84.5M /var rpool/dump 640M 5.42G 640M - rpool/export 14.0M 5.42G19K /export rpool/export/home 14.0M 5.42G 14.0M /export/home rpool/swap 640M 6.05G16K - temparray1 92.5K 29.3G 1K /temparray1 # zpool destroy temparray1 And the pretty one: # zpool create temparray raidz c1t2d0 c1t4d0 raidz c1t3d0 c1t5d0 raidz c1t6d0 c1t8d0 # zpool status pool: rpool state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM rpool ONLINE 0 0 0 mirror ONLINE 0 0 0 c1t0d0s0 ONLINE 0 0 0 c1t1d0s0 ONLINE 0 0 0 errors: No known data errors pool: temparray state: ONLINE scrub: none requested config: NAMESTATE READ WRITE CKSUM temparray ONLINE 0 0 0 raidz1ONLINE 0 0 0 c1t2d0 ONLINE 0 0 0 c1t4d0 ONLINE 0 0 0 raidz1ONLINE 0 0 0 c1t3d0 ONLINE 0 0 0 c1t5d0 ONLINE 0 0 0 raidz1ONLINE 0 0 0 c1t6d0 ONLINE 0 0 0 c1t8d0 ONLINE 0 0 0 errors: No known data errors # zfs list NAMEUSED AVAIL REFER MOUNTPOINT rpool 4.36G 5.42G35K /rpool rpool/ROOT 3.09G 5.42G18K legacy rpool/ROOT/snv_91 3.09G 5.42G 3.01G / rpool/ROOT/snv_91/var 84.6M 5.42G 84.6M /var rpool/dump 640M 5.42G 640M - rpool/export 14.0M 5.42G19K /export rpool/export/home 14.0M 5.42G 14.0M /export/home rpool/swap 640M 6.05G16K - temparray94K 29.3G 1K /temparray # zpool destroy temparray That second attempt leads this newcommer to imagine that they have 3 raid drives mirrored to 3 raid drives. Is there a way to get mirror performance (double speed) with raid integrity (one drive can fail and you are OK)? I can't imagine that there exists no one who would want that configuration. Thanks for your comment Peter. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs equivalent of ufsdump and ufsrestore
I'd like to take a backup of a live filesystem without modifying the last accessed time. why not take a snapshot? Rob ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss