[zfs-discuss] zpool mirror (dumb question)
Hi there! I am new to the list, and to OpenSolaris, as well as ZPS. I am creating a zpool/zfs to use on my NAS server, and basically I want some redundancy for my files/media. What I am looking to do, is get a bunch of 2TB drives, and mount them mirrored, and in a zpool so that I don't have to worry about running out of room. (I know, pretty typical I guess). My problem is, is that not all 2TB hard drives are the same size (even though they should be 2 trillion bytes, there is still sometimes a +/- (I've only noticed this 2x so far) ) and if I create them mirrored, and one fails, and then I replace the drive, and for some reason, it is 1byte smaller, it will not work. How would I go about fixing this "problem"? THIS is just a thought, I am looking for thoughts and opinions on doing this... it prolly would be a bad idea, but hey, does it hurt to ask? I have been thinking, and would it be a good idea, to have on the 2TB drives, say 1TB or 500GB "files" and then mount them as mirrored? So basically, have a 2TB hard drive, set up like: (where drive1 and drive2 are the paths to the mount points) Mkfile 465gb /drive1/drive1part1 Mkfile 465gb /drive1/drive1part2 Mkfile 465gb /drive1/drive1part3 Mkfile 465gb /drive1/drive1part4 Mkfile 465gb /drive2/drive2part1 Mkfile 465gb /drive2/drive2part2 Mkfile 465gb /drive2/drive2part3 Mkfile 465gb /drive2/drive2part4 (I use 465gb, as 2TB = 2trillion bytes, / 4 = 465.66 gb) And then add them to the zpool Zpool add medianas mirror /drive1/drive1part1 /drive2/drive2/part1 Zpool add medianas mirror /drive1/drive1part2 /drive2/drive2/part2 Zpool add medianas mirror /drive1/drive1part3 /drive2/drive2/part3 Zpool add medianas mirror /drive1/drive1part4 /drive2/drive2/part4 And then, if a drive goes and I only have a 500gb and a 1.5tb drives, they could be replaced that way? I am sure there are performance issues in doing this, but would the performance outweigh the possibility of hard drive failure and replacing drives? Sorry for posting a novel, but I am just concerned about failure on bigger drives, and putting my media/files into basically what consists of a JBOD type array (on steroids). Steve ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Best practice for full stystem backup - equivelent of ufsdump/ufsrestore
> From: Bob Friesenhahn [mailto:bfrie...@simple.dallas.tx.us] > Sent: Saturday, May 01, 2010 7:07 PM > > On Sat, 1 May 2010, Peter Tribble wrote: > >> > >> With the new Oracle policies, it seems unlikely that you will be > able to > >> reinstall the OS and achieve what you had before. > > > > And what policies have Oracle introduced that mean you can't > reinstall > > your system? > > The main concern is that you might not be able to get back the same OS > install you had before due to loss of patch access after your service > contract has expired and Oracle arbitrarily decided not to grant you a > new one. It's as if you didn't even read this thread. In the proposed answers to Euan's question, there is no need to apply any patches, or to have any service contract. As long as you still have your OS install CD, or *any* OS install CD, you install a throw-away OS, just for the sake of letting the installer create the partitions, boot record, boot properties, etc... And then you immediately obliterate and overwrite rpool, using your backup image. Since this restoration process puts the filesystem back into the exact state it was before failure ... All the patches you previously had are restored, and everything is restored just as it was before crash. There is nothing anywhere which indicates any reason you couldn't do this, even in the future. So you're totally spreading BS on this one. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Single-disk pool corrupted after controller failure
On 05/01/2010 06:07 PM, Bill Sommerfeld wrote: > there are two reasons why you could get this: > 1) the labels are gone. Possible, since I got the metadata errors on `zfs status` before. > 2) the labels are not at the start of what solaris sees as p1, and thus > are somewhere else on the disk. I'd look more closely at how freebsd > computes the start of the partition or slice '/dev/ad6s1d' > that contains the pool. > > I think #2 is somewhat more likely. c5d0p1 is the only place where zdb finds any labels at all too... ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Virtual to physical migration
This actually turned out be a lot of fun! The end of it is that I have a hard disk partition now which can boot in both physical and virtual world (got rid of the VDIs finally!). The physical world has outstanding performance but has ugly graphics (1600x1200 vesa driver with weird DPI and fonts...ughhh) because ATI drivers don't work in Opensolaris for my card. Virtualbox gives me better graphics than my real install. This is a bit of a painful pill to swallow! One thing I tested right away in the physical install was to see how portage performance was compared to Linux. This is a sort of test of the scheduler as well as small file performance of the FS. OpenSolaris emerged python-2.6.5 in 45 seconds compared to 55 seconds in Linux, cmake took 38 seconds vs 53 seconds in Linux. In general, a portage operation (like emerge -pv ) completes much much faster in OpenSolaris. Its not to say OpenSolaris doesn't have issues. I notice short freezes (keyboard/mouse and gkrellm updates) lasting few seconds (sometimes more) during FS activity. Firefox restart (like after an add-on install) takes forever whereas it should be instant because all of it is in memory already. I would like to troubleshoot these sometime using DTrace. Lastly, OpenSolaris is memory hungry! It crawls without it. In the VM, I have it using 1.5GB and pkg manager alone can eat more than half of it, throwing everything else into swap. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS Root Permissions Create Pool With Zpool
When I first started using ZFS I tried to create a pool from my disks /dev/c8d1 and /dev/c8d1 . I could see the slices though. I could not see those disks with out being root and all though I get it ZFS didnt. It could not find the disks and did not tell me I needed to be root. That is all... Web... ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Best practice for full stystem backup - equivelent of ufsdump/ufsrestore
On Sat, 1 May 2010, Peter Tribble wrote: With the new Oracle policies, it seems unlikely that you will be able to reinstall the OS and achieve what you had before. And what policies have Oracle introduced that mean you can't reinstall your system? The main concern is that you might not be able to get back the same OS install you had before due to loss of patch access after your service contract has expired and Oracle arbitrarily decided not to grant you a new one. Maybe if you are able to overwrite the pool with the original pristine state rather than rely on an "install", then you would be ok. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Best practice for full stystem backup - equivelent of ufsdump/ufsrestore
On 05/ 1/10 04:46 PM, Edward Ned Harvey wrote: One more really important gotcha. Let's suppose the version of zfs on the CD supports up to zpool 14. Let's suppose your "live" system had been fully updated before crash, and let's suppose the zpool had been upgraded to zpool 15. Wouldn't that mean it's impossible to restore your rpool using the CD? Just make sure you have an up to date live CD when you upgrade your pool. It's seldom wise to upgrade a pool too quickly after an OS upgrade, you may find an issue and have to revert back to a previous BE. -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Best practice for full stystem backup - equivelent of ufsdump/ufsrestore
On Fri, Apr 30, 2010 at 6:39 PM, Bob Friesenhahn wrote: > On Thu, 29 Apr 2010, Edward Ned Harvey wrote: >> >> This is why I suggested the technique of: >> Reinstall the OS just like you did when you first built your machine, >> before >> the catastrophy. It doesn't even matter if you make the same selections >> you > > With the new Oracle policies, it seems unlikely that you will be able to > reinstall the OS and achieve what you had before. And what policies have Oracle introduced that mean you can't reinstall your system? -- -Peter Tribble http://www.petertribble.co.uk/ - http://ptribble.blogspot.com/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Single-disk pool corrupted after controller failure
On 05/01/10 13:06, Diogo Franco wrote: After seeing that on some cases labels were corrupted, I tried running zdb -l on mine: ... (labels 0, 1 not there, labels 2, 3 are there). I'm looking for pointers on how to fix this situation, since the disk still has available metadata. there are two reasons why you could get this: 1) the labels are gone. 2) the labels are not at the start of what solaris sees as p1, and thus are somewhere else on the disk. I'd look more closely at how freebsd computes the start of the partition or slice '/dev/ad6s1d' that contains the pool. I think #2 is somewhat more likely. - Bill ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Performance drop during scrub?
On Fri, 30 Apr 2010, Freddie Cash wrote: Without a periodic scrub that touches every single bit of data in the pool, how can you be sure that 10-year files that haven't been opened in 5 years are still intact? You don't. But it seems that having two or three extra copies of the data on different disks should instill considerable confidence. With sufficient redundancy, chances are that the computer will explode before it loses data due to media corruption. The calculated time before data loss becomes longer than even the pyramids in Egypt could withstand. The situation becomes similar to having a house with a heavy front door with three deadbolt locks, and many glass windows. The front door with its three locks is no longer a concern when you are evaluating your home for its security against burglary or home invasion because the glass windows are so fragile and easily broken. It is necessary to look at all the factors which might result in data loss before deciding what the most effective steps are to minimize the probability of loss. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS: "Cannot replace a replacing drive"
On Apr 29, 2010, at 2:20 AM, Freddie Cash wrote: > On Wed, Apr 28, 2010 at 2:48 PM, Victor Latushkin > wrote: > > 2. Run 'zdb -ddd storage' and provide section titles Dirty Time Logs > > See attached. So you really do have enough redundancy to be able to handle this scenario, so this is software bug. On recent OpenSolaris build you should be able to detach one of the devices, and replace second one. Version 14 corresponds to build 103, and spa_vdev_detach() was changed significantly in build 105 (along with other related changes), so those changes are probably not yet available in FreeBSD. > > 3. Try 'zpool detach' approach on v14 system? > > Pool upgraded successfully. Btw, you do not have to upgrade pool immediately along with upgrade to newer ZFS version. Now you cannot use upgraded pool on system running older bits. > Same results to all the zpool commands, though: online, offline, detach, > replace. Ok, i see. I mistakenly thought that v14 is for user/group quotas which means build 114, but i was wrong - user/group quotas require version 15. > Another option may be to try latest OpenSolaris LiveCD (build 134). > > I'll have to see if I can download/make one. > > Does it include drivers for 3Ware 9550SXU and 9650SE RAID controllers? All > the drives are plugged into those. I do not know for sure, but quick check with google suggests that chances are good. victor ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Single-disk pool corrupted after controller failure
I had a single spare 500GB HDD and I decided to install a FreeBSD file server in it for learning purposes, and I moved almost all of my data to it. Yesterday, and naturally after no longer having backups of the data in the server, I had a controller failure (SiS 180 (oh, the quality)) and the HDD was considered unplugged. When I noticed a few checksum failures on `zfs status` (including two on metadata (small hex numbers)), I tried running `zfs scrub tank`, thinking it was a regular data corruption and then the box locked up. I had also converted the pool to v14 a few days before, so the freebsd v13 tools couldn't do anything to help. Today I downloaded the OpenSolaris 134 snapshot image and booted it to try and rescue the pool, but: # zpool status no pools available So I couldn't run a clean or an export or destroy to reimport with -D. I tried to run a regular import: # zpool import pool: tank id: 6157028625215863355 state: FAULTED status: The pool was last accessed by another system. action: The pool cannot be imported due to damaged devices or data. The pool may be active on another system, but can be imported using the '-f' flag. see: http://www.sun.com/msg/ZFS-8000-EY config: tankFAULTED corrupted data c5d0p1UNAVAIL corrupted data There was no important data written in the past two days or so, thus using an older uberblock would't be a problem, so I tried using the new recovery option: # mkdir -p /mnt/tank && zpool import -fF -R /mnt/tank tank cannot import 'tank': one or more devices is currently unavailable Destroy and re-create the pool from a backup source. I tried googling for other people with similar issues, but almost all of them had raids and other complex configuration and were not really related to this problem. After seeing that on some cases labels were corrupted, I tried running zdb -l on mine: # zdb -l /dev/dsk/c5d0p1 LABEL 0 failed to unpack label 0 LABEL 1 failed to unpack label 1 LABEL 2 version: 14 name: 'tank' state: 0 txg: 11420324 pool_guid: 6157028625215863355 hostid: 2563111091 hostname: '' top_guid: 1987270273092463401 guid: 1987270273092463401 vdev_tree: type: 'disk' id: 0 guid: 1987270273092463401 path: '/dev/ad6s1d' whole_disk: 0 metaslab_array: 23 metaslab_shift: 32 ashift: 9 asize: 497955373056 is_log: 0 DTL: 111 LABEL 3 version: 14 name: 'tank' state: 0 txg: 11420324 pool_guid: 6157028625215863355 hostid: 2563111091 hostname: '' top_guid: 1987270273092463401 guid: 1987270273092463401 vdev_tree: type: 'disk' id: 0 guid: 1987270273092463401 path: '/dev/ad6s1d' whole_disk: 0 metaslab_array: 23 metaslab_shift: 32 ashift: 9 asize: 497955373056 is_log: 0 DTL: 111 I'm looking for pointers on how to fix this situation, since the disk still has available metadata. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS on ZFS based storage?
On Sat, May 1, 2010 at 7:08 AM, Gabriele Bulfon wrote: > My question is: > - is it correct to mount the iScsi device as base disks for the VM and then > create zpools/volumes in it, considering that behind it there is already > another zfs? Yes, that will work fine. In fact, zfs checksums will help protect from over the wire errors. You can enable redundancy at either or both levels, depending on performance requirements, available space and your level of paranoia. Using mirroring or raidz in your VM will use more bandwidth to your iscsi server. > - in case it's correct to have the VM zfs over the storage zfs, where should > I manage snapshots? on the VM or on the storage? It's up to what you plan on doing with the VM. I'd probably do both, depending on the changes that I plan on making. For instance, use time slider / zfs-auto-snapshot on the VM, but also snapshot the zvol on the backing store before making any big configuration changes. -B -- Brandon High : bh...@freaks.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Reverse lookup: inode to name lookup
> If the kernel (or root) can open an arbitrary directory by inode number, > then the kernel (or root) can find the inode number of its parent by looking > at the '..' entry, which the kernel (or root) can then open, and identify > both: the name of the child subdir whose inode number is already known, and > (b) yet another '..' entry. The kernel (or root) can repeat this process > recursively, up to the root of the filesystem tree. At that time, the > kernel (or root) has completely identified the absolute path of the inode > that it started with. > > The only question I want answered right now is: > > Although it is possible, is it implemented? Is there any kind of function, > or existing program, which can be run by root, to obtain either the complete > path of a directory by inode number, or to simply open an inode by number, > which would leave the recursion and absolute path generation yet to be > completed? You can do in the kernel by calling vnodetopath(). I don't know if it is exposed to user space. But that could be slow if you have large directories so you have to think about where you would use it. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Reverse lookup: inode to name lookup
> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- > boun...@opensolaris.org] On Behalf Of Mattias Pantzare > > The nfs server can find the file but not the file _name_. > > inode is all that the NFS server needs, it does not need the file name > if it has the inode number. It is not useful or helpful for you guys to debate whether or not this is possible. And it is especially not helpful to flat out say "it's not possible." Here is the final word on whether or not it's possible: Whenever any process calls "open('/some/path/filename')" that system call is handled by the kernel, recursively resolving name to inode number, checking permissions, and opening that inode number, until the final inode is identified and opened, or some error is encountered. The point is: Obviously, the kernel has the facility to open an inode by number. However, for security reasons (enforcing permissions of parent directories before the parent directories have been identified), the ability to open an arbitrary inode by number is not normally made available to user level applications, except perhaps when run by root. At present, a file inode does not contain any reference to its parent directory or directories. But that's just a problem inherent to files. It is fundamentally easier to reverse lookup a directory by inode number, because this information is already in the filesystem. No filesystem enhancements are needed to reverse lookup a directory by inode number, because: (a) every directory contains an entry ".." which refers to its parent by number, and (b) every directory has precisely one parent, and no more. There is no such thing as a hardlink copy of a directory. Therefore, there is exactly one absolute path to any directory in any ZFS filesystem. If the kernel (or root) can open an arbitrary directory by inode number, then the kernel (or root) can find the inode number of its parent by looking at the '..' entry, which the kernel (or root) can then open, and identify both: the name of the child subdir whose inode number is already known, and (b) yet another '..' entry. The kernel (or root) can repeat this process recursively, up to the root of the filesystem tree. At that time, the kernel (or root) has completely identified the absolute path of the inode that it started with. The only question I want answered right now is: Although it is possible, is it implemented? Is there any kind of function, or existing program, which can be run by root, to obtain either the complete path of a directory by inode number, or to simply open an inode by number, which would leave the recursion and absolute path generation yet to be completed? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs log on another zfs pool
What problem are you trying to solve? On 1 May 2010, at 02:18, Tuomas Leikola wrote: Hi. I have a simple question. Is it safe to place log device on another zfs disk? I'm planning on placing the log on my mirrored root partition. Using latest opensolaris. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Best practice for full stystem backup - equivelent of ufsdump/ufsrestore
On Sat, 1 May 2010, Edward Ned Harvey wrote: Would that be fuel to recommend people, "Never upgrade your version of zpool or zfs on your rpool?" It does seem to be a wise policy to not update the pool and filesystem versions unless you require a new pool or filesystem feature. Then you would update to the minimum version required to support that feature. Note that if the default filesystem version changes and you create a new filesystem, this may also cause problems (I have been bit by that before). Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Reverse lookup: inode to name lookup
On Sat, May 1, 2010 at 16:49, wrote: > > >>No, a NFS client will not ask the NFS server for a name by sending the >>inode or NFS-handle. There is no need for a NFS client to do that. > > The NFS clients certainly version 2 and 3 only use the "file handle"; > the file handle can be decoded by the server. It filehandle does not > contain the name, only the FSid, the inode number and the generation. > > >>There is no way to get a name from an inode number. > > The nfs server knows how so it is clearly possible. It is not exported to > userland but the kernel can find a file by its inumber. The nfs server can find the file but not the file _name_. inode is all that the NFS server needs, it does not need the file name if it has the inode number. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Reverse lookup: inode to name lookup
>No, a NFS client will not ask the NFS server for a name by sending the >inode or NFS-handle. There is no need for a NFS client to do that. The NFS clients certainly version 2 and 3 only use the "file handle"; the file handle can be decoded by the server. It filehandle does not contain the name, only the FSid, the inode number and the generation. >There is no way to get a name from an inode number. The nfs server knows how so it is clearly possible. It is not exported to userland but the kernel can find a file by its inumber. Casper ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Reverse lookup: inode to name lookup
On Sat, May 1, 2010 at 16:23, wrote: > > >>I understand you cannot lookup names by inode number in general, because >>that would present a security violation. Joe User should not be able to >>find the name of an item that's in a directory where he does not have >>permission. >> >> >> >>But, even if it can only be run by root, is there some way to lookup the >>name of an object based on inode number? > > Sure, that's typically how NFS works. > > The inode itself is not sufficient; an inode number might be recycled and > and old snapshot with the same inode number may refer to a different file. No, a NFS client will not ask the NFS server for a name by sending the inode or NFS-handle. There is no need for a NFS client to do that. There is no way to get a name from an inode number. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Reverse lookup: inode to name lookup
>I understand you cannot lookup names by inode number in general, because >that would present a security violation. Joe User should not be able to >find the name of an item that's in a directory where he does not have >permission. > > > >But, even if it can only be run by root, is there some way to lookup the >name of an object based on inode number? Sure, that's typically how NFS works. The inode itself is not sufficient; an inode number might be recycled and and old snapshot with the same inode number may refer to a different file. Casper ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS on ZFS based storage?
I'm trying to guess what is the best practice in this scenario: - let's say I have a zfs based storage (let's say nexenta) that has it zfs pools and volumes shared as iScsi raw devices - let's say I have another server running xvm or virtualbox connected to the storage - let's say one of the virtual guests is OpenSolaris My question is: - is it correct to mount the iScsi device as base disks for the VM and then create zpools/volumes in it, considering that behind it there is already another zfs? - what alternatives do I have? - in case it's correct to have the VM zfs over the storage zfs, where should I manage snapshots? on the VM or on the storage? Thanks for any idea Gabriele. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Reverse lookup: inode to name lookup
Forget about files for the moment, because directories are fundamentally easier to deal with. Let's suppose I've got the inode number of some directory in the present filesystem. [r...@filer ~]# ls -id /share/projects/foo/goo/rev1.0/working 14363 /share/projects/foo/goo/rev1.0/working/ I want to identify the previous names & locations of that directory from snapshots. find /share/.zfs/snapshot -inum 14363 And I want to do it fast. I don't want to use "find" or anything else that needs to walk every tree of every snapshot. The answer needs to be essentially zero-time, just like the "ls -id" is essentially zero-time. I understand you cannot lookup names by inode number in general, because that would present a security violation. Joe User should not be able to find the name of an item that's in a directory where he does not have permission. But, even if it can only be run by root, is there some way to lookup the name of an object based on inode number? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] How to manage scrub priority or defer scrub?
I was going though this posting and it seems that were is some "personal tension" :). However going back to the technical problem of scrubbing a 200 TB pool I think this issue needs to be addressed. One warning up front: This writing is rather long, and if you like to jump to the part dealing with Scrub, jump to "Scrub implementation" below. >From my perspective: - ZFS is great for huge amounts of data Thats what it was made for with 128bit and jbod design in mind. So ZFS is perfect for internet multi media in terms of scalability. - ZFS is great for commodity hardware Ok you should use 24x7 drives, but 2 TB 7200 disks are ok for internet media mass storage. We want huge amounts of data stored and in the internet age nobody pays for this. So you must use low cost hardware (well it must be compatible) - but you should not need enterprise components - thats what we have ZFS as clever software for. For mass storage internet services, the alternative is NOT EMC, NetApp (remember nobody pays a lot for the services because you can get it free at google) - the alternative is Linux based HW raid (with its well known limitations) and home grown solutions. Those do not have the nice ZFS features mentioned below. - ZFS guarantees data intrity by self-healing silent data corruption (thats what the checksums are for) - But only if you have redundancy. There are a lot of posts on the net saying when people will notice the bad blocks - it happens when a disk in a raid5 failes, and they have to resilver everything. Then you detect the missing redundancy. So people use Raid6 and "hope" that everything works. Or people do scrubs on their advanced raid controllers (if they provide internal checksumming). The same problem exists for huge, passive, raidz1 data sets in ZFS. If you do not scrub the array regularly, chances are higher that you will have a bad block during resilvering and then ZFS can not help. For active data sets the problem is not as critical, because on every read the checksum is verified - but still - because once in arc cache noboy checks - the problem exists. So we need scrub! - ZFS can do many other nice things There's compression, dedupe etc .. however I look at them as "nice to have. - ZFS needs proper pool design Using ZFS right is not easy, sizing the system is even more complicated. There are a lot of threads reagarding pool design - the easiest is to say "do a lot of mirrors", cause then the read performance really scales. However in internet mass media services, you cant - too expensive - because mirrored ZFS is more expensive then HW Raid 6 with Linux. How much members to a vdev ? multiple pools or single pools ? - ZFS is open and community based ... well lets see how this goes wth Oracle "financing" the whole stuff :) And some of those points make ZFS a hit for internet service provider and mass media requirements (VOD etc.)! So whats my point you may ask ? My experience with ZFS is that some points are simply not addressed well enough yet - BUT - ZFS is a living piece of software and thanks to the many great people developing it, it evolves faster then all the other storage solutions. So for the longer term - I believe ZFS will (hopefully) have all "enterprice-ness" it needs and it will revolutionize the storage industry (like cisco did in the 70's). I really believe that. >From my perspective some of the points not addressed well in ZFS are: - pool defragmentation - you need this for a COW filesystem I think the ZFS developers are working on this with the background rewriter. So I hope it will come 2010. With the rewriter on disk layout can be optimized for read performance for sequencial workloads - also for raidz1 and raidz2 - meaning ZFS can compete with Raid5 and Raid6 - also for wider vdevs. And wider vdevs mean more effective capacity. If the vdev read-ahead cache is working nice with a sequencially aligned on disk layout then - (from disk) read performance will be great. - IO priorization for zvols / zfs filesystems (aka Storage QoS) Unfortunately you can not prioritize I/O to zfs filesystems and zvols right now. I think this is one of the features that make ZFS not suitable for 1st tier storage (like EMC Symmetrix or NetApp FAS6000 series). You need priorization here - because your SAP system really is more important than my MP3 web server :) - Deduplication not ready for production Currently dedup is nice, but the DDT table handling and memory sizing is tricky and hardly usable for larger pools (my perspective). The DDT is handled like any other component - meaning user I/O can push the DDT out of the arc (and the L2ARC) - even with (primarycache=secondarycache)=metadata. For typical mass media storage applications, the working set is much larger then the memory (and L2ARC) meaning your DDT will come from disk - causing real performance degration. This is especially true for
[zfs-discuss] zfs log on another zfs pool
Hi. I have a simple question. Is it safe to place log device on another zfs disk? I'm planning on placing the log on my mirrored root partition. Using latest opensolaris. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Panic when deleting a large dedup snapshot
- "Cindy Swearingen" skrev: > Brandon, > > You're probably hitting this CR: > > http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6924824 Interesting - reported in february and still no fix? roy ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss