[zfs-discuss] Posible newbie question about space between zpool and zfs file systems
Sorry if this is too basic - So I have a single zpool in addition to the rpool, called xpool. NAMESIZE USED AVAILCAP HEALTH ALTROOT rpool 136G 109G 27.5G79% ONLINE - xpool 408G 171G 237G42% ONLINE - I have 408 in the pool, am using 171 leaving me 237 GB. The pool is built up as; pool: xpool state: ONLINE scrub: none requested config: NAMESTATE READ WRITE CKSUM xpool ONLINE 0 0 0 raidz2ONLINE 0 0 0 c8t1d0 ONLINE 0 0 0 c8t2d0 ONLINE 0 0 0 c8t3d0 ONLINE 0 0 0 errors: No known data errors But - and here is the question - Creating file systems on it, and the file systems in play report only 76GB of space free <<<>> xpool/zones/logserver/ROOT/zbe 975M 76.4G 975M legacy xpool/zones/openxsrvr 2.22G 76.4G 21.9K /export/zones/openxsrvr xpool/zones/openxsrvr/ROOT2.22G 76.4G 18.9K legacy xpool/zones/openxsrvr/ROOT/zbe2.22G 76.4G 2.22G legacy xpool/zones/puggles241M 76.4G 21.9K /export/zones/puggles xpool/zones/puggles/ROOT 241M 76.4G 18.9K legacy xpool/zones/puggles/ROOT/zbe 241M 76.4G 241M legacy xpool/zones/reposerver 299M 76.4G 21.9K /export/zones/reposerver So my question is, where is the space from xpool being used? or is it? Thanks for reading. Mike. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] corruption of ZFS on iScsi storage
Hello, I'd like to check for any guidance about using zfs on iscsi storage appliances. Recently I had an unlucky situation with an unlucky storage machine freezing. Once the storage was up again (rebooted) all other iscsi clients were happy, while one of the iscsi clients (a sun solaris sparc, running Oracle) did not mount the volume marking it as corrupted. I had no way to get back my zfs data: had to destroy and recreate from backups. So I have some questions regarding this nice story: - I remember sysadmins being able to almost always recover data on corrupted ufs filesystems by magic of superblocks. Is there something similar on zfs? Is there really no way to access data of a corrupted zfs filesystem? - In this case, the storage appliance is a legacy system based on linux, so raids/mirrors are managed at the storage side its own way. Being an iscsi target, this volume was mounted as a single iscsi disk from the solaris host, and prepared as a zfs pool consisting of this single iscsi target. ZFS best practices, tell me that to be safe in case of corruption, pools should always be mirrors or raidz on 2 or more disks. In this case, I considered all safe, because the mirror and raid was managed by the storage machine. But from the solaris host point of view, the pool was just one! And maybe this has been the point of failure. What is the correct way to go in this case? - Finally, looking forward to run new storage appliances using OpenSolaris and its ZFS+iscsitadm and/or comstar, I feel a bit confused by the possibility of having a double zfs situation: in this case, I would have the storage zfs filesystem divided into zfs volumes, accessed via iscsi by a possible solaris host that creates his own zfs pool on it (...is it too redundant??) and again I would fall in the same previous case (host zfs pool connected to one only iscsi resource). Any guidance would be really appreciated :) Thanks a lot Gabriele. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Posible newbie question about space between zpool and zfs file systems
Hi Michael, For a RAIDZ pool, the zpool list command identifies the "inflated" space for the storage pool, which is the physical available space without an accounting for redundancy overhead. The zfs list command identifies how much actual pool space is available to the file systems. See the example of a RAIDZ-2 pool created below with 3 44 GB disks. The total pool capacity reported by zpool list is 134 GB. The amount of pool space that is available to the file systems is 43.8 GB due to RAIDZ-2 redundancy overhead. See this FAQ section for more information. http://hub.opensolaris.org/bin/view/Community+Group+zfs/faq#HZFSAdministrationQuestions Why doesn't the space that is reported by the zpool list command and the zfs list command match? Although this site is dog-slow for me today... Thanks, Cindy # zpool create xpool raidz2 c3t40d0 c3t40d1 c3t40d2 # zpool list xpool NAMESIZE USED AVAILCAP HEALTH ALTROOT xpool 134G 234K 134G 0% ONLINE - # zfs list xpool NAMEUSED AVAIL REFER MOUNTPOINT xpool 73.2K 43.8G 20.9K /xpool On 03/15/10 08:38, Michael Hassey wrote: Sorry if this is too basic - So I have a single zpool in addition to the rpool, called xpool. NAMESIZE USED AVAILCAP HEALTH ALTROOT rpool 136G 109G 27.5G79% ONLINE - xpool 408G 171G 237G42% ONLINE - I have 408 in the pool, am using 171 leaving me 237 GB. The pool is built up as; pool: xpool state: ONLINE scrub: none requested config: NAMESTATE READ WRITE CKSUM xpool ONLINE 0 0 0 raidz2ONLINE 0 0 0 c8t1d0 ONLINE 0 0 0 c8t2d0 ONLINE 0 0 0 c8t3d0 ONLINE 0 0 0 errors: No known data errors But - and here is the question - Creating file systems on it, and the file systems in play report only 76GB of space free <<<>> xpool/zones/logserver/ROOT/zbe 975M 76.4G 975M legacy xpool/zones/openxsrvr 2.22G 76.4G 21.9K /export/zones/openxsrvr xpool/zones/openxsrvr/ROOT2.22G 76.4G 18.9K legacy xpool/zones/openxsrvr/ROOT/zbe2.22G 76.4G 2.22G legacy xpool/zones/puggles241M 76.4G 21.9K /export/zones/puggles xpool/zones/puggles/ROOT 241M 76.4G 18.9K legacy xpool/zones/puggles/ROOT/zbe 241M 76.4G 241M legacy xpool/zones/reposerver 299M 76.4G 21.9K /export/zones/reposerver So my question is, where is the space from xpool being used? or is it? Thanks for reading. Mike. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Posible newbie question about space between zpool and zf
That solved it. Thank you Cindy. Zpool list NOT reporting raidz overhead is what threw me... Thanks again. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] corruption of ZFS on iScsi storage
On Mar 15, 2010, at 10:55 AM, Gabriele Bulfon wrote: > - In this case, the storage appliance is a legacy system based on linux, so > raids/mirrors are managed at the storage side its own way. Being an iscsi > target, this volume was mounted as a single iscsi disk from the solaris host, > and prepared as a zfs pool consisting of this single iscsi target. ZFS best > practices, tell me that to be safe in case of corruption, pools should always > be mirrors or raidz on 2 or more disks. In this case, I considered all safe, > because the mirror and raid was managed by the storage machine. But from the > solaris host point of view, the pool was just one! And maybe this has been > the point of failure. What is the correct way to go in this case? I'd guess this could be because the iscsi target wasn't honoring ZFS flush requests. > - Finally, looking forward to run new storage appliances using OpenSolaris > and its ZFS+iscsitadm and/or comstar, I feel a bit confused by the > possibility of having a double zfs situation: in this case, I would have the > storage zfs filesystem divided into zfs volumes, accessed via iscsi by a > possible solaris host that creates his own zfs pool on it (...is it too > redundant??) and again I would fall in the same previous case (host zfs pool > connected to one only iscsi resource). My experience with this is significantly lower end, but I have had iSCSI shares from a ZFS NAS come up as corrupt to the client. It's fixable if you have snapshots. I've been using iSCSI to provide Time Machine targets to OS X boxes. We had a client crash during writing, and upon reboot it showed the iSCSI volume is corrupt. You can put whatever file system you like the iSCSI target obviously. The current OpenSolaris iSCSI implementation I believe uses synchronous writes, so hopefully what happened to you wouldn't happen in this case. In my case I was using HFS+ (the OS X client has to), and I couldn't repair the volume. However, with a snapshot I could roll it back. If you plan ahead this should save you some restoration work (you'll need to be able to roll back all the files that have to be consistent). Good luck, Ware ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Posible newbie question about space between zpool and zfs file systems
Yeah, this threw me. A 3 disk RAID-Z2 doesn't make sense, because at a redundancy level, RAID-Z2 looks like RAID 6. That is, there are 2 levels of parity for the data. Out of 3 disks, the equivalent of 2 disks will be used to store redundancy (parity) data and only 1 disk equivalent will store actual data. This is what others might term a "degenerate case of 3-way mirroring", except with a lot more computational overhead since we're performing 2 parity calculations. I'm curious what the purpose of creating a 3 disk RAID-Z2 pool is/was? (For my own personal edification. Maybe there is something for me to learn from this example.) Aside: Does ZFS actually create the pool as a 3-way mirror, given that this configuration is effectively the same? This is a question for any of the ZFS team who may be reading but I'm curious now. On Mon, Mar 15, 2010 at 10:38, Michael Hassey wrote: > Sorry if this is too basic - > > So I have a single zpool in addition to the rpool, called xpool. > > NAMESIZE USED AVAILCAP HEALTH ALTROOT > rpool 136G 109G 27.5G79% ONLINE - > xpool 408G 171G 237G42% ONLINE - > > I have 408 in the pool, am using 171 leaving me 237 GB. > > The pool is built up as; > > pool: xpool > state: ONLINE > scrub: none requested > config: > >NAMESTATE READ WRITE CKSUM >xpool ONLINE 0 0 0 > raidz2ONLINE 0 0 0 >c8t1d0 ONLINE 0 0 0 >c8t2d0 ONLINE 0 0 0 >c8t3d0 ONLINE 0 0 0 > > errors: No known data errors > > > But - and here is the question - > > Creating file systems on it, and the file systems in play report only 76GB > of space free > > <<<>> > > xpool/zones/logserver/ROOT/zbe 975M 76.4G 975M legacy > xpool/zones/openxsrvr 2.22G 76.4G 21.9K > /export/zones/openxsrvr > xpool/zones/openxsrvr/ROOT2.22G 76.4G 18.9K legacy > xpool/zones/openxsrvr/ROOT/zbe2.22G 76.4G 2.22G legacy > xpool/zones/puggles241M 76.4G 21.9K > /export/zones/puggles > xpool/zones/puggles/ROOT 241M 76.4G 18.9K legacy > xpool/zones/puggles/ROOT/zbe 241M 76.4G 241M legacy > xpool/zones/reposerver 299M 76.4G 21.9K > /export/zones/reposerver > > > So my question is, where is the space from xpool being used? or is it? > > > Thanks for reading. > > Mike. > -- > This message posted from opensolaris.org > ___ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > -- "You can choose your friends, you can choose the deals." - Equity Private "If Linux is faster, it's a Solaris bug." - Phil Harman Blog - http://whatderass.blogspot.com/ Twitter - @khyron4eva ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] corruption of ZFS on iScsi storage
On Mar 15, 2010, at 10:55 AM, Gabriele Bulfon wrote: Hello, I'd like to check for any guidance about using zfs on iscsi storage appliances. Recently I had an unlucky situation with an unlucky storage machine freezing. Once the storage was up again (rebooted) all other iscsi clients were happy, while one of the iscsi clients (a sun solaris sparc, running Oracle) did not mount the volume marking it as corrupted. I had no way to get back my zfs data: had to destroy and recreate from backups. So I have some questions regarding this nice story: - I remember sysadmins being able to almost always recover data on corrupted ufs filesystems by magic of superblocks. Is there something similar on zfs? Is there really no way to access data of a corrupted zfs filesystem? - In this case, the storage appliance is a legacy system based on linux, so raids/mirrors are managed at the storage side its own way. Being an iscsi target, this volume was mounted as a single iscsi disk from the solaris host, and prepared as a zfs pool consisting of this single iscsi target. ZFS best practices, tell me that to be safe in case of corruption, pools should always be mirrors or raidz on 2 or more disks. In this case, I considered all safe, because the mirror and raid was managed by the storage machine. But from the solaris host point of view, the pool was just one! And maybe this has been the point of failure. What is the correct way to go in this case? - Finally, looking forward to run new storage appliances using OpenSolaris and its ZFS+iscsitadm and/or comstar, I feel a bit confused by the possibility of having a double zfs situation: in this case, I would have the storage zfs filesystem divided into zfs volumes, accessed via iscsi by a possible solaris host that creates his own zfs pool on it (...is it too redundant??) and again I would fall in the same previous case (host zfs pool connected to one only iscsi resource). Any guidance would be really appreciated :) Thanks a lot Gabriele. What iSCSI target was this? If it was IET I hope you were NOT using the write-back option on it as it caches write data in volatile RAM. IET does support cache flushes, but if you cache in RAM (bad idea) a system lockup or panic will ALWAYS loose data. -Ross ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] corruption of ZFS on iScsi storage
Well, I actually don't know what implementation is inside this legacy machine. This machine is an AMI StoreTrends ITX, but maybe it has been built around IET, don't know. Well, maybe I should disable write-back on every zfs host connecting on iscsi? How do I check this? Thx Gabriele. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] corruption of ZFS on iScsi storage
On Mar 15, 2010, at 12:13 PM, Gabriele Bulfon wrote: > Well, I actually don't know what implementation is inside this legacy machine. > This machine is an AMI StoreTrends ITX, but maybe it has been built around > IET, don't know. > Well, maybe I should disable write-back on every zfs host connecting on iscsi? > How do I check this? I think this would be a property of the NAS, not the clients. --Ware ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] corruption of ZFS on iScsi storage
On Mar 15, 2010, at 12:19 PM, Ware Adams wrote: On Mar 15, 2010, at 12:13 PM, Gabriele Bulfon wrote: Well, I actually don't know what implementation is inside this legacy machine. This machine is an AMI StoreTrends ITX, but maybe it has been built around IET, don't know. Well, maybe I should disable write-back on every zfs host connecting on iscsi? How do I check this? I think this would be a property of the NAS, not the clients. Yes, Ware's right the setting should be on the AMI device. I don't know what target it's using either, but if it has an option to disable write-back caching at least then if it doesn't honor flushing your data should still be safe. -Ross ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] CR 6880994 and pkg fix
On Sun, March 14, 2010 13:54, Frank Middleton wrote: > > How can it even be remotely possible to get a checksum failure on mirrored > drives > with copies=2? That means all four copies were corrupted? Admittedly this > is > on a grotty PC with no ECC and flaky bus parity, but how come the same > file always > gets flagged as being clobbered (even though apparently it isn't). > > The oddest part is that libdlpi.so.1 doesn't actually seem to be > corrupted. nm lists > it with no problem and you can copy it to /tmp, rename it, and then copy > it back. > objdump and readelf can all process this library with no problem. But "pkg > fix" > flags an error in it's own inscrutable way. CCing pkg-discuss in case a > pkg guru > can shed any light on what the output of "pkg fix" (below) means. > Presumably libc > is OK, or it wouldn't boot :-). This sounds really bizarre. One detail suggestion on checking what's going on (since I don't have a clue towards a real root-cause determination): Get an md5sum on a clean copy of the file, say from a new install or something, and check the allegedly-corrupted copy against that. This can fairly easily give you a pretty reliable indication if the file is truly corrupted or not. -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zpool reporting consistent read errors
On Mon, March 15, 2010 00:54, no...@euphoriq.com wrote: > I'm running a raidz1 with 3 Samsung 1.5TB drives. Every time I scrub the > pool I get multiple read errors, no write errors and no checksum errors on > one drive (always the same drive, and no data loss). > > I've changed cables, changed the sata ports the drives are attached to, I > always get the same outcome. The drives are new. Is this likely a drive > problem? Given what you've already changed, it's sounding like it could well be a drive problem. The one other thing that comes to mind is power to the drive. -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] backup zpool to tape
Greg, I am using NetBackup 6.5.3.1 (7.x is out) with fine results. Nice and fast. -Scott -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] backup zpool to tape
Hey Scott, Thanks for the information. I doubt I can drop that kind of cash, but back to getting bacula working! Thanks again, Greg -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zpool reporting consistent read errors
Wow. I never thought about it. I changed the power supply to a cheap one a while back (a now seemingly foolish effort to save money) - it could be the issue. I'll change it back and let you know. Thanks -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zpool reporting consistent read errors
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 15.03.2010 21:13, no...@euphoriq.com wrote: > Wow. I never thought about it. I changed the power supply to a cheap one a > while back (a now seemingly foolish effort to save money) - it could be the > issue. I'll change it back and let you know. "cheap" powersupplies rarely are. ;) It's been my experience that if you "overengineer" the psu a bit, the efficiency of the PSU increases (it's no longer pushing 100% of its rated spec) and actually the consumed power (on the 220v side) drops. //Svein - -- - +---+--- /"\ |Svein Skogen | sv...@d80.iso100.no \ / |Solberg Østli 9| PGP Key: 0xE5E76831 X|2020 Skedsmokorset | sv...@jernhuset.no / \ |Norway | PGP Key: 0xCE96CE13 | | sv...@stillbilde.net ascii | | PGP Key: 0x58CD33B6 ribbon |System Admin | svein-listm...@stillbilde.net Campaign|stillbilde.net | PGP Key: 0x22D494A4 +---+--- |msn messenger: | Mobile Phone: +47 907 03 575 |sv...@jernhuset.no | RIPE handle:SS16503-RIPE - +---+--- If you really are in a hurry, mail me at svein-mob...@stillbilde.net This mailbox goes directly to my cellphone and is checked even when I'm not in front of my computer. - Picture Gallery: https://gallery.stillbilde.net/v/svein/ - -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.12 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkuemfgACgkQSBMQn1jNM7ZJ5gCghZuA3LnqkZnA54zddSlrkG6Y MbcAoK8RU5td2Xx79q+Wmbztth7pB217 =pRID -END PGP SIGNATURE- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] persistent L2ARC
Greeting ALL I understand that L2ARC is still under enhancement. Does any one know if ZFS can be upgrades to include "Persistent L2ARC", ie. L2ARC will not loose its contents after system reboot ? -- Abdullah Al-Dahlawi George Washington University Department. Of Electrical & Computer Engineering Check The Fastest 500 Super Computers Worldwide http://www.top500.org/list/2009/11/100 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] pool causes kernel panic, recursive mutex enter, 134
hi, i´m using opensolaris about 2 years with an mirrored rpool and an data pool with 3 x 2 (mirrored) drives. the data pool drives are connected to SIL pci-express cards. yesterday i updated from 130 to 134, everything seemed to be fine and i also replaced 1 pair of mirrored drives with larger disks. still no problems, done some tests, rebooted a few times, checked logs, nothing special. today i started copying a larger amount of data. while copying, at about 40gb, opensolaris gave me the first kernel panic ever seen on this system. system rebooted and while mounting the data pool, you may guess it, panic again. what i did so far in trying to get it up again: boot without data drive, try to mount manualy and with -F -n (non destructive as manual says) tried to mount normal with different combination of mirrors taken offline, so that there is only a single drive for each slice. same panic. i still have the drives that i replaced with the newer drives but i believe they are useless since the structure changed? the kernel panic i get is cpu(0) recursive mutex enter and several lines of SIL driver errors. i tried also booting with previous BE 130 before the update and where the pools never got an error, same panic. ANY ideas of volume rescue are welcome - if i missed some important information,please tell me. regards, mark -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zpool reporting consistent read errors
On Mon, March 15, 2010 15:35, Svein Skogen wrote: > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA1 > > On 15.03.2010 21:13, no...@euphoriq.com wrote: >> Wow. I never thought about it. I changed the power supply to a cheap >> one a while back (a now seemingly foolish effort to save money) - it >> could be the issue. I'll change it back and let you know. > > "cheap" powersupplies rarely are. ;) I've had all types fail on me. I think I've had more power supplies than disk drives fail on me, even. And they can produce the most *amazing* range of symptoms, if they don't fail completely. Quite remarkable. > It's been my experience that if you "overengineer" the psu a bit, the > efficiency of the PSU increases (it's no longer pushing 100% of its > rated spec) and actually the consumed power (on the 220v side) drops. Strangely enough, running up to the limit is hard on components, yes. -- David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/ Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/ Photos: http://dd-b.net/photography/gallery/ Dragaera: http://dragaera.info ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] pool causes kernel panic, recursive mutex enter, 134
some screenshots that may help: pool: tank id: 5649976080828524375 state: ONLINE action: The pool can be imported using its name or numeric identifier. config: data ONLINE mirror-0 ONLINE c27t2d0ONLINE c27t0d0ONLINE mirror-1 ONLINE c27t3d0ONLINE c29t1d0ONLINE mirror-2 ONLINE c27t1d0ONLINE c29t0d0ONLINE Mar 15 21:42:50 solaris1.local ^Mpanic[cpu0]/thread=d6792f00: Mar 15 21:42:50 solaris1.local genunix: [ID 335743 kern.notice] BAD TRAP: type=e (#pf Page fault) rp=d76d3658 addr=34 occurred in module "zfs" due to a NULL pointer dereference Mar 15 21:42:50 solaris1.local unix: [ID 10 kern.notice] Mar 15 21:42:50 solaris1.local unix: [ID 839527 kern.notice] syseventd: Mar 15 21:42:50 solaris1.local unix: [ID 753105 kern.notice] #pf Page fault Mar 15 21:42:50 solaris1.local unix: [ID 532287 kern.notice] Bad kernel fault at addr=0x34 Mar 15 21:42:50 solaris1.local unix: [ID 243837 kern.notice] pid=93, pc=0xf924b97e, sp=0xd76d36c4, eflags=0x10282 Mar 15 21:42:50 solaris1.local unix: [ID 211416 kern.notice] cr0: 8005003b cr4: 6f8 Mar 15 21:42:50 solaris1.local unix: [ID 624947 kern.notice] cr2: 34 Mar 15 21:42:50 solaris1.local unix: [ID 625075 kern.notice] cr3: 2ead020 Mar 15 21:42:50 solaris1.local unix: [ID 10 kern.notice] Mar 15 21:42:50 solaris1.local unix: [ID 537610 kern.notice] gs: d76d01b0 fs:0 es: cb0160 ds: e31a0160 Mar 15 21:42:50 solaris1.local unix: [ID 537610 kern.notice]edi:0 esi: de581350 ebp: d76d36a4 esp: d76d3690 Mar 15 21:42:50 solaris1.local unix: [ID 537610 kern.notice]ebx:0 edx:b ecx:0 eax:0 Mar 15 21:42:50 solaris1.local unix: [ID 537610 kern.notice]trp:e err:0 eip: f924b97e cs: 158 Mar 15 21:42:50 solaris1.local unix: [ID 717149 kern.notice]efl:10282 usp: d76d36c4 ss: f924b9c6 Mar 15 21:42:50 solaris1.local unix: [ID 10 kern.notice] Mar 15 21:42:50 solaris1.local genunix: [ID 353471 kern.notice] d76d3594 unix:die+93 (e, d76d3658, 34, 0) Mar 15 21:42:50 solaris1.local genunix: [ID 353471 kern.notice] d76d3644 unix:trap+1449 (d76d3658, 34, 0) Mar 15 21:42:50 solaris1.local genunix: [ID 353471 kern.notice] d76d3658 unix:cmntrap+7c (d76d01b0, 0, cb0160) Mar 15 21:42:50 solaris1.local genunix: [ID 353471 kern.notice] d76d36a4 zfs:vdev_is_dead+6 (0, 0, cb36a7, e31ad) Mar 15 21:42:50 solaris1.local genunix: [ID 353471 kern.notice] d76d36c4 zfs:vdev_readable+e (0, 1, 0, fe96c13d) Mar 15 21:42:50 solaris1.local genunix: [ID 353471 kern.notice] d76d3704 zfs:vdev_mirror_child_select+55 (dedc6560, 1, 0, f92) Mar 15 21:42:50 solaris1.local genunix: [ID 353471 kern.notice] d76d3744 zfs:vdev_mirror_io_start+b3 (dedc6560, 1
Re: [zfs-discuss] Posible newbie question about space between zpool and zfs file systems
Hi Cindy, trying to reproduce this > For a RAIDZ pool, the zpool list command identifies > the "inflated" space > for the storage pool, which is the physical available > space without an > accounting for redundancy overhead. > > The zfs list command identifies how much actual pool > space is available > to the file systems. I am lacking 1 TB on my pool: u...@filemeister:~$ zpool list daten NAMESIZE ALLOC FREECAP DEDUP HEALTH ALTROOT daten10T 3,71T 6,29T37% 1.00x ONLINE - u...@filemeister:~$ zpool status daten pool: daten state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM daten ONLINE 0 0 0 raidz2-0ONLINE 0 0 0 c10t2d0 ONLINE 0 0 0 c10t3d0 ONLINE 0 0 0 c10t4d0 ONLINE 0 0 0 c10t5d0 ONLINE 0 0 0 c10t6d0 ONLINE 0 0 0 c10t7d0 ONLINE 0 0 0 c10t8d0 ONLINE 0 0 0 c10t9d0 ONLINE 0 0 0 c11t18d0 ONLINE 0 0 0 c11t19d0 ONLINE 0 0 0 c11t20d0 ONLINE 0 0 0 spares c11t21d0AVAIL errors: No known data errors u...@filemeister:~$ zfs list daten NAMEUSED AVAIL REFER MOUNTPOINT daten 3,01T 4,98T 110M /daten I am counting 11 disks 1 TB each in a raidz2 pool. This is 11 TB gross capacity, and 9 TB net. Zpool is however stating 10 TB and zfs is stating 8TB. The difference between net and gross is correct, but where is the capacity from the 11th disk going? Regards, Tonmaus -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Posible newbie question about space between zpool and zfs file systems
Tonmaus wrote: I am lacking 1 TB on my pool: u...@filemeister:~$ zpool list daten NAMESIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT daten10T 3,71T 6,29T37% 1.00x ONLINE - u...@filemeister:~$ zpool status daten pool: daten state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM daten ONLINE 0 0 0 raidz2-0ONLINE 0 0 0 c10t2d0 ONLINE 0 0 0 c10t3d0 ONLINE 0 0 0 c10t4d0 ONLINE 0 0 0 c10t5d0 ONLINE 0 0 0 c10t6d0 ONLINE 0 0 0 c10t7d0 ONLINE 0 0 0 c10t8d0 ONLINE 0 0 0 c10t9d0 ONLINE 0 0 0 c11t18d0 ONLINE 0 0 0 c11t19d0 ONLINE 0 0 0 c11t20d0 ONLINE 0 0 0 spares c11t21d0AVAIL errors: No known data errors u...@filemeister:~$ zfs list daten NAME USED AVAIL REFER MOUNTPOINT daten 3,01T 4,98T 110M /daten I am counting 11 disks 1 TB each in a raidz2 pool. This is 11 TB gross capacity, and 9 TB net. Zpool is however stating 10 TB and zfs is stating 8TB. The difference between net and gross is correct, but where is the capacity from the 11th disk going? My guess is unit conversion and rounding. Your pool has 11 base 10 TB, which is 10.2445 base 2 TiB. Likewise your fs has 9 base 10 TB, which is 8.3819 base 2 TiB. -- Carson ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Posible newbie question about space between zpool and zfs file systems
On Mon, 2010-03-15 at 15:03 -0700, Tonmaus wrote: > Hi Cindy, > trying to reproduce this > > > For a RAIDZ pool, the zpool list command identifies > > the "inflated" space > > for the storage pool, which is the physical available > > space without an > > accounting for redundancy overhead. > > > > The zfs list command identifies how much actual pool > > space is available > > to the file systems. > > I am lacking 1 TB on my pool: > > u...@filemeister:~$ zpool list daten > NAMESIZE ALLOC FREECAP DEDUP HEALTH ALTROOT > daten10T 3,71T 6,29T37% 1.00x ONLINE - > u...@filemeister:~$ zpool status daten > pool: daten > state: ONLINE > scrub: none requested > config: > > NAME STATE READ WRITE CKSUM > daten ONLINE 0 0 0 > raidz2-0ONLINE 0 0 0 > c10t2d0 ONLINE 0 0 0 > c10t3d0 ONLINE 0 0 0 > c10t4d0 ONLINE 0 0 0 > c10t5d0 ONLINE 0 0 0 > c10t6d0 ONLINE 0 0 0 > c10t7d0 ONLINE 0 0 0 > c10t8d0 ONLINE 0 0 0 > c10t9d0 ONLINE 0 0 0 > c11t18d0 ONLINE 0 0 0 > c11t19d0 ONLINE 0 0 0 > c11t20d0 ONLINE 0 0 0 > spares > c11t21d0AVAIL > > errors: No known data errors > u...@filemeister:~$ zfs list daten > NAMEUSED AVAIL REFER MOUNTPOINT > daten 3,01T 4,98T 110M /daten > > I am counting 11 disks 1 TB each in a raidz2 pool. This is 11 TB gross > capacity, and 9 TB net. Zpool is however stating 10 TB and zfs is stating > 8TB. The difference between net and gross is correct, but where is the > capacity from the 11th disk going? > > Regards, > > Tonmaus 1TB disks aren't a terabyte. Remember, the storage industry uses powers of 10, not 2. it's annoying. For each GB, you lose 7% in actual space computation. For each TB, it's about 9%. So, your "1TB" of is actually about 931 GB. 'zfs list' is going to report in actual powers-of-2, just like df. In my case, I have a 12 x 1TB configuration, and zfs list shows: # zpool list NAMESIZE USED AVAILCAP HEALTH ALTROOT array2540 10.9T 5.46T 5.41T50% ONLINE - Likewise: # zfs list NAMEUSED AVAIL REFER MOUNTPOINT array2540 4.53T 4.34T 80.4M /data So, here's the math: 1 "storage TB" = 1e12 / (1024^3) = 931 actual GB 931 GB x 12 = 11,172 GB but, 1TB = 1024 GB so: 931 GB x 12 / (1024) = 10.9TB. Quick Math: 1 TB of advertised space = 0.91 TB of real space 1 GB of advertised space = 0.93 GB of real space -- Erik Trimble Java System Support Mailstop: usca22-123 Phone: x17195 Santa Clara, CA Timezone: US/Pacific (GMT-0800) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Posible newbie question about space between zpool and zfs file systems
On Mon, 2010-03-15 at 15:40 -0700, Carson Gaspar wrote: > Tonmaus wrote: > > > I am lacking 1 TB on my pool: > > > > u...@filemeister:~$ zpool list daten NAMESIZE ALLOC FREE > > CAP DEDUP HEALTH ALTROOT daten10T 3,71T 6,29T37% 1.00x > > ONLINE - u...@filemeister:~$ zpool status daten pool: daten state: > > ONLINE scrub: none requested config: > > > > NAME STATE READ WRITE CKSUM daten ONLINE 0 > > 0 0 raidz2-0ONLINE 0 0 0 c10t2d0 ONLINE > > 0 0 0 c10t3d0 ONLINE 0 0 0 c10t4d0 ONLINE > > 0 0 0 c10t5d0 ONLINE 0 0 0 c10t6d0 ONLINE > > 0 0 0 c10t7d0 ONLINE 0 0 0 c10t8d0 ONLINE > > 0 0 0 c10t9d0 ONLINE 0 0 0 c11t18d0 ONLINE > > 0 0 0 c11t19d0 ONLINE 0 0 0 c11t20d0 ONLINE > > 0 0 0 spares c11t21d0AVAIL > > > > errors: No known data errors u...@filemeister:~$ zfs list daten NAME > > USED AVAIL REFER MOUNTPOINT daten 3,01T 4,98T 110M /daten > > > > I am counting 11 disks 1 TB each in a raidz2 pool. This is 11 TB > > gross capacity, and 9 TB net. Zpool is however stating 10 TB and zfs > > is stating 8TB. The difference between net and gross is correct, but > > where is the capacity from the 11th disk going? > > My guess is unit conversion and rounding. Your pool has 11 base 10 TB, > which is 10.2445 base 2 TiB. > > Likewise your fs has 9 base 10 TB, which is 8.3819 base 2 TiB. Not quite. 11 x 10^12 =~ 10.004 x (1024^4). So, the 'zpool list' is right on, at "10T" available. For the 'zfs list', remember there is a slight overhead for filesystem formatting. So, instead of 9 x 10^12 =~ 8.185 x (1024^4) it shows 7.99TB usable. The roughly 200GB is the overhead. (or, about 3%). -- Erik Trimble Java System Support Mailstop: usca22-123 Phone: x17195 Santa Clara, CA Timezone: US/Pacific (GMT-0800) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] corruption of ZFS on iScsi storage
> Being an iscsi > target, this volume was mounted as a single iscsi > disk from the solaris host, and prepared as a zfs > pool consisting of this single iscsi target. ZFS best > practices, tell me that to be safe in case of > corruption, pools should always be mirrors or raidz > on 2 or more disks. In this case, I considered all > safe, because the mirror and raid was managed by the > storage machine. As far as I understand Best Practises, redundancy needs to be within zfs in order to provide full protection. So, actually Best Practises says that your scenario is rather one to be avoided. Regards, Tonmaus -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Posible newbie question about space between zpool and zfs file systems
> My guess is unit conversion and rounding. Your pool > has 11 base 10 TB, > which is 10.2445 base 2 TiB. > > Likewise your fs has 9 base 10 TB, which is 8.3819 > base 2 TiB. > Not quite. > > 11 x 10^12 =~ 10.004 x (1024^4). > > So, the 'zpool list' is right on, at "10T" available. Duh! I completely forgot about this. Thanks for the heads-up. Tonmaus -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Posible newbie question about space between zpool and zfs file systems
Someone wrote (I haven't seen the mail, only the unattributed quote): My guess is unit conversion and rounding. Your pool has 11 base 10 TB, which is 10.2445 base 2 TiB. Likewise your fs has 9 base 10 TB, which is 8.3819 base 2 TiB. Not quite. 11 x 10^12 =~ 10.004 x (1024^4). So, the 'zpool list' is right on, at "10T" available. Duh, I was doing GiB math (y = x * 10^9 / 2^20), not TiB math (y = x * 10^12 / 2^40). Thanks for the correction. -- Carson ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] corruption of ZFS on iScsi storage
On Mon, Mar 15, 2010 at 9:55 AM, Gabriele Bulfon wrote: > Hello, > I'd like to check for any guidance about using zfs on iscsi storage > appliances. > Recently I had an unlucky situation with an unlucky storage machine > freezing. > Once the storage was up again (rebooted) all other iscsi clients were > happy, while one of the iscsi clients (a sun solaris sparc, running Oracle) > did not mount the volume marking it as corrupted. > I had no way to get back my zfs data: had to destroy and recreate from > backups. > So I have some questions regarding this nice story: > - I remember sysadmins being able to almost always recover data on > corrupted ufs filesystems by magic of superblocks. Is there something > similar on zfs? Is there really no way to access data of a corrupted zfs > filesystem? > - In this case, the storage appliance is a legacy system based on linux, so > raids/mirrors are managed at the storage side its own way. Being an iscsi > target, this volume was mounted as a single iscsi disk from the solaris > host, and prepared as a zfs pool consisting of this single iscsi target. ZFS > best practices, tell me that to be safe in case of corruption, pools should > always be mirrors or raidz on 2 or more disks. In this case, I considered > all safe, because the mirror and raid was managed by the storage machine. > But from the solaris host point of view, the pool was just one! And maybe > this has been the point of failure. What is the correct way to go in this > case? > - Finally, looking forward to run new storage appliances using OpenSolaris > and its ZFS+iscsitadm and/or comstar, I feel a bit confused by the > possibility of having a double zfs situation: in this case, I would have the > storage zfs filesystem divided into zfs volumes, accessed via iscsi by a > possible solaris host that creates his own zfs pool on it (...is it too > redundant??) and again I would fall in the same previous case (host zfs pool > connected to one only iscsi resource). > > Any guidance would be really appreciated :) > Thanks a lot > Gabriele. > > To answer the other portion of your question, yes, you can roll back zfs if you're at the proper version. The procedure is listed below, essentially it will try to find the last known good transaction. If that doesn't work, your only remaining option is to restore from backup: http://docs.sun.com/app/docs/doc/817-2271/gbctt?l=ja&a=view --Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] persistent L2ARC
On Mon, Mar 15, 2010 at 5:39 PM, Abdullah Al-Dahlawi wrote: > Greeting ALL > > > I understand that L2ARC is still under enhancement. Does any one know if > ZFS can be upgrades to include "Persistent L2ARC", ie. L2ARC will not loose > its contents after system reboot ? > There is a bug opened for that but it doesn't seem to be implemented yet. http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6662467 -- Giovanni ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] corruption of ZFS on iScsi storage
On Mar 15, 2010, at 7:11 PM, Tonmaus wrote: Being an iscsi target, this volume was mounted as a single iscsi disk from the solaris host, and prepared as a zfs pool consisting of this single iscsi target. ZFS best practices, tell me that to be safe in case of corruption, pools should always be mirrors or raidz on 2 or more disks. In this case, I considered all safe, because the mirror and raid was managed by the storage machine. As far as I understand Best Practises, redundancy needs to be within zfs in order to provide full protection. So, actually Best Practises says that your scenario is rather one to be avoided. There is nothing saying redundancy can't be provided below ZFS just if you want auto recovery you need redundancy within ZFS itself as well. You can have 2 separate raid arrays served up via iSCSI to ZFS which then makes a mirror out of the storage. -Ross ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] corruption of ZFS on iScsi storage
On Mon, Mar 15, 2010 at 9:10 PM, Ross Walker wrote: > On Mar 15, 2010, at 7:11 PM, Tonmaus wrote: > > Being an iscsi >>> target, this volume was mounted as a single iscsi >>> disk from the solaris host, and prepared as a zfs >>> pool consisting of this single iscsi target. ZFS best >>> practices, tell me that to be safe in case of >>> corruption, pools should always be mirrors or raidz >>> on 2 or more disks. In this case, I considered all >>> safe, because the mirror and raid was managed by the >>> storage machine. >>> >> >> As far as I understand Best Practises, redundancy needs to be within zfs >> in order to provide full protection. So, actually Best Practises says that >> your scenario is rather one to be avoided. >> > > There is nothing saying redundancy can't be provided below ZFS just if you > want auto recovery you need redundancy within ZFS itself as well. > > You can have 2 separate raid arrays served up via iSCSI to ZFS which then > makes a mirror out of the storage. > > -Ross > > Perhaps I'm remembering incorrectly, but I didn't think mirroring would auto-heal/recover, I thought that was limited to the raidz* implementations. --Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] corruption of ZFS on iScsi storage
On Mar 15, 2010, at 11:10 PM, Tim Cook wrote: On Mon, Mar 15, 2010 at 9:10 PM, Ross Walker wrote: On Mar 15, 2010, at 7:11 PM, Tonmaus wrote: Being an iscsi target, this volume was mounted as a single iscsi disk from the solaris host, and prepared as a zfs pool consisting of this single iscsi target. ZFS best practices, tell me that to be safe in case of corruption, pools should always be mirrors or raidz on 2 or more disks. In this case, I considered all safe, because the mirror and raid was managed by the storage machine. As far as I understand Best Practises, redundancy needs to be within zfs in order to provide full protection. So, actually Best Practises says that your scenario is rather one to be avoided. There is nothing saying redundancy can't be provided below ZFS just if you want auto recovery you need redundancy within ZFS itself as well. You can have 2 separate raid arrays served up via iSCSI to ZFS which then makes a mirror out of the storage. -Ross Perhaps I'm remembering incorrectly, but I didn't think mirroring would auto-heal/recover, I thought that was limited to the raidz* implementations. Mirroring auto-heals, in fact copies=2 on a single disk vdev can auto- heal (if it isn't a disk failure). -Ross ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] How to manage scrub priority or defer scrub?
On Mar 14, 2010, at 11:25 PM, Tonmaus wrote: > Hello again, > > I am still concerned if my points are being well taken. > >> If you are concerned that a >> single 200TB pool would take a long >> time to scrub, then use more pools and scrub in >> parallel. > > The main concern is not scrub time. Scrub time could be weeks if scrub just > would behave. You may imagine that there are applications where segmentation > is a pain point, too. I agree. >> The scrub will queue no more >> han 10 I/Os at one time to a device, so devices which >> can handle concurrent I/O >> are not consumed entirely by scrub I/O. This could be >> tuned lower, but your storage >> is slow and *any* I/O activity will be noticed. > > There are a couple of things I maybe don't understand, then. > > - zpool iostat is reporting more than 1k of outputs while scrub ok > - throughput is as high as can be until maxing out CPU You would rather your CPU be idle? What use is an idle CPU, besides wasting energy :-)? > - nominal I/O capacity of a single device is still around 90, how can 10 I/Os > already bring down payload 90 IOPS is approximately the worst-case rate for a 7,200 rpm disk for a small, random workload. ZFS tends to write sequentially, so "random writes" tend to become "sequential writes" on ZFS. So it is quite common to see scrub workloads with >> 90 IOPS. > - scrubbing the same pool, configured as raidz1 didn't max out CPU which is > no surprise (haha, slow storage...) the notable part is that it didn't slow > down payload that much either. raidz creates more, smaller writes than a mirror or simple stripe. If the disks are slow, then the IOPS will be lower and the scrub takes longer, but the I/O scheduler can manage the queue better (disks are slower). > - scrub is obviously fine with data added or deleted during a pass. So, it > could be possible to pause and resume a pass, couldn't it? You can start or stop scrubs, there no resume directive. There are several bugs/RFEs along these lines, something like: http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6743992 > My conclusion from these observations is that not only disk speed counts > here, but other bottlenecks may strike as well. Solving the issue by the > wallet is one way, solving it by configuration of parameters is another. So, > is there a lever for scrub I/O prio, or not? Is there a possibility to pause > scrub passed and resume? Scrub is already the lowest priority. Would you like it to be lower? I think the issue is more related to which queue is being managed by the ZFS priority scheduler rather than the lack of scheduling priority. -- richard ZFS storage and performance consulting at http://www.RichardElling.com ZFS training on deduplication, NexentaStor, and NAS performance Atlanta, March 16-18, 2010 http://nexenta-atlanta.eventbrite.com Las Vegas, April 29-30, 2010 http://nexenta-vegas.eventbrite.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss