Re: [zfs-discuss] zpool resilver - error history
Marcel Gschwandl schrieb: Hi all! I'm running a Solaris 10 Update 6 (10/08) system and had to resilver a zpool. It's now showing snip scrub: resilver completed after 9h0m with 21 errors on Wed Nov 4 22:07:49 2009 /snip but I haven't found an option to see what files where affected, Is there any way to do that? Thanks in advance Marcel Try zpool status -v poolname ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] sparc + zfs + nfs + mac osX = fail ?
hi folks, i'm seeing an odd problem wondered whether others had encountered it. when i try to write to a nevada NFS share from a mac os X (10.5) client via the mac's GUI, i get a permissions error - the file is 0 bytes, date set to jan 1, 1970, and perms set to 000. writing to the share via the command line works fine, so it's not a normal permissions problem. here's the weird thing: a) with mac os X 10.4, sparc snv_84 and zfs filesystem zhared via NFS = no problem b) mac os X 10.5, sparc snv_84, zfs = doesn't work c) mac os X 10.5, sparc snv_84, UFS = no problem d) mac os X 10.5, x86 snv_115, ZFS = no problem e) mac os X, 10.5, sparc snv_125, ZFS = doesn't work i haven't yet tried sparc snv_125 + UFS, but i'm wondering if there's anyone out here with a working combination of: mac os X 10.5, sparc snv_120+, ZFS, NFS? i thought at first it was a problem with the mac 10.5 nfs client, but then i'd expect (c) and (d) to fail, too. i've tried existing shares, new shares, zfs-based sharing, straight share(1m) sharing, root=mac_client - none have made much difference. it's all been NFSv3 snooping the communication hasn't yielded much obvious. any thoughts/suggestions/wisdom? following up on this, i can confirm the following: x64, zfs, snv_126= ok sparc, zfs, snv_125 = fail sparc, ufs, snv_125 = ok thanks for the help offered so far; i've been shown the following by macko: [...] OK, it looks like e_nfs_gui has the failure case and it is failing because the server for some reason is not correctly setting the mode of the new file when it is created exclusively. At Packet #87, we see an exclusive CREATE of pwl_standard.jpg followed immediately with a SETATTR call (#94) that specifies the new file's mode should be 0644. In packet #95, the server reports that the SETATTR call succeeded, but the new attributes returned for the file show that the mode is still . And subsequent ACCESS requests on that file report that no access is allowed. The same thing happens again starting at packet #188, but in that case the mode being set is 0666. But the result is the same, the server reports that the SETATTR succeeds but the attributes show that the mode is still set to . In e_nfs_cli and n_nfs_cli, the CREATE attempt is not done exclusively and the problem does not happen. The non-exclusive (unchecked) create succeeds and the mode is set to the mode passed in the CREATE call. In n_nfs_gui, the CREATE attempt is done exclusively like in e_nfs_gui however the client's attempt to set the mode via SETATTR does succeed and the new attributes show the new mode. The difference between e_nfs_gui and n_nfs_gui appears to be in the NFS server's handling of the SETATTR request that follows the exclusive CREATE request. The client attempts to set mode=0644, uid=36493, and gid=0. The gid=0 is because the directory's gid=0 and the Mac VFS layer considers the default behavior to be to copy the directory's gid to the child. Some servers may balk at this if the credential isn't a member of the group, but the Mac NFS client will then attempt the SETATTR again without setting the uid/gid. This is what happens in n_nfs_gui. Strangely, in the e_nfs_gui trace the SETATTR request does appear to set the gid=0 successfully even though the mode seems unchanged. [...] so - something weird is happening when an nfs call is made on a zfs filesystem to do exclusive create setattr given the places where this succeeds fails, this is starting to look like a zfs (sparc) bug - but i'm happy to be shown that it's some sort of global settings problem instead. any further suggestions? i can provide snoops/tcpdumps if anyone is interested. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] sparc + zfs + nfs + mac osX = fail ?
i wonder whether this is related to an itunes update - it tends to fiddle about in the library for a bit when you install ? (??) -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Quick drive slicing madness question
Darren J Moffat darr...@opensolaris.org writes: Mauricio Tavares wrote: If I have a machine with two drives, could I create equal size slices on the two disks, set them up as boot pool (mirror) and then use the remaining space as a striped pool for other more wasteful applications? You could but why bother ? Why not just create one mirrored pool. you get half the space available... even if you don't forego redundancy and use mirroring on both slices, you can't extend the data pool later. Having two pools on the same disk (or mirroring to the same disk) is asking for performance pain if both are being written to heavily. not too common with heavy writing to rpool, is it? the main source of writing is syslog, I guess. -- Kjetil T. Homme Redpill Linpro AS - Changing the game ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] RAID-Z and virtualization
Tim Cook wrote: On Sun, Nov 8, 2009 at 2:03 AM, besson3c j...@netmusician.org mailto:j...@netmusician.org wrote: I'm entertaining something which might be a little wacky, I'm wondering what your general reaction to this scheme might be :) I would like to invest in some sort of storage appliance, and I like the idea of something I can grow over time, something that isn't tethered to my servers (i.e. not direct attach), as I'd like to keep this storage appliance beyond the life of my servers. Therefore, a RAID 5 or higher type setup in a separate 2U chassis is attractive to me. I do a lot of virtualization on my servers, and currently my VM host is running VMWare Server. It seems like the way forward is with software based RAID with sophisticated file systems such as ZFS or BTRFS rather than a hardware RAID card and dumber file system. I really like what ZFS brings to the table in terms of RAID-Z and more, so I'm thinking that it might be smart to skip getting a hardware RAID card and jump into using ZFS. The obvious problem at this point is that ZFS is not available for Linux yet, and BTRFS is not yet ready for production usage. So, I'm exploring some options. One option is to just get that RAID card and reassess all of this when BTRFS is ready, but the other option is the following... What if I were to run a FreeBSD VM and present it several vdisks, format these as ZFS, and serve up ZFS shares through this VM? I realize that I'm getting the sort of userland conveniences of ZFS this way since the host would still be writing to an EXT3/4 volume, but on the other hand perhaps these conveniences and other benefits would be worthwhile? What would I be missing out on, despite no assurances of the same integrity given the underlying EXT3/4 volume? What do you think, would setting up a VM solely for hosting ZFS shares be worth my while as a sort of bridge to BTRFS? I realize that I'd have to allocate a lot of RAM to this VM, I'm prepared to do that. Is this idea retarded? Something you would recommend or do yourself? All of this convenience is pointless if there will be significant problems, I would like to eventually serve production servers this way. Fairly low volume ones, but still important to me. Why not just convert the VM's to run in virtualbox and run Solaris directly on the hardware? That's another possibility, but it depends on how Virtualbox stacks up against VMWare Server. At this point a lot of planning would be necessary to switch to something else, although this is possibility. How would Virtualbox stack up against VMWare Server? Last I checked it doesn't have a remote console of any sort, which would be a deal breaker. Can I disable allocating virtual memory to Virtualbox VMs? Can I get my VMs to auto boot in a specific order at runlevel 3? Can I control my VMs via the command line? I thought Virtualbox was GUI only, designed for Desktop use primarily? This switch will only make sense if all of this points to a net positive. --Tim -- Joe Auty NetMusician: web publishing software for musicians http://www.netmusician.org j...@netmusician.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] RAID-Z and virtualization
Erik Ableson wrote: Uhhh - for an unmanaged server you can use ESXi for free. Identical server functionality, just requires licenses if you need multiserver features (ie vMotion) How does ESXi w/o vMotion, vSphere, and vCenter server stack up against VMWare Server? My impression was that you need these other pieces to make such an infrastructure useful? Cordialement, Erik Ableson On 8 nov. 2009, at 19:12, Tim Cook t...@cook.ms mailto:t...@cook.ms wrote: On Sun, Nov 8, 2009 at 11:48 AM, Joe Auty j...@netmusician.org mailto:j...@netmusician.org wrote: Tim Cook wrote: It appears that one can get more in the way of features out of VMWare Server for free than with ESX, which is seemingly a hook into buying more VMWare stuff. I've never looked at Sun xVM, in fact I didn't know it even existed, but I do now. Thank you, I will research this some more! The only other variable, I guess, is the future of said technologies given the Oracle takeover? There has been much discussion on how this impacts ZFS, but I'll have to learn how xVM might be affected, if at all. Quite frankly, I wouldn't let that stop you. Even if Oracle were to pull the plug on xVM entirely (not likely), you could very easily just move the VM's back over to *insert your favorite flavor of Linux* or Citrix Xen. Including Unbreakable Linux (Oracle's version of RHEL). I remember now why Xen was a no-go from when I last tested it. I rely on the 64 bit version of FreeBSD for most of my VM guest machines, and FreeBSD only supports running as domU on i386 systems. This is a monkey wrench! Sorry, just thinking outloud here... I have no idea what it supports right now. I can't even find a decent support matrix. Quite frankly, I would (and do) just use a separate server for the fileserver than the vm box. You can get 64bit cpu's with 4GB of ram for awfully cheap nowadays. That should be more than enough for most home workloads. --Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org mailto:zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- Joe Auty NetMusician: web publishing software for musicians http://www.netmusician.org j...@netmusician.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zpool resilver - error history
Am 09/11/2009 09:57 schrieb Thomas Maier-Komor unter tho...@maier-komor.de: Marcel Gschwandl schrieb: Hi all! I'm running a Solaris 10 Update 6 (10/08) system and had to resilver a zpool. It's now showing snip scrub: resilver completed after 9h0m with 21 errors on Wed Nov 4 22:07:49 2009 /snip but I haven't found an option to see what files where affected, Is there any way to do that? Thanks in advance Marcel Try zpool status -v poolname I already tried that, it only gives me snip errors: No known data errors /snip During the resilver it showed me some files but not after finishing it. Thanks anyway smime.p7s Description: S/MIME cryptographic signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Accidentally mixed-up disks in RAIDZ
No, Chris, I didn't export the pool becasue I didn't expect this to happen. It's an excellent suggestion, so I'll try it when I get my hands on the machine. Thank you. Leandro. De: Chris Murray chrismurra...@gmail.com Para: Leandro Vanden Bosch l_vbo...@yahoo.com.ar Enviado: sábado, 7 de noviembre, 2009 19:13:33 Asunto: RE: [zfs-discuss] Accidentally mixed-up disks in RAIDZ Did you export the pool before unplugging the drives? I've had occasions in the past where ZFS does get mixed-up if the machine is powered up with the drives of a currently imported pool, in the wrong order. The solution in the end was to power up without the drives, export the pool, and then import. Chris From:zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-boun...@opensolaris.org] On Behalf Of Leandro Vanden Bosch Sent: 07 November 2009 20:28 To: Tim Cook Cc: zfs-discuss@opensolaris.org Subject: Re: [zfs-discuss] Accidentally mixed-up disks in RAIDZ Thanks Tim for your answer! I'll try it in afew hours and post the outcome. Regards, Leandro. De:Tim Cook t...@cook.ms Para: Leandro Vanden Bosch l_vbo...@yahoo.com.ar CC: zfs-discuss@opensolaris.org Enviado: sábado, 7 de noviembre, 2009 16:40:55 Asunto: Re: [zfs-discuss] Accidentally mixed-up disks in RAIDZ On Sat, Nov 7, 2009 at 1:38 PM, Leandro Vanden Bosch l_vbo...@yahoo.com.ar wrote: Hello to you all, Here's the situation: While doing a case replacement in my home storage server I accidentally removed the post-it with the disk number from my three 1TB disks before connecting them back to the corresponding SATA connector. The issue now is that I don't know in which order they should be connected. Do any of you know how can I _safely_ bring the zpool on-line? I didn't plugged'em in yet becasue I'm afraid of losing some valueable personal information. Thanks in advance. Leandro. Of course, it doesn't matter which drive is plugged in where. When you import a pool, zfs scans the headers of each disk to verify if they're part of a pool or not, and if they are, does the import. --Tim Encontra las mejores recetas con Yahoo! Cocina. http://ar.mujer.yahoo.com/cocina/ Yahoo! Cocina Encontra las mejores recetas con Yahoo! Cocina. http://ar.mujer.yahoo.com/cocina/___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] PSARC recover files?
This new PSARC putback that allows to rollback to an earlier valid uber block is good. This immediately raises a question: could we use this PSARC functionality to recover deleted files? Or some variation? I dont need that functionality now, but I am just curious... -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zpool resilver - error history
On Mon, 9 Nov 2009, Gschwandl Marcel HSLU TA wrote: zpool status -v poolname I already tried that, it only gives me snip errors: No known data errors /snip Errors do not necessarily cause data loss. For example, there may have been sufficient redundancy that the error was able to be automatically repaired and so there was no data loss. Metadata always has a redundant copy, and if you are using something like raidz2, then your data still has a redundant copy while resilvering a disk. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] can't delete a zpool
OpenSolaris 2009.06 I have a ST2540 Fiber Array directly attached to a X4150. There is a zpool on the fiber device. The zpool went into a faulted state, but I can't seem to get it back via scrub or even delete it? Do I have to re-install the entire OS if I want to use that device again? Thanks, Mike # zpool list NAMESIZE USED AVAILCAP HEALTH ALTROOT fc-disk- - - - FAULTED - rpool68G 23.3G 44.7G34% ONLINE - scsi-disk 544G97K 544G 0% ONLINE - # zpool status fc-disk pool: fc-disk state: UNAVAIL status: One or more devices could not be used because the label is missing or invalid. There are insufficient replicas for the pool to continue functioning. action: Destroy and re-create the pool from a backup source. see: http://www.sun.com/msg/ZFS-8000-5E scrub: none requested config: NAME STATE READ WRITE CKSUM fc-disk UNAVAIL 0 0 0 insufficient replicas c0t600A0B8000389BC904524A7AF4BAd0 UNAVAIL 0 0 0 corrupted data # zpool destroy fc-disk internal error: Invalid argument Abort (core dumped) r...@vdi-storage:~# ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] PSARC recover files?
frequent snapshots offer outstanding oops protection. Rob ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] can't delete a zpool
I had the same problem recently onb125. I had a one disc zpool Movies. And shutdown the computer. Removed the disc Movies and inserted another one disc zpool Misc. Booted and imported the Misc zpool. But the Movies zpool showed exactly the same behaviour as you report. The Movies zpool would not be imported, nor destroyed. I dont remember how I solved the problem, but I think I inserted the Movies zpool disc again and then exported it, before removed the disc. Or something similar. Maybe you could try to dd the disc with zeroes and then create a new zpool? -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] PSARC recover files?
Maybe to create snapshots after the fact as a part of some larger disaster recovery effort. (What did my pool/file-system look like at 10am?... Say 30-minutes before the database barffed on itself...) With some enhancements might this functionality be extendable into a poor man's CDP offering that won't protect against (non-redundant) hardware failures, but can provide some relieve in App/Human creativity. Seems like one of those things you never really need... Until you have to that one time, at which point nothing else will do. One would think that using zdb and friends it might be possible to walk the chain of tx-logs backwards and each good/whole one could be a valid recover/reset-point. -- This raises a more fundamental question that perhaps someone can comment on. Does ZFS's COW follow a fairly strict last released-block, last overwritten model (keeping a maximum buffer of in tact data), or do previously used blocks get overwritten largely based on block/physical location, fragmentation/best-fit, etc?). In cases of blank disks/LUNs, does for instance a 1TB drive get completely COW-ed onto its blank-space, or does zfs re-use previously used (and freed) space before burning through then entire disk-space? Thanks, -- MikeE -Original Message- From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-boun...@opensolaris.org] On Behalf Of Orvar Korvar Sent: Monday, November 09, 2009 8:36 AM To: zfs-discuss@opensolaris.org Subject: [zfs-discuss] PSARC recover files? This new PSARC putback that allows to rollback to an earlier valid uber block is good. This immediately raises a question: could we use this PSARC functionality to recover deleted files? Or some variation? I dont need that functionality now, but I am just curious... -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] MPxIO and removing physical devices
I'm not sure if this is exactly what you're looking for but check out the work around in this bug: http://bugs.opensolaris.org/view_bug.do;jsessionid=9011b9dacffa0b615db182bbcd7b?bug_id=6559281 Basically Look through cfgadm -al and run the following command on the unusable attachment points, Example: cfgadm -o unusable_FCP_dev -c unconfigure c2::5005076801400525 You might also try the Storage-Discuss list. -Alex -Original Message- From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-boun...@opensolaris.org] On Behalf Of Karl Katzke Sent: Tuesday, November 03, 2009 3:11 PM To: zfs-discuss@opensolaris.org Subject: [zfs-discuss] MPxIO and removing physical devices I am a bit of a Solaris newbie. I have a brand spankin' new Solaris 10u8 machine (x4250) that is running an attached J4400 and some internal drives. We're using multipathed SAS I/O (enabled via stmsboot), so the device mount points have been moved off from their normal c0t5d0 to long strings -- in the case of c0t5d0, it's now /dev/rdsk/c6t5000CCA00A274EDCd0. (I can see the cross-referenced devices with stmsboot -L.) Normally, when replacing a disk on a Solaris system, I would run cfgadm -c unconfigure c0::dsk/c0t5d0. However, cfgadm -l does not list c6, nor does it list any disks. In fact, running cfgadm against the places where I think things are supposed to live gets me the following: bash# cfgadm -l /dev/rdsk/c0t5d0 Ap_Id Type Receptacle Occupant Condition /dev/rdsk/c0t5d0: No matching library found bash# cfgadm -l /dev/rdsk/c6t5000CCA00A274EDCd0 cfgadm: Attachment point not found bash# cfgadm -l /dev/dsk/c6t5000CCA00A274EDCd0 Ap_Id Type Receptacle Occupant Condition /dev/dsk/c6t5000CCA00A274EDCd0: No matching library found bash# cfgadm -l c6t5000CCA00A274EDCd0 Ap_Id Type Receptacle Occupant Condition c6t5000CCA00A274EDCd0: No matching library found I ran devfsadm -C -v and it removed all of the old attachment points for the /dev/dsk/c0t5d0 devices and created some for the c6 devices. Running cfgadm -al shows a c0, c4, and c5 -- these correspond to the actual controllers, but no devices are attached to the controllers. I found an old email on this list about MPxIO that said the solution was basically to yank the physical device after making sure that no I/O was happening to it. While this worked and allowed us to return the device to service as a spare in the zpool it inhabits, more concerning was what happened when we ran mpathadm list lu after yanking the device and returning it to service: -- bash# mpathadm list lu /dev/rdsk/c6t5000CCA00A2A9398d0s2 Total Path Count: 1 Operational Path Count: 1 /dev/rdsk/c6t5000CCA00A29EE2Cd0s2 Total Path Count: 1 Operational Path Count: 1 /dev/rdsk/c6t5000CCA00A2BDBFCd0s2 Total Path Count: 1 Operational Path Count: 1 /dev/rdsk/c6t5000CCA00A2A8E68d0s2 Total Path Count: 1 Operational Path Count: 1 /dev/rdsk/c6t5000CCA00A0537ECd0s2 Total Path Count: 1 Operational Path Count: 1 mpathadm: Error: Unable to get configuration information. mpathadm: Unable to complete operation (Side note: Some of the disks are single path via an internal controller, and some of them are multi path in the J4400 via two external controllers.) A reboot fixed the 'issue' with mpathadm and it now outputs complete data. So -- how do I administer and remove physical devices that are in multipath-managed controllers on Solaris 10u8 without breaking multipath and causing configuration changes that interfere with the services and devices attached via mpathadm and the other voodoo and black magic inside? I can't seem to find this documented anywhere, even if the instructions to enable multipathing with stmsboot -e were quite complete and worked well! Thanks, Karl Katzke -- Karl Katzke Systems Analyst II TAMU - RGS ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] PSARC recover files?
Maybe to create snapshots after the fact how does one quiesce a drive after the fact? ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] RAID-Z and virtualization
On 8-Nov-09, at 12:20 PM, Joe Auty wrote: Tim Cook wrote: On Sun, Nov 8, 2009 at 2:03 AM, besson3c j...@netmusician.org wrote: ... Why not just convert the VM's to run in virtualbox and run Solaris directly on the hardware? That's another possibility, but it depends on how Virtualbox stacks up against VMWare Server. At this point a lot of planning would be necessary to switch to something else, although this is possibility. How would Virtualbox stack up against VMWare Server? Last I checked it doesn't have a remote console of any sort, which would be a deal breaker. Can I disable allocating virtual memory to Virtualbox VMs? Can I get my VMs to auto boot in a specific order at runlevel 3? Can I control my VMs via the command line? Yes you certainly can. Works well, even for GUI based guests, as there is vm-level VRDP (VNC/Remote Desktop) access as well as whatever remote access the guest provides. I thought Virtualbox was GUI only, designed for Desktop use primarily? Not at all. Read up on VBoxHeadless. --Toby This switch will only make sense if all of this points to a net positive. --Tim -- Joe Auty NetMusician: web publishing software for musicians http://www.netmusician.org j...@netmusician.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] PSARC recover files?
+-- | On 2009-11-09 12:18:04, Ellis, Mike wrote: | | Maybe to create snapshots after the fact as a part of some larger disaster recovery effort. | (What did my pool/file-system look like at 10am?... Say 30-minutes before the database barffed on itself...) | | With some enhancements might this functionality be extendable into a poor man's CDP offering that won't protect against (non-redundant) hardware failures, but can provide some relieve in App/Human creativity. Alternatively, you can write a cronjob/service that takes snapshots of your important filesystems. I take hourly snaps of our all our homedirs, and five-minute snaps of our database volumes (InnoDB and Postgres both recover adequately; I have used these snaps to build recovery zones to pull accidentally deleted data from before; good times). Look at OpenSolaris' Time Slider service, although writing something that does this is pretty trivial (we use a Perl program with YAML configs launched by cron every minute). My one suggestion would be to ensure the automatically taken snaps have a unique name (@auto, or whatever), so you can do bulk expiry tomorrow or next week without worry. Cheers. -- bda cyberpunk is dead. long live cyberpunk. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ..and now ZFS send dedupe
More ZFS goodness putback before close of play for snv_128. http://mail.opensolaris.org/pipermail/onnv-notify/2009-November/010768.html http://hg.genunix.org/onnv-gate.hg/rev/216d8396182e Regards Nigel Smith -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] marvell88sx2 driver build126
Hi, I can't find any bug-related issues with marvell88sx2 in b126. I looked over Dave Hollister's shoulder while he searched for marvell in his webrevs of this putback and nothing came up: driver change with build 126? not for the SATA framework, but for HBAs there is: http://hub.opensolaris.org/bin/view/Community+Group+on/2009093001 I will find a thumper, load build 125, create a raidz pool, and upgrade to b126. I'll also send the error messages that Tim provided to someone who works in the driver group. Thanks, Cindy On 11/07/09 14:33, Orvar Korvar wrote: I saw the same checksum error problem when I booted into b126. I havent dared try b126 again, I use b125 now, without problems. Here is my hardware Intel Q9450 + P45 Gigabyte EP45-DS3P motherboard + Ati 4850 I have the same AOC SATA controller card. And some Samsung Spinpoint F1, 1TB drives. Brand new. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ..and now ZFS send dedupe
On Mon, Nov 9, 2009 at 12:45 PM, Nigel Smith nwsm...@wilusa.freeserve.co.uk wrote: More ZFS goodness putback before close of play for snv_128. http://mail.opensolaris.org/pipermail/onnv-notify/2009-November/010768.html http://hg.genunix.org/onnv-gate.hg/rev/216d8396182e Regards Nigel Smith -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss Are these recent developments due to help/support from Oracle? Or is it business as usual for ZFS developments? -- Brent Jones br...@servuhome.net ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ..and now ZFS send dedupe
On 11/09/09 12:58, Brent Jones wrote: Are these recent developments due to help/support from Oracle? No. Or is it business as usual for ZFS developments? Yes. - Eric -- Eric Schrock, Fishworkshttp://blogs.sun.com/eschrock ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ..and now ZFS send dedupe
Interesting stuff. By the way, is there a place to watch lated news like this on zfs/opensolaris? rss maybe? -- Roman -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ..and now ZFS send dedupe
Roman Naumenko wrote: Interesting stuff. By the way, is there a place to watch lated news like this on zfs/opensolaris? rss maybe? You could subscribe to onnv-not...@opensolaris.org... James C. McPherson -- Senior Kernel Software Engineer, Solaris Sun Microsystems http://blogs.sun.com/jmcp http://www.jmcp.homeunix.com/blog ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs inotify?
I'd hoped this script would work for me as a snapshot diff script, but it seems that bart doesn't play well with large filesystems (don't know the cutoff, but my zfs pools (other than rpool) are all well over 4TB). 'bart create' fails immediately with a Value too large for defined data type error, and this is in fact mentioned in the Solaris 10 10/09 release notes: Possible Error With 32-bit Applications Getting File System State on Large File Systems (6468905) When run on large file systems, for example ZFS, applications using statvfs(2) or statfs(2) to get information about the state of the file system exhibit an error. The following error message is displayed: Value too large for defined data type Workaround: Applications should use statvfs64() instead. from http://docs.sun.com/app/docs/doc/821-0381/gdzmr?l=ena=view and in fact, if I invoke bart via truss, I see it calls statvfs() and fails. Way to keep up with the times, Sun! Is there a 64-bit version of bart, or a better recommendation for comparing snapshots? My current backup strategy uses rsync, which I'd like to replace with zfs send/receive, but I need a way to see what changed in the past day. Thanks, Andrew Daugherity Systems Analyst Division of Research Graduate Studies Texas AM University Trevor Pretty trevor_pre...@eagle.co.nz 10/26/2009 5:16 PM Paul Being a script hacker like you the only kludge I can think of. A script that does something like ls /tmp/foo sleep ls /tmp/foo.new diff /tmp/foo /tmp/foo.new /tmp/files_that_have_changed mv /tmp/foo.new /tmp/foo Or you might be able to knock something up with bart nd zfs snapshots. I did write this which may help? #!/bin/sh #set -x # Note: No implied warranty etc. applies. # Don't cry if it does not work. I'm an SE not a programmer! # ### # # Version 29th Jan. 2009 # # GOAL: Show what files have changed between snapshots # # But of course it could be any two directories!! # ### # ## Set some variables # SCRIPT_NAME=$0 FILESYSTEM=$1 SNAPSHOT=$2 FILESYSTEM_BART_FILE=/tmp/filesystem.$$ SNAPSHOT_BART_FILE=/tmp/snapshot.$$ CHANGED_FILES=/tmp/changes.$$ ## Declare some commands (just in case PATH is wrong, like cron) # BART=/bin/bart ## Usage # Usage() { echo echo echo Usage: $SCRIPT_NAME -q filesystem snapshot echo echo -q will stop all echos and just list the changes echo echo Examples echo $SCRIPT_NAME /home/fred/home/.zfs/snapshot/fred echo $SCRIPT_NAME . /home/.zfs/snapshot/fred echo echo exit 1 } ### Main Part ### ## Check Usage # if [ $# -ne 2 ]; then Usage fi ## Check we have different directories # if [ $1 = $2 ]; then Usage fi ## Handle dot # if [ $FILESYSTEM = . ]; then cd $FILESYSTEM ; FILESYSTEM=`pwd` fi if [ $SNAPSHOT = . ]; then cd $SNAPSHOT ; SNAPSHOT=`pwd` fi ## Check the filesystems exists It should be a directory # and it should have some files # for FS in $FILESYSTEM $SNAPSHOT do if [ ! -d $FS ]; then echo echo ERROR file system $FS does not exist echo exit 1 fi if [ X`/bin/ls $FS` = X ]; then echo echo ERROR file system $FS seems to be empty exit 1 echo fi done ## Create the bart files # echo echo Creating bart file for $FILESYSTEM can take a while.. cd $FILESYSTEM ; $BART create -R . $FILESYSTEM_BART_FILE echo echo Creating bart file for $SNAPSHOT can take a while.. cd $SNAPSHOT ; $BART create -R . $SNAPSHOT_BART_FILE ## Compare them and report the diff # echo echo Changes echo $BART compare -p $FILESYSTEM_BART_FILE $SNAPSHOT_BART_FILE | awk '{print $1}' $CHANGED_FILES /bin/more $CHANGED_FILES echo echo echo ## Tidy kiwi # /bin/rm $FILESYSTEM_BART_FILE /bin/rm $SNAPSHOT_BART_FILE /bin/rm $CHANGED_FILES exit 0 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ..and now ZFS send dedupe
Roman, I like to check here for recent putbacks: http://hg.genunix.org/onnv-gate.hg/shortlog To see new cases: http://arc.opensolaris.org/caselog/PSARC/ Also, to see what should appear in upcoming builds (although not recently updated): http://hub.opensolaris.org/bin/view/Community+Group+on/flag-days Enjoy... -cheers, CSB -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs inotify?
Andrew Daugherity wrote: if I invoke bart via truss, I see it calls statvfs() and fails. Way to keep up with the times, Sun! % file /bin/truss /bin/amd64/truss /bin/truss: ELF 32-bit LSB executable 80386 Version 1 [FPU], dynamically linked, not stripped, no debugging information available /bin/amd64/truss: ELF 64-bit LSB executable AMD64 Version 1 [SSE2 SSE FXSR CMOV FPU], dynamically linked, not stripped, no debugging information available Rob T ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ..and now ZFS send dedupe
Craig S. Bell wrote: Roman, I like to check here for recent putbacks: http://hg.genunix.org/onnv-gate.hg/shortlog To see new cases: http://arc.opensolaris.org/caselog/PSARC/ Also, to see what should appear in upcoming builds (although not recently updated): http://hub.opensolaris.org/bin/view/Community+Group+on/flag-days The flag days page has not been updated since the switch to XWiki, it's on my todo list but I don't have an ETA for when it'll be done. James C. McPherson -- Senior Kernel Software Engineer, Solaris Sun Microsystems http://blogs.sun.com/jmcp http://www.jmcp.homeunix.com/blog ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ..and now ZFS send dedupe
Roman Naumenko wrote: James C. McPherson wrote, On 09-11-09 04:40 PM: Roman Naumenko wrote: Interesting stuff. By the way, is there a place to watch lated news like this on zfs/opensolaris? rss maybe? You could subscribe to onnv-not...@opensolaris.org... James C. McPherson -- Senior Kernel Software Engineer, Solaris Sun Microsystems http://blogs.sun.com/jmcphttp://www.jmcp.homeunix.com/blog Thanks, James. What is the subscription process? Just to send email? http://mail.opensolaris.org/mailman/listinfo/onnv-notify covers what's necessary (and I see you found it already). cheers, James C. McPherson -- Senior Kernel Software Engineer, Solaris Sun Microsystems http://blogs.sun.com/jmcp http://www.jmcp.homeunix.com/blog ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] CR6894234 -- improved sgid directory compatibility with non-Solaris NFS clients
On Fri, 6 Nov 2009, James Andrewartha wrote: How about attacking it the other way? Sign the SCA, get a sponsor and put the fix into OpenSolaris, then sustaining just have to backport it. http://hub.opensolaris.org/bin/view/Main/participate Do you mean the samba bug or the NFS bug? For the samba bug, I've already submitted a patch to fix the problem. For the NFS bug, while I have in the past pursued such options with open-source software, considering Solaris 10 is a commercial product for which we're paying a fairly substantial cost on for support, I'd really prefer they fix it themselves... Also, since you know it's a NFS server issue now, have you tried asking on nfs-discuss? Yup: http://opensolaris.org/jive/thread.jspa?messageID=430745 No responses... -- Paul B. Henson | (909) 979-6361 | http://www.csupomona.edu/~henson/ Operating Systems and Network Analyst | hen...@csupomona.edu California State Polytechnic University | Pomona CA 91768 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs inotify?
On Mon, Nov 09, 2009 at 03:25:02PM -0700, Robert Thurlow wrote: Andrew Daugherity wrote: if I invoke bart via truss, I see it calls statvfs() and fails. Way to keep up with the times, Sun! % file /bin/truss /bin/amd64/truss /bin/truss: ELF 32-bit LSB executable 80386 Version 1 [FPU], dynamically linked, not stripped, no debugging information available /bin/amd64/truss: ELF 64-bit LSB executable AMD64 Version 1 [SSE2 SSE FXSR CMOV FPU], dynamically linked, not stripped, no debugging information available I'm pretty sure he means that 'bart' is failing, not truss. /bin/truss is just a link to /usr/lib/isaexec, which will run the 64-bit version when appropriate. -- Darren ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs inotify?
On Nov 9, 2009, at 2:06 PM, Andrew Daugherity wrote: I'd hoped this script would work for me as a snapshot diff script, but it seems that bart doesn't play well with large filesystems (don't know the cutoff, but my zfs pools (other than rpool) are all well over 4TB). 'bart create' fails immediately with a Value too large for defined data type error, and this is in fact mentioned in the Solaris 10 10/09 release notes: Possible Error With 32-bit Applications Getting File System State on Large File Systems (6468905) When run on large file systems, for example ZFS, applications using statvfs(2) or statfs(2) to get information about the state of the file system exhibit an error. The following error message is displayed: Value too large for defined data type Workaround: Applications should use statvfs64() instead. from http://docs.sun.com/app/docs/doc/821-0381/gdzmr?l=ena=view and in fact, if I invoke bart via truss, I see it calls statvfs() and fails. Way to keep up with the times, Sun! Is there a 64-bit version of bart, or a better recommendation for comparing snapshots? My current backup strategy uses rsync, which I'd like to replace with zfs send/receive, but I need a way to see what changed in the past day. find /filesystem -mtime -1 -- richard Thanks, Andrew Daugherity Systems Analyst Division of Research Graduate Studies Texas AM University Trevor Pretty trevor_pre...@eagle.co.nz 10/26/2009 5:16 PM Paul Being a script hacker like you the only kludge I can think of. A script that does something like ls /tmp/foo sleep ls /tmp/foo.new diff /tmp/foo /tmp/foo.new /tmp/files_that_have_changed mv /tmp/foo.new /tmp/foo Or you might be able to knock something up with bart nd zfs snapshots. I did write this which may help? #!/bin/sh #set -x # Note: No implied warranty etc. applies. # Don't cry if it does not work. I'm an SE not a programmer! # ### # # Version 29th Jan. 2009 # # GOAL: Show what files have changed between snapshots # # But of course it could be any two directories!! # ### # ## Set some variables # SCRIPT_NAME=$0 FILESYSTEM=$1 SNAPSHOT=$2 FILESYSTEM_BART_FILE=/tmp/filesystem.$$ SNAPSHOT_BART_FILE=/tmp/snapshot.$$ CHANGED_FILES=/tmp/changes.$$ ## Declare some commands (just in case PATH is wrong, like cron) # BART=/bin/bart ## Usage # Usage() { echo echo echo Usage: $SCRIPT_NAME -q filesystem snapshot echo echo -q will stop all echos and just list the changes echo echo Examples echo $SCRIPT_NAME /home/fred/home/.zfs/snapshot/fred echo $SCRIPT_NAME . /home/.zfs/snapshot/fred echo echo exit 1 } ### Main Part ### ## Check Usage # if [ $# -ne 2 ]; then Usage fi ## Check we have different directories # if [ $1 = $2 ]; then Usage fi ## Handle dot # if [ $FILESYSTEM = . ]; then cd $FILESYSTEM ; FILESYSTEM=`pwd` fi if [ $SNAPSHOT = . ]; then cd $SNAPSHOT ; SNAPSHOT=`pwd` fi ## Check the filesystems exists It should be a directory # and it should have some files # for FS in $FILESYSTEM $SNAPSHOT do if [ ! -d $FS ]; then echo echo ERROR file system $FS does not exist echo exit 1 fi if [ X`/bin/ls $FS` = X ]; then echo echo ERROR file system $FS seems to be empty exit 1 echo fi done ## Create the bart files # echo echo Creating bart file for $FILESYSTEM can take a while.. cd $FILESYSTEM ; $BART create -R . $FILESYSTEM_BART_FILE echo echo Creating bart file for $SNAPSHOT can take a while.. cd $SNAPSHOT ; $BART create -R . $SNAPSHOT_BART_FILE ## Compare them and report the diff # echo echo Changes echo $BART compare -p $FILESYSTEM_BART_FILE $SNAPSHOT_BART_FILE | awk '{print $1}' $CHANGED_FILES /bin/more $CHANGED_FILES echo echo echo ## Tidy kiwi # /bin/rm $FILESYSTEM_BART_FILE /bin/rm $SNAPSHOT_BART_FILE /bin/rm $CHANGED_FILES exit 0 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS + fsck
On Thu Nov 5 14:38:13 PST 2009, Gary Mills wrote: It would be nice to see this information at: http://hub.opensolaris.org/bin/view/Community+Group+on/126-130 but it hasn't changed since 23 October. Well it seems we have an answer: http://mail.opensolaris.org/pipermail/zfs-discuss/2009-November/033672.html On Mon Nov 9 14:26:54 PST 2009, James C. McPherson wrote: The flag days page has not been updated since the switch to XWiki, it's on my todo list but I don't have an ETA for when it'll be done. Perhaps anyone interested in seeing the flags days page resurrected can petition James to raise the priority on his todo list. Thanks Nigel Smith -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS + fsck
Nigel Smith wrote: On Thu Nov 5 14:38:13 PST 2009, Gary Mills wrote: It would be nice to see this information at: http://hub.opensolaris.org/bin/view/Community+Group+on/126-130 but it hasn't changed since 23 October. Well it seems we have an answer: http://mail.opensolaris.org/pipermail/zfs-discuss/2009-November/033672.html On Mon Nov 9 14:26:54 PST 2009, James C. McPherson wrote: The flag days page has not been updated since the switch to XWiki, it's on my todo list but I don't have an ETA for when it'll be done. Perhaps anyone interested in seeing the flags days page resurrected can petition James to raise the priority on his todo list. Nigel, *everybody* is interested in the flag days page. Including me. Asking me to raise the priority is not helpful. James C. McPherson -- Senior Kernel Software Engineer, Solaris Sun Microsystems http://blogs.sun.com/jmcp http://www.jmcp.homeunix.com/blog ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs inotify?
Robert Thurlow robert.thur...@sun.com 11/9/2009 4:25 PM % file /bin/truss /bin/amd64/truss /bin/truss: ELF 32-bit LSB executable 80386 Version 1 [FPU], dynamically linked, not stripped, no debugging information available /bin/amd64/truss: ELF 64-bit LSB executable AMD64 Version 1 [SSE2 SSE FXSR CMOV FPU], dynamically linked, not stripped, no debugging information available It doesn't make any difference if I invoke it with the amd64 truss. The only bart binary I can find on the system (Sol 10u8) is /usr/bin/bart, and it definitely calls statvfs(). Truss log follows at the end. I know all about 'find -mtime ...', but that doesn't show which files have been deleted, whereas 'rsync -av --delete --backup-dir=`date +%Y%m%d`' does. (When users delete files and then need them restored a week later, it's very helpful to know which day they were deleted, as I can avoid running a find that could take quite a while. I think incremental zfs snapshots are a better strategy but there are little hurdles like this to be crossed.) bart (or something faster than running 'gdiff -qr snap1 snap2' on a snapshots of a 2.1TB-and-growing FS) seems like a great idea, if I could find a working tool. It looks like dircmp(1) might be a possibility, but I'm open to suggestions. I suppose I could use something like AIDE or tripwire, although that seems a bit like swatting a fly with a sledgehammer. Thanks, Andrew and...@imsfs-new:~$ /usr/bin/amd64/truss bart create -R /export/ims /tmp/bart-ims execve(/usr/bin/bart, 0x08047D6C, 0x08047D80) argc = 4 mmap(0x, 4096, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_ANON, -1, 0) = 0xFEFF resolvepath(/usr/lib/ld.so.1, /lib/ld.so.1, 1023) = 12 resolvepath(/usr/bin/bart, /usr/bin/bart, 1023) = 13 sysconfig(_CONFIG_PAGESIZE) = 4096 stat64(/usr/bin/bart, 0x08047B00) = 0 open(/var/ld/ld.config, O_RDONLY) Err#2 ENOENT stat64(/lib/libsec.so.1, 0x080473A0) = 0 resolvepath(/lib/libsec.so.1, /lib/libsec.so.1, 1023) = 16 open(/lib/libsec.so.1, O_RDONLY) = 3 mmap(0x0001, 32768, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_ALIGN, 3, 0) = 0xFEFB mmap(0x0001, 143360, PROT_NONE, MAP_PRIVATE|MAP_NORESERVE|MAP_ANON|MAP_ALIGN, -1, 0) = 0xFEF8 mmap(0xFEF8, 50487, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_TEXT, 3, 0) = 0xFEF8 mmap(0xFEF9D000, 11909, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_INITDATA, 3, 53248) = 0xFEF9D000 mmap(0xFEFA, 8296, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANON, -1, 0) = 0xFEFA munmap(0xFEF8D000, 65536) = 0 memcntl(0xFEF8, 8844, MC_ADVISE, MADV_WILLNEED, 0, 0) = 0 close(3)= 0 stat64(/lib/libmd.so.1, 0x080473A0) = 0 resolvepath(/lib/libmd.so.1, /lib/libmd.so.1, 1023) = 15 open(/lib/libmd.so.1, O_RDONLY) = 3 mmap(0xFEFB, 32768, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED, 3, 0) = 0xFEFB mmap(0x0001, 126976, PROT_NONE, MAP_PRIVATE|MAP_NORESERVE|MAP_ANON|MAP_ALIGN, -1, 0) = 0xFEF6 mmap(0xFEF6, 56424, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_TEXT, 3, 0) = 0xFEF6 mmap(0xFEF7E000, 552, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_INITDATA, 3, 57344) = 0xFEF7E000 munmap(0xFEF6E000, 65536) = 0 memcntl(0xFEF6, 1464, MC_ADVISE, MADV_WILLNEED, 0, 0) = 0 close(3)= 0 stat64(/lib/libc.so.1, 0x080473A0)= 0 resolvepath(/lib/libc.so.1, /lib/libc.so.1, 1023) = 14 open(/lib/libc.so.1, O_RDONLY)= 3 mmap(0xFEFB, 32768, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED, 3, 0) = 0xFEFB mmap(0x0001, 1208320, PROT_NONE, MAP_PRIVATE|MAP_NORESERVE|MAP_ANON|MAP_ALIGN, -1, 0) = 0xFEE3 mmap(0xFEE3, 1099077, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_TEXT, 3, 0) = 0xFEE3 mmap(0xFEF4D000, 30183, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_INITDATA, 3, 1101824) = 0xFEF4D000 mmap(0xFEF55000, 4240, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANON, -1, 0) = 0xFEF55000 munmap(0xFEF3D000, 65536) = 0 memcntl(0xFEE3, 124080, MC_ADVISE, MADV_WILLNEED, 0, 0) = 0 close(3)= 0 stat64(/lib/libavl.so.1, 0x080473A0) = 0 resolvepath(/lib/libavl.so.1, /lib/libavl.so.1, 1023) = 16 open(/lib/libavl.so.1, O_RDONLY) = 3 mmap(0xFEFB, 32768, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED, 3, 0) = 0xFEFB mmap(0x0001, 73728, PROT_NONE, MAP_PRIVATE|MAP_NORESERVE|MAP_ANON|MAP_ALIGN, -1, 0) = 0xFEE1 mmap(0xFEE1, 2788, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_TEXT, 3, 0) = 0xFEE1 mmap(0xFEE21000, 204, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_INITDATA, 3, 4096) = 0xFEE21000 munmap(0xFEE11000, 65536) = 0 mmap(0x, 4096, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_ANON, -1, 0)
Re: [zfs-discuss] zfs inotify?
Seems to me that you really want auditing. You can configure the audit system to only record the events you are interested in. http://docs.sun.com/app/docs/doc/816-4557/auditov-1?l=ena=view -- richard On Nov 9, 2009, at 4:55 PM, Andrew Daugherity wrote: Robert Thurlow robert.thur...@sun.com 11/9/2009 4:25 PM % file /bin/truss /bin/amd64/truss /bin/truss: ELF 32-bit LSB executable 80386 Version 1 [FPU], dynamically linked, not stripped, no debugging information available /bin/amd64/truss: ELF 64-bit LSB executable AMD64 Version 1 [SSE2 SSE FXSR CMOV FPU], dynamically linked, not stripped, no debugging information available It doesn't make any difference if I invoke it with the amd64 truss. The only bart binary I can find on the system (Sol 10u8) is /usr/bin/ bart, and it definitely calls statvfs(). Truss log follows at the end. I know all about 'find -mtime ...', but that doesn't show which files have been deleted, whereas 'rsync -av --delete --backup- dir=`date +%Y%m%d`' does. (When users delete files and then need them restored a week later, it's very helpful to know which day they were deleted, as I can avoid running a find that could take quite a while. I think incremental zfs snapshots are a better strategy but there are little hurdles like this to be crossed.) bart (or something faster than running 'gdiff -qr snap1 snap2' on a snapshots of a 2.1TB-and-growing FS) seems like a great idea, if I could find a working tool. It looks like dircmp(1) might be a possibility, but I'm open to suggestions. I suppose I could use something like AIDE or tripwire, although that seems a bit like swatting a fly with a sledgehammer. Thanks, Andrew and...@imsfs-new:~$ /usr/bin/amd64/truss bart create -R /export/ims /tmp/bart-ims execve(/usr/bin/bart, 0x08047D6C, 0x08047D80) argc = 4 mmap(0x, 4096, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE| MAP_ANON, -1, 0) = 0xFEFF resolvepath(/usr/lib/ld.so.1, /lib/ld.so.1, 1023) = 12 resolvepath(/usr/bin/bart, /usr/bin/bart, 1023) = 13 sysconfig(_CONFIG_PAGESIZE) = 4096 stat64(/usr/bin/bart, 0x08047B00) = 0 open(/var/ld/ld.config, O_RDONLY) Err#2 ENOENT stat64(/lib/libsec.so.1, 0x080473A0) = 0 resolvepath(/lib/libsec.so.1, /lib/libsec.so.1, 1023) = 16 open(/lib/libsec.so.1, O_RDONLY) = 3 mmap(0x0001, 32768, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_ALIGN, 3, 0) = 0xFEFB mmap(0x0001, 143360, PROT_NONE, MAP_PRIVATE|MAP_NORESERVE| MAP_ANON|MAP_ALIGN, -1, 0) = 0xFEF8 mmap(0xFEF8, 50487, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED| MAP_TEXT, 3, 0) = 0xFEF8 mmap(0xFEF9D000, 11909, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED| MAP_INITDATA, 3, 53248) = 0xFEF9D000 mmap(0xFEFA, 8296, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED| MAP_ANON, -1, 0) = 0xFEFA munmap(0xFEF8D000, 65536) = 0 memcntl(0xFEF8, 8844, MC_ADVISE, MADV_WILLNEED, 0, 0) = 0 close(3)= 0 stat64(/lib/libmd.so.1, 0x080473A0) = 0 resolvepath(/lib/libmd.so.1, /lib/libmd.so.1, 1023) = 15 open(/lib/libmd.so.1, O_RDONLY) = 3 mmap(0xFEFB, 32768, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED, 3, 0) = 0xFEFB mmap(0x0001, 126976, PROT_NONE, MAP_PRIVATE|MAP_NORESERVE| MAP_ANON|MAP_ALIGN, -1, 0) = 0xFEF6 mmap(0xFEF6, 56424, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED| MAP_TEXT, 3, 0) = 0xFEF6 mmap(0xFEF7E000, 552, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED| MAP_INITDATA, 3, 57344) = 0xFEF7E000 munmap(0xFEF6E000, 65536) = 0 memcntl(0xFEF6, 1464, MC_ADVISE, MADV_WILLNEED, 0, 0) = 0 close(3)= 0 stat64(/lib/libc.so.1, 0x080473A0)= 0 resolvepath(/lib/libc.so.1, /lib/libc.so.1, 1023) = 14 open(/lib/libc.so.1, O_RDONLY)= 3 mmap(0xFEFB, 32768, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED, 3, 0) = 0xFEFB mmap(0x0001, 1208320, PROT_NONE, MAP_PRIVATE|MAP_NORESERVE| MAP_ANON|MAP_ALIGN, -1, 0) = 0xFEE3 mmap(0xFEE3, 1099077, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED| MAP_TEXT, 3, 0) = 0xFEE3 mmap(0xFEF4D000, 30183, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED| MAP_INITDATA, 3, 1101824) = 0xFEF4D000 mmap(0xFEF55000, 4240, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED| MAP_ANON, -1, 0) = 0xFEF55000 munmap(0xFEF3D000, 65536) = 0 memcntl(0xFEE3, 124080, MC_ADVISE, MADV_WILLNEED, 0, 0) = 0 close(3)= 0 stat64(/lib/libavl.so.1, 0x080473A0) = 0 resolvepath(/lib/libavl.so.1, /lib/libavl.so.1, 1023) = 16 open(/lib/libavl.so.1, O_RDONLY) = 3 mmap(0xFEFB, 32768, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED, 3, 0) = 0xFEFB mmap(0x0001, 73728, PROT_NONE, MAP_PRIVATE|MAP_NORESERVE| MAP_ANON|MAP_ALIGN, -1, 0) = 0xFEE1 mmap(0xFEE1, 2788,
[zfs-discuss] Couple questions about ZFS writes and fragmentation
1. Is it true that because block sizes vary (in powers of 2 of course) on each write that there will be very little internal fragmentation? 2. I came upon this statement in a forum post: [i]ZFS uses 128K data blocks by default whereas other filesystems typically use 4K or 8K blocks. This naturally reduces the potential for fragmentation by 32X over 4k blocks.[/i] How is this true? I mean, if you have a 128k default block size and you store a 4k file within that block then you will have a ton of slack space to clear up. 3. Another statement from a post: [i]the seek time for single-user contiguous access is essentially zero since the seeks occur while the application is already busy processing other data. When mirror vdevs are used, any device in the mirror may be used to read the data.[/i] All this is saying that is when you are reading off of one physical device you will already be seeking for the blocks that you need from the other device so the seek time will no longer be an issue right? 4. In terms of where ZFS chooses to write data, is it always going to pick one metaslab and write to only free blocks within that metaslab? Or will it go all over the place? 5. When ZFS looks for a place to write data, does it look somewhere to intelligently see that there are some number of free blocks available within this particular metaslab and if so where is this located? 6. Could anyone clarify this post: [i]ZFS uses a copy-on-write model. Copy-on-write tends to cause fragmentation if portions of existing files are updated. If a large portion of a file is overwritten in a short period of time, the result should be reasonably fragment-free but if parts of the file are updated over a long period of time (like a database) then the file is certain to be fragmented. This is not such a big problem as it appears to be since such files were already typically accessed using random access.[/i] 7. An aside question...I was reading a paper about ZFS and it stated that offsets are something like 8 bytes from the first vdev label. Is there any reason why the storage pool is after 2 vdev labels? Thanks guys -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Couple questions about ZFS writes and fragmentation
On Mon, 9 Nov 2009, Ilya wrote: 2. I came upon this statement in a forum post: [i]ZFS uses 128K data blocks by default whereas other filesystems typically use 4K or 8K blocks. This naturally reduces the potential for fragmentation by 32X over 4k blocks.[/i] How is this true? I mean, if you have a 128k default block size and you store a 4k file within that block then you will have a ton of slack space to clear up. Short files are given a short block. Files larger than 128K are diced into 128K blocks, but the last block may be shorter. The fragmentation discussed is fragmentation at the file level. 3. Another statement from a post: [i]the seek time for single-user contiguous access is essentially zero since the seeks occur while the application is already busy processing other data. When mirror vdevs are used, any device in the mirror may be used to read the data.[/i] All this is saying that is when you are reading off of one physical device you will already be seeking for the blocks that you need from the other device so the seek time will no longer be an issue right? The seek time becomes less of an issue for sequential reads if blocks are read from different disks, and the reads are scheduled in advance. It still consumes drive IOPS if the disk needs to seek. 6. Could anyone clarify this post: [i]ZFS uses a copy-on-write model. Copy-on-write tends to cause fragmentation if portions of existing files are updated. If a large portion of a file is overwritten in a short period of time, the result should be reasonably fragment-free but if parts of the file are updated over a long period of time (like a database) then the file is certain to be fragmented. This is not such a big problem as it appears to be since such files were already typically accessed using random access.[/i] The point here is that zfs buffers unwritten data in memory for up to 30 seconds. With a large amount of buffered data, zfs is able to write the data in a more sequential and better-optimized fashion, while wasting fewer IOPS. Databases usually use random I/O and synchronous writes, which tends to scramble the data layout on disk with a copy-on-write model. Zfs is not optimized for database performance. On the other hand, the copy-on-write model reduces the chance of database corruption if there is a power failure or system crash. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] CIFS crashes when accessed with Adobe Photoshop Elements 6.0 via Vista
I have a repeatable test case for this indecent.Every time I access my ZFS cifs shared file system with Adobe Photoshop elements 6.0 via my Vista workstation the OpenSolaris server stops serving CIFS. The share functions as expected for all other CIFS operations. -Begin Configuration Data- -scotts:zelda# cat /etc/release OpenSolaris 2009.06 snv_111b X86 Copyright 2009 Sun Microsystems, Inc. All Rights Reserved. Use is subject to license terms. Assembled 07 May 2009 -scotts:zelda# uname -a SunOS zelda 5.11 snv_111b i86pc i386 i86pc -scotts:zelda# -scotts:zelda# prtdiag System Configuration: IBM IBM eServer 325 -[8835W11]- BIOS Configuration: IBM IBM BIOS Version 1.36 -[M1E136AUS-1.36]- 01/19/05 BMC Configuration: IPMI 1.5 (KCS: Keyboard Controller Style) Processor Sockets Version Location Tag -- Opteron CPU0-Socket 940 Opteron CPU1-Socket 940 Memory Device Sockets TypeStatus Set Device Locator Bank Locator --- -- --- --- DRAMin use 1 DDR1Bank 0 DRAMin use 1 DDR2Bank 0 DRAMin use 2 DDR3Bank 1 DRAMin use 2 DDR4Bank 1 DRAMin use 3 DDR5Bank 2 DRAMin use 3 DDR6Bank 2 On-Board Devices = Upgradeable Slots ID StatusType Description --- - 1 in usePCI-XPCI-X Slot 1 2 available PCI-XPCI-X Slot 2 -scotts:zelda# zpool status pool: ary01 state: ONLINE scrub: none requested config: NAMESTATE READ WRITE CKSUM ary01 ONLINE 0 0 0 raidz1ONLINE 0 0 0 c5t8d0 ONLINE 0 0 0 c5t5d0 ONLINE 0 0 0 c5t4d0 ONLINE 0 0 0 c5t3d0 ONLINE 0 0 0 c5t2d0 ONLINE 0 0 0 c5t1d0 ONLINE 0 0 0 c5t0d0 ONLINE 0 0 0 c6t8d0 ONLINE 0 0 0 c6t5d0 ONLINE 0 0 0 c6t4d0 ONLINE 0 0 0 c6t3d0 ONLINE 0 0 0 c6t2d0 ONLINE 0 0 0 spares c6t1d0AVAIL errors: No known data errors pool: rpool state: ONLINE scrub: none requested config: NAMESTATE READ WRITE CKSUM rpool ONLINE 0 0 0 c3d0s0ONLINE 0 0 0 errors: No known data errors -scotts:zelda# zfs get all ary01/media NAME PROPERTY VALUE SOURCE ary01/media type filesystem - ary01/media creation Fri Jul 11 23:24 2008 - ary01/media used 347G - ary01/media available 1.09T - ary01/media referenced 344G - ary01/media compressratio 1.00x - ary01/media mountedyes- ary01/media quota none default ary01/media reservationnone default ary01/media recordsize 128K default ary01/media mountpoint /shared_media local ary01/media sharenfs on local ary01/media checksum on default ary01/media compressionoffdefault ary01/media atime on default ary01/media deviceson default ary01/media exec on default ary01/media setuid on default ary01/media readonly offdefault ary01/media zoned offlocal ary01/media snapdirvisiblelocal ary01/media aclmodegroupmask default ary01/media aclinherit restricted default ary01/media canmount on default ary01/media shareiscsi offdefault ary01/media xattr on default ary01/media copies 1 default ary01/media version3 - ary01/media utf8only
[zfs-discuss] How to purge bad data from snapshots
So, I had a fun ZFS learning experience a few months ago. A server of mine suddenly dropped off the network, or so it seemed. It was an OpenSolaris 2008.05 box serving up samba shares from a ZFS pool, but it noticed too many checksum errors and so decided it was time to take the pool down so as to save the (apparently) dying disk from further damage. Seemed inconvenient at the time, but a in hindsight that's a cool feature. Haven't actually found any problems with the drive (an SSD), which has worked fine ever since. Bit rot? Power failure (we had a lot of those for a while)? Who knows. At first I was afraid my ZFS pool had corrupted itself until I realized that it was a unique feature of ZFS actually protecting me from further damage rather than ZFS itself being the problem. At any rate, in this case, the corruption managed to make it over to my backup server replicated with SNDR. One of the corrupted blocks happened to be referenced by every single one of my daily snapshots going back nearly a year. I had no mirrored storage and copies set to 1. Arguably a bad setup, I'm sure, but that's why I had a replicated server. At any rate, I didn't care about the file referencing the corrupt block. I would just as well have deleted it, but it was still referenced by all the snapshots. It was a crisis at the time, so I just switched over to my replicated server (in case the drive on the primary server actually was bad) and deleted the files containing corrupt blocks and then deleted all the snapshots so zfs would quit unmounting the pool and just to get going again. Things have been fine ever since, but I still wonder - is there something different that I could have done to get rid of the corrupt blocks without losing all my snapshots (could have r estored them from backup, but it would have taken forever). I guess I could just do clones and then have the capability of deleting stuff, but then I don't believe I'd be able to back the thing up - if I don't do incremental zfs send/recv, the backup takes over 24 hours since there's so many snapshots, and I wouldn't think clones work with incremental zfs send/recv (especially if you start deleting files willy-nilly). Am I just missing something altogether, or is restoring from backup the only option? -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Couple questions about ZFS writes and fragmentation
On Nov 9, 2009, at 6:42 PM, Ilya wrote: 1. Is it true that because block sizes vary (in powers of 2 of course) on each write that there will be very little internal fragmentation? Block size limit (aka recordsize) is in powers of 2. Block sizes are as needed. 2. I came upon this statement in a forum post: [i]ZFS uses 128K data blocks by default whereas other filesystems typically use 4K or 8K blocks. This naturally reduces the potential for fragmentation by 32X over 4k blocks.[/i] How is this true? I mean, if you have a 128k default block size and you store a 4k file within that block then you will have a ton of slack space to clear up. If a file only uses 4 KB, ZFS only allocates 4 KB to the file. 3. Another statement from a post: [i]the seek time for single-user contiguous access is essentially zero since the seeks occur while the application is already busy processing other data. When mirror vdevs are used, any device in the mirror may be used to read the data.[/i] All this is saying that is when you are reading off of one physical device you will already be seeking for the blocks that you need from the other device so the seek time will no longer be an issue right? This comment makes no sense to me. By the time the I/O request is handled by the disk, the relationship to a user is long gone. Also, seeks only apply to HDDs. Either side of a mirror can be used for reading... that part makes sense. 4. In terms of where ZFS chooses to write data, is it always going to pick one metaslab and write to only free blocks within that metaslab? Or will it go all over the place? Yes :-) 5. When ZFS looks for a place to write data, does it look somewhere to intelligently see that there are some number of free blocks available within this particular metaslab and if so where is this located? Yes, of course. 6. Could anyone clarify this post: [i]ZFS uses a copy-on-write model. Copy-on-write tends to cause fragmentation if portions of existing files are updated. If a large portion of a file is overwritten in a short period of time, the result should be reasonably fragment-free but if parts of the file are updated over a long period of time (like a database) then the file is certain to be fragmented. This is not such a big problem as it appears to be since such files were already typically accessed using random access.[/i] YMMV. Allan Packer Neel did a study on the affect of this on MySQL. But some databases COW themselves, so it is not a given that the application will read data sequentially. Video http://www.youtube.com/watch?v=a31NhwzlAxs Slides http://blogs.sun.com/realneel/resource/MySQL_Conference_2009_ZFS_MySQL.pdf 7. An aside question...I was reading a paper about ZFS and it stated that offsets are something like 8 bytes from the first vdev label. Is there any reason why the storage pool is after 2 vdev labels? Historically, the first 8 KB of a slice was used to store the disk label. In the bad old days, people writing applications often did not know this and would clobber the label. So the first 8 KB of the ZFS label is not used to preserve any disk label. The storage pool data starts at an offset of 4 MB, 3.5 MB past the second label. This area is reserved for a boot block. Where did you see it documented as starting after the first two labels? -- richard ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Couple questions about ZFS writes and fragmentation
Wow, this forum is great and uber-fast in response, appreciate the responses, makes sense. Only, what does ZFS do to write to data? Let's say that you want to write x blocks somewhere, is ZFS going to find a pointer to the space map of some metaslab and then write there? Is it going to find a metaslab closest to the outside of the HDD for higher bandwidth? And the label thing, heh, I made a mistake in what I read, you are right. Within the vdev array though, after the storage pool location though, it also showed more vdev labels coming after it (vdev 1, vdev 2, boot block, storage space, vdev 3, vdev4). Would there more vdev labels after #4 or more storage space? Thanks again -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Couple questions about ZFS writes and fragmentation
On Nov 9, 2009, at 9:15 PM, Ilya wrote: Wow, this forum is great and uber-fast in response, appreciate the responses, makes sense. Nothing on TV tonight and all of my stress tests are passing :-) Only, what does ZFS do to write to data? Let's say that you want to write x blocks somewhere, is ZFS going to find a pointer to the space map of some metaslab and then write there? Is it going to find a metaslab closest to the outside of the HDD for higher bandwidth? By default, it does start with the metaslabs on the outer cylinders. But it may also decide to skip to another metaslab. For example, the redundant metadata is spread further away. Similarly, if you have copies=2 or 3, then those will be spatially diverse as well. And the label thing, heh, I made a mistake in what I read, you are right. Within the vdev array though, after the storage pool location though, it also showed more vdev labels coming after it (vdev 1, vdev 2, boot block, storage space, vdev 3, vdev4). Would there more vdev labels after #4 or more storage space? The 4th label (label 3) is at the end, modulo 256 KB. -- richard Thanks again -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss