Re: [zfs-discuss] Oracle DB sequential dump questions
I would look at what size IOs you are doing in each case. I have been playing with a T5240 and got 400Mb/s read and 200Mb/s write speeds with iozone throughput tests on a 6 disk mirror pool, so the box and ZFS can certainly push data around - but that was using 128k blocks. You mention the disks are doing bursts of 50-60M which suggests they have more bandwidth and are not flat out trying to prefetch data. I suspect you might be IOPS bound - if you are doing a serial read then write workload and only doing small blocks to the tape it might lead to higher service times on the tape device hence slowing down your overall read speed. It its LTO-4 try and up your block size as big as you can go - 256k, 512k or higher and maybe use truss on the process to see what read/write sizes its doing. I also found the iosnoop dtrace tool from Brendan Greg's dtrace toolkit to be very helpful in tracking down these sorts of issues. HTH. Cheers, Adrian -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] query: why does zfs boot in 10/08 not support flash archive jumpstart
With much excitement I have been reading the new features coming into Solaris 10 in 10/08 and am eager to start playing with zfs root. However one thing which struck me as strange and somewhat annoying is that it appears in the FAQs and documentation that its not possible to do a ZFS root install using jumpstart and flash archives? I predominantly do my installs using flash archives as it saves massive amounts of time in the install process and gives me consistancy between builds. Really I am just curious why it isnt supported, and what the intention is for supporting it and when? Cheers, Adrian -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS error handling - suggestion
Howdy, I have at several times had issues with consumer grade PC hardware and ZFS not getting along. The problem is not the disks but the fact I dont have ECC and end to end checking on the datapath. What is happening is that random memory errors and bit flips are written out to disk and when read back again ZFS reports it as a checksum failure: pool: myth state: ONLINE status: One or more devices has experienced an error resulting in data corruption. Applications may be affected. action: Restore the file in question if possible. Otherwise restore the entire pool from backup. see: http://www.sun.com/msg/ZFS-8000-8A scrub: none requested config: NAMESTATE READ WRITE CKSUM mythONLINE 0 048 raidz1ONLINE 0 048 c7t1d0 ONLINE 0 0 0 c7t3d0 ONLINE 0 0 0 c6t1d0 ONLINE 0 0 0 c6t2d0 ONLINE 0 0 0 errors: Permanent errors have been detected in the following files: /myth/tv/1504_20080216203700.mpg /myth/tv/1509_20080217192700.mpg Note there are no disk errors, just entire RAID errors. I get the same thing on a mirror pool where both sides of the mirror have identical errors. All I can assume is that it was corrupted after the checksum was calculated and flushed to disk like that. In the past it was a motherboard capacitor that had popped - but it was enough to generate these errors under load. At any rate ZFS is doing the right thing by telling me - what I dont like is that from that point on I cant convince ZFS to ignore it. The data in question is video files - a bit flip here or there wont matter. But if ZFS reads the affected block it returns and I/O error and until I restore the file I have no option but to try and make the application skip over it. If it was UFS for example I would have never known, but ZFS makes a point of stopping anything using it - understandably, but annoyingly as well. What I would like to see is an option to ZFS in the style of the 'onerror' for UFS i.e the ability to tell ZFS to join fight club - let what doesnt matter truely slide. For example: zfs set erroraction=[iofail|log|ignore] This would default to the current action of iofail but in the event you wanted to try and recover or repair data, you could set log to say generate an FMA event that there is bad checksums, or ignore, to get on with your day. As mentioned, I see this as mostly an option to help repair data after the issue is identified or repaired. Of course its data specific, but if the application can allow it or handle it, why should ZFS get in the way? Just a thought. Cheers, Adrian PS: And yes, I am now buying some ECC memory. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: suggestion: directory promotion to filesystem
thanks for the replies - I imagined it would have been discussed but must have been searching the wrong terms :) Any idea on the timeline or future of zfs split ? Cheers, Adrian This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] suggestion: directory promotion to filesystem
Not sure how technically feasible it is, but something I thought of while shuffling some files around my home server. My poor understanding of ZFS internals is that the entire pool is effectivly a tree structure, with nodes either being data or metadata. Given that, couldnt ZFS just change a directory node to a filesystem with little effort, allowing me do everything ZFS does with filesystems on a subset of my filesystem :) Say you have some filesystems you created early on before you had a good idea of usage. Say for example I made a large share filesystem and started filling it up with photos and movies and some assorted downloads. A few months later I realise it would be so much nicer to be able to snapshot my movies and photos seperatly for backups, instead of doing the whole share. Not hard to work around - zfs create and a mv/tar command and it is done... some time later. If there was say a zfs graft directory newfs command, you could just break of the directory as a new filesystem and away you go - no copying, no risking cleaning up the wrong files etc. Corollary - zfs merge - take a filesystem and merge it into an existing filesystem. Just a thought - any comments welcome. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: pools are zfs file systems?
when playing with ZFS to try and come up with some standards for using it in our environment I also disliked having the pool directory mounted when my intentions were not to us it, but subdivide the space with in it. Simple fix: zpool create data blah zfs create data/share zfs create data/oracle zfs set mountpoint=/export/share data/share zfs set mountpoint=/oracle data/oracle zfs set mountpoint=none data It is semi-clean - you have to remember to set mountpoints if you create any more children under data, and zones tended not to like importing datasets that had the mountpoint set to none. My 2c This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] S10U2: zfs instant hang with dovecot imap server using mmap
Hi, I just upgraded my home box to run Solaris 10 6/06 and converted my previous filesytems over to ZFS, including /var/mail. Previously on S10 FCS I was running dovecot mail server from blastwave.org without issue. On upgrading to Update 2 I have found that the mail server hangs frequently. The imap process cannot be killed, dtraced, pstacked or trussed. After a few goes at dtrace I took a core dump and had a look at that. The stack for the imap process was simply: 0t1550::pid2proc|::walk thread|::findstack -v stack pointer for thread d77f8600: d41e8e2c d41e8e78 0xd41e8e44(2, d41e8f44) d41e8ed8 zfs_write+0x59f(d6a273c0, d41e8f44, 0, d8adee10, 0) d41e8f0c fop_write+0x2d(d6a273c0, d41e8f44, 0, d8adee10, 0) d41e8f8c write+0x29a() d41e8fb4 sys_sysenter+0xdc() Some digging in sunsolve I found a few references to mmap and zfs_write locks - so on a hunch (because fuser previously showed my mailfile as mmaped) I disabled mmap in the dovecot configuration, and I no longer get the deadlocks. I could not find an exact bug in sunsolve - is this a known one or is more work needed? I can provide the core file or SSH access to the box if more analysis is needed. Cheers, Adrian This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: S10U2: zfs instant hang with dovecot imap server using mmap
Oh - and by instant hang I mean I can reproduce it simply by rebooting, enabling the dovecot service and then connecting to dovecot with thunderbird. This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Re: S10U2: zfs instant hang with dovecot imap server using mmap
There are also a number of procmail processes with the following thread stacks 0t1611::pid2proc|::walk thread|::findstack -v stack pointer for thread d8ae1a00: d3bcbbac [ d3bcbbac 0xfe826b37() ] d3bcbbc4 swtch+0x13e() d3bcbbe8 cv_wait_sig+0x119(da58fb4c, d46a8680) d3bcbc00 wait_for_lock+0x30(da58fac8) d3bcbc20 flk_wait_execute_request+0x156(da58fac8) d3bcbc64 flk_process_request+0x4c7(da58fac8) d3bcbd38 reclock+0x3a9(d6a273c0, d3bcbe64, 6, 10a, 202e92c, 0) d3bcbd8c fs_frlock+0x252(d6a273c0, 7, d3bcbe64, 10a, 202e92c, 0) d3bcbdc0 zfs_frlock+0x73(d6a273c0, 7, d3bcbe64, 10a, 202e92c, 0) d3bcbdf8 fop_frlock+0x2c(d6a273c0, 7, d3bcbe64, 10a, 202e92c, 0) d3bcbf8c fcntl+0x95d() d3bcbfb4 sys_sysenter+0xdc() This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss