Re: [zfs-discuss] arc_no_grow is set to 1 and never set back to 0
On 04 January, 2012 - Steve Gonczi sent me these 2,5K bytes: The interesting bit is what happens inside arc_reclaim_needed(), that is, how it arrives at the conclusion that there is memory pressure. Maybe we could trace arg0, which gives the location where we have left the function. This would finger which return path arc_reclaim_needed() took. It's new code, basically comparing the inuse/total/free from kstat -n zfs_file_data, which seems buggered. Steve - Original Message - Well it looks like the only place this get's changed is in the arc_reclaim_thread for opensolaris. I suppose you could dtrace it to see what is going on and investigate what is happening to the return code of the arc_reclaim_needed is. http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/fs/zfs/arc.c#2089 maybe dtrace -n 'fbt:zfs:arc_reclaim_needed:return { trace(arg1) }' Dave /Tomas -- Tomas Forsman, st...@acc.umu.se, http://www.acc.umu.se/~stric/ |- Student at Computing Science, University of Umeå `- Sysadmin at {cs,acc}.umu.se ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] S11 vs illumos zfs compatiblity
if a bug fixed in Illumos is never reported to Oracle by a customer, it would likely never get fixed in Solaris either :-( I would have liked to think that there was some good-will between the ex- and current-members of the zfs team, in the sense that the people who created zfs but then left Oracle still care about it enough to want the Oracle version to be as bug-free as possible. (Obviously I don't expect this to be the case for developers of all software but I think filesystem developers are a special breed!) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] S11 vs illumos zfs compatiblity
On Jan 5, 2012, at 6:53 AM, sol wrote: if a bug fixed in Illumos is never reported to Oracle by a customer, it would likely never get fixed in Solaris either :-( I would have liked to think that there was some good-will between the ex- and current-members of the zfs team, in the sense that the people who created zfs but then left Oracle still care about it enough to want the Oracle version to be as bug-free as possible. There is good-will between the developers. And the ZFS working group has representatives currently employed by Oracle. However, Oracle is a lawnmower. http://www.youtube.com/watch?v=-zRN7XLCRhc (Obviously I don't expect this to be the case for developers of all software but I think filesystem developers are a special breed!) They are! And there are a lot of really cool things happening in the wild as well as behind Oracle's closed doors. -- richard -- ZFS and performance consulting http://www.RichardElling.com illumos meetup, Jan 10, 2012, Menlo Park, CA http://www.meetup.com/illumos-User-Group/events/41665962/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Stress test zfs
Ok. I blew it. I didn't add enough information. Here's some more detail: Disk array is a RAMSAN array, with RAID6 and 8K stripes. I'm measuring performance with the results of the bonnie++ output and comparing with with the the zpool iostat output. It's with the zpool iostat I'm not seeing a lot of writes. Like I said, I'm new to this and if I need to provide anything else I will. Thanks, all. On Wed, Jan 4, 2012 at 2:59 PM, grant lowe glow...@gmail.com wrote: Hi all, I've got a solaris 10 running 9/10 on a T3. It's an oracle box with 128GB memory RIght now oracle . I've been trying to load test the box with bonnie++. I can seem to get 80 to 90 K writes, but can't seem to get more than a couple K for writes. Any suggestions? Or should I take this to a bonnie++ mailing list? Any help is appreciated. I'm kinda new to load testing. Thanks. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Stress test zfs
grant lowe wrote: Ok. I blew it. I didn't add enough information. Here's some more detail: Disk array is a RAMSAN array, with RAID6 and 8K stripes. I'm measuring performance with the results of the bonnie++ output and comparing with with the the zpool iostat output. It's with the zpool iostat I'm not seeing a lot of writes. Since ZFS never writes data back where it was, it can coalesce multiple outstanding writes into fewer device writes. This may be what you're seeing. I have a ZFS IOPs demo where the (multi-threaded) application is performing over 10,000 synchronous write IOPs, but the underlying devices are only performing about 1/10th of that, due to ZFS coalescing multiple outstanding writes. Sorry, I'm not familiar with what type of load bonnie generates. -- Andrew Gabriel | Solaris Systems Architect Email: andrew.gabr...@oracle.com Mobile: +44 7720 598213 Oracle EMEA Server Pre-Sales ORACLE Corporation UK Ltd is a company incorporated in England Wales | Company Reg. No. 1782505 | Reg. office: Oracle Parkway, Thames Valley Park, Reading RG6 1RA Hardware and Software, Engineered to Work Together ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Stress test zfs
what is zpool zpool status are you using the default 128k stripsize for zpool is your server x86 ? or sparc t3 , how many socket? IMHO, t3 for oracle need careful tuning since many oracle ops need fast single thread cpu Sent from my iPad On Jan 5, 2012, at 11:40, grant lowe glow...@gmail.com wrote: Ok. I blew it. I didn't add enough information. Here's some more detail: Disk array is a RAMSAN array, with RAID6 and 8K stripes. I'm measuring performance with the results of the bonnie++ output and comparing with with the the zpool iostat output. It's with the zpool iostat I'm not seeing a lot of writes. Like I said, I'm new to this and if I need to provide anything else I will. Thanks, all. On Wed, Jan 4, 2012 at 2:59 PM, grant lowe glow...@gmail.com wrote: Hi all, I've got a solaris 10 running 9/10 on a T3. It's an oracle box with 128GB memory RIght now oracle . I've been trying to load test the box with bonnie++. I can seem to get 80 to 90 K writes, but can't seem to get more than a couple K for writes. Any suggestions? Or should I take this to a bonnie++ mailing list? Any help is appreciated. I'm kinda new to load testing. Thanks. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] S11 vs illumos zfs compatiblity
On Thu, Jan 5, 2012 at 8:53 AM, sol a...@yahoo.com wrote: if a bug fixed in Illumos is never reported to Oracle by a customer, it would likely never get fixed in Solaris either :-( I would have liked to think that there was some good-will between the ex- and current-members of the zfs team, in the sense that the people who created zfs but then left Oracle still care about it enough to want the Oracle version to be as bug-free as possible. My intention was to encourage users to report bugs to both, Oracle and Illumos. It's possible that Oracle engineers pay attention to the Illumos bug database, but I expect that for legal reasons the will not look at Illumos code that has any new copyright notices relative to Oracle code. The simplest way for Oracle engineers to avoid all possible legal problems is to simply ignore at least the Illumos source repositories, possibly more. I'm speculating, sure; I might be wrong. As for good will, I'm certain that there is, at least at the engineer level, and probably beyond. But that doesn't mean that there will be bug parity, much less feature parity. Nico -- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Stress test zfs
one still does not understand your setup 1 what is hba in T3-2 2 did u setup raid6 (how) in ramsan array? or present the ssd as jbod to zpool 3 which model of RAMSAN 4 are there any other storage behind RAMSAN 5 do you set up zpool with zil and or ARC? 6 IMHO, the hybre approach to ZFS is the most cost effective, 7200rpm SAS with zil and ARC and mirror Zpool the problem with raid6 with 8k and oracle 8k is the mismatch of stripsize we know the zpool use dynamic stripsize in raid, not the same as in hw raid but similar consideration still exist Sent from my iPad On Jan 5, 2012, at 11:40, grant lowe glow...@gmail.com wrote: Ok. I blew it. I didn't add enough information. Here's some more detail: Disk array is a RAMSAN array, with RAID6 and 8K stripes. I'm measuring performance with the results of the bonnie++ output and comparing with with the the zpool iostat output. It's with the zpool iostat I'm not seeing a lot of writes. Like I said, I'm new to this and if I need to provide anything else I will. Thanks, all. On Wed, Jan 4, 2012 at 2:59 PM, grant lowe glow...@gmail.com wrote: Hi all, I've got a solaris 10 running 9/10 on a T3. It's an oracle box with 128GB memory RIght now oracle . I've been trying to load test the box with bonnie++. I can seem to get 80 to 90 K writes, but can't seem to get more than a couple K for writes. Any suggestions? Or should I take this to a bonnie++ mailing list? Any help is appreciated. I'm kinda new to load testing. Thanks. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] S11 vs illumos zfs compatiblity
On Thu, Jan 5, 2012 at 9:32 AM, Richard Elling richard.ell...@gmail.comwrote: On Jan 5, 2012, at 6:53 AM, sol wrote: if a bug fixed in Illumos is never reported to Oracle by a customer, it would likely never get fixed in Solaris either :-( I would have liked to think that there was some good-will between the ex- and current-members of the zfs team, in the sense that the people who created zfs but then left Oracle still care about it enough to want the Oracle version to be as bug-free as possible. There is good-will between the developers. And the ZFS working group has representatives currently employed by Oracle. However, Oracle is a lawnmower. http://www.youtube.com/watch?v=-zRN7XLCRhc (Obviously I don't expect this to be the case for developers of all software but I think filesystem developers are a special breed!) They are! And there are a lot of really cool things happening in the wild as well as behind Oracle's closed doors. -- richard -- ZFS and performance consulting http://www.RichardElling.com illumos meetup, Jan 10, 2012, Menlo Park, CA http://www.meetup.com/illumos-User-Group/events/41665962/ Speaking of illumos, what exactly is the deal with the zfs discuss mailing list? There's all of 3 posts that show up for all of 2011. Am I missing something, or is there just that little traction currently? http://www.listbox.com/member/archive/182191/sort/time_rev/ --Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Stress test zfs
i just take look the ramsan web site there are many whitepaper on oracle, none on ZFS Sent from my iPad On Jan 5, 2012, at 12:58, Hung-Sheng Tsao (laoTsao) laot...@gmail.com wrote: one still does not understand your setup 1 what is hba in T3-2 2 did u setup raid6 (how) in ramsan array? or present the ssd as jbod to zpool 3 which model of RAMSAN 4 are there any other storage behind RAMSAN 5 do you set up zpool with zil and or ARC? 6 IMHO, the hybre approach to ZFS is the most cost effective, 7200rpm SAS with zil and ARC and mirror Zpool the problem with raid6 with 8k and oracle 8k is the mismatch of stripsize we know the zpool use dynamic stripsize in raid, not the same as in hw raid but similar consideration still exist Sent from my iPad On Jan 5, 2012, at 11:40, grant lowe glow...@gmail.com wrote: Ok. I blew it. I didn't add enough information. Here's some more detail: Disk array is a RAMSAN array, with RAID6 and 8K stripes. I'm measuring performance with the results of the bonnie++ output and comparing with with the the zpool iostat output. It's with the zpool iostat I'm not seeing a lot of writes. Like I said, I'm new to this and if I need to provide anything else I will. Thanks, all. On Wed, Jan 4, 2012 at 2:59 PM, grant lowe glow...@gmail.com wrote: Hi all, I've got a solaris 10 running 9/10 on a T3. It's an oracle box with 128GB memory RIght now oracle . I've been trying to load test the box with bonnie++. I can seem to get 80 to 90 K writes, but can't seem to get more than a couple K for writes. Any suggestions? Or should I take this to a bonnie++ mailing list? Any help is appreciated. I'm kinda new to load testing. Thanks. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] arc_no_grow is set to 1 and never set back to 0
It's supposed to be 7111576: arc shrinks in the absence of memory pressure currently in status accepted and an RPE escalation pending. -Original Message- From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-boun...@opensolaris.org] On Behalf Of Tomas Forsman Sent: Donnerstag, 5. Januar 2012 10:35 To: zfs-discuss@opensolaris.org Subject: Re: [zfs-discuss] arc_no_grow is set to 1 and never set back to 0 On 04 January, 2012 - Steve Gonczi sent me these 2,5K bytes: The interesting bit is what happens inside arc_reclaim_needed(), that is, how it arrives at the conclusion that there is memory pressure. Maybe we could trace arg0, which gives the location where we have left the function. This would finger which return path arc_reclaim_needed() took. It's new code, basically comparing the inuse/total/free from kstat -n zfs_file_data, which seems buggered. Steve - Original Message - Well it looks like the only place this get's changed is in the arc_reclaim_thread for opensolaris. I suppose you could dtrace it to see what is going on and investigate what is happening to the return code of the arc_reclaim_needed is. http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/comm on/fs/zfs/arc.c#2089 maybe dtrace -n 'fbt:zfs:arc_reclaim_needed:return { trace(arg1) }' Dave /Tomas -- Tomas Forsman, st...@acc.umu.se, http://www.acc.umu.se/~stric/ |- Student at Computing Science, University of Umeå `- Sysadmin at {cs,acc}.umu.se ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS + Dell MD1200's - MD3200 necessary?
We are looking at building a storage platform based on Dell HW + ZFS (likely Nexenta). Going Dell because they can provide solid HW support globally. Are any of you using the MD1200 JBOD with head units *without* an MD3200 in front? We are being told that the MD1200's won't daisy chain unless the MD3200 is involved. We would be looking to use some sort of LSI-based SAS controller on the Dell front-end servers. Looking to confirm from folks who have this deployed in the wild. Perhaps you'd be willing to describe your setup as well and anything we might need to take into consideration (thinking best option for getting ZIL/L2ARC devices into Dell R510 head units for example in a supported manner). Thanks, Ray ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS + Dell MD1200's - MD3200 necessary?
On Thu, Jan 05, 2012 at 06:07:33PM -0800, Craig Morgan wrote: Ray, If you are intending to go Nexenta then speak to your local Nexenta SE, we've got HSL qualified solutions which cover our h/w support and we've explicitly qualed some MD1200 configs with Dell for certain deployments to guarantee support via both Dell h/w support and ourselves. If you don't know who that would be drop me a line and I'll find someone local to you … We tend to go with the LSI cards, but even there there are some issues with regard to Dell supply or over the counter. HTH Craig Hi Craig; Yep, we are doing this. Just trying to sanity check the suggested config against what folks are doing in the wild as our Dell partner doesn't seem to think it should/can be done without the MD3200. They may have alterior motives of course. :) Thanks, Ray On 6 Jan 2012, at 01:28, Ray Van Dolson wrote: We are looking at building a storage platform based on Dell HW + ZFS (likely Nexenta). Going Dell because they can provide solid HW support globally. Are any of you using the MD1200 JBOD with head units *without* an MD3200 in front? We are being told that the MD1200's won't daisy chain unless the MD3200 is involved. We would be looking to use some sort of LSI-based SAS controller on the Dell front-end servers. Looking to confirm from folks who have this deployed in the wild. Perhaps you'd be willing to describe your setup as well and anything we might need to take into consideration (thinking best option for getting ZIL/L2ARC devices into Dell R510 head units for example in a supported manner). Thanks, Ray ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS Upgrade
Dear list, I'm about to upgrade a zpool from 10 to 29 version, I suppose that this upgrade will improve several performance issues that are present on 10, however inside that pool we have several zfs filesystems all of them are version 1 my first question is is there a problem with performance or any other problem if you operate a zpool 29 with zfs filesystems version 1 ? Is it better to upgrade zfs to the latest version ? Can we jump from zfs version 1 to 5 ? Is there any implications on zfs send/receive with filesystem's and pools with different versions ? Thanks in advance Ivan ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Thinking about spliting a zpool in system and data
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Sorry if this list is inappropriate. Pointers welcomed. Using Solaris 10 Update 10, x86-64. I have been a ZFS heavy user since available, and I love the system. My servers are usually small (two disks) and usually hosted in a datacenter, so I usually create a ZPOOL used both for system and data. That is, the entire system contains an unique two-disk zpool. This have worked nice so far. But my new servers have SSD too. Using them for L2ARC is easy enough, but I can not use them as ZIL because no separate ZIL device can be used in root zpools. Ugh, that hurts!. So I am thinking about splitting my full two-disk zpool in two zpools, one for system and other for data. Both using both disks for mirroring. So I would have two slices per disk. I have the system in production in a datacenter I can not access, but I have remote KVM access. Servers are in production, I can't reinstall but I could be allowed to have small (minutes) downtimes for a while. My plan is this: 1. Do a scrub to be sure the data is OK in both disks. 2. Break the mirror. The A disk will keep working, B disk is idle. 3. Partition B disk with two slices instead of current full disk slice. 4. Create a system zpool in B. 5. Snapshot zpool/ROOT in A and zfs send it to system in B. Repeat several times until we have a recent enough copy. This stream will contain the OS and the zones root datasets. I have zones. 6. Change GRUB to boot from system instead of zpool. Cross fingers and reboot. Do I have to touch the bootfs property? Now ideally I would be able to have system as the zpool root. The zones would be mounted from the old datasets. 7. If everything is OK, I would zfs send the data from the old zpool to the new one. After doing a few times to get a recent copy, I would stop the zones and do a final copy, to be sure I have all data, no changes in progress. 8. I would change the zone manifest to mount the data in the new zpool. 9. I would restart the zones and be sure everything seems ok. 10. I would restart the computer to be sure everything works. So fair, it this doesn't work, I could go back to the old situation simply changing the GRUB boot to the old zpool. 11. If everything works, I would destroy the original zpool in A, partition the disk and recreate the mirroring, with B as the source. 12. Reboot to be sure everything is OK. So, my questions: a) Is this workflow reasonable and would work?. Is the procedure documented anywhere?. Suggestions?. Pitfalls? b) *MUST* SWAP and DUMP ZVOLs reside in the root zpool or can they live in a nonsystem zpool? (always plugged and available). I would like to have a quite small(let say 30GB, I use Live Upgrade and quite a fez zones) system zpool, but my swap is huge (32 GB and yes, I use it) and I would rather prefer to have SWAP and DUMP in the data zpool, if possible supported. c) Currently Solaris decides to activate write caching in the SATA disks, nice. What would happen if I still use the complete disks BUT with two slices instead of one?. Would it still have write cache enabled?. And yes, I have checked that the cache flush works as expected, because I can only do around one hundred write+sync per second. Advices?. - -- Jesus Cea Avion _/_/ _/_/_/_/_/_/ j...@jcea.es - http://www.jcea.es/ _/_/_/_/ _/_/_/_/ _/_/ jabber / xmpp:j...@jabber.org _/_/_/_/ _/_/_/_/_/ . _/_/ _/_/_/_/ _/_/ _/_/ Things are not so easy _/_/ _/_/_/_/ _/_/_/_/ _/_/ My name is Dump, Core Dump _/_/_/_/_/_/ _/_/ _/_/ El amor es poner tu felicidad en la felicidad de otro - Leibniz -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQCVAwUBTwaHW5lgi5GaxT1NAQLe/AP9EIK0tckVBhqzrTHWbNzT2TPUGYc7ZYjS pZYX1EXkJNxVOmmXrWApmoVFGtYbwWeaSQODqE9XY5rUZurEbYrXOmejF2olvBPL zyGFMnZTcmWLTrlwH5vaXeEJOSBZBqzwMWPR/uv2Z/a9JWO2nbidcV1OAzVdT2zU kfboJpbxONQ= =6i+A -END PGP SIGNATURE- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Thinking about spliting a zpool in system and data
On Fri, Jan 6, 2012 at 12:32 PM, Jesus Cea j...@jcea.es wrote: So, my questions: a) Is this workflow reasonable and would work?. Is the procedure documented anywhere?. Suggestions?. Pitfalls? try http://www.solarisinternals.com/wiki/index.php/ZFS_Troubleshooting_Guide#Complete_Solaris_ZFS_Root_Pool_Recovery b) *MUST* SWAP and DUMP ZVOLs reside in the root zpool or can they live in a nonsystem zpool? (always plugged and available). I would like to have a quite small(let say 30GB, I use Live Upgrade and quite a fez zones) system zpool, but my swap is huge (32 GB and yes, I use it) and I would rather prefer to have SWAP and DUMP in the data zpool, if possible supported. try it? :D Last time i played around with S11, you could even go without swap and dump (with some manual setup). c) Currently Solaris decides to activate write caching in the SATA disks, nice. What would happen if I still use the complete disks BUT with two slices instead of one?. Would it still have write cache enabled?. And yes, I have checked that the cache flush works as expected, because I can only do around one hundred write+sync per second. You can enable disk cache manually using format. -- Fajar ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss