Re: [zfs-discuss] very slow write performance on 151a
hi guys, does anyone know if a fix for this (space map thrashing) is in the works? i've been running into this on and off on a number of systems i manage. sometimes i can delete snapshots and things go back to normal, sometimes the only thing that works is enabling metaslab_debug. obviously the latter is only really an option for systems with a huge amount of ram. or: am i doing something wrong? milosz On Mon, Dec 19, 2011 at 8:02 AM, Jim Klimov jimkli...@cos.ru wrote: 2011-12-15 22:44, milosz цкщеу: There are a few metaslab-related tunables that can be tweaked as well. - Bill For the sake of completeness, here are the relevant lines I have in /etc/system: ** * fix up metaslab min size (recent default ~10Mb seems bad, * recommended return to 4Kb, we'll do 4*8K) * greatly increases write speed in filled-up pools set zfs:metaslab_min_alloc_size = 0x8000 set zfs:metaslab_smo_bonus_pct = 0xc8 ** These values were described in greater detail on the list this summer, I think. HTH, //Jim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] very slow write performance on 151a
hi all, suddenly ran into a very odd issue with a 151a server used primarily for cifs... out of (seemingly) nowhere, writes are incredibly slow, often 10kb/s. this is what zpool iostat 1 looks like when i copy a big file: storepool 13.4T 1.07T 57 0 6.13M 0 storepool 13.4T 1.07T216 91 740K 5.58M storepool 13.4T 1.07T127182 232K 1004K storepool 13.4T 1.07T189 99 361K 5.47M storepool 13.4T 1.07T357172 910K 949K storepool 13.4T 1.07T454222 1.42M 2.14M storepool 13.4T 1.07T 55209 711K 1.05M basically instead of the usual txg 5-second write pattern zfs writes to the zpool every second. this is certainly not an issue with the disks... iostat -En shows no errors and -Xn shows that the disks are barely being used (20%). the only situation in which i've seen this before was a multi-terabyte pool with dedup=on and constant writes (goes away once you turn off dedup). no dedup anywhere on this zpool, though. arc usage is normal (total ram is 12gb, max is set to 11gb, current usage is 8gb). pool is an 8-disk raidz2. any ideas? pretty stumped. milosz ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] very slow write performance on 151a
thanks, bill. i killed an old filesystem. also forgot about arc_meta_limit. kicked it up to 4gb from 2gb. things are back to normal. On Thu, Dec 15, 2011 at 1:06 PM, Bill Sommerfeld sommerf...@alum.mit.edu wrote: On 12/15/11 09:35, milosz wrote: hi all, suddenly ran into a very odd issue with a 151a server used primarily for cifs... out of (seemingly) nowhere, writes are incredibly slow, often10kb/s. this is what zpool iostat 1 looks like when i copy a big file: storepool 13.4T 1.07T 57 0 6.13M 0 storepool 13.4T 1.07T 216 91 740K 5.58M ... any ideas? pretty stumped. Behavior I've observed with multiple pools is that you will sometimes hit a performance wall when the pool gets too full; the system spends lots of time reading in metaslab metadata looking for a place to put newly-allocated blocks. If you're in this mode, kernel profiling will show a lot of time spent in metaslab-related code. Exactly where you hit the wall seems to depend on the history of what went into the pool; I've seen the problem kick in with only 69%-70% usage in one pool that was used primarily for solaris development. The workaround turned out to be simple: delete stuff you don't need to keep. Once there was enough free space, write performance returned to normal. There are a few metaslab-related tunables that can be tweaked as well. - Bill ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS for iSCSI based SAN
2 first disks Hardware mirror of 146Go with Sol10 UFS filesystem on it. The next 6 others will be used as a raidz2 ZFS volume of 535G, compression and shareiscsi=on. I'm going to CHAP protect it soon... you're not going to get the random read write performance you need for a vm backend out of any kind of parity raid. just go with 3 sets of mirrors. unless you're ok with subpar performance (and if you think you are, you should really reconsider). also you might get significant mileage out of putting an ssd in and using it for zil. here's a good post from roch's blog about parity vs mirrored setups: http://blogs.sun.com/roch/entry/when_to_and_not_to ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS for iSCSI based SAN
- - the VM will be mostly few IO systems : - -- WS2003 with Trend Officescan, WSUS (for 300 XP) and RDP - -- Solaris10 with SRSS 4.2 (Sunray server) (File and DB servers won't move in a nearby future to VM+SAN) I thought -but could be wrong- that those systems could afford a high latency IOs data rate. might be fine most of the time... rdp in particular is vulnerable to io spiking and disk latency. depends on how many users you have on that rdp vm. also wsus is surprisingly (or not, given it's a microsoft production) resource-hungry. if those servers are on physical boxes right now i'd do some perfmon caps and add up the iops. what you're getting out of it is essentially a 500gb 15k drive with a high mttdl That's what i wanted, a rock-solid disk area, despite a not-as-good-as-i'd-like random IO. fair enough. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS write I/O stalls
is this a direct write to a zfs filesystem or is it some kind of zvol export? anyway, sounds similar to this: http://opensolaris.org/jive/thread.jspa?threadID=105702tstart=0 On Tue, Jun 23, 2009 at 7:14 PM, Bob Friesenhahnbfrie...@simple.dallas.tx.us wrote: It has been quite some time (about a year) since I did testing of batch processing with my software (GraphicsMagick). In between time, ZFS added write-throttling. I am using Solaris 10 with kernel 141415-03. Quite a while back I complained that ZFS was periodically stalling the writing process (which UFS did not do). The ZFS write-throttling feature was supposed to avoid that. In my testing today I am still seeing ZFS stall the writing process periodically. When the process is stalled, there is a burst of disk activity, a burst of context switching, and total CPU use drops to almost zero. Zpool iostat says that read bandwidth is 15.8M and write bandwidth is 15.8M over a 60 second averaging interval. Since my drive array is good for writing over 250MB/second, this is a very small write load and the array is loafing. My program uses the simple read-process-write approach. Each file written (about 8MB/file) is written contiguously and written just once. Data is read and written in 128K blocks. For this application there is no value obtained by caching the file just written. From what I am seeing, reading occurs as needed, but writes are being batched up until the next ZFS synchronization cycle. During the ZFS synchronization cycle it seems that processes are blocked from writing. Since my system has a lot of memory and the ARC is capped at 10GB, quite a lot of data can be queued up to be written. The ARC is currently running at its limit of 10GB. If I tell my software to invoke fsync() before closing each written file, then the stall goes away, but the program then needs to block so there is less beneficial use of the CPU. If this application stall annoys me, I am sure that it would really annoy a user with mission-critical work which needs to get done on a uniform basis. If I run this little script then the application runs more smoothly but I see evidence of many shorter stalls: while true do sleep 3 sync done Is there a solution in the works for this problem? Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer, http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs on 32 bit?
thank you, caspar. to sum up here (seems to have been a lot of confusion in this thread): the efi vs. smi thing that richard and a few other people have talked about is not the issue at the heart of this. this: 32 bit Solaris can use at most 2^31 as disk address; a disk block is 512bytes, so in total it can address 2^40 bytes. A SMI label found in Solaris 10 (update 8?) and OpenSolaris has been enhanced and can address 2TB but only on a 64 bit system. is what the problem is. so 32-bit zfs cannot use disks larger than 1(.09951)tb regardless of whether it's for the root pool or not. let me repeat that i do not consider this a bug and do not want to see it fixed. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zio_taskq_threads and TXG sync
wow, that hasn't been a recognized problem since this past april? i've been seeing it for a -long- time. i think i first reported it back in december. are people actively working on it? On Tue, Jun 16, 2009 at 10:24 AM, Marcelo Lealno-re...@opensolaris.org wrote: Hello all, I'm trying to understand the ZFS IO scheduler ( http://www.eall.com.br/blog/?p=1170 ), and why sometimes the system seems to be stalled for some seconds, and every application that needs some IO (most read, i think), have serious problems. What can be a big problem in iSCSI or NFS soft mounts. Looking at the code, i could get to the zio_taskq_threads structure, and to this bug report: http://bugs.opensolaris.org/bugdatabase/printableBug.do?bug_id=6826241 And seems like it was already integrated to newer releases (i don't know since when)... Somebody could explain the real diff between the ISSUE and INTR, READ and WRITE changes, and maybe why in the first implementation were the same value for both? ;-) Another move that i did not fully understand very well, was the time between txg syncs, from 5s to 30s, what i think can make this problem worst, because we will have more data to commit. Well to much questions... ;-) PS: Where i can find the patches and attachments from the bugs.opensolaris.org? The files mention attach, but i can not find them. Thanks a lot for your time! Leal [ http://www.eall.com.br/blog ] -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs on 32 bit?
yeah, i get a nice clean zfs error message about disk size limits when i try to add the disk. On Tue, Jun 16, 2009 at 4:26 PM, rolandno-re...@opensolaris.org wrote: the only problems i've run into are: slow (duh) and will not take disks that are bigger than 1tb do you think that 1tb limit is due to 32bit solaris ? -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs on 32 bit?
yeah i pretty much agree with you on this. the fact that no one has brought this up before is a pretty good indication of the demand. there are about 1000 things i'd rather see fixed/improved than max disk size on a 32bit platform. On Tue, Jun 16, 2009 at 5:55 PM, Neal Pollackneal.poll...@sun.com wrote: On 06/16/09 02:39 PM, roland wrote: so, we have a 128bit fs, but only support for 1tb on 32bit? i`d call that a bug, isn`t it ? is there a bugid for this? ;) Well, opinion is welcome. I'd call it an RFE. With 64 bit versions of the CPU chips so inexpensive these days, how much money do you want me to invest in moving modern features and support to old versions of the OS? I mean, Microsoft could, on a technical level, backport all new features from Vista and Windows Seven to Windows 95. But if they did that, their current offering would lag, since all the engineers would be working on the older stuff. Heck, you can buy a 64 bit CPU motherboard very very cheap. The staff that we do have are working on modern features for the 64bit version, rather than spending all their time in the rear-view mirror. Live life forward. Upgrade. Changing all the data structures in the 32 bit OS to handle super larger disks, is, well, sorta like trying to get a Pentium II to handle HD Video. I'm sure, with enough time and money, you might find a way. But is it worth it? Or is it cheaper to buy a new pump? Neal ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs on 32 bit?
one of my disaster recovery servers has been running on 32bit hardware (ancient northwood chip) for about a year. the only problems i've run into are: slow (duh) and will not take disks that are bigger than 1tb. that is kind of a bummer and means i'll have to switch to a 64bit base soon. everything else has been fine. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] how to reliably determine what is locking up my zvol?
deleting the lu's via sbdadm solved this. still wondering if there is some reliable way to figure out what is using the zvol, though =) On Wed, May 20, 2009 at 6:32 PM, milosz mew...@gmail.com wrote: -bash-3.2# zpool export exchbk cannot remove device links for 'exchbk/exchbk-2': dataset is busy this is a zvol used for a comstar iscsi backend: -bash-3.2# stmfadm list-lu -v LU Name: 600144F0EAC009004A0A4F410001 Operational Status: Offline Provider Name : sbd Alias : /dev/zvol/rdsk/exchbk/exchbk-1 View Entry Count : 1 LU Name: 600144F0EAC009004A1307D20001 Operational Status: Offline Provider Name : sbd Alias : /dev/zvol/rdsk/exchbk/exchbk-2 View Entry Count : 1 i offlined everything. it's driving me nuts. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] how to reliably determine what is locking up my zvol?
-bash-3.2# zpool export exchbk cannot remove device links for 'exchbk/exchbk-2': dataset is busy this is a zvol used for a comstar iscsi backend: -bash-3.2# stmfadm list-lu -v LU Name: 600144F0EAC009004A0A4F410001 Operational Status: Offline Provider Name : sbd Alias : /dev/zvol/rdsk/exchbk/exchbk-1 View Entry Count : 1 LU Name: 600144F0EAC009004A1307D20001 Operational Status: Offline Provider Name : sbd Alias : /dev/zvol/rdsk/exchbk/exchbk-2 View Entry Count : 1 i offlined everything. it's driving me nuts. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS using all memory for cache.
google evil tuning guide and you will find it. you can throw a zfs into the query too, or not. zfs will basically use as much ram as it can. see section 2.2, limiting arc cache On Mon, May 11, 2009 at 11:16 AM, Ross Schaulis ross.schau...@sun.comwrote: (Please reply to me directly as I am not on the ZFS alias) IHAC running BEA WebLogic on a T2000 with ZFS. Here's what he's telling me. - He found himself running out of memory on the T2000 (16GB) - He rebooted his T2000 and got all his memory back - He ran his system for a while and then did a vmstat and showed he had 10G available. He copied what should have been ~ 5Gig to disk and immediately re-ran vmstat and it showed he only had 2Gig memory left!!! My guess is that ZFS is using the system memory for cache and is not giving it back? OR maybe ZFS will give it back when asked? Are there better commands to run to see actual memory available? Is there a way to cap the amount of memory being used for cache by ZFS? Or better yet is there a tuning guide I can give to my customer to help him better understand and run his ZFS environment? (I think there used to be an evil tuning guide...but I have been unable to find it) RS -- _/_/_/ _/_/ _/ _/ |Ross Schaulis _/ _/_/ _/_/ _/|Senior Systems Engineer _/_/_/ _/_/ _/ _/ _/ |State and Local Government _/ _/_/ _/ _/_/ |(877) 249-7441 office _/_/_/ _/_/_/ _/ _/ M I C R O S Y S T E M S | ross.schau...@sun.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Areca 1160 ZFS
with pass-through disks on areca controllers you have to set the lun id (i believe) using the volume command. when you issue a volume info your disk id's should look like this (if you want solaris to see the disks): 0/1/0 0/2/0 0/3/0 0/4/0 etc. the middle part there (again, i think that's supposed to be lun id) is what you need to set manually for each disk. it's actually my #1 peeve with using areca with solaris. On Thu, May 7, 2009 at 4:29 PM, Gregory Skelton gskel...@gravity.phys.uwm.edu wrote: Hi Everyone, I want to start out by saying ZFS has been a life saver to me, and the scientific collaboration I work for. I can't imagine working with the TB's of data that we do, without the snapshots or the ease of moving the data from one pool to another. Right now I'm trying to setup a whiteboxe with OpenSolaris. It has an Areca 1160 RAID controller(lastest firmware), SuperMicro H8SSL-I mobo, and a SuperMicro IPMI card. I haven't been working with Solaris for all that long, and wanted to create a zpool similar to our x4500's. From the documentation it says to use the format command to locate the disks. OpenSolaris lives on a 2 disk Mirrored raid, and I was hoping I could have the disks pass through, so that zfs could manage the zpool. What am I doing wrong here, that I can't see all the disks? Or do I have to use a RAID 5 underneath the zpool? Any and all help is appreciated. Thanks, Gregory r...@nfs0009:~# format Searching for disks...done AVAILABLE DISK SELECTIONS: 0. c3t0d0 DEFAULT cyl 48627 alt 2 hd 255 sec 63 /p...@0,0/pci1166,3...@1/pci1166,1...@d/pci8086,3...@1/pci17d3,1...@e/s...@0,0 1. c3t1d0 DEFAULT cyl 48639 alt 2 hd 255 sec 63 /p...@0,0/pci1166,3...@1/pci1166,1...@d/pci8086,3...@1/pci17d3,1...@e/s...@1,0 Specify disk (enter its number): r...@nfs0009:~# ./cli64 disk info # Ch# ModelName Capacity Usage === 1 1 WDC WD4000YS-01MPB1 400.1GB Raid Set # 00 2 2 WDC WD4000YS-01MPB1 400.1GB Raid Set # 00 3 3 WDC WD4000YS-01MPB1 400.1GB Pass Through 4 4 WDC WD4000YS-01MPB1 400.1GB Pass Through 5 5 WDC WD4000YS-01MPB1 400.1GB Pass Through 6 6 WDC WD4000YS-01MPB1 400.1GB Pass Through 7 7 WDC WD4000YS-01MPB1 400.1GB Pass Through 8 8 WDC WD4000YS-01MPB1 400.1GB Pass Through 9 9 WDC WD4000YS-01MPB1 400.1GB Pass Through 10 10 WDC WD4000YS-01MPB1 400.1GB Pass Through 11 11 WDC WD4000YS-01MPB1 400.1GB Pass Through 12 12 WDC WD4000YS-01MPB1 400.1GB Pass Through 13 13 WDC WD4000YS-01MPB1 400.1GB Pass Through 14 14 WDC WD4000YS-01MPB1 400.1GB Pass Through 15 15 WDC WD4000YS-01MPB1 400.1GB Pass Through 16 16 WDC WD4000YS-01MPB1 400.1GB Pass Through === GuiErrMsg0x00: Success. r...@nfs0009:~# ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs iscsi sustained write performance
sorry, that 60% statement was misleading... i will VERY OCCASIONALLY get a spike to 60%, but i'm averaging more like 15%, with the throughput often dropping to zero for several seconds at a time. that iperf test more or less demonstrates it isn't a network problem, no? also i have been using microsoft iscsi initiator... i will try doing a solaris-solaris test later. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs iscsi sustained write performance
iperf test coming out fine, actually... iperf -s -w 64k iperf -c -w 64k -t 900 -i 5 [ ID] Interval Transfer Bandwidth [ 5] 0.0-899.9 sec 81.1 GBytes774 Mbits/sec totally steady. i could probably implement some tweaks to improve it, but if i were getting a steady 77% of gigabit i'd be very happy. not seeing any cpu saturation with mpstat... nothing unusual other than low activity while zfs commits writes to disk (ostensibly this is when the transfer rate troughs)... -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs iscsi sustained write performance
thanks for your responses, guys... the nagle's tweak is the first thing i did, actually. not sure what the network limiting factors could be here... there's no switch, jumbo frames are on... maybe it's the e1000g driver? it's been wonky since 94 or so. even during the write bursts i'm only getting 60% of gigabit on average. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] zfs iscsi sustained write performance
hi all, currently having trouble with sustained write performance with my setup... ms server 2003/ms iscsi initiator 2.08 w/intel e1000g nic directly connected to snv_101 w/ intel e1000g nic. basically, given enough time, the sustained write behavior is perfectly periodic. if i copy a large file to the iscsi target, iostat reports 10 seconds or so of -no- writes to disk, just small reads... then 2-3 seconds of disk-maxed writes, during which time windows reports the write performance dropping to zero (disk queues maxed). so iostat will report something like this for each of my zpool disks (with iostat -xtc 1) 1s: %b 0 2s: %b 0 3s: %b 0 4s: %b 0 5s: %b 0 6s: %b 0 7s: %b 0 8s: %b 0 9s: %b 0 10s: %b 0 11s: %b 100 12s: %b 100 13s: %b 100 14s: %b 0 15s: %b 0 it looks like solaris hangs out caching the writes and not actually committing them to disk... when the cache gets flushed, the iscsitgt (or whatever) just stops accepting writes. this is happening across controllers and zpools. also, a test copy of a 10gb file from one zpool to another (not iscsi) yielded similar iostat results: 10 seconds of big reads from the source zpool, 2-3 seconds of big writes to the target zpool (target zpool is 5x bigger than source zpool). anyone got any ideas? point me in the right direction? thanks, milosz -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs iscsi sustained write performance
compression is off across the board. svc_t is only maxed during the periods of heavy write activity (2-3 seconds every 10 or so seconds)... otherwise disks are basically idling. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs iscsi sustained write performance
my apologies... 11s, 12s, and 13s represent the number of seconds in a read/write period, not disks. so, 11 seconds into a period, %b suddenly jumps to 100 after having been 0 for the first 10. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss