Re: [zfs-discuss] [storage-discuss] Supermicro SAS/SATA controllers?
On Mon, Apr 13, 2009 at 3:27 PM, Miles Nordin wrote: > > "nl" == Nicholas Lee writes: > > nl>1. Is the cache only used for RAID modes and not in JBOD >nl> mode? > > well, there are different LSI cards and firmwares and drivers, but: > > The X4150 SAS RAID controllers will use the on-board battery backed cache > even when disks are presented as individual LUNs. > -- "Aaron Blew" > Wed, 3 Sep 2008 15:29:29 -0700 > > We're using an Infortrend SATA/SCSI disk array with individual LUNs, but > it still uses the disk cache. > -- Tomas ?gren >Thu, 4 Sep 2008 10:20:30 +0200 > >nl> 2. If it is used by the controller is it driver >nl> dependant? Only works if the driver can handle the cache > > driver is proprietary. :) no way to know. > >nl> 3. If the cache does work what happens if there is a power >nl> reset? > > Obviously it is supposed to handle this. But, yeah, as you said, > _when_ is the battery-backed cache flushed? At boot during the BIOS > probe? What if you're using SPARC and don't do a BIOS probe? by the > driver? When the ``card's firmware boots?'' How can you tell if the > cache has got stuff in it or not? What if you're doing maintenance > like replacing disks---something not unlikely to coincide with unclean > shutdowns. Will this confuse it? > I didn't think about this scenario. zfs handles so much of what once would have been done in hardware and by drivers. While this is good, it is leaving this huge grey area where it is hard for those of us on the front line to make decisions about best choices. > The driver and the ``firmware'' is all proprietary, so there's no way > to look into the matter yourself other than exhaustive testing, and > there's no vendor standing squarely behind the overall system like > there is with an external array. > > but...it's so extremely cheap and fast that I think there's a huge That;s the big point. 10,000 USD for a 2U 12 disk 10TB raw NAS or 100,000 USD for the equalivent appliance. > segment of market, the segment which cares about being extremely cheap > and fast, that uses this stuff as a matter of course. I guess these > are the guys who were supposed to start using ZFS but for now I guess > the hardware cache is still faster for ``hardware'' raid-on-a-card. > > I think the ideal device would have a fully open-source driver stack, > and a light on the SSD slog, or battery+RAM, or supercap+RAM+CF, to > indicate if it's empty or not. If it's missing and not empty then the > pool will always refuse to auto-import but always import if > ``forced'', and if it's missing and empty then the pool will sometimes > auto-import (ex., always if there was a clean shutdown and sometimes > if there wasn't), and if forced to import when the light's out the > pool will be fsync-consistent. Currently we're short of the ideal > even using the ZFS-style slog, but AIUI you can get closer if you make > a backup of your empty slog right after you attach it and stash the > .dd.gz file somewhere outside the pool---you can force the import of a > pool with a dirty, missing slog by substituting an old empty slog with > the right label on it. However, still closed driver, still nothing > with fancy lights on it. :) > The only issue I have with slog-type devices at the moment is that they are not removable and thus easily replaceable. Seems if you want a production system using slogs then you must mirror them - otherwise if the slog is corrupted you can only revert to a backup. > >nl> iRAM device seems like a hack, > > There's also the ACARD device: > > acard ANS-9010B $250 > plus 8GB RAM$86 > plus 16GB CF$44 > > It's also got a battery but can dump/restore the RAM to a CF card. > It's physically larger and not cheaper nor faster than Intel X25E but > at least it doesn't have the fragmentation problems to worry about. > I've not tested it myself. Someone on the list tested it, but IIRC he > did not use it as a slog, nor comment on how the CF dumping feature > works (it sounds kind of sketchy. ``buttons'' are involved, which to > me sounds very bad). > I've seen these before, but dismissed them as they are 5.25" units which is tricky in rack systems which generally only catered for 3.5". I wonder if it is possible to pull these apart and put them in a smaller case. Has anyone done any specific testing with SSD devices and solaris other than the FISHWORKS stuff? Which is better for what - SLC and MLC? Nicholas ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Any news on ZFS bug 6535172?
Gary, How full is the pool ? -- Sanjeev On Sun, Apr 12, 2009 at 08:39:03AM -0500, Gary Mills wrote: > We're running a Cyrus IMAP server on a T2000 under Solaris 10 with > about 1 TB of mailboxes on ZFS filesystems. Recently, when under > load, we've had incidents where IMAP operations became very slow. The > general symptoms are that the number of imapd, pop3d, and lmtpd > processes increases, the CPU load average increases, but the ZFS I/O > bandwidth decreases. At the same time, ZFS filesystem operations > become very slow. A rewrite of a small file can take two minutes. > > We've added memory; this was an improvement, but the incidents > continued. The next step is to disable ZFS prefetch and test this > under load. If that doesn't help either, we're down to ZFS bugs. > > Our incidents seem similar to the ones at UC Davis: > > http://vpiet.ucdavis.edu/docs/EmailReviewCmte.Report_Feb2008.pdf > > These were attributed to bug 6535160, but this one is fixed on our > server with patch 127127-11. Bug 6535172, ``zil_sync causing long > hold times on zl_lock'', doesn't have a patch yet: > > http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6535172 > > Could this bug cause our problem? How do I confirm that it does? > Is there a workaround? > > Cyrus IMAP uses several moderate-sized databases that are > memory-mapped by all processes. I can move these from ZFS to UFS if > this is likely to help. > > -- > -Gary Mills--Unix Support--U of M Academic Computing and Networking- > ___ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- Sanjeev Bagewadi Solaris RPE Bangalore, India ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Any news on ZFS bug 6535172?
On Sun, Apr 12, 2009 at 05:01:57PM -0400, Ellis, Mike wrote: > Is the netapp iscsi-lun forcing a dull sync as a part of zfs's > 5-second synx/flush type of thing? (Not needed tince the netapp > guarantees the write once it acks it) I've asked that of our Netapp guy, but so far I haven't heard from him. Is there a way to determine this from the Iscsi initiator side? I do have a test mail server that I can play with. > That could make a big difference... > (Perhaps disabling the write-flush in zfs will make a big difference > here, especially on a write-heavy system) -- -Gary Mills--Unix Support--U of M Academic Computing and Networking- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] [storage-discuss] Supermicro SAS/SATA controllers?
On Sun, Apr 12, 2009 at 7:24 PM, Miles Nordin wrote: > >nl> Supermicro have several LSI controllers. AOC-USASLP-L8i with >nl> the LSI 1068E > > That's what I'm using. It uses the proprietary mpt driver. > >nl> and AOC-USASLP-H8iR with the LSI 1078. > > I'm not using this. > >nl> How does is the performance compare to the Marvel? > > don't know, but the proprietary Marvell driver uses the SATA > framework, and the LSI proprietary driver attaches like a SCSI > controller without using SATA framework. If you are trying to burn > CD's or play DVD's or use 'smartctl' on your hard disks, I'm not sure > if it will work with LSI. Disk storage only. I usually use USB cdroms for servers if I need them. >nl> The LSI1068E has 16MB SRAM onboard cache - I expect this helps >nl> performances, but does it causes issues with ZIL? > > no, it is just sillyness. It's just part of the controller/driver, > not something to worry about. > I guess when you think, it is actually smaller (now) than the cache on many HDDs. Probably a waste of space. > >nl> The LSI1078 has 512MB DDR2 onboard cache with a battery backup >nl> option. > > yeah, without the battery the onboard cache may be a liability rather > than an asset. You will have to worry if the card is unsafely > offering to store things in this volatile cache. I'm not sure how it > works out in practice. > I guess this is my main point of worry about this card. 1. Is the cache only used for RAID modes and not in JBOD mode? 2. If it is used by the controller is it driver dependant? Only works if the driver can handle the cache 3. If the cache does work what happens if there is a power reset? - In the first case if it is driver independent, and simply does a cache to disk flush of IO commands on power restart would cause corruption with zfs? - In the second case, similar to the first case but is it now dependant on the driver? How stable is the driver? Is corruption a more likely event? 4. In either case the option to turn off the cache might be important. 5. Furthermore, without a battery you might also desire to turn off the battery. > I think the battery-backed caches are much cheaper than a SSD slog, > and the bandwidth to the cache is much higher than bandwidth to a > single SATA port too. I don't like it, though, because data collects > inside the cache which I can't get out. OTOH, slog plus data disks I > can easily move from one machine to another while diagnosing a > problem, if i suspect a motherboard or the LSI card itself is bad, for > example. > I agree with your points. Even though an iRAM device seems like a hack, without good information about the stability of controller base cache they seem like the more portable solution. nl> using the battery backup option, allowing "zil disable"? > > please reread the best practices. I think you're confusing two > different options and planning to do something unsafe. > Sorry, I meant zfs_nocacheflush - which should only be used when NVRAM is available or a secure power supply. > t> UIO fits just fine in a normal chassis, you just have to > t> remove the bracket. [...] it's really not a big deal. > > +1, that supermicro card, Nicholas, is UIO rather than PCIe, and it > does work for me in plain PCIe slot with the bracket removed. so long > as you are not moving around the machine too much, I agree it's not a > big deal. > I plan to use supermicro chassis in a rack - so both a m/b designed for UIO and in a stable location. Should be in fine. Nicholas ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Any news on ZFS bug 6535172?
On Sun, Apr 12, 2009 at 12:23:03PM -0700, Richard Elling wrote: > These disks are pretty slow. JBOD? They are not 100% busy, which > means that either the cached data is providing enough response to the > apps, or the apps are not capable of producing enough load -- which > means the bottleneck may be elsewhere. They are four 500-gig Iscsi LUNs exported from a Netapp filer, with Solaris multipathing. Yes, the I/O is normally mostly writes, with reads being satisfied from various caches. > You can use fsstat to get a better idea of what sort of I/O the applications > are seeing from the file system. That might be revealing. Thanks for the suggestion. There are so many `*stat' commands that I forget about some of them. I've run a baseline with `fsstat', but the server is mostly idle now. I'll have to wait for another incident! What option to `fsstat' do you recommend? Here's a sample of the default output: $ fsstat zfs 5 5 new name name attr attr lookup rddir read read write write file remov chng get setops ops ops bytes ops bytes 3.56M 1.53M 3.83M 1.07G 1.53M 2.47G 4.09M 56.4M 1.83T 61.1M 306G zfs 13 116 1.40K 5 11.6K 0 5 38.5K 125 127K zfs 18 018 3.61K 6 21.1K 0 6 16.7K97 244K zfs 26 425 1.73K10 6.76K 018 178K 142 817K zfs 12 313 3.90K 5 9.00K 0 5 32.8K 108 287K zfs 7 2 7 1.98K 3 5.87K 0 7 67.5K 108 2.34M zfs -- -Gary Mills--Unix Support--U of M Academic Computing and Networking- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Any news on ZFS bug 6535172?
On Sun, Apr 12, 2009 at 10:49:49AM -0700, Richard Elling wrote: > Gary Mills wrote: > >We're running a Cyrus IMAP server on a T2000 under Solaris 10 with > >about 1 TB of mailboxes on ZFS filesystems. Recently, when under > >load, we've had incidents where IMAP operations became very slow. The > >general symptoms are that the number of imapd, pop3d, and lmtpd > >processes increases, the CPU load average increases, but the ZFS I/O > >bandwidth decreases. At the same time, ZFS filesystem operations > >become very slow. A rewrite of a small file can take two minutes. > > > > Bandwidth is likely not the issue. What does the latency to disk look like? Yes, I have statistics! This set was taken during an incident on Thursday. The load average was 12. There were about 5700 Cyrus processes running. Here are the relevant portions of `iostat -xn 5 4': extended device statistics r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 23.8 20.7 1195.0 677.8 0.0 1.00.0 22.2 0 37 c4t60A98000433469764E4A2D456A644A74d0 29.0 23.5 1438.3 626.8 0.0 1.30.0 25.4 0 44 c4t60A98000433469764E4A2D456A696579d0 22.8 26.6 1356.7 822.1 0.0 1.30.0 26.2 0 32 c4t60A98000433469764E4A476D2F664E4Fd0 26.4 27.3 1516.0 850.7 0.0 1.40.0 26.5 0 38 c4t60A98000433469764E4A476D2F6B385Ad0 extended device statistics r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 39.7 27.0 1395.8 285.5 0.0 1.10.0 16.3 0 51 c4t60A98000433469764E4A2D456A644A74d0 52.5 29.8 1890.8 175.1 0.0 1.80.0 22.3 0 63 c4t60A98000433469764E4A2D456A696579d0 30.0 33.3 1940.2 432.8 0.0 1.20.0 19.4 0 34 c4t60A98000433469764E4A476D2F664E4Fd0 39.9 42.5 2062.1 616.7 0.0 1.90.0 22.9 0 50 c4t60A98000433469764E4A476D2F6B385Ad0 extended device statistics r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 43.8 47.6 1691.5 504.8 0.0 1.60.0 17.3 0 59 c4t60A98000433469764E4A2D456A644A74d0 55.4 62.4 2027.8 517.0 0.0 2.20.0 18.5 0 72 c4t60A98000433469764E4A2D456A696579d0 18.6 76.8 682.3 843.5 0.0 1.10.0 12.0 0 34 c4t60A98000433469764E4A476D2F664E4Fd0 30.2 115.8 873.6 905.8 0.0 2.20.0 15.1 0 52 c4t60A98000433469764E4A476D2F6B385Ad0 extended device statistics r/sw/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device 49.8 21.8 2438.7 400.3 0.0 1.70.0 24.0 0 62 c4t60A98000433469764E4A2D456A644A74d0 53.2 34.0 2741.3 218.0 0.0 2.10.0 24.4 0 63 c4t60A98000433469764E4A2D456A696579d0 14.0 26.8 506.2 482.1 0.0 0.70.0 18.2 0 32 c4t60A98000433469764E4A476D2F664E4Fd0 23.4 38.8 484.5 582.3 0.0 1.10.0 18.2 0 42 c4t60A98000433469764E4A476D2F6B385Ad0 -- -Gary Mills--Unix Support--U of M Academic Computing and Networking- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Any news on ZFS bug 6535172?
Gary Mills wrote: We're running a Cyrus IMAP server on a T2000 under Solaris 10 with about 1 TB of mailboxes on ZFS filesystems. Recently, when under load, we've had incidents where IMAP operations became very slow. The general symptoms are that the number of imapd, pop3d, and lmtpd processes increases, the CPU load average increases, but the ZFS I/O bandwidth decreases. At the same time, ZFS filesystem operations become very slow. A rewrite of a small file can take two minutes. Bandwidth is likely not the issue. What does the latency to disk look like? -- richard We've added memory; this was an improvement, but the incidents continued. The next step is to disable ZFS prefetch and test this under load. If that doesn't help either, we're down to ZFS bugs. Our incidents seem similar to the ones at UC Davis: http://vpiet.ucdavis.edu/docs/EmailReviewCmte.Report_Feb2008.pdf These were attributed to bug 6535160, but this one is fixed on our server with patch 127127-11. Bug 6535172, ``zil_sync causing long hold times on zl_lock'', doesn't have a patch yet: http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6535172 Could this bug cause our problem? How do I confirm that it does? Is there a workaround? Cyrus IMAP uses several moderate-sized databases that are memory-mapped by all processes. I can move these from ZFS to UFS if this is likely to help. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Any news on ZFS bug 6535172?
We're running a Cyrus IMAP server on a T2000 under Solaris 10 with about 1 TB of mailboxes on ZFS filesystems. Recently, when under load, we've had incidents where IMAP operations became very slow. The general symptoms are that the number of imapd, pop3d, and lmtpd processes increases, the CPU load average increases, but the ZFS I/O bandwidth decreases. At the same time, ZFS filesystem operations become very slow. A rewrite of a small file can take two minutes. We've added memory; this was an improvement, but the incidents continued. The next step is to disable ZFS prefetch and test this under load. If that doesn't help either, we're down to ZFS bugs. Our incidents seem similar to the ones at UC Davis: http://vpiet.ucdavis.edu/docs/EmailReviewCmte.Report_Feb2008.pdf These were attributed to bug 6535160, but this one is fixed on our server with patch 127127-11. Bug 6535172, ``zil_sync causing long hold times on zl_lock'', doesn't have a patch yet: http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6535172 Could this bug cause our problem? How do I confirm that it does? Is there a workaround? Cyrus IMAP uses several moderate-sized databases that are memory-mapped by all processes. I can move these from ZFS to UFS if this is likely to help. -- -Gary Mills--Unix Support--U of M Academic Computing and Networking- ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss