Re: [zfs-discuss] [storage-discuss] Supermicro SAS/SATA controllers?

2009-04-12 Thread Nicholas Lee
On Mon, Apr 13, 2009 at 3:27 PM, Miles Nordin  wrote:

> > "nl" == Nicholas Lee  writes:
>
> nl>1. Is the cache only used for RAID modes and not in JBOD
>nl> mode?
>
> well, there are different LSI cards and firmwares and drivers, but:
>
>  The X4150 SAS RAID controllers will use the on-board battery backed cache
>  even when disks are presented as individual LUNs.
>  -- "Aaron Blew" 
> Wed, 3 Sep 2008 15:29:29 -0700
>
>  We're using an Infortrend SATA/SCSI disk array with individual LUNs, but
>  it still uses the disk cache.
>  -- Tomas ?gren 
>Thu, 4 Sep 2008 10:20:30 +0200
>
>nl> 2. If it is used by the controller is it driver
>nl> dependant?  Only works if the driver can handle the cache
>
> driver is proprietary. :)  no way to know.
>
>nl> 3. If the cache does work what happens if there is a power
>nl> reset?
>
> Obviously it is supposed to handle this.  But, yeah, as you said,
> _when_ is the battery-backed cache flushed?  At boot during the BIOS
> probe?  What if you're using SPARC and don't do a BIOS probe?  by the
> driver?  When the ``card's firmware boots?''  How can you tell if the
> cache has got stuff in it or not?  What if you're doing maintenance
> like replacing disks---something not unlikely to coincide with unclean
> shutdowns.  Will this confuse it?
>

I didn't think about this scenario. zfs handles so much of what once would
have been done in hardware and by drivers.  While this is good, it is
leaving this huge grey area where it is hard for those of us on the front
line to make decisions about best choices.



> The driver and the ``firmware'' is all proprietary, so there's no way
> to look into the matter yourself other than exhaustive testing, and
> there's no vendor standing squarely behind the overall system like
> there is with an external array.
>
> but...it's so extremely cheap and fast that I think there's a huge


That;s the big point.  10,000 USD for a 2U 12 disk 10TB raw NAS or 100,000
USD for the equalivent appliance.



> segment of market, the segment which cares about being extremely cheap
> and fast, that uses this stuff as a matter of course.  I guess these
> are the guys who were supposed to start using ZFS but for now I guess
> the hardware cache is still faster for ``hardware'' raid-on-a-card.
>




> I think the ideal device would have a fully open-source driver stack,
> and a light on the SSD slog, or battery+RAM, or supercap+RAM+CF, to
> indicate if it's empty or not.  If it's missing and not empty then the
> pool will always refuse to auto-import but always import if
> ``forced'', and if it's missing and empty then the pool will sometimes
> auto-import (ex., always if there was a clean shutdown and sometimes
> if there wasn't), and if forced to import when the light's out the
> pool will be fsync-consistent.  Currently we're short of the ideal
> even using the ZFS-style slog, but AIUI you can get closer if you make
> a backup of your empty slog right after you attach it and stash the
> .dd.gz file somewhere outside the pool---you can force the import of a
> pool with a dirty, missing slog by substituting an old empty slog with
> the right label on it.  However, still closed driver, still nothing
> with fancy lights on it. :)
>


The only issue I have with slog-type devices at the moment is that they are
not removable and thus easily replaceable.  Seems if you want a production
system using slogs then you must mirror them - otherwise if the slog is
corrupted you can only revert to a backup.


>
>nl> iRAM device seems like a hack,
>
> There's also the ACARD device:
>
> acard ANS-9010B $250
>  plus 8GB RAM$86
>  plus 16GB CF$44
>
> It's also got a battery but can dump/restore the RAM to a CF card.
> It's physically larger and not cheaper nor faster than Intel X25E but
> at least it doesn't have the fragmentation problems to worry about.
> I've not tested it myself.  Someone on the list tested it, but IIRC he
> did not use it as a slog, nor comment on how the CF dumping feature
> works (it sounds kind of sketchy.  ``buttons'' are involved, which to
> me sounds very bad).
>

I've seen these before, but dismissed them as they are 5.25" units which is
tricky in rack systems which generally only catered for 3.5".   I wonder if
it is possible to pull these apart and put them in a smaller case.


Has anyone done any specific testing with SSD devices and solaris other than
the FISHWORKS stuff?  Which is better for what - SLC and MLC?

Nicholas
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Any news on ZFS bug 6535172?

2009-04-12 Thread Sanjeev
Gary,

How full is the pool ? 

-- Sanjeev
On Sun, Apr 12, 2009 at 08:39:03AM -0500, Gary Mills wrote:
> We're running a Cyrus IMAP server on a T2000 under Solaris 10 with
> about 1 TB of mailboxes on ZFS filesystems.  Recently, when under
> load, we've had incidents where IMAP operations became very slow.  The
> general symptoms are that the number of imapd, pop3d, and lmtpd
> processes increases, the CPU load average increases, but the ZFS I/O
> bandwidth decreases.  At the same time, ZFS filesystem operations
> become very slow.  A rewrite of a small file can take two minutes.
> 
> We've added memory; this was an improvement, but the incidents
> continued.  The next step is to disable ZFS prefetch and test this
> under load.  If that doesn't help either, we're down to ZFS bugs.
> 
> Our incidents seem similar to the ones at UC Davis:
> 
> http://vpiet.ucdavis.edu/docs/EmailReviewCmte.Report_Feb2008.pdf
> 
> These were attributed to bug 6535160, but this one is fixed on our
> server with patch 127127-11.  Bug 6535172, ``zil_sync causing long
> hold times on zl_lock'', doesn't have a patch yet:
> 
> http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6535172
> 
> Could this bug cause our problem?  How do I confirm that it does?
> Is there a workaround?
> 
> Cyrus IMAP uses several moderate-sized databases that are
> memory-mapped by all processes.  I can move these from ZFS to UFS if
> this is likely to help.
> 
> -- 
> -Gary Mills--Unix Support--U of M Academic Computing and Networking-
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

-- 

Sanjeev Bagewadi
Solaris RPE 
Bangalore, India
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Any news on ZFS bug 6535172?

2009-04-12 Thread Gary Mills
On Sun, Apr 12, 2009 at 05:01:57PM -0400, Ellis, Mike wrote:
> Is the netapp iscsi-lun forcing a dull sync as a part of zfs's
> 5-second synx/flush type of thing? (Not needed tince the netapp
> guarantees the write once it acks it)

I've asked that of our Netapp guy, but so far I haven't heard from
him.  Is there a way to determine this from the Iscsi initiator
side?  I do have a test mail server that I can play with.

> That could make a big difference...
> (Perhaps disabling the write-flush in zfs will make a big difference
> here, especially on a write-heavy system)

-- 
-Gary Mills--Unix Support--U of M Academic Computing and Networking-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [storage-discuss] Supermicro SAS/SATA controllers?

2009-04-12 Thread Nicholas Lee
On Sun, Apr 12, 2009 at 7:24 PM, Miles Nordin  wrote:

>
>nl> Supermicro have several LSI controllers.  AOC-USASLP-L8i with
>nl> the LSI 1068E
>
> That's what I'm using.  It uses the proprietary mpt driver.
>
>nl> and AOC-USASLP-H8iR with the LSI 1078.
>
> I'm not using this.
>
>nl> How does is the performance compare to the Marvel?
>
> don't know, but the proprietary Marvell driver uses the SATA
> framework, and the LSI proprietary driver attaches like a SCSI
> controller without using SATA framework.  If you are trying to burn
> CD's or play DVD's or use 'smartctl' on your hard disks, I'm not sure
> if it will work with LSI.


Disk storage only. I usually use USB cdroms for servers if I need them.


>nl> The LSI1068E has 16MB SRAM onboard cache - I expect this helps
>nl> performances, but does it causes issues with ZIL?
>
> no, it is just sillyness.  It's just part of the controller/driver,
> not something to worry about.
>

I guess when you think, it is actually smaller (now) than the cache on many
HDDs. Probably a waste of space.


>
>nl> The LSI1078 has 512MB DDR2 onboard cache with a battery backup
>nl> option.
>
> yeah, without the battery the onboard cache may be a liability rather
> than an asset.  You will have to worry if the card is unsafely
> offering to store things in this volatile cache.  I'm not sure how it
> works out in practice.
>

I guess this is my main point of worry about this card.


   1. Is the cache only used for RAID modes and not in JBOD mode?
   2. If it is used by the controller is it driver dependant?  Only works if
   the driver can handle the cache
   3. If the cache does work what happens if there is a power reset?
  - In the first case if it is driver independent, and simply does a
  cache to disk flush of IO commands on power restart would cause
corruption
  with zfs?
  - In the second case, similar to the first case but is it now
  dependant on the driver?  How stable is the driver? Is corruption a more
  likely event?
   4. In either case the option to turn off the cache might be important.
   5. Furthermore, without a battery you might also desire to turn off the
   battery.


> I think the battery-backed caches are much cheaper than a SSD slog,
> and the bandwidth to the cache is much higher than bandwidth to a
> single SATA port too.  I don't like it, though, because data collects
> inside the cache which I can't get out.  OTOH, slog plus data disks I
> can easily move from one machine to another while diagnosing a
> problem, if i suspect a motherboard or the LSI card itself is bad, for
> example.
>

I agree with your points.  Even though an iRAM device seems like a hack,
without good information about the stability of controller base cache they
seem like the more portable solution.


   nl> using the battery backup option, allowing "zil disable"?
>
> please reread the best practices.  I think you're confusing two
> different options and planning to do something unsafe.
>

Sorry, I meant zfs_nocacheflush - which should only be used when NVRAM
is available or a secure power supply.



> t> UIO fits just fine in a normal chassis, you just have to
> t> remove the bracket. [...] it's really not a big deal.
>
> +1, that supermicro card, Nicholas, is UIO rather than PCIe, and it
> does work for me in plain PCIe slot with the bracket removed.  so long
> as you are not moving around the machine too much, I agree it's not a
> big deal.
>

I plan to use supermicro chassis in a rack - so both a m/b designed for UIO
and in a stable location. Should be in fine.

Nicholas
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Any news on ZFS bug 6535172?

2009-04-12 Thread Gary Mills
On Sun, Apr 12, 2009 at 12:23:03PM -0700, Richard Elling wrote:
> These disks are pretty slow.  JBOD?  They are not 100% busy, which
> means that either the cached data is providing enough response to the
> apps, or the apps are not capable of producing enough load -- which
> means the bottleneck may be elsewhere.

They are four 500-gig Iscsi LUNs exported from a Netapp filer, with
Solaris multipathing.  Yes, the I/O is normally mostly writes, with
reads being satisfied from various caches.

> You can use fsstat to get a better idea of what sort of I/O the applications
> are seeing from the file system.  That might be revealing.

Thanks for the suggestion.  There are so many `*stat' commands that I
forget about some of them.  I've run a baseline with `fsstat', but the
server is mostly idle now.  I'll have to wait for another incident!
What option to `fsstat' do you recommend?  Here's a sample of the
default output:

$ fsstat  zfs 5 5
 new  name   name  attr  attr lookup rddir  read read  write write
 file remov  chng   get   setops   ops   ops bytes   ops bytes
3.56M 1.53M 3.83M 1.07G 1.53M  2.47G 4.09M 56.4M 1.83T 61.1M  306G zfs
   13 116 1.40K 5  11.6K 0 5 38.5K   125  127K zfs
   18 018 3.61K 6  21.1K 0 6 16.7K97  244K zfs
   26 425 1.73K10  6.76K 018  178K   142  817K zfs
   12 313 3.90K 5  9.00K 0 5 32.8K   108  287K zfs
7 2 7 1.98K 3  5.87K 0 7 67.5K   108 2.34M zfs
-- 
-Gary Mills--Unix Support--U of M Academic Computing and Networking-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Any news on ZFS bug 6535172?

2009-04-12 Thread Gary Mills
On Sun, Apr 12, 2009 at 10:49:49AM -0700, Richard Elling wrote:
> Gary Mills wrote:
> >We're running a Cyrus IMAP server on a T2000 under Solaris 10 with
> >about 1 TB of mailboxes on ZFS filesystems.  Recently, when under
> >load, we've had incidents where IMAP operations became very slow.  The
> >general symptoms are that the number of imapd, pop3d, and lmtpd
> >processes increases, the CPU load average increases, but the ZFS I/O
> >bandwidth decreases.  At the same time, ZFS filesystem operations
> >become very slow.  A rewrite of a small file can take two minutes.
> >  
> 
> Bandwidth is likely not the issue.  What does the latency to disk look like?

Yes, I have statistics!  This set was taken during an incident on
Thursday.  The load average was 12.  There were about 5700 Cyrus
processes running.  Here are the relevant portions of `iostat -xn 5 4':

extended device statistics
r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
   23.8   20.7 1195.0  677.8  0.0  1.00.0   22.2   0  37 
c4t60A98000433469764E4A2D456A644A74d0
   29.0   23.5 1438.3  626.8  0.0  1.30.0   25.4   0  44 
c4t60A98000433469764E4A2D456A696579d0
   22.8   26.6 1356.7  822.1  0.0  1.30.0   26.2   0  32 
c4t60A98000433469764E4A476D2F664E4Fd0
   26.4   27.3 1516.0  850.7  0.0  1.40.0   26.5   0  38 
c4t60A98000433469764E4A476D2F6B385Ad0
extended device statistics
r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
   39.7   27.0 1395.8  285.5  0.0  1.10.0   16.3   0  51 
c4t60A98000433469764E4A2D456A644A74d0
   52.5   29.8 1890.8  175.1  0.0  1.80.0   22.3   0  63 
c4t60A98000433469764E4A2D456A696579d0
   30.0   33.3 1940.2  432.8  0.0  1.20.0   19.4   0  34 
c4t60A98000433469764E4A476D2F664E4Fd0
   39.9   42.5 2062.1  616.7  0.0  1.90.0   22.9   0  50 
c4t60A98000433469764E4A476D2F6B385Ad0
extended device statistics
r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
   43.8   47.6 1691.5  504.8  0.0  1.60.0   17.3   0  59 
c4t60A98000433469764E4A2D456A644A74d0
   55.4   62.4 2027.8  517.0  0.0  2.20.0   18.5   0  72 
c4t60A98000433469764E4A2D456A696579d0
   18.6   76.8  682.3  843.5  0.0  1.10.0   12.0   0  34 
c4t60A98000433469764E4A476D2F664E4Fd0
   30.2  115.8  873.6  905.8  0.0  2.20.0   15.1   0  52 
c4t60A98000433469764E4A476D2F6B385Ad0
extended device statistics
r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
   49.8   21.8 2438.7  400.3  0.0  1.70.0   24.0   0  62 
c4t60A98000433469764E4A2D456A644A74d0
   53.2   34.0 2741.3  218.0  0.0  2.10.0   24.4   0  63 
c4t60A98000433469764E4A2D456A696579d0
   14.0   26.8  506.2  482.1  0.0  0.70.0   18.2   0  32 
c4t60A98000433469764E4A476D2F664E4Fd0
   23.4   38.8  484.5  582.3  0.0  1.10.0   18.2   0  42 
c4t60A98000433469764E4A476D2F6B385Ad0

-- 
-Gary Mills--Unix Support--U of M Academic Computing and Networking-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Any news on ZFS bug 6535172?

2009-04-12 Thread Richard Elling

Gary Mills wrote:

We're running a Cyrus IMAP server on a T2000 under Solaris 10 with
about 1 TB of mailboxes on ZFS filesystems.  Recently, when under
load, we've had incidents where IMAP operations became very slow.  The
general symptoms are that the number of imapd, pop3d, and lmtpd
processes increases, the CPU load average increases, but the ZFS I/O
bandwidth decreases.  At the same time, ZFS filesystem operations
become very slow.  A rewrite of a small file can take two minutes.
  


Bandwidth is likely not the issue.  What does the latency to disk look like?
-- richard


We've added memory; this was an improvement, but the incidents
continued.  The next step is to disable ZFS prefetch and test this
under load.  If that doesn't help either, we're down to ZFS bugs.

Our incidents seem similar to the ones at UC Davis:

http://vpiet.ucdavis.edu/docs/EmailReviewCmte.Report_Feb2008.pdf

These were attributed to bug 6535160, but this one is fixed on our
server with patch 127127-11.  Bug 6535172, ``zil_sync causing long
hold times on zl_lock'', doesn't have a patch yet:

http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6535172

Could this bug cause our problem?  How do I confirm that it does?
Is there a workaround?

Cyrus IMAP uses several moderate-sized databases that are
memory-mapped by all processes.  I can move these from ZFS to UFS if
this is likely to help.

  

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Any news on ZFS bug 6535172?

2009-04-12 Thread Gary Mills
We're running a Cyrus IMAP server on a T2000 under Solaris 10 with
about 1 TB of mailboxes on ZFS filesystems.  Recently, when under
load, we've had incidents where IMAP operations became very slow.  The
general symptoms are that the number of imapd, pop3d, and lmtpd
processes increases, the CPU load average increases, but the ZFS I/O
bandwidth decreases.  At the same time, ZFS filesystem operations
become very slow.  A rewrite of a small file can take two minutes.

We've added memory; this was an improvement, but the incidents
continued.  The next step is to disable ZFS prefetch and test this
under load.  If that doesn't help either, we're down to ZFS bugs.

Our incidents seem similar to the ones at UC Davis:

http://vpiet.ucdavis.edu/docs/EmailReviewCmte.Report_Feb2008.pdf

These were attributed to bug 6535160, but this one is fixed on our
server with patch 127127-11.  Bug 6535172, ``zil_sync causing long
hold times on zl_lock'', doesn't have a patch yet:

http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6535172

Could this bug cause our problem?  How do I confirm that it does?
Is there a workaround?

Cyrus IMAP uses several moderate-sized databases that are
memory-mapped by all processes.  I can move these from ZFS to UFS if
this is likely to help.

-- 
-Gary Mills--Unix Support--U of M Academic Computing and Networking-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss