Re: [zfs-discuss] RFE: Un-dedup for unique blocks

2013-01-22 Thread Gary Mills
On Tue, Jan 22, 2013 at 11:54:53PM +, Edward Ned Harvey 
(opensolarisisdeadlongliveopensolaris) wrote:
  From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
  boun...@opensolaris.org] On Behalf Of Nico Williams
  
  As for swap... really, you don't want to swap.  If you're swapping you
  have problems.  
 
 In solaris, I've never seen it swap out idle processes; I've only
 seen it use swap for the bad bad bad situation.  I assume that's all
 it can do with swap.

You would be wrong.  Solaris uses swap space for paging.  Paging out
unused portions of an executing process from real memory to the swap
device is certainly beneficial.  Swapping out complete processes is a
desperation move, but paging out most of an idle process is a good
thing.

-- 
-Gary Mills--refurb--Winnipeg, Manitoba, Canada-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] LUN sizes

2012-10-29 Thread Gary Mills
On Mon, Oct 29, 2012 at 09:30:47AM -0500, Brian Wilson wrote:
 
 First I'd like to note that contrary to the nomenclature there isn't
 any one SAN product that all operates the same. There are a number
 of different vendor provided solutions that use a FC SAN to deliver
 luns to hosts, and they each have their own limitations. Forgive my
 pedanticism please.
 
 On Sun, Oct 28, 2012 at 04:43:34PM +0700, Fajar A. Nugraha wrote:
  On Sat, Oct 27, 2012 at 9:16 PM, Edward Ned Harvey
  (opensolarisisdeadlongliveopensolaris)
  opensolarisisdeadlongliveopensola...@nedharvey.com wrote:
   From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-()
   boun...@opensolaris.org] On Behalf Of Fajar A. Nugraha
  
   So my
   suggestion is actually just present one huge 25TB LUN to zfs and let
   the SAN handle redundancy.
 
 You are entering the uncharted waters of ``multi-level disk
 management'' here. Both ZFS and the SAN use redundancy and error-
 checking to ensure data integrity. Both of them also do automatic
 replacement of failing disks. A good SAN will present LUNs that
 behave as perfectly reliable virtual disks, guaranteed to be error
 free. Almost all of the time, ZFS will find no errors. If ZFS does
 find an error, there's no nice way to recover. Most commonly, this
 happens when the SAN is powered down or rebooted while the ZFS host
 is still running.
 
 On your host side, there's also the consideration of ssd/scsi
 queuing. If you're running on only one LUN, you're limiting your
 IOPS to only one IO queue over your FC paths, and if you have that
 throttled (per many storage vendors recommendations about
 ssd:ssd_max_throttle and zfs:zfs_vdev_max_pending), then one LUN
 will throttle your IOPS back on your host. That might also motivate
 you to split into multiple LUNS so your OS doesn't end up
 bottle-necking your IO before it even gets to your SAN HBA.

That's a performance issue rather than a reliability issue.  The other
performance issue to consider is block size.  At the last place I
worked, we used an Iscsi LUN from a Netapp filer.  This LUN reported a
block size of 512 bytes, even though the Netapp itself used a 4K
block size.  This means that the filer was doing the block size
conversion, resulting in much more I/O than the ZFS layer intended.
The fact that Netapp does COW made this situation even worse.

My impression was that very few of their customers encountered this
performance problem because almost all of them used their Netapp only
for NFS or CIFS.  Our Netapp was extremely reliable but did not have
the Iscsi LUN performance that we needed.

-- 
-Gary Mills--refurb--Winnipeg, Manitoba, Canada-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Zpool LUN Sizes

2012-10-28 Thread Gary Mills
On Sun, Oct 28, 2012 at 04:43:34PM +0700, Fajar A. Nugraha wrote:
 On Sat, Oct 27, 2012 at 9:16 PM, Edward Ned Harvey
 (opensolarisisdeadlongliveopensolaris)
 opensolarisisdeadlongliveopensola...@nedharvey.com wrote:
  From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
  boun...@opensolaris.org] On Behalf Of Fajar A. Nugraha
 
  So my
  suggestion is actually just present one huge 25TB LUN to zfs and let
  the SAN handle redundancy.
 
  create a bunch of 1-disk volumes and let ZFS handle them as if they're JBOD.
 
 Last time I use IBM's enterprise storage (which was, admittedly, a
 long time ago) you can't even do that. And looking at Morris' mail
 address, it should be revelant :)
 
 ... or probably it's just me who haven't found how to do that. Which
 why I suggested just use whatever the SAN can present :)

You are entering the uncharted waters of ``multi-level disk
management'' here.  Both ZFS and the SAN use redundancy and error-
checking to ensure data integrity.  Both of them also do automatic
replacement of failing disks.  A good SAN will present LUNs that
behave as perfectly reliable virtual disks, guaranteed to be error
free.  Almost all of the time, ZFS will find no errors.  If ZFS does
find an error, there's no nice way to recover.  Most commonly, this
happens when the SAN is powered down or rebooted while the ZFS host
is still running.

-- 
-Gary Mills--refurb--Winnipeg, Manitoba, Canada-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] What happens when you rm zpool.cache?

2012-10-21 Thread Gary Mills
On Sun, Oct 21, 2012 at 11:40:31AM +0200, Bogdan Ćulibrk wrote:
Follow up question regarding this: is there any way to disable
automatic import of any non-rpool on boot without any hacks of removing
zpool.cache?

Certainly.  Import it with an alternate cache file.  You do this by
specifying the `cachefile' property on the command line.  The `zpool'
man page describes how to do this.

-- 
-Gary Mills--refurb--Winnipeg, Manitoba, Canada-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [developer] Re: History of EPERM for unlink() of directories on ZFS?

2012-06-26 Thread Gary Mills
On Tue, Jun 26, 2012 at 10:41:14AM -0500, Nico Williams wrote:
 On Tue, Jun 26, 2012 at 9:44 AM, Alan Coopersmith
 alan.coopersm...@oracle.com wrote:
  On 06/26/12 05:46 AM, Lionel Cons wrote:
  On 25 June 2012 11:33,  casper@oracle.com wrote:
  To be honest, I think we should also remove this from all other
  filesystems and I think ZFS was created this way because all modern
  filesystems do it that way.
 
  This may be wrong way to go if it breaks existing applications which
  rely on this feature. It does break applications in our case.
 
  Existing applications rely on the ability to corrupt UFS filesystems?
  Sounds horrible.
 
 My guess is that the OP just wants unlink() of an empty directory to
 be the same as rmdir() of the same.  Or perhaps they want unlink() of
 a non-empty directory to result in a recursive rm...  But if they
 really want hardlinks to directories, then yeah, that's horrible.

This all sounds like a good use for LD_PRELOAD and a tiny library
that intercepts and modernizes system calls.

-- 
-Gary Mills--refurb--Winnipeg, Manitoba, Canada-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs and iscsi performance help

2012-01-27 Thread Gary Mills
On Fri, Jan 27, 2012 at 03:25:39PM +1100, Ivan Rodriguez wrote:
 
 We have a backup server with a zpool size of 20 TB, we transfer
 information using zfs snapshots every day (we have around 300 fs on
 that pool),
 the storage is a dell md3000i connected by iscsi, the pool is
 currently version 10, the same storage is connected
  to another server with a smaller pool of 3 TB(zpool version 10) this
 server is working fine and speed is good between the storage
 and the server, however  in the server with 20 TB pool performance is
 an issue  after we restart the server
 performance is good but with the time lets say a week the performance
 keeps dropping until we have to
 bounce the server again (same behavior with new version of solaris in
 this case performance drops in 2 days), no errors in logs or storage
 or the zpool status -v

This sounds like a ZFS cache problem on the server.  You might check
on how cache statistics change over time.  Some tuning may eliminate
this degradation.  More memory may also help.  Does a scrub show any
errors?  Does the performance drop affect reads or writes or both?

 We suspect that the pool has some issues probably there is corruption
 somewhere, we tested solaris 10 8/11 with zpool 29,
 although we haven't update the pool itself, with the new solaris the
 performance is even worst and every time
 that we restart the server we get stuff like this:
 
  SOURCE: zfs-diagnosis, REV: 1.0
  EVENT-ID: 0168621d-3f61-c1fc-bc73-c50efaa836f4
 DESC: All faults associated with an event id have been addressed.
  Refer to http://sun.com/msg/FMD-8000-4M for more information.
  AUTO-RESPONSE: Some system components offlined because of the
 original fault may have been brought back online.
  IMPACT: Performance degradation of the system due to the original
 fault may have been recovered.
  REC-ACTION: Use fmdump -v -u EVENT-ID to identify the repaired components.
 [ID 377184 daemon.notice] SUNW-MSG-ID: FMD-8000-6U, TYPE: Resolved,
 VER: 1, SEVERITY: Minor
 
 And we need to export and import the pool in order to be  able to  access it.

This is a separate problem, introduced with an upgrade to the Iscsi
service.  The new one has a dependancy on the name service (typically
DNS), which means that it isn't available when the zpool import is
done during the boot.  Check with Oracle support to see if they have
found a solution.

-- 
-Gary Mills--refurb--Winnipeg, Manitoba, Canada-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] unable to access the zpool after issue a reboot

2012-01-26 Thread Gary Mills
On Thu, Jan 26, 2012 at 04:36:58PM +0100, Christian Meier wrote:
 Hi Sudheer
 
  3)bash-3.2# zpool status
pool: pool name
   state: UNAVAIL
  status: One or more devices could not be opened.  There are insufficient
  replicas for the pool to continue functioning.
  action: Attach the missing device and online it using 'zpool online'.
 see: http://www.sun.com/msg/ZFS-8000-3C
   scan: none requested
  config:NAMESTATE READ WRITE CKSUM
  pool name  UNAVAIL  0 0 0  insufficient replicas
c5t1d1UNAVAIL  0 0 0  cannot open

This means that, at the time of that import, device c5t1d1 was not
available.  What does `ls -l /dev/rdsk/c5t1d1s0' show for the physical
path?

  And the important thing is when I export  import the zpool, then I
  was able to access it.

Yes, later the device became available.  After the boot, `svcs' will
show you the services listed in order of their completion times.  The
ZFS mount is done by this service:

svc:/system/filesystem/local:default

The zpool import (without the mount) is done earlier.  Check to see
if any of the FC services run too late during the boot.

 As Gary and Bob mentioned, I saw this Issue with ISCSI Devices.
 Instead of export / import is a zpool clear also working?
 
 mpathadm list LU
 mpathadm show LU /dev/rdsk/c5t1d1s2

-- 
-Gary Mills--refurb--Winnipeg, Manitoba, Canada-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] unable to access the zpool after issue a reboot

2012-01-24 Thread Gary Mills
On Tue, Jan 24, 2012 at 05:33:39PM +0530, sureshkumar wrote:
 
I am new to Solaris  I am facing an issue with the dynapath [multipath
s/w] for Solaris10u10 x86 .
 
I am facing an issue with the zpool.
 
Whats my problem is unable to access the zpool after issue a reboot.

I've seen this happen when the zpool was built on an Iscsi LUN.  At
reboot time, the ZFS import was done before the Iscsi driver was able
to connect to its target.  After the system was up, an export and
import was successful.  The solution was to add a new service that
imported the zpool later during the reboot.

-- 
-Gary Mills--refurb--Winnipeg, Manitoba, Canada-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs defragmentation via resilvering?

2012-01-16 Thread Gary Mills
On Mon, Jan 16, 2012 at 09:13:03AM -0600, Bob Friesenhahn wrote:
 On Mon, 16 Jan 2012, Jim Klimov wrote:
 
 I think that in order to create a truly fragmented ZFS layout,
 Edward needs to do sync writes (without a ZIL?) so that every
 block and its metadata go to disk (coalesced as they may be)
 and no two blocks of the file would be sequenced on disk together.
 Although creating snapshots should give that effect...
 
 In my experience, most files on Unix systems are re-written from
 scatch.  For example, when one edits a file in an editor, the editor
 loads the file into memory, performs the edit, and then writes out
 the whole file.  Given sufficient free disk space, these files are
 unlikely to be fragmented.
 
 The case of slowly written log files or random-access databases are
 the worse cases for causing fragmentation.

The case I've seen was with an IMAP server with many users.  E-mail
folders were represented as ZFS directories, and e-mail messages as
files within those directories.  New messages arrived randomly in the
INBOX folder, so that those files were written all over the place on
the storage.  Users also deleted many messages from their INBOX
folder, but the files were retained in snapshots for two weeks.  On
IMAP session startup, the server typically had to read all of the
messages in the INBOX folder, making this portion slow.  The server
also had to refresh the folder whenever new messages arrived, making
that portion slow as well.  Performance degraded when the storage
became 50% full.  It would increase markedly when the oldest snapshot
was deleted.

-- 
-Gary Mills--refurb--Winnipeg, Manitoba, Canada-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Does raidzN actually protect against bitrot? If yes - how?

2012-01-15 Thread Gary Mills
On Sun, Jan 15, 2012 at 04:06:33PM +, Peter Tribble wrote:
 On Sun, Jan 15, 2012 at 3:04 PM, Jim Klimov jimkli...@cos.ru wrote:
  Does raidzN actually protect against bitrot?
  That's a kind of radical, possibly offensive, question formula
  that I have lately.
 
 Yup, it does. That's why many of us use it.

There's actually no such thing as bitrot on a disk.  Each sector on
the disk is accompanied by a CRC that's verified by the disk
controller on each read.  It will either return correct data or report
an unreadable sector.  There's nothing inbetween.

Of course, if something outside of ZFS writes to the disk, then data
belonging to ZFS will be modified.  I've heard of RAID controllers or
SAN devices doing this when they modify the disk geometry or reserved
areas on the disk.

-- 
-Gary Mills--refurb--Winnipeg, Manitoba, Canada-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Very poor pool performance - no zfs/controller errors?!

2011-12-19 Thread Gary Mills
On Mon, Dec 19, 2011 at 11:58:57AM +, Jan-Aage Frydenbø-Bruvoll wrote:
 
 2011/12/19 Hung-Sheng Tsao (laoTsao) laot...@gmail.com:
  did you run a scrub?
 
 Yes, as part of the previous drive failure. Nothing reported there.
 
 Now, interestingly - I deleted two of the oldest snapshots yesterday,
 and guess what - the performance went back to normal for a while. Now
 it is severely dropping again - after a good while on 1.5-2GB/s I am
 again seeing write performance in the 1-10MB/s range.

That behavior is a symptom of fragmentation.  Writes slow down
dramatically when there are no contiguous blocks available.  Deleting
a snapshot provides some of these, but only temporarily.

-- 
-Gary Mills--refurb--Winnipeg, Manitoba, Canada-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Does the zpool cache file affect import?

2011-08-29 Thread Gary Mills
I have a system with ZFS root that imports another zpool from a start
method.  It uses a separate cache file for this zpool, like this:

if [ -f $CCACHE ]
then
echo Importing $CPOOL with cache $CCACHE
zpool import -o cachefile=$CCACHE -c $CCACHE $CPOOL
else
echo Importing $CPOOL with device scan
zpool import -o cachefile=$CCACHE $CPOOL
fi

It also exports that zpool from the stop method, which has the side
effect of deleting the cache.  This all works nicely when the server
is rebooted.

What will happen when the server is halted without running the stop
method, so that that zpool is not exported?  I know that there is a
flag in the zpool that indicates when it's been exported cleanly.  The
cache file will exist when the server reboots.  Will the import fail
with the `The pool was last accessed by another system.' error, or
will the import succeed?  Does the cache change the import behavior?
Does it recognize that the server is the same system?  I don't want
to include the `-f' flag in the commands above when it's not needed.

-- 
-Gary Mills--Unix Group--Computer and Network Services-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How create a FAT filesystem on a zvol?

2011-07-12 Thread Gary Mills
On Sun, Jul 10, 2011 at 11:16:02PM +0700, Fajar A. Nugraha wrote:
 On Sun, Jul 10, 2011 at 10:10 PM, Gary Mills mi...@cc.umanitoba.ca wrote:
  The `lofiadm' man page describes how to export a file as a block
  device and then use `mkfs -F pcfs' to create a FAT filesystem on it.
 
  Can't I do the same thing by first creating a zvol and then creating
  a FAT filesystem on it?
 
 seems not.
[...]
 Some solaris tools (like fdisk, or mkfs -F pcfs) needs disk geometry
 to function properly. zvols doesn't provide that. If you want to use
 zvols to work with such tools, the easiest way would be using lofi, or
 exporting zvols as iscsi share and import it again.
 
 For example, if you have a 10MB zvol and use lofi, fdisk would show
 these geometry
 
  Total disk size is 34 cylinders
  Cylinder size is 602 (512 byte) blocks
 
 ... which will then be used if you run mkfs -F pcfs -o
 nofdisk,size=20480. Without lofi, the same command would fail with
 
 Drive geometry lookup (need tracks/cylinder and/or sectors/track:
 Operation not supported

So, why can I do it with UFS?

# zfs create -V 10m rpool/vol1
# newfs /dev/zvol/rdsk/rpool/vol1
newfs: construct a new file system /dev/zvol/rdsk/rpool/vol1: (y/n)? y
Warning: 4130 sector(s) in last cylinder unallocated
/dev/zvol/rdsk/rpool/vol1:  20446 sectors in 4 cylinders of 48 tracks, 128 
sectors
10.0MB in 1 cyl groups (14 c/g, 42.00MB/g, 20160 i/g)
super-block backups (for fsck -F ufs -o b=#) at:
 32,

Why is this different from PCFS?

-- 
-Gary Mills--Unix Group--Computer and Network Services-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] How create a FAT filesystem on a zvol?

2011-07-10 Thread Gary Mills
The `lofiadm' man page describes how to export a file as a block
device and then use `mkfs -F pcfs' to create a FAT filesystem on it.

Can't I do the same thing by first creating a zvol and then creating
a FAT filesystem on it?  Nothing I've tried seems to work.  Isn't the
zvol just another block device?

-- 
-Gary Mills--Unix Group--Computer and Network Services-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] write cache partial-disk pools (was Server with 4 drives, how to configure ZFS?)

2011-06-20 Thread Gary Mills
On Sun, Jun 19, 2011 at 08:03:25AM -0700, Richard Elling wrote:
 On Jun 19, 2011, at 6:28 AM, Edward Ned Harvey wrote:
  From: Richard Elling [mailto:richard.ell...@gmail.com]
  Sent: Saturday, June 18, 2011 7:47 PM
  
  Actually, all of the data I've gathered recently shows that the number of
  IOPS does not significantly increase for HDDs running random workloads.
  However the response time does :-( 
  
  Could you clarify what you mean by that?  
 
 Yes. I've been looking at what the value of zfs_vdev_max_pending should be.
 The old value was 35 (a guess, but a really bad guess) and the new value is
 10 (another guess, but a better guess).  I observe that data from a fast, 
 modern 
 HDD, for  1-10 threads (outstanding I/Os) the IOPS ranges from 309 to 333 
 IOPS. 
 But as we add threads, the average response time increases from 2.3ms to 
 137ms.
 Since the whole idea is to get lower response time, and we know disks are not 
 simple queues so there is no direct IOPS to response time relationship, maybe 
 it
 is simply better to limit the number of outstanding I/Os.

How would this work for a storage device with an intelligent
controller that provides only a few LUNs to the host, even though it
contains a much larger number of disks?  I would expect the controller
to be more efficient with a large number of outstanding IOs because it
could distribute those IOs across the disks.  It would, of course,
require a non-volatile cache to provide fast turnaround for writes.

-- 
-Gary Mills--Unix Group--Computer and Network Services-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] JBOD recommendation for ZFS usage

2011-05-30 Thread Gary Mills
On Mon, May 30, 2011 at 08:06:31AM +0200, Thomas Nau wrote:
 
 We are looking for JBOD systems which
 (1) hold 20+ 3.3 SATA drives
 (2) are rack mountable
 (3) have all the nive hot-swap stuff
 (4) allow 2 hosts to connect via SAS (4+ lines per host) and see
 all available drives as disks, no RAID volume.
 In a perfect world both hosts would connect each using
 two independent SAS connectors
 
 The box will be used in a ZFS Solaris/based fileserver in a
 fail-over cluster setup. Only one host will access a drive
 at any given time.

I'm using a J4200 array as shared storage for a cluster.  It needs a
SAS HBA in each cluster node.  The disks in the array are visible to
both nodes in the cluster.  Here's the feature list.  I don't know if
it's still available:

Sun Storage J4200 Array:
# Scales up to 48 SAS/SATA disk drives
# Provides up to 72 Gb/sec of total bandwidth
* Up to 72 Gb/sec of total bandwidth
* Four x4-wide 3 Gb/sec SAS host/uplink ports (48 Gb/sec bandwidth)
* Two x4-wide 3 Gb/sec SAS expansion ports (24 Gb/sec bandwidth)
* Scales up to 48 drives

-- 
-Gary Mills--Unix Group--Computer and Network Services-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Best practice for boot partition layout in ZFS

2011-04-06 Thread Gary Mills
On Wed, Apr 06, 2011 at 08:08:06AM -0700, Erik Trimble wrote:
On 4/6/2011 7:50 AM, Lori Alt wrote:
On 04/ 6/11 07:59 AM, Arjun YK wrote:
  
I'm not sure there's a defined best practice.  Maybe someone else
can answer that question.  My guess is that in environments where,
before, a separate ufs /var slice was used, a separate zfs /var
dataset with a quota might now be appropriate.
Lori

Traditionally, the reason for a separate /var was one of two major
items:
(a)  /var was writable, and / wasn't - this was typical of diskless or
minimal local-disk configurations. Modern packaging systems are making
this kind of configuration increasingly difficult.
(b) /var held a substantial amount of data, which needed to be handled
separately from /  - mail and news servers are a classic example
For typical machines nowdays, with large root disks, there is very
little chance of /var suddenly exploding and filling /  (the classic
example of being screwed... wink).  Outside of the above two cases,
about the only other place I can see that having /var separate is a
good idea is for certain test machines, where you expect frequent
memory dumps (in /var/crash) - if you have a large amount of RAM,
you'll need a lot of disk space, so it might be good to limit /var in
this case by making it a separate dataset.

People forget (c), the ability to set different filesystem options on
/var.  You might want to have `setuid=off' for improved security, for
example.

-- 
-Gary Mills--Unix Group--Computer and Network Services-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] One LUN per RAID group

2011-02-14 Thread Gary Mills
With ZFS on a Solaris server using storage on a SAN device, is it
reasonable to configure the storage device to present one LUN for each
RAID group?  I'm assuming that the SAN and storage device are
sufficiently reliable that no additional redundancy is necessary on
the Solaris ZFS server.  I'm also assuming that all disk management is
done on the storage device.

I realize that it is possible to configure more than one LUN per RAID
group on the storage device, but doesn't ZFS assume that each LUN
represents an independant disk, and schedule I/O accordingly?  In that
case, wouldn't ZFS I/O scheduling interfere with I/O scheduling
already done by the storage device?

Is there any reason not to use one LUN per RAID group?

-- 
-Gary Mills--Unix Group--Computer and Network Services-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] One LUN per RAID group

2011-02-14 Thread Gary Mills
On Mon, Feb 14, 2011 at 03:04:18PM -0500, Paul Kraus wrote:
 On Mon, Feb 14, 2011 at 2:38 PM, Gary Mills mi...@cc.umanitoba.ca wrote:
 
  Is there any reason not to use one LUN per RAID group?
[...]
 In other words, if you build a zpool with one vdev of 10GB and
 another with two vdev's each of 5GB (both coming from the same array
 and raid set) you get almost exactly twice the random read performance
 from the 2x5 zpool vs. the 1x10 zpool.

This finding is surprising to me.  How do you explain it?  Is it
simply that you get twice as many outstanding I/O requests with two
LUNs?  Is it limited by the default I/O queue depth in ZFS?  After
all, all of the I/O requests must be handled by the same RAID group
once they reach the storage device.

 Also, using a 2540 disk array setup as a 10 disk RAID6 (with 2 hot
 spares), you get substantially better random read performance using 10
 LUNs vs. 1 LUN. While inconvenient, this just reflects the scaling of
 ZFS aith number of vdevs and not spindles.

-- 
-Gary Mills--Unix Group--Computer and Network Services-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] zpool-poolname has 99 threads

2011-01-31 Thread Gary Mills
After an upgrade of a busy server to Oracle Solaris 10 9/10, I notice
a process called zpool-poolname that has 99 threads.  This seems to be
a limit, as it never goes above that.  It is lower on workstations.
The `zpool' man page says only:

  Processes
 Each imported pool has an associated process,  named  zpool-
 poolname.  The  threads  in  this process are the pool's I/O
 processing threads, which handle the compression,  checksum-
 ming,  and other tasks for all I/O associated with the pool.
 This process exists to  provides  visibility  into  the  CPU
 utilization  of the system's storage pools. The existence of
 this process is an unstable interface.

There are several thousand processes doing ZFS I/O on the busy server.
Could this new process be a limitation in any way?  I'd just like to
rule it out before looking further at I/O performance.

-- 
-Gary Mills--Unix Group--Computer and Network Services-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Sliced iSCSI device for doing RAIDZ?

2010-09-24 Thread Gary Mills
On Fri, Sep 24, 2010 at 12:01:35AM +0200, Alexander Skwar wrote:
 
  Suppose they gave you two huge lumps of storage from the SAN, and you
  mirrored them with ZFS.  What would you do if ZFS reported that one of
  its two disks had failed and needed to be replaced?  You can't do disk
  management with ZFS in this situation anyway because those aren't real
  disks.  Disk management all has to be done on the SAN storage device.
 
 Yes. I was rather thinking about RAIDZ instead of mirroring.

I was just using a simpler example.

 Anyway. Without redundancy, ZFS cannot do recovery, can
 it? As far as I understand, it could detect block level corruption,
 even if there's not redundancy. But it could not correct such a
 corruption.
 
 Or is that a wrong understanding?

That's correct, but it also should never happen.

 If I got the gist of what you wrote, it boils down to how reliable
 the SAN is? But also SANs could have block level corruption,
 no? I'm a bit confused, because of the (perceived?) contra-
 diction to the Best Practices Guide? :)

The real problem is that ZFS was not designed to run in a SAN
environment, that is one where all of the disk management and
sufficient redundancy reside in the storage device on the SAN.  ZFS
certainly can't do any disk management in this situation.  Error
detection and correction is still a debatable issue, one that quickly
becomes exceedingly complex.  The decision rests on probabilities
rather than certainties.

-- 
-Gary Mills--Unix Group--Computer and Network Services-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Sliced iSCSI device for doing RAIDZ?

2010-09-23 Thread Gary Mills
On Tue, Sep 21, 2010 at 05:48:09PM +0200, Alexander Skwar wrote:
 
 We're using ZFS via iSCSI on a S10U8 system. As the ZFS Best
 Practices Guide http://j.mp/zfs-bp states, it's advisable to use
 redundancy (ie. RAIDZ, mirroring or whatnot), even if the underlying
 storage does its own RAID thing.
 
 Now, our storage does RaID and the storage people say, it is
 impossible to have it export iSCSI devices which have no redundancy/
 RAID.

If you have a reliable Iscsi SAN and a reliable storage device, you
don't need the additional redundancy provided by ZFS.

 Actually, were would there be a difference? I mean, those iSCSI
 devices anyway don't represent real disks/spindles, but it's just
 some sort of abstractation. So, if they'd give me 3x400 GB compared
 to 1200 GB in one huge lump like they do now, it could be, that
 those would use the same spots on the real hard drives.

Suppose they gave you two huge lumps of storage from the SAN, and you
mirrored them with ZFS.  What would you do if ZFS reported that one of
its two disks had failed and needed to be replaced?  You can't do disk
management with ZFS in this situation anyway because those aren't real
disks.  Disk management all has to be done on the SAN storage device.

-- 
-Gary Mills--Unix Group--Computer and Network Services-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS with Equallogic storage

2010-08-22 Thread Gary Mills
On Sat, Aug 21, 2010 at 06:36:37PM -0400, Toby Thain wrote:
 
 On 21-Aug-10, at 3:06 PM, Ross Walker wrote:
 
 On Aug 21, 2010, at 2:14 PM, Bill Sommerfeld bill.sommerf...@oracle.com 
  wrote:
 
 On 08/21/10 10:14, Ross Walker wrote:
 ...
 Would I be better off forgoing resiliency for simplicity, putting  
 all my faith into the Equallogic to handle data resiliency?
 
 IMHO, no; the resulting system will be significantly more brittle.
 
 Exactly how brittle I guess depends on the Equallogic system.
 
 If you don't let zfs manage redundancy, Bill is correct: it's a more  
 fragile system that *cannot* self heal data errors in the (deep)  
 stack. Quantifying the increased risk, is a question that Richard  
 Elling could probably answer :)

That's because ZFS does not have a way to handle a large class of
storage designs, specifically the ones with raw storage and disk
management being provided by reliable SAN devices.

-- 
-Gary Mills--Unix Group--Computer and Network Services-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Solaris startup script location

2010-08-18 Thread Gary Mills
On Wed, Aug 18, 2010 at 12:16:04AM -0700, Alxen4 wrote:
 Is there any way run start-up script before non-root pool is mounted ?
 
 For example I'm trying to use ramdisk as ZIL device (ramdiskadm )
 So I need to create ramdisk before actual pool is mounted otherwise it 
 complains that log device is missing :)

Yes, it's actually quite easy.  You need to create an SMF manifest and
method.  The manifest should make the ZFS mount dependant on it with
the `dependent' and `/dependent' tag pair.  It also needs to be
dependant on resources it needs, with the `dependency' and
`/dependency' pairs. It should also specify a `single_instance/' and
`transient' service.  The method script can do whatever the mount
requires, such as creating the ramdisk.

-- 
-Gary Mills--Unix Group--Computer and Network Services-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Opensolaris is apparently dead

2010-08-16 Thread Gary Mills
On Fri, Aug 13, 2010 at 01:54:13PM -0700, Erast wrote:
 
 On 08/13/2010 01:39 PM, Tim Cook wrote:
 http://www.theregister.co.uk/2010/08/13/opensolaris_is_dead/
 
 I'm a bit surprised at this development... Oracle really just doesn't
 get it.  The part that's most disturbing to me is the fact they won't be
 releasing nightly snapshots.  It appears they've stopped Illumos in its
 tracks before it really even got started (perhaps that explains the
 timing of this press release)
 
 Wrong. Be patient, with the pace of current Illumos development it soon 
 will have all the closed binaries liberated and ready to sync up with 
 promised ON code drops as dictated by GPL and CDDL licenses.

Is this what you mean, from:

http://hub.opensolaris.org/bin/view/Main/opensolaris_license

Any Covered Software that You distribute or otherwise make available
in Executable form must also be made available in Source Code form and
that Source Code form must be distributed only under the terms of this
License. You must include a copy of this License with every copy of
the Source Code form of the Covered Software You distribute or
otherwise make available. You must inform recipients of any such
Covered Software in Executable form as to how they can obtain such
Covered Software in Source Code form in a reasonable manner on or
through a medium customarily used for software exchange.

-- 
-Gary Mills--Unix Group--Computer and Network Services-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS development moving behind closed doors

2010-08-13 Thread Gary Mills
If this information is correct,

http://opensolaris.org/jive/thread.jspa?threadID=133043

further development of ZFS will take place behind closed doors.
Opensolaris will become the internal development version of Solaris
with no public distributions.  The community has been abandoned.

-- 
-Gary Mills--Unix Group--Computer and Network Services-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] zfs upgrade unmounts filesystems

2010-07-29 Thread Gary Mills
Zpool upgrade on this system went fine, but zfs upgrade failed:

# zfs upgrade -a
cannot unmount '/space/direct': Device busy
cannot unmount '/space/dcc': Device busy
cannot unmount '/space/direct': Device busy
cannot unmount '/space/imap': Device busy
cannot unmount '/space/log': Device busy
cannot unmount '/space/mysql': Device busy
2 filesystems upgraded

Do I have to shut down all the applications before upgrading the
filesystems?  This is on a Solaris 10 5/09 system.

-- 
-Gary Mills--Unix Group--Computer and Network Services-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs upgrade unmounts filesystems

2010-07-29 Thread Gary Mills
On Thu, Jul 29, 2010 at 10:26:14PM +0200, Pawel Jakub Dawidek wrote:
 On Thu, Jul 29, 2010 at 12:00:08PM -0600, Cindy Swearingen wrote:
  
  I found a similar zfs upgrade failure with the device busy error, which
  I believe was caused by a file system mounted under another file system.
  
  If this is the cause, I will file a bug or find an existing one.

No, it was caused by processes active on those filesystems.

  The workaround is to unmount the nested file systems and upgrade them
  individually, like this:
  
  # zfs upgrade space/direct
  # zfs upgrade space/dcc

Except that I couldn't unmount them because the filesystems were busy.

 'zfs upgrade' unmounts file system first, which makes it hard to upgrade
 for example root file system. The only work-around I found is to clone
 root file system (clone is created with most recent version), change
 root file system to newly created clone, reboot, upgrade original root
 file system, change root file system back, reboot, destroy clone.

In this case it wasn't the root filesystem, but I still had to disable
twelve services before doing the upgrade and enable them afterwards.
`fuser -c' is useful to identify the processes.  Mapping them to
services can be difficult.  The server is essentially down during the
upgrade.

For a root filesystem, you might have to boot off the failsafe archive
or a DVD and import the filesystem in order to upgrade it.

-- 
-Gary Mills--Unix Group--Computer and Network Services-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS disks hitting 100% busy

2010-06-07 Thread Gary Mills
Our e-mail server started to slow down today.  One of the disk devices
is frequently at 100% usage.  The heavy writes seem to cause reads to
run quite slowly.  In the statistics below, `c0t0d0' is UFS, containing
the / and /var slices.  `c0t1d0' is ZFS, containing /var/log/syslog,
a couple of databases, and the GNU mailman files.  It's this latter
disk that's been hitting 100% usage.

  $ iostat -xn 5 3
  extended device statistics
  r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
  8.2   57.8  142.6  538.2  0.0  1.70.1   25.2   0  48 c0t0d0
  5.8  273.0  303.4 24115.9  0.0 18.60.0   66.7   0  73 c0t1d0
  extended device statistics
  r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
  0.0   57.20.0  294.6  0.0  1.30.0   22.1   0  64 c0t0d0
  0.2  370.21.1 33968.5  0.0 31.40.0   84.9   1 100 c0t1d0
  extended device statistics
  r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
  0.8   61.06.4  503.0  0.0  2.50.0   40.0   0  70 c0t0d0
  0.0  295.80.0 35273.3  0.0 35.00.0  118.3   0 100 c0t1d0

This system is running Solaris 10 5/09 on a Sun 4450 server.  Both the
disk devices are actually hardware-mirrored pairs of SAS disks, with
the Adaptec RAID controller.  Can anything be done to either reduce
the amount of I/O or to improve the write bandwidth?  I assume that
adding another disk device to the zpool will double the bandwidth.

/var/log/syslog is quite large, reaching about 600 megabytes before
it's rotated.  This takes place each night, with compression bringing
it down to about 70 megabytes.  The server handles about 500,000
messages a day.

-- 
-Gary Mills--Unix Group--Computer and Network Services-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Is the J4200 SAS array suitable for Sun Cluster?

2010-05-17 Thread Gary Mills
On Sun, May 16, 2010 at 01:14:24PM -0700, Charles Hedrick wrote:
 We use this configuration. It works fine. However I don't know
 enough about the details to answer all of your questions.
 
 The disks are accessible from both systems at the same time. Of
 course with ZFS you had better not actually use them from both
 systems.

That's what I wanted to know.  I'm not familiar with SAS fabrics, so
it's good to know that they operate similarly to multi-initiator SCSI
in a cluster.

 Actually, let me be clear about what we do. We have two J4200's and
 one J4400. One J4200 uses SAS disks, the others SATA. The two with
 SATA disks are used in Sun cluster configurations as NFS
 servers. They fail over just fine, losing no state. The one with SAS
 is not used with Sun Cluster. Rather, it's a Mysql server with two
 systems, one of them as a hot spare. (It also acts as a mysql slave
 server, but it uses different storage for that.) That means that our
 actual failover experience is with the SATA configuration. I will
 say from experience that in the SAS configuration both systems see
 the disks at the same time. I even managed to get ZFS to mount the
 same pool from both systems, which shouldn't be possible. Behavior
 was very strange until we realized what was going on.

Our situation is that we only need a small amount of shared storate
in the cluster.  It's intended for high-availability of core services,
such as DNS and NIS, rather than as a NAS server.

 I get the impression that they have special hardware in the SATA
 version that simulates SAS dual interface drives. That's what lets
 you use SATA drives in a two-node configuration. There's also some
 additional software setup for that configuration.

That would be the SATA interposer that does that.

-- 
-Gary Mills--Unix Group--Computer and Network Services-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Does ZFS use large memory pages?

2010-05-07 Thread Gary Mills
On Thu, May 06, 2010 at 07:46:49PM -0700, Rob wrote:
 Hi Gary,
 I would not remove this line in /etc/system.
 We have been combatting this bug for a while now on our ZFS file
 system running JES Commsuite 7.
 
 I would be interested in finding out how you were able to pin point
 the problem.

Our problem was a year ago.  Careful reading of Sun bug reports
helped.  Opening a support case with Sun helped even more.  Large
memory pages were likely not involved.

 We seem to have no worries with the system currently, but when the
 file system gets above 80% we seems to have quite a number of
 issues, much the same as what you've had in the past, ps and prstats
 hanging.
 
 are you able to tell me the IDR number that you applied?

The IDR was only needed last year.  Upgrading to Solaris 10 10/09
and applying the latest patches resolved the problem.

-- 
-Gary Mills--Unix Group--Computer and Network Services-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Is the J4200 SAS array suitable for Sun Cluster?

2010-05-03 Thread Gary Mills
I'm setting up a two-node cluster with 1U x86 servers.  It needs a
small amount of shared storage, with two or four disks.  I understand
that the J4200 with SAS disks is approved for this use, although I
haven't seen this information in writing.  Does anyone have experience
with this sort of configuration?  I have a few questions.

I understand that the J4200 with SATA disks will not do SCSI
reservations.  Will it with SAS disks?

The X4140 seems to require two SAS HBAs, one for the internal disks
and one for the external disks.  Is this correct?

Will the disks in the J4200 be accessible from both nodes, so that
the cluster can fail over the storage?  I know this works with a
multi-initiator SCSI bus, but I don't know about SAS behavior.

Is there a smaller, and cheaper, SAS array that can be used in this
configuration?  It would still need to have redundant power and
redundant SAS paths.

I plan to use ZFS everywhere, for the root filesystem and the shared
storage.  The only exception will be UFS for /globaldevices .

-- 
-Gary Mills--Unix Group--Computer and Network Services-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SAS vs SATA: Same size, same speed, why SAS?

2010-04-26 Thread Gary Mills
On Mon, Apr 26, 2010 at 01:32:33PM -0500, Dave Pooser wrote:
 On 4/26/10 10:10 AM, Richard Elling richard.ell...@gmail.com wrote:
 
  SAS shines with multiple connections to one or more hosts.  Hence, SAS
  is quite popular when implementing HA clusters.
 
 So that would be how one builds something like the active/active controller
 failover in standalone RAID boxes. Is there a good resource on doing
 something like that with an OpenSolaris storage server? I could see that as
 a project I might want to attempt.

This is interesting.  I have a two-node SPARC cluster that uses a
multi-initiator SCSI array for shared storage.  As an application
server, it need only two disks in the array.  They are a ZFS mirror.
This all works quite nicely under Sun Cluster.

I'd like to duplicate this configuration with two small x86 servers
and a small SAS array, also with only two disks.  It should be easy to
find a pair of 1U servers, but what's the smallest SAS array that's
available?  Does it need an array controller?  What's needed on the
servers to connect to it?

-- 
-Gary Mills--Unix Group--Computer and Network Services-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Snapshot recycle freezes system activity

2010-03-11 Thread Gary Mills
On Thu, Mar 04, 2010 at 04:20:10PM -0600, Gary Mills wrote:
 We have an IMAP e-mail server running on a Solaris 10 10/09 system.
 It uses six ZFS filesystems built on a single zpool with 14 daily
 snapshots.  Every day at 11:56, a cron command destroys the oldest
 snapshots and creates new ones, both recursively.  For about four
 minutes thereafter, the load average drops and I/O to the disk devices
 drops to almost zero.  Then, the load average shoots up to about ten
 times normal and then declines to normal over about four minutes, as
 disk activity resumes.  The statistics return to their normal state
 about ten minutes after the cron command runs.

I'm pleased to report that I found the culprit and the culprit was me!
Well, ZFS peculiarities may be involved as well.  Let me explain:

We had a single second-level filesystem and five third-level
filesystems, all with 14 daily snapshots.  The snapshots were
maintained by a cron command that did a `zfs list -rH -t snapshot -o
name' to get the names of all of the snapshots, extracted the part
after the `@', and then sorted them uniquely to get a list of suffixes
that were older than 14 days.  The suffixes were Julian dates so they
sorted correctly.  It then did a `zfs destroy -r' to delete them.  The
recursion was always done from the second-level filesystem.  The
top-level filesystem was empty and had no snapshots.  Here's a portion
of the script:

zfs list -rH -t snapshot -o name $FS | \
cut -d@ -f2 | \
sort -ur | \
sed 1,${NR}d | \
xargs -I '{}' zfs destroy -r $FS@'{}'

zfs snapshot -r $...@$jd

Just over two weeks ago, I rearranged the filesystems so that the
second-level filesystem was newly-created and initially had no
snapshots.  It did have a snapshot taken every day thereafter, so that
eventually it also had 14 of them.  It was during that interval that
the complaints started.  My statistics clearly showed the performance
stall and subsequent recovery.  Once that filesystem reached 14
snapshots, the complaints stopped and the statistics showed only a
modest increase in CPU activity, but no stall.

During this interval, the script was doing a recursive destroy for a
snapshot that didn't exist at the specified level, but only existed in
the descendent filesystems.  I'm assuming that that unusual situation
was the cause of the stall, although I don't have good evidence.  By
the time the complaints reached my ears, and I was able to refine my
statistics gathering sufficiently, the problem had gone away.

-- 
-Gary Mills--Unix Group--Computer and Network Services-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Snapshot recycle freezes system activity

2010-03-09 Thread Gary Mills
On Mon, Mar 08, 2010 at 03:18:34PM -0500, Miles Nordin wrote:
  gm == Gary Mills mi...@cc.umanitoba.ca writes:
 
 gm destroys the oldest snapshots and creates new ones, both
 gm recursively.
 
 I'd be curious if you try taking the same snapshots non-recursively
 instead, does the pause go away?  

I'm still collecting statistics, but that is one of the things I'd
like to try.

 Because recursive snapshots are special: they're supposed to
 atomically synchronize the cut-point across all the filesystems
 involved, AIUI.  I don't see that recursive destroys should be
 anything special though.
 
 gm Is it destroying old snapshots or creating new ones that
 gm causes this dead time?
 
 sortof seems like you should tell us this, not the other way
 around. :)  Seriously though, isn't that easy to test?  And I'm curious
 myself too.

Yes, that's another thing I'd like to try.  I'll just put a `sleep'
in the script between the two actions to see if the dead time moves
later in the day.

-- 
-Gary Mills--Unix Group--Computer and Network Services-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Snapshot recycle freezes system activity

2010-03-09 Thread Gary Mills
On Mon, Mar 08, 2010 at 01:23:10PM -0800, Bill Sommerfeld wrote:
 On 03/08/10 12:43, Tomas Ögren wrote:
 So we tried adding 2x 4GB USB sticks (Kingston Data
 Traveller Mini Slim) as metadata L2ARC and that seems to have pushed the
 snapshot times down to about 30 seconds.
 
 Out of curiosity, how much physical memory does this system have?

Mine has 64 GB of memory with the ARC limited to 32 GB.  The Cyrus
IMAP processes, thousands of them, use memory mapping extensively.
I don't know if this design affects the snapshot recycle behavior.

-- 
-Gary Mills--Unix Group--Computer and Network Services-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Snapshot recycle freezes system activity

2010-03-05 Thread Gary Mills
On Thu, Mar 04, 2010 at 04:20:10PM -0600, Gary Mills wrote:
 We have an IMAP e-mail server running on a Solaris 10 10/09 system.
 It uses six ZFS filesystems built on a single zpool with 14 daily
 snapshots.  Every day at 11:56, a cron command destroys the oldest
 snapshots and creates new ones, both recursively.  For about four
 minutes thereafter, the load average drops and I/O to the disk devices
 drops to almost zero.  Then, the load average shoots up to about ten
 times normal and then declines to normal over about four minutes, as
 disk activity resumes.  The statistics return to their normal state
 about ten minutes after the cron command runs.

I should mention that this seems to be a new problem.  We've been
using the same scheme to cycle snapshots for several years.  The
complaints of an unresponsive interval have only happened recently.
I'm still waiting for our help desk to report on when the complaints
started.  It may be the result of some recent change we made, but
so far I can't tell what that might have been.

-- 
-Gary Mills--Unix Group--Computer and Network Services-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Snapshot recycle freezes system activity

2010-03-04 Thread Gary Mills
We have an IMAP e-mail server running on a Solaris 10 10/09 system.
It uses six ZFS filesystems built on a single zpool with 14 daily
snapshots.  Every day at 11:56, a cron command destroys the oldest
snapshots and creates new ones, both recursively.  For about four
minutes thereafter, the load average drops and I/O to the disk devices
drops to almost zero.  Then, the load average shoots up to about ten
times normal and then declines to normal over about four minutes, as
disk activity resumes.  The statistics return to their normal state
about ten minutes after the cron command runs.

Is it destroying old snapshots or creating new ones that causes this
dead time?  What does each of these procedures do that could affect
the system?  What can I do to make this less visible to users?

-- 
-Gary Mills--Unix Group--Computer and Network Services-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Snapshot recycle freezes system activity

2010-03-04 Thread Gary Mills
On Thu, Mar 04, 2010 at 07:51:13PM -0300, Giovanni Tirloni wrote:
 
On Thu, Mar 4, 2010 at 7:28 PM, Ian Collins [1]...@ianshome.com
wrote:

Gary Mills wrote:

  We have an IMAP e-mail server running on a Solaris 10 10/09 system.
  It uses six ZFS filesystems built on a single zpool with 14 daily
  snapshots.  Every day at 11:56, a cron command destroys the oldest
  snapshots and creates new ones, both recursively.  For about four
  minutes thereafter, the load average drops and I/O to the disk
  devices
  drops to almost zero.  Then, the load average shoots up to about
  ten
  times normal and then declines to normal over about four minutes,
  as
  disk activity resumes.  The statistics return to their normal state
  about ten minutes after the cron command runs.
  Is it destroying old snapshots or creating new ones that causes
  this
  dead time?  What does each of these procedures do that could affect
  the system?  What can I do to make this less visible to users?
  
  I have a couple of Solaris 10 boxes that do something similar
  (hourly snaps) and I've never seen any lag in creating and
  destroying snapshots.  One system with 16 filesystems takes 5
  seconds to destroy the 16 oldest snaps and create 5 recursive new
  ones.  I logged load average on these boxes and there is a small
  spike on the hour, but this is down to sending the snaps, not
  creating them.
  
We've seen the behaviour that Gary describes while destroying datasets
recursively (600GB and with 7 snapshots). It seems that close to the
end the server stalls for 10-15 minutes and NFS activity stops. For
small datasets/snapshots that doesn't happen or is harder to notice.
Does ZFS have to do something special when it's done releasing the
data blocks at the end of the destroy operation ?

That does sound similar to the problem here.  The zpool is 3 TB in
size with about 1.4 TB used.  It does sound as if the stall happens
during the `zfs destroy -r' rather than during the `zfs snapshot -r'.
What can zfs be doing when the CPU load average drops and disk I/O is
close to zero?

I also had peculiar problem here recently when I was upgrading the ZFS
filesystems on our test server from 3 to 4.  When I tried `zfs upgrade
-a', the command hung for a long time and could not be interrupted,
killed, or traced.  Eventually it terminated on its own.  Only the two
upper-level filesystems had been upgraded.  I upgraded the lower-
level ones individually with `zfs upgrade' with no further problems.
I had previously upgraded the zpool with no problems.  I don't know if
this behavior is related to the stall on the production server.  I
haven't attempted the upgrades there yet.

-- 
-Gary Mills--Unix Group--Computer and Network Services-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How do separate ZFS filesystems affect performance?

2010-01-14 Thread Gary Mills
On Thu, Jan 14, 2010 at 10:58:48AM +1100, Daniel Carosone wrote:
 On Wed, Jan 13, 2010 at 08:21:13AM -0600, Gary Mills wrote:
  Yes, I understand that, but do filesystems have separate queues of any
  sort within the ZIL?
 
 I'm not sure. If you can experiment and measure a benefit,
 understanding the reasons is helpful but secondary.  If you can't
 experiment so easily, you're stuck asking questions, as now, to see
 whether the effort of experimenting is potentially worthwhile. 

Yes, we're stuck asking questions.  I appreciate your responses.

 Some other things to note (not necessarily arguments for or against):
 
  * you can have multiple slog devices, in case you're creating
so much ZIL traffic that ZIL queueing is a real problem, however
shared or structured between filesystems.

For the time being, I'd like to stay with the ZIL that's internal
to the zpool.

  * separate filesystems can have different properties which might help
tuning and experiments (logbias, copies, compress, *cache), as well
the recordsize.  Maybe you will find that compress on mailboxes
helps, as long as you're not also compressing the db's?

Yes, that's a good point in favour of a separate filesystem.

  * separate filesystems may have different recovery requirements
(snapshot cycles).  Note that taking snapshots is ~free, but
keeping them and deleting them have costs over time.  Perhaps you
can save some of these costs if the db's are throwaway/rebuildable. 

Also a good point.

  If not, would it help to put the database
  filesystems into a separate zpool?
 
 Maybe, if you have the extra devices - but you need to compare with
 the potential benefit of adding those devices (and their IOPS) to
 benefit all users of the existing pool.
 
 For example, if the databases are a distinctly different enough load,
 you could compare putting them on a dedicated pool on ssd, vs using
 those ssd's as additional slog/l2arc.  Unless you can make quite
 categorical separations between the workloads, such that an unbalanced
 configuration matches an unbalanced workload, you may still be better
 with consolidated IO capacity in the one pool.

As well, I'd like to keep all of the ZFS pools on the same external
storage device.  This makes migrating to a different server quite easy.

 Note, also, you can only take recursive atomic snapshots within the
 one pool - this might be important if the db's have to match the
 mailbox state exactly, for recovery.

That's another good point.  It's certainly better to have synchronized
snapshots.

-- 
-Gary Mills--Unix Group--Computer and Network Services-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Does ZFS use large memory pages?

2010-01-12 Thread Gary Mills
On Mon, Jan 11, 2010 at 01:43:27PM -0600, Gary Mills wrote:
 
 This line was a workaround for bug 6642475 that had to do with
 searching for for large contiguous pages. The result was high system
 time and slow response.  I can't find any public information on this
 bug, although I assume it's been fixed by now.  It may have only
 affected Oracle database.

I eventually found it.  The bug is not visible from Sunsolve even with
a contract, but it is in bugs.opensolaris.org without one.  This is
extremely confusing.

 I'd like to remove this line from /etc/system now, but I don't know
 if it will have any adverse effect on ZFS or the Cyrus IMAP server
 that runs on this machine.  Does anyone know if ZFS uses large memory
 pages?

Bug 6642475 is still outstanding, although related bugs have been fixed.
I'm going to leave `set pg_contig_disable=1' in place.

-- 
-Gary Mills--Unix Group--Computer and Network Services-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] How do separate ZFS filesystems affect performance?

2010-01-12 Thread Gary Mills
I'm working with a Cyrus IMAP server running on a T2000 box under
Solaris 10 10/09 with current patches.  Mailboxes reside on six ZFS
filesystems, each containing about 200 gigabytes of data.  These are
part of a single zpool built on four Iscsi devices from our Netapp
filer.

One of these ZFS filesystems contains a number of global and per-user
databases in addition to one sixth of the mailboxes.  I'm thinking of
moving these databases to a separate ZFS filesystem.  Access to these
databases must be quick to ensure responsiveness of the server.  We
are currently experiencing a slowdown in performance when the number
of simultaneous IMAP sessions rises above 3000.  These databases are
opened and memory-mapped by all processes.  They have the usual
requirement for locking and synchronous writes whenever they are
updated.

Is moving the databases (IMAP metadata) to a separate ZFS filesystem
likely to improve performance?  I've heard that this is important, but
I'm not clear why this is.  Does each filesystem have its own queue in
the ARC or ZIL?  Here are some statistics taken while the server was
busy and access was slow:

# /usr/local/sbin/zilstat 5 5
   N-Bytes  N-Bytes/s N-Max-RateB-Bytes  B-Bytes/s B-Max-Rateops  =4kB 
4-32kB =32kB
   1126664 225332 515872   1148518422970363469312292163 
51 79
740536 148107 250896953548819070974005888198106 
24 68
758344 151668 179104   1254604825092092682880227 93 
45 89
603304 120660 204344917913618358272084864179 89 
23 67
948896 189779 346520   1588019231760384173824262108 
32123
# /usr/local/sbin/arcstat 5 5
Time  read  miss  miss%  dmis  dm%  pmis  pm%  mmis  mm%  arcsz c
10:50:16  191M   31M 16   14M8   17M   48   18M   1230G   32G
10:50:211K   148 1076572   5878   1530G   32G
10:50:261K   154 1288765   7296   1830G   32G
10:50:31   79661  7547 6   3525830G   32G
10:50:361K   117  9   105812   5344   1030G   32G

-- 
-Gary Mills--Unix Group--Computer and Network Services-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How do separate ZFS filesystems affect performance?

2010-01-12 Thread Gary Mills
On Tue, Jan 12, 2010 at 11:11:36AM -0600, Bob Friesenhahn wrote:
 On Tue, 12 Jan 2010, Gary Mills wrote:
 
 Is moving the databases (IMAP metadata) to a separate ZFS filesystem
 likely to improve performance?  I've heard that this is important, but
 I'm not clear why this is.
 
 There is an obvious potential benefit in that you are then able to 
 tune filesystem parameters to best fit the needs of the application 
 which updates the data.  For example, if the database uses a small 
 block size, then you can set the filesystem blocksize to match.  If 
 the database uses memory mapped files, then using a filesystem 
 blocksize which is closest to the MMU page size may improve 
 performance.

I found a couple of references that suggest just putting the databases
on their own ZFS filesystem has a great benefit.  One is an e-mail
message to a mailing list from Vincent Fox at UC Davis.  They run a
similar system to ours at that site.  He says:

Particularly the database is important to get it's own filesystem so
that it's queue/cache are separated.

The second one is from:

http://blogs.sun.com/roch/entry/the_dynamics_of_zfs

He says:

For file modification that come with some immediate data integrity
constraint (O_DSYNC, fsync etc.) ZFS manages a per-filesystem intent
log or ZIL.

This sounds like the ZIL queue mentioned above.  Is I/O for each of
those handled separately?

-- 
-Gary Mills--Unix Group--Computer and Network Services-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Does ZFS use large memory pages?

2010-01-11 Thread Gary Mills
Last April we put this in /etc/system on a T2000 server with large ZFS
filesystems:

set pg_contig_disable=1

This was while we were attempting to solve a couple of ZFS problems
that were eventually fixed with an IDR.  Since then, we've removed
the IDR and brought the system up to Solaris 10 10/09 with current
patches.  It's stable now, but seems slower.

This line was a workaround for bug 6642475 that had to do with
searching for for large contiguous pages. The result was high system
time and slow response.  I can't find any public information on this
bug, although I assume it's been fixed by now.  It may have only
affected Oracle database.

I'd like to remove this line from /etc/system now, but I don't know
if it will have any adverse effect on ZFS or the Cyrus IMAP server
that runs on this machine.  Does anyone know if ZFS uses large memory
pages?

-- 
-Gary Mills--Unix Group--Computer and Network Services-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS filesystems not mounted on reboot with Solaris 10 10/09

2009-12-19 Thread Gary Mills
I have a system that was recently upgraded to Solaris 10 10/09.  It
has a UFS root on local disk and a separate zpool on Iscsi disk.
After a reboot, the ZFS filesystems were not mounted, although the
zpool had been imported.  `zfs mount' showed nothing.  `zfs mount -a'
mounted them nicely.  The `canmount' property is `on'.  Why would they
not be mounted at boot?  This used to work with earlier releases of
Solaris 10.

The `zfs mount -a' at boot is run by the /system/filesystem/local:default
service.  It didn't record any errors on the console or in the log

[ Dec 19 08:09:11 Executing start method (/lib/svc/method/fs-local) ]
[ Dec 19 08:09:12 Method start exited with status 0 ]

Is a dependancy missing?

-- 
-Gary Mills--Unix Group--Computer and Network Services-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Permanent errors on two files

2009-12-06 Thread Gary Mills
On Fri, Dec 04, 2009 at 02:52:47PM -0700, Cindy Swearingen wrote:
 
 If space/dcc is a dataset, is it mounted? ZFS might not be able to
 print the filenames if the dataset is not mounted, but I'm not sure
 if this is why only object numbers are displayed.

Yes, it's mounted and is quite an active filesystem.

 I would also check fmdump -eV to see how frequent the hardware
 has had problems.

That shows ZFS checksum errors in July, but nothing since that time.
There were also DIMM errors before that, starting in June.  We
replaced the failed DIMMs, also in July.  This is an X4450 with ECC
memory.  There were no disk errors reported.  I suppose we can blame
the memory.

-- 
-Gary Mills--Unix Group--Computer and Network Services-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Permanent errors on two files

2009-12-06 Thread Gary Mills
On Sat, Dec 05, 2009 at 01:52:12AM +0300, Victor Latushkin wrote:
 On Dec 5, 2009, at 0:52, Cindy Swearingen cindy.swearin...@sun.com  
 wrote:
 
 The zpool status -v command will generally print out filenames, dnode
 object numbers, or identify metadata corruption problems. These look
 like object numbers, because they are large, rather than metadata
 objects, but an expert will have to comment.
 
 Yes, thi is object numbers and most likely reason these are not turned  
 into filnames is that corresponding files no longer exist.

That seems to be the case:

# zdb -d space/dcc 0x11e887 0xba25aa
Dataset space/dcc [ZPL], ID 21, cr_txg 19, 20.5G, 3672408 objects

 So I'd run scrub another time, if the files are gone and there are no  
 other corruptions scrub will reset error log and zpool status should  
 become clean.

That worked.  After the scrub, there are no errors reported.

 You might be able to identify these object numbers with zdb, but
 I'm not sure how do that.
 
 You can try to use zdb this way to check if these objects still exist
 
 zdb -d space/dcc 0x11e887 0xba25aa

-- 
-Gary Mills--Unix Group--Computer and Network Services-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] If you have ZFS in production, willing to share some details (with me)?

2009-09-21 Thread Gary Mills
On Fri, Sep 18, 2009 at 01:51:52PM -0400, Steffen Weiberle wrote:
 I am trying to compile some deployment scenarios of ZFS.
 
 # of systems

One, our e-mail server for the entire campus.

 amount of storage

2 TB that's 58% used.

 application profile(s)

This is our Cyrus IMAP spool.  In addition to user's e-mail folders
(directories) and messages (files), it contains global, per-folder,
and per-user databases.  The latter two types are quite small.

 type of workload (low, high; random, sequential; read-only, read-write, 
 write-only)

It's quite active.  Message files arrive randomly and are deleted
randomly.  As a result, files in a directory are not located in
proximity on the storage.  Individual users often read all of their
folders and messages in one IMAP session.  Databases are quite active.
Each incoming message adds a file to a directory and reads or updates
several databases.  Most IMAP I/O is done with mmap() rather than with
read()/write().  So far, IMAP peformance is adequate.  The backup,
done by EMC Networker, is very slow because it must read thousands of
small files in directory order.

 storage type(s)

We are using an Iscsi SAN with storage on a Netapp filer.  It exports
four 500-gb LUNs that are striped into one ZFS pool.  All disk
mangement is done on the Netapp.  We have had several disk failures
and replacements on the Netapp, with no effect on the e-mail server.

 industry

A University with 35,000 enabled e-mail accounts.

 whether it is private or I can share in a summary
 anything else that might be of interest

You are welcome to share this information.

-- 
-Gary Mills--Unix Group--Computer and Network Services-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS commands hang after several zfs receives

2009-09-15 Thread Gary Mills
On Tue, Sep 15, 2009 at 08:48:20PM +1200, Ian Collins wrote:
 Ian Collins wrote:
 I have a case open for this problem on Solaris 10u7.
 
 The case has been identified and I've just received an IDR,which I 
 will test next week.  I've been told the issue is fixed in update 8, 
 but I'm not sure if there is an nv fix target.
 
 I'll post back once I've abused a test system for a while.
 
 The IDR I was sent appears to have fixed the problem.  I have been 
 abusing the box for a couple of weeks without any lockups.  Roll on 
 update 8!

Was that IDR140221-17?  That one fixed a deadlock bug for us back
in May.

-- 
-Gary Mills--Unix Group--Computer and Network Services-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-07-07 Thread Gary Mills
On Mon, Jul 06, 2009 at 04:54:16PM +0100, Andrew Gabriel wrote:
 Andre van Eyssen wrote:
 On Mon, 6 Jul 2009, Gary Mills wrote:
 
 As for a business case, we just had an extended and catastrophic
 performance degradation that was the result of two ZFS bugs.  If we
 have another one like that, our director is likely to instruct us to
 throw away all our Solaris toys and convert to Microsoft products.
 
 If you change platform every time you get two bugs in a product, you 
 must cycle platforms on a pretty regular basis!
 
 You often find the change is towards Windows. That very rarely has the 
 same rules applied, so things then stick there.

There's a more general principle in operation here.  Organizations do
sometimes change platforms for peculiar reasons, but once they do that
they're not going to do it again for a long time.  That's why they
disregard problems with the new platform.

-- 
-Gary Mills--Unix Support--U of M Academic Computing and Networking-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-07-06 Thread Gary Mills
On Sat, Jul 04, 2009 at 07:18:45PM +0100, Phil Harman wrote:
 Gary Mills wrote:
 On Sat, Jul 04, 2009 at 08:48:33AM +0100, Phil Harman wrote:
   
 ZFS doesn't mix well with mmap(2). This is because ZFS uses the ARC  
 instead of the Solaris page cache. But mmap() uses the latter. So if  
 anyone maps a file, ZFS has to keep the two caches in sync.
 
 That's the first I've heard of this issue.  Our e-mail server runs
 Cyrus IMAP with mailboxes on ZFS filesystems.  Cyrus uses mmap(2)
 extensively.  I understand that Solaris has an excellent
 implementation of mmap(2).  ZFS has many advantages, snapshots for
 example, for mailbox storage.  Is there anything that we can be do to
 optimize the two caches in this environment?  Will mmap(2) one day
 play nicely with ZFS?
 
[..]
 Software engineering is always about prioritising resource. Nothing 
 prioritises performance tuning attention quite like compelling 
 competitive data. When Bart Smaalders and I wrote libMicro we generated 
 a lot of very compelling data. I also coined the phrase If Linux is 
 faster, it's a Solaris bug. You will find quite a few (mostly fixed) 
 bugs with the synopsis linux is faster than solaris at 
 
 So, if mmap(2) playing nicely with ZFS is important to you, probably the 
 best thing you can do to help that along is to provide data that will 
 help build the business case for spending engineering resource on the issue.

First of all, how significant is the double caching in terms of
performance?  If the effect is small, I won't worry about it anymore.

What sort of data do you need?  Would a list of software products that
utilize mmap(2) extensively and could benefit from ZFS be suitable?

As for a business case, we just had an extended and catastrophic
performance degradation that was the result of two ZFS bugs.  If we
have another one like that, our director is likely to instruct us to
throw away all our Solaris toys and convert to Microsoft products.

-- 
-Gary Mills--Unix Support--U of M Academic Computing and Networking-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-07-04 Thread Gary Mills
On Sat, Jul 04, 2009 at 08:48:33AM +0100, Phil Harman wrote:
 ZFS doesn't mix well with mmap(2). This is because ZFS uses the ARC  
 instead of the Solaris page cache. But mmap() uses the latter. So if  
 anyone maps a file, ZFS has to keep the two caches in sync.

That's the first I've heard of this issue.  Our e-mail server runs
Cyrus IMAP with mailboxes on ZFS filesystems.  Cyrus uses mmap(2)
extensively.  I understand that Solaris has an excellent
implementation of mmap(2).  ZFS has many advantages, snapshots for
example, for mailbox storage.  Is there anything that we can be do to
optimize the two caches in this environment?  Will mmap(2) one day
play nicely with ZFS?

-- 
-Gary Mills--Unix Support--U of M Academic Computing and Networking-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Lots of metadata overhead on filesystems with 100M files

2009-06-18 Thread Gary Mills
On Thu, Jun 18, 2009 at 12:12:16PM +0200, Cor Beumer - Storage Solution 
Architect wrote:
 
 What they noticed on the the X4500 systems, that when the zpool became 
 filled up for about 50-60% the performance of the system
 did drop enormously.
 They do claim this has to do with the fragmentation of the ZFS 
 filesystem. So we did try over there putting an S7410 system in with 
 about the same config on disks, 44x 1TB SATA BUT 4x 18GB WriteZilla (in 
 a stripe) we were able to get much and much more i/o's from the system 
 the the comparable X4500, however they did put it in production for a 
 couple of weeks, and as soon as the ZFS filesystem did come in the range 
 of about 50-60% filling the did see the same problem.

We had a similar problem with a T2000 and 2 TB of ZFS storage.  Once
the usage reached 1 TB, the write performance dropped considerably and
the CPU consumption increased.  Our problem was indirectly a result of
fragmentation, but it was solved by a ZFS patch.  I understand that
this patch, which fixes a whole bunch of ZFS bugs, should be released
soon.  I wonder if this was your problem.

-- 
-Gary Mills--Unix Support--U of M Academic Computing and Networking-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] What causes slow performance under load?

2009-05-13 Thread Gary Mills
On Mon, Apr 27, 2009 at 04:47:27PM -0500, Gary Mills wrote:
 On Sat, Apr 18, 2009 at 04:27:55PM -0500, Gary Mills wrote:
  We have an IMAP server with ZFS for mailbox storage that has recently
  become extremely slow on most weekday mornings and afternoons.  When
  one of these incidents happens, the number of processes increases, the
  load average increases, but ZFS I/O bandwidth decreases.  Users notice
  very slow response to IMAP requests.  On the server, even `ps' becomes
  slow.
 
 The cause turned out to be this ZFS bug:
 
 6596237: Stop looking and start ganging
 
 Apparently, the ZFS code was searching the free list looking for the
 perfect fit for each write.  With a fragmented pool, this search took
 a very long time, delaying the write.  Eventually, the requests arrived
 faster than writes could be sent to the devices, causing the server
 to be unresponsive.

We also had another problem, due to this ZFS bug:

6591646: Hang while trying to enter a txg while holding a txg open

This was a deadlock, with one thread blocking hundreds of other
threads.  Our symptom was that all zpool I/O would stop and the `ps'
command would hang.  A reboot was the only way out.

If you have a support contract, Sun will supply an IDR that fixes
both problems.

-- 
-Gary Mills--Unix Support--U of M Academic Computing and Networking-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] What causes slow performance under load?

2009-04-27 Thread Gary Mills
On Sat, Apr 18, 2009 at 04:27:55PM -0500, Gary Mills wrote:
 We have an IMAP server with ZFS for mailbox storage that has recently
 become extremely slow on most weekday mornings and afternoons.  When
 one of these incidents happens, the number of processes increases, the
 load average increases, but ZFS I/O bandwidth decreases.  Users notice
 very slow response to IMAP requests.  On the server, even `ps' becomes
 slow.

The cause turned out to be this ZFS bug:

6596237: Stop looking and start ganging

Apparently, the ZFS code was searching the free list looking for the
perfect fit for each write.  With a fragmented pool, this search took
a very long time, delaying the write.  Eventually, the requests arrived
faster than writes could be sent to the devices, causing the server
to be unresponsive.

There isn't a patch for this one yet, but Sun will supply an IDR if
you open a support case.

-- 
-Gary Mills--Unix Support--U of M Academic Computing and Networking-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Peculiarities of COW over COW?

2009-04-26 Thread Gary Mills
On Sun, Apr 26, 2009 at 05:19:18PM -0400, Ellis, Mike wrote:

 As soon as you put those zfs blocks ontop of iscsi, the netapp won't
 have a clue as far as how to defrag those iscsi files from the
 filer's perspective.  (It might do some fancy stuff based on
 read/write patterns, but that's unlikely)

Since the LUN is just a large file on the Netapp, I assume that all
it can do is to put the blocks back into sequential order.  That might
have some benefit overall.

-- 
-Gary Mills--Unix Support--U of M Academic Computing and Networking-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Peculiarities of COW over COW?

2009-04-26 Thread Gary Mills
On Sun, Apr 26, 2009 at 05:02:38PM -0500, Tim wrote:
 
On Sun, Apr 26, 2009 at 3:52 PM, Gary Mills [1]mi...@cc.umanitoba.ca
wrote:

  We run our IMAP spool on ZFS that's derived from LUNs on a Netapp
  filer.  There's a great deal of churn in e-mail folders, with
  messages
  appearing and being deleted frequently.

  Should ZFS and the Netapp be using the same blocksize, so that they
  cooperate to some extent?
  
Just make sure ZFS is using a block size that is a multiple of 4k,
which I believe it does by default.

Okay, that's good.

I have to ask though... why not just serve NFS off the filer to the
Solaris box?  ZFS on a LUN served off a filer seems to make about as
much sense as sticking a ZFS based lun behind a v-filer (although the
latter might actually might make sense in a world where it were
supported *cough*neverhappen*cough* since you could buy the cheap
newegg disk).

I prefer NFS too, but the IMAP server requires POSIX semantics.
I believe that NFS doesn't support that, at least NFS version 3.

-- 
-Gary Mills--Unix Support--U of M Academic Computing and Networking-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] What is the 32 GB 2.5-Inch SATA Solid State Drive?

2009-04-25 Thread Gary Mills
On Fri, Apr 24, 2009 at 09:08:52PM -0700, Richard Elling wrote:
 Gary Mills wrote:
 Does anyone know about this device?
 
 SESX3Y11Z 32 GB 2.5-Inch SATA Solid State Drive with Marlin Bracket
 for Sun SPARC Enterprise T5120, T5220, T5140 and T5240 Servers, RoHS-6
 Compliant
 
 This is from Sun's catalog for the T5120 server.  Would this work well
 as a separate ZIL device for ZFS?  Is there any way I could use this in
 a T2000 server?  The brackets appear to be different.
 
 The brackets are different.  T2000 uses nemo bracket and T5120 uses
 marlin.  For the part-number details, SunSolve is your friend.
 http://sunsolve.sun.com/handbook_pub/validateUser.do?target=Systems/SE_T5120/components
 http://sunsolve.sun.com/handbook_pub/validateUser.do?target=Systems/SunFireT2000_R/components

I see also that no SSD is listed for the T2000.  Has anyone gotten one
to work as a separate ZIL device for ZFS?

-- 
-Gary Mills--Unix Support--U of M Academic Computing and Networking-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] What is the 32 GB 2.5-Inch SATA Solid State Drive?

2009-04-24 Thread Gary Mills
Does anyone know about this device?

SESX3Y11Z 32 GB 2.5-Inch SATA Solid State Drive with Marlin Bracket
for Sun SPARC Enterprise T5120, T5220, T5140 and T5240 Servers, RoHS-6
Compliant

This is from Sun's catalog for the T5120 server.  Would this work well
as a separate ZIL device for ZFS?  Is there any way I could use this in
a T2000 server?  The brackets appear to be different.

-- 
-Gary Mills--Unix Support--U of M Academic Computing and Networking-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] What causes slow performance under load?

2009-04-22 Thread Gary Mills
On Tue, Apr 21, 2009 at 04:09:03PM -0400, Oscar del Rio wrote:
 There's a similar thread on hied-emailad...@listserv.nd.edu
 that might help or at least can get you in touch with other University 
 admins in a similar situation.
 
 https://listserv.nd.edu/cgi-bin/wa?A1=ind0904L=HIED-EMAILADMIN
 Thread: mail systems using ZFS filesystems?

Thanks.  Those problems do sound similar.  I also see positive
experiences with T2000 servers, ZFS, and Cyrus IMAP from UC Davis.

None of the people involved seem to be active on either the ZFS
mailing list or the Cyrus list.

-- 
-Gary Mills--Unix Support--U of M Academic Computing and Networking-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] What causes slow performance under load?

2009-04-21 Thread Gary Mills
On Tue, Apr 21, 2009 at 09:34:57AM -0500, Patrick Skerrett wrote:
 I'm fighting with an identical problem here  am very interested in this 
 thread.
 
 Solaris 10 127112-11 boxes running ZFS on a fiberchannel raid5 device 
 (hardware raid).

You are about a year behind in kernel patches.  There is one patch
that addresses similar problems.  I'd recommend installing all of
the new patches.  This bug seems to be relevant:

http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6535160

 Randomly one lun on a machine will stop writing for about 10-15 minutes 
 (during a busy time of day), and then all of a sudden become active with 
 a burst of activity. Reads will continue to happen.

One thing that seems to have solved our hang and stall problems is
to set `pg_contig_disable=1' in the kernel.  I believe that only
systems with Niagara CPUs are affected.  It has to do with kernel
code for handling two different sizes of memory pages.  You can find
more information here:

http://forums.sun.com/thread.jspa?threadID=5257060

Also, open a support case with Sun if you haven't already.

-- 
-Gary Mills--Unix Support--U of M Academic Computing and Networking-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] What causes slow performance under load?

2009-04-20 Thread Gary Mills
On Sat, Apr 18, 2009 at 04:27:55PM -0500, Gary Mills wrote:
 We have an IMAP server with ZFS for mailbox storage that has recently
 become extremely slow on most weekday mornings and afternoons.  When
 one of these incidents happens, the number of processes increases, the
 load average increases, but ZFS I/O bandwidth decreases.  Users notice
 very slow response to IMAP requests.  On the server, even `ps' becomes
 slow.

After I moved a couple of Cyrus databases from ZFS to UFS on Sunday
morning, the server seemed to run quite nicely.  One of these
databases is memory-mapped by all of the lmtpd and pop3d processes.
The other is opened by all the lmtpd processes.  Both were quite
active, with many small writes, so I assumed they'd be better on UFS.
All of the IMAP mailboxes were still on ZFS.

However, this morning, things went from bad to worse.  All writes to
the ZFS filesystems stopped completely.  Look at this:

$ zpool iostat 5 5
   capacity operationsbandwidth
pool used  avail   read  write   read  write
--  -  -  -  -  -  -
space   1.04T   975G 86 67  4.53M  2.57M
space   1.04T   975G  5  0   159K  0
space   1.04T   975G  7  0   337K  0
space   1.04T   975G  3  0   179K  0
space   1.04T   975G  4  0   167K  0

`fsstat' told me that there was both writes and memory-mapped I/O
to UFS, but nothing to ZFS.  At the same time, the `ps' command
would hang and could not be interrupted.  `truss' on `ps' looked
like this, but it eventually also stopped and not be interrupted.

open(/proc/6359/psinfo, O_RDONLY) = 4
read(4, 02\0\0\0\0\0\001\0\018D7.., 416)  = 416
close(4)= 0
open(/proc/12782/psinfo, O_RDONLY)= 4
read(4, 02\0\0\0\0\0\001\0\0 1EE.., 416)  = 416
close(4)= 0

What could cause this sort of behavior?  It happened three times today!

-- 
-Gary Mills--Unix Support--U of M Academic Computing and Networking-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] What causes slow performance under load?

2009-04-19 Thread Gary Mills
On Sat, Apr 18, 2009 at 09:41:39PM -0500, Tim wrote:
 
On Sat, Apr 18, 2009 at 9:01 PM, Gary Mills [1]mi...@cc.umanitoba.ca
wrote:

  On Sat, Apr 18, 2009 at 06:53:30PM -0400, Ellis, Mike wrote:
   In case the writes are a problem: When zfs sends a sync-command
  to
   the iscsi luns, does the netapp just ack it, or does it wait till
  it
   fully destages? Might make sense to disable write/sync in
   /etc/system to be sure.
  So far I haven't been able to get an answer to that question from
  Netapp.  I'm assuming that it acks it as soon as it's in the
  Netapp's
  non-volatile write cache.
  
IIRC, it should just ack it.  What version of ONTAP are you running?

It seems to be this one:

  MODEL: FAS3020-R5
  SW VERSION:7.2.3

-- 
-Gary Mills--Unix Support--U of M Academic Computing and Networking-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] What causes slow performance under load?

2009-04-19 Thread Gary Mills
On Sat, Apr 18, 2009 at 11:45:54PM -0500, Mike Gerdts wrote:
 [perf-discuss cc'd]
 
 On Sat, Apr 18, 2009 at 4:27 PM, Gary Mills mi...@cc.umanitoba.ca wrote:
  Many other layers are involved in this server.  We use scsi_vhci for
  redundant I/O paths and Sun's Iscsi initiator to connect to the
  storage on our Netapp filer.  The kernel plays a part as well.  How
  do we determine which layer is responsible for the slow performance?
 
 Have you disabled the nagle algorithm for the iscsi initiator?
 
 http://bugs.opensolaris.org/view_bug.do?bug_id=6772828

I tried that on our test IMAP backend the other day.  It made no
significant difference to read or write times or to ZFS I/O bandwidth.
I conclude that the Iscsi initiator has already sized its TCP packets
to avoid Nagle delays.

 Also, you may want to consider doing backups from the NetApp rather
 than from the Solaris box.

I've certainly recommended finding a different way to perform backups.

 Assuming all of your LUNs are in the same
 volume on the filer, a snapshot should be a crash-consistent image of
 the zpool.  You could verify this by making the snapshot rw and trying
 to import the snapshotted LUNs on another host.

That part sounds scary!  The filer exports four LUNs that are combined
into one ZFS pool on the IMAP server.  These LUNs are volumes on the
filer.  How can we safely import them on another host?

 Anyway, this would
 remove the backup-related stress on the T2000.  You can still do
 snapshots at the ZFS layer to give you file level restores.  If the
 NetApp caught on fire, you would simply need to restore the volume
 containing the LUNs (presumably a small collection of large files)
 which would go a lot quicker than a large collection of small files.

Yes, a disaster recovery would be much quicker in this case.

 Since iSCSI is in the mix, you should also be sure that your network
 is appropriately tuned.  Assuming that you are using the onboard
 e1000g NICs, be sure that none of the bad counters are incrementing:
 
 $ kstat -p e1000g | nawk '$0 ~ /err|drop|fail|no/  $NF != 0'
 
 If this gives any output, there is likely something amiss with your network.

Only this:
e1000g:0:e1000g0:unknowns   1764449

I don't know what those are, but it's e1000g1 and e1000g2 that are
used for the Iscsi network.

 The output from iostat -xCn 10 could be interesting as well.  If
 asvc_t is high (30?), it means the filer is being slow to respond.
 If wsvc_t is frequently non-zero, there is some sort of a bottleneck
 that prevents the server from sending requests to the filer.  Perhaps
 you have tuned ssd_max_throttle or Solaris has backed off because the
 filer said to slow down.  (Assuming that ssd is used with iSCSI LUNs).

Here's an example, taken from one of the busy periods:

extended device statistics
r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
0.05.00.07.7  0.0  0.14.1   24.8   1   1 c1t2d0
   27.0   13.8 1523.4  172.9  0.0  0.50.0   11.8   0  38 
c4t60A98000433469764E4A2D456A644A74d0
   42.0   21.4 2027.3  350.0  0.0  0.90.0   13.9   0  60 
c4t60A98000433469764E4A2D456A696579d0
   40.8   25.0 1993.5  339.1  0.0  0.80.0   11.8   0  52 
c4t60A98000433469764E4A476D2F664E4Fd0
   42.0   26.6 1968.4  319.1  0.0  0.80.0   11.8   0  56 
c4t60A98000433469764E4A476D2F6B385Ad0

The service times seem okay to me.  There's no `throttle' setting in
any of the relevant driver conf files.

 What else is happening on the filer when mail gets slow?  That is, are
 you experiencing slowness due to a mail peak or due to some research
 project that happens to be on the same spindles?  What does the
 network look like from the NetApp side?

Our Netapp guy tells me that the filer is operating normally when the
problem occurs.  The Iscsi network is less than 10% utilized.

 Are the mail server and the NetApp attached to the same switch, or are
 they at opposite ends of the campus?  Is there something between them
 that is misbehaving?

I don't think so.  We have dedicated ethernet ports on both the IMAP
server and the filer for Iscsi, along with a pair of dedicated switches.

-- 
-Gary Mills--Unix Support--U of M Academic Computing and Networking-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] What causes slow performance under load?

2009-04-18 Thread Gary Mills
We have an IMAP server with ZFS for mailbox storage that has recently
become extremely slow on most weekday mornings and afternoons.  When
one of these incidents happens, the number of processes increases, the
load average increases, but ZFS I/O bandwidth decreases.  Users notice
very slow response to IMAP requests.  On the server, even `ps' becomes
slow.

We've tried a number of things, each of which made an improvement, but
the problem still occurs.  The ZFS ARC size was about 10 GB, but was
diminishing to 1 GB when the server was busy.  In fact, it was
unusable when that happened.  Upgrading memory from 16 GB to 64 GB
certainly made a difference.  The ARC size is always over 30 GB now.
Next, we limited the number of `lmtpd' (local delivery) processes to
64.  With those two changes, the server still became very slow at busy
times, but no longer became unresponsive.  The final change was to
disable ZFS prefetch.  It's not clear if this made an improvement.

The server is a T2000 running Solaris 10.  It's a Cyrus murder back-
end, essentially only an IMAP server.  We did recently upgrade the
front-end, from a 4-CPU SPARC box to a 16-core Intel box with more
memory, also running Solaris 10.  The front-end runs sendmail and
proxies IMAP and POP connections to the back-end, and also forwards
SMTP for local deliveries to the back-end, using LMTP.

Cyrus runs thousands of `imapd' processes, with many `pop3d', and
`lmtpd' processes as well.  This should be an ideal workload for a
Niagara box.  All of these memory-map several moderate-sized
databases, both Berkeley DB and skiplist types, and occasionally
update those databases.  Our EMC Networker client also often runs
during the day, doing backups.  All of the IMAP mailboxes reside on
six ZFS filesystems, using a single 2-TB pool.  It's only 51% occupied
at the moment.

Many other layers are involved in this server.  We use scsi_vhci for
redundant I/O paths and Sun's Iscsi initiator to connect to the
storage on our Netapp filer.  The kernel plays a part as well.  How
do we determine which layer is responsible for the slow performance?

-- 
-Gary Mills--Unix Support--U of M Academic Computing and Networking-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] What causes slow performance under load?

2009-04-18 Thread Gary Mills
On Sat, Apr 18, 2009 at 05:25:17PM -0500, Bob Friesenhahn wrote:
 On Sat, 18 Apr 2009, Gary Mills wrote:
 
 How do we determine which layer is responsible for the slow 
 performance?
 
 If the ARC size is diminishing under heavy load then there must be 
 excessive pressure for memory from the kernel or applications.  A 30GB 
 ARC is quite large.  The slowdown likely increases the amount of RAM 
 needed since more simultaneous requests are taking place at once and 
 not completing as quickly as they should.  Once the problem starts, it 
 makes itself worse.

It was diminishing under load when the server had only 16 GB of
memory.  There certainly was pressure then, so much so that the server
became unresponsive.  Once we upgraded that to 64 GB, the ARC size
stayed high.  I gather then that there's no longer pressure for memory
by any of the components that might need it.

 It is good to make sure that the backup software is not the initial 
 cause of the cascade effect.

The backup is also very slow, often running for 24 hours.  Since it's
spending most of its time reading files, I assume that it must be
cycling a cache someplace.  I don't know if it's suffering from the
same performance problem or if it's interfering with the IMAP service.
Certainly, killing the backup doesn't seem to provide any relief.  I
don't like the idea of backups running in the daytime, but I get
overruled in that one.

-- 
-Gary Mills--Unix Support--U of M Academic Computing and Networking-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] What causes slow performance under load?

2009-04-18 Thread Gary Mills
On Sat, Apr 18, 2009 at 09:58:19PM -0400, Ellis, Mike wrote:
 I've found that (depending on the backup software) the backup agents
 tend to run a single thread per filesystem. While that can backup
 several filesystems concurrently, the single filesystem backup is
 single-threaded...

Yes, they do that.  There are two of them running right now, but
together they're only using 0.6% CPU.  They're sleeping most of
the time.

 I assume you're using zfs snapshots so you don't get fuzzy backups
 (over the 20-hour period...)

That's what I've been recommending.  We do have 14 daily snapshots
available.  I named them by Julian date, but our backups person
doesn't like them because the names keep changing.

 Can you take a snapshot, and then have your backup software instead
 of backing up 1 entire fs/tree backup a bunch of the high-level
 filesystems concurrently? That could make a big difference on
 something like a t2000.

Wouldn't there be one recent snapshot for each ZFS filesystem?  We've
certainly discussed backing up snapshots, but I wouldn't expect it to
be much different.  Wouldn't it still read all of the same files,
except for ones that were added after the snapshot was taken?

 (You're not by chance using any type of ssh-transfers etc as part of
 the backups are you)

No, Networker use RPC to connect to the backup server, but there's no
encryption or compression on the client side.

-- 
-Gary Mills--Unix Support--U of M Academic Computing and Networking-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] What causes slow performance under load?

2009-04-18 Thread Gary Mills
On Sat, Apr 18, 2009 at 06:06:49PM -0700, Richard Elling wrote:
 [CC'ed to perf-discuss]
 
 Gary Mills wrote:
 We have an IMAP server with ZFS for mailbox storage that has recently
 become extremely slow on most weekday mornings and afternoons.  When
 one of these incidents happens, the number of processes increases, the
 load average increases, but ZFS I/O bandwidth decreases.  Users notice
 very slow response to IMAP requests.  On the server, even `ps' becomes
 slow.
 
 If memory is being stolen from the ARC, then the consumer must be outside
 of ZFS.  I think this is a case for a traditional performance assessment.

It was being stolen from the ARC, but once we added memory, that was
no longer the case.  ZFS is still one of the suspects.

 The server is a T2000 running Solaris 10.  It's a Cyrus murder back-
 end, essentially only an IMAP server.  We did recently upgrade the
 front-end, from a 4-CPU SPARC box to a 16-core Intel box with more
 memory, also running Solaris 10.  The front-end runs sendmail and
 proxies IMAP and POP connections to the back-end, and also forwards
 SMTP for local deliveries to the back-end, using LMTP.
 
 Cyrus runs thousands of `imapd' processes, with many `pop3d', and
 `lmtpd' processes as well.  This should be an ideal workload for a
 Niagara box.  All of these memory-map several moderate-sized
 databases, both Berkeley DB and skiplist types, and occasionally
 update those databases.  Our EMC Networker client also often runs
 during the day, doing backups.  All of the IMAP mailboxes reside on
 six ZFS filesystems, using a single 2-TB pool.  It's only 51% occupied
 at the moment.
 
 Many other layers are involved in this server.  We use scsi_vhci for
 redundant I/O paths and Sun's Iscsi initiator to connect to the
 storage on our Netapp filer.  The kernel plays a part as well.  How
 do we determine which layer is responsible for the slow performance?
 
 prstat is your friend.  Find out who is consuming the resources and work
 from there.

What resources are visible with prstat, other than CPU and memory?
Even at the busiest times, all of the processes only add up to about
6% of the CPU.  The load average does rise alarmingly.  Nothing is using
large amounts of memory, although with thousands of processes, it would
add up.

 I've found that it often makes sense to create processor sets and segregate
 dissimilar apps into different processor sets. mpstat can then clearly show
 how each processor set consumes its processors.  IMAP workloads can
 be very tricky, because of the sort of I/O generated and because IMAP
 allows searching to be done on the server, rather than the client (eg POP)

What would I look for with mpstat?

-- 
-Gary Mills--Unix Support--U of M Academic Computing and Networking-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Any news on ZFS bug 6535172?

2009-04-13 Thread Gary Mills
On Mon, Apr 13, 2009 at 09:08:09AM +0530, Sanjeev wrote:
 
 How full is the pool ? 

Only 50%, but it started with two 500-gig LUNs initially.  We added
two more when it got up to 300 gigabytes.

  # zpool list
  NAMESIZE   USED  AVAILCAP  HEALTH  ALTROOT
  space  1.99T  1.02T   992G51%  ONLINE  -
  # zpool status
pool: space
   state: ONLINE
  status: The pool is formatted using an older on-disk format.  The pool can
  still be used, but some features are unavailable.
  action: Upgrade the pool using 'zpool upgrade'.  Once this is done, the
  pool will no longer be accessible on older software versions.
   scrub: none requested
  config:
  
  NAME STATE READ WRITE CKSUM
  spaceONLINE   0 0 0
c4t60A98000433469764E4A2D456A644A74d0  ONLINE   0 0 0
c4t60A98000433469764E4A2D456A696579d0  ONLINE   0 0 0
c4t60A98000433469764E4A476D2F6B385Ad0  ONLINE   0 0 0
c4t60A98000433469764E4A476D2F664E4Fd0  ONLINE   0 0 0
  
  errors: No known data errors

-- 
-Gary Mills--Unix Support--U of M Academic Computing and Networking-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Any news on ZFS bug 6535172?

2009-04-12 Thread Gary Mills
We're running a Cyrus IMAP server on a T2000 under Solaris 10 with
about 1 TB of mailboxes on ZFS filesystems.  Recently, when under
load, we've had incidents where IMAP operations became very slow.  The
general symptoms are that the number of imapd, pop3d, and lmtpd
processes increases, the CPU load average increases, but the ZFS I/O
bandwidth decreases.  At the same time, ZFS filesystem operations
become very slow.  A rewrite of a small file can take two minutes.

We've added memory; this was an improvement, but the incidents
continued.  The next step is to disable ZFS prefetch and test this
under load.  If that doesn't help either, we're down to ZFS bugs.

Our incidents seem similar to the ones at UC Davis:

http://vpiet.ucdavis.edu/docs/EmailReviewCmte.Report_Feb2008.pdf

These were attributed to bug 6535160, but this one is fixed on our
server with patch 127127-11.  Bug 6535172, ``zil_sync causing long
hold times on zl_lock'', doesn't have a patch yet:

http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6535172

Could this bug cause our problem?  How do I confirm that it does?
Is there a workaround?

Cyrus IMAP uses several moderate-sized databases that are
memory-mapped by all processes.  I can move these from ZFS to UFS if
this is likely to help.

-- 
-Gary Mills--Unix Support--U of M Academic Computing and Networking-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Any news on ZFS bug 6535172?

2009-04-12 Thread Gary Mills
On Sun, Apr 12, 2009 at 10:49:49AM -0700, Richard Elling wrote:
 Gary Mills wrote:
 We're running a Cyrus IMAP server on a T2000 under Solaris 10 with
 about 1 TB of mailboxes on ZFS filesystems.  Recently, when under
 load, we've had incidents where IMAP operations became very slow.  The
 general symptoms are that the number of imapd, pop3d, and lmtpd
 processes increases, the CPU load average increases, but the ZFS I/O
 bandwidth decreases.  At the same time, ZFS filesystem operations
 become very slow.  A rewrite of a small file can take two minutes.
   
 
 Bandwidth is likely not the issue.  What does the latency to disk look like?

Yes, I have statistics!  This set was taken during an incident on
Thursday.  The load average was 12.  There were about 5700 Cyrus
processes running.  Here are the relevant portions of `iostat -xn 5 4':

extended device statistics
r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
   23.8   20.7 1195.0  677.8  0.0  1.00.0   22.2   0  37 
c4t60A98000433469764E4A2D456A644A74d0
   29.0   23.5 1438.3  626.8  0.0  1.30.0   25.4   0  44 
c4t60A98000433469764E4A2D456A696579d0
   22.8   26.6 1356.7  822.1  0.0  1.30.0   26.2   0  32 
c4t60A98000433469764E4A476D2F664E4Fd0
   26.4   27.3 1516.0  850.7  0.0  1.40.0   26.5   0  38 
c4t60A98000433469764E4A476D2F6B385Ad0
extended device statistics
r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
   39.7   27.0 1395.8  285.5  0.0  1.10.0   16.3   0  51 
c4t60A98000433469764E4A2D456A644A74d0
   52.5   29.8 1890.8  175.1  0.0  1.80.0   22.3   0  63 
c4t60A98000433469764E4A2D456A696579d0
   30.0   33.3 1940.2  432.8  0.0  1.20.0   19.4   0  34 
c4t60A98000433469764E4A476D2F664E4Fd0
   39.9   42.5 2062.1  616.7  0.0  1.90.0   22.9   0  50 
c4t60A98000433469764E4A476D2F6B385Ad0
extended device statistics
r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
   43.8   47.6 1691.5  504.8  0.0  1.60.0   17.3   0  59 
c4t60A98000433469764E4A2D456A644A74d0
   55.4   62.4 2027.8  517.0  0.0  2.20.0   18.5   0  72 
c4t60A98000433469764E4A2D456A696579d0
   18.6   76.8  682.3  843.5  0.0  1.10.0   12.0   0  34 
c4t60A98000433469764E4A476D2F664E4Fd0
   30.2  115.8  873.6  905.8  0.0  2.20.0   15.1   0  52 
c4t60A98000433469764E4A476D2F6B385Ad0
extended device statistics
r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
   49.8   21.8 2438.7  400.3  0.0  1.70.0   24.0   0  62 
c4t60A98000433469764E4A2D456A644A74d0
   53.2   34.0 2741.3  218.0  0.0  2.10.0   24.4   0  63 
c4t60A98000433469764E4A2D456A696579d0
   14.0   26.8  506.2  482.1  0.0  0.70.0   18.2   0  32 
c4t60A98000433469764E4A476D2F664E4Fd0
   23.4   38.8  484.5  582.3  0.0  1.10.0   18.2   0  42 
c4t60A98000433469764E4A476D2F6B385Ad0

-- 
-Gary Mills--Unix Support--U of M Academic Computing and Networking-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Any news on ZFS bug 6535172?

2009-04-12 Thread Gary Mills
On Sun, Apr 12, 2009 at 12:23:03PM -0700, Richard Elling wrote:
 These disks are pretty slow.  JBOD?  They are not 100% busy, which
 means that either the cached data is providing enough response to the
 apps, or the apps are not capable of producing enough load -- which
 means the bottleneck may be elsewhere.

They are four 500-gig Iscsi LUNs exported from a Netapp filer, with
Solaris multipathing.  Yes, the I/O is normally mostly writes, with
reads being satisfied from various caches.

 You can use fsstat to get a better idea of what sort of I/O the applications
 are seeing from the file system.  That might be revealing.

Thanks for the suggestion.  There are so many `*stat' commands that I
forget about some of them.  I've run a baseline with `fsstat', but the
server is mostly idle now.  I'll have to wait for another incident!
What option to `fsstat' do you recommend?  Here's a sample of the
default output:

$ fsstat  zfs 5 5
 new  name   name  attr  attr lookup rddir  read read  write write
 file remov  chng   get   setops   ops   ops bytes   ops bytes
3.56M 1.53M 3.83M 1.07G 1.53M  2.47G 4.09M 56.4M 1.83T 61.1M  306G zfs
   13 116 1.40K 5  11.6K 0 5 38.5K   125  127K zfs
   18 018 3.61K 6  21.1K 0 6 16.7K97  244K zfs
   26 425 1.73K10  6.76K 018  178K   142  817K zfs
   12 313 3.90K 5  9.00K 0 5 32.8K   108  287K zfs
7 2 7 1.98K 3  5.87K 0 7 67.5K   108 2.34M zfs
-- 
-Gary Mills--Unix Support--U of M Academic Computing and Networking-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Any news on ZFS bug 6535172?

2009-04-12 Thread Gary Mills
On Sun, Apr 12, 2009 at 05:01:57PM -0400, Ellis, Mike wrote:
 Is the netapp iscsi-lun forcing a dull sync as a part of zfs's
 5-second synx/flush type of thing? (Not needed tince the netapp
 guarantees the write once it acks it)

I've asked that of our Netapp guy, but so far I haven't heard from
him.  Is there a way to determine this from the Iscsi initiator
side?  I do have a test mail server that I can play with.

 That could make a big difference...
 (Perhaps disabling the write-flush in zfs will make a big difference
 here, especially on a write-heavy system)

-- 
-Gary Mills--Unix Support--U of M Academic Computing and Networking-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Efficient backup of ZFS filesystems?

2009-04-10 Thread Gary Mills
On Thu, Apr 09, 2009 at 04:25:58PM +0200, Henk Langeveld wrote:
 Gary Mills wrote:
 I've been watching the ZFS ARC cache on our IMAP server while the
 backups are running, and also when user activity is high.  The two
 seem to conflict.  Fast response for users seems to depend on their
 data being in the cache when it's needed.  Most of the disk I/O seems
 to be writes in this situation.  However, the backup needs to stat
 all files and read many of them.  I'm assuming that all of this
 information is also added to the ARC cache, even though it may never
 be needed again.  It must also evict user data from the cache, causing
 it to be reloaded every time it's needed.
 
 Find out whether you have a problem first.  If not, don't worry, but
 read one.  If you do have a problem, add memory or an L2ARC device.

We do have a problem, but not with the backup itself.  The backup is
slow, but I expect that's just because it's reading a very large
number of small files.  Our problem is with normal IMAP operations
becoming quite slow at times.  I'm wondering if the backup is
contributing to this problem.

 The ARC was designed to mitigate the effect of any single burst of
 sequential I/O, but the size of the cache dedicated to more Frequently
 used pages (the current working set) will still be reduced, depending
 on the amount of activity on either side of the cache.

That's a nice design, better than a simple cache.

 As the ARC maintains a shadow list of recently evicted pages from both
 sides of the cache, such pages that are accessed again will then return
 to the 'Frequent' side of the cache.
 
 There will be continuous competition between 'Recent' and 'Frequent'
 sides of the ARC (and for convenience, I'm glossing over the existence
 of 'Locked' pages).
 
 Several reasons might cause pathological behaviour - a backup process
 might access the same metadata multiple times, causing that data to
 be promoted to 'Frequent', flushing out application related data.
 (ZFS does not differentiate between data and metadata for resource
  allocation, they all use the same I/O mechanism and cache.)

That might be possible in our case.

 On the other hand, you might just not have sufficient memory to keep
 most of your metadata in the cache, or the backup process is just too
 aggressive.   Adding memory or an L2cache might help.

We've added memory.  That did seem to help, although the problem's
still there.  I assume the L2cache is not available in Solaris 10.

-- 
-Gary Mills--Unix Support--U of M Academic Computing and Networking-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Efficient backup of ZFS filesystems?

2009-04-06 Thread Gary Mills
I've been watching the ZFS ARC cache on our IMAP server while the
backups are running, and also when user activity is high.  The two
seem to conflict.  Fast response for users seems to depend on their
data being in the cache when it's needed.  Most of the disk I/O seems
to be writes in this situation.  However, the backup needs to stat
all files and read many of them.  I'm assuming that all of this
information is also added to the ARC cache, even though it may never
be needed again.  It must also evict user data from the cache, causing
it to be reloaded every time it's needed.

We use Networker for backups now.  Is there some way to configure ZFS
so that backups don't churn the cache?  Is there a different way to
perform backups to avoid this problem?  We do keep two weeks of daily
ZFS snapshots to use for restores of recently-lost data.  We still
need something for longer-term backups.

-- 
-Gary Mills--Unix Support--U of M Academic Computing and Networking-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] How to set a minimum ARC size?

2009-04-02 Thread Gary Mills
We have an IMAP server that uses ZFS filesystems for all of its
mailbox and database files.  As the number of users increases,
with a consequent increase in the number of processes, the ARC
size decreases from 10 gigabytes down to 2 gigabytes.  I know that
it's supposed to do that, but in this case ZFS is starved for memory
and the whole thing slows to a crawl.  Is there a way to set a
minimum ARC size so that this doesn't happen?

We are going to upgrade the memory, but a lower limit on ARC size
might still be a good idea.

-- 
-Gary Mills--Unix Support--U of M Academic Computing and Networking-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs related google summer of code ideas - your vote

2009-03-04 Thread Gary Mills
On Tue, Mar 03, 2009 at 11:35:40PM +0200, C. Bergström wrote:
 
 Here's more or less what I've collected...
 
[..]
   10) Did I miss something..

I suppose my RFE for two-level ZFS should be included, unless nobody
intends to attach a ZFS file server to a SAN with ZFS on application
servers.

-- 
-Gary Mills--Unix Support--U of M Academic Computing and Networking-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs related google summer of code ideas - your vote

2009-03-04 Thread Gary Mills
On Wed, Mar 04, 2009 at 01:20:42PM -0500, Miles Nordin wrote:
  gm == Gary Mills mi...@cc.umanitoba.ca writes:
 
 gm I suppose my RFE for two-level ZFS should be included,
 
 Not that my opinion counts for much, but I wasn't deaf to it---I did
 respond.

I appreciate that.

 I thought it was kind of based on mistaken understanding.  It included
 this strangeness of the upper ZFS ``informing'' the lower one when
 corruption had occured on the network, and the lower ZFS was supposed
 to do something with the physical disks...to resolve corruption on the
 network?  why?  IIRC several others pointed out the same bogosity.

It's a simply a consequence of ZFS's end-to-end error detection.
There are many different components that could contribute to such
errors.  Since only the lower ZFS has data redundancy, only it can
correct the error.  Of course, if something in the data path
consistently corrupts the data regardless of its origin, it won't be
able to correct the error.  The same thing can happen in the simple
case, with one ZFS over physical disks.

 It makes slightly more sense in the write direction than the read
 direction maybe, but I still don't fully get the plan.  It is a new
 protocol to replace iSCSI?  or NFS?  or, what?  Is it a re-invention
 of pNFS or Lustre, but with more work since you're starting from zero,
 and less architectural foresight?

I deliberately did not specify the protocol to keep the concept
general.  Anything that works and solves the problem would be good.

-- 
-Gary Mills--Unix Support--U of M Academic Computing and Networking-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs related google summer of code ideas - your vote

2009-03-04 Thread Gary Mills
On Wed, Mar 04, 2009 at 06:31:59PM -0700, Dave wrote:
 Gary Mills wrote:
 On Wed, Mar 04, 2009 at 01:20:42PM -0500, Miles Nordin wrote:
 gm == Gary Mills mi...@cc.umanitoba.ca writes:
 gm I suppose my RFE for two-level ZFS should be included,
 
 It's a simply a consequence of ZFS's end-to-end error detection.
 There are many different components that could contribute to such
 errors.  Since only the lower ZFS has data redundancy, only it can
 correct the error.  Of course, if something in the data path
 consistently corrupts the data regardless of its origin, it won't be
 able to correct the error.  The same thing can happen in the simple
 case, with one ZFS over physical disks.
 
 I would argue against building this into ZFS. Any corruption happening 
 on the wire should not be the responsibility of ZFS. If you want to make 
 sure your data is not corrupted over the wire, use IPSec. If you want to 
 prevent corruption in RAM, use ECC sticks, etc.

But what if the `wire' is a SCSI bus?  Would you want ZFS to do error
correction in that case?  There are many possible wires.  Every
component does its own error checking of some sort, but in its own
domain.  This brings us back to end-to-end error checking again. Since
we are designing a filesystem, that's where the reliability should
reside.

-- 
-Gary Mills--Unix Support--U of M Academic Computing and Networking-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] RFE for two-level ZFS

2009-02-21 Thread Gary Mills
On Thu, Feb 19, 2009 at 12:36:22PM -0800, Brandon High wrote:
 On Thu, Feb 19, 2009 at 6:18 AM, Gary Mills mi...@cc.umanitoba.ca wrote:
  Should I file an RFE for this addition to ZFS?  The concept would be
  to run ZFS on a file server, exporting storage to an application
  server where ZFS also runs on top of that storage.  All storage
  management would take place on the file server, where the physical
  disks reside.  The application server would still perform end-to-end
  error checking but would notify the file server when it detected an
  error.
 
 You could accomplish most of this by creating a iSCSI volume on the
 storage server, then using ZFS with no redundancy on the application
 server.

That's what I'd like to do, and what we do now.  The RFE is to take
advantage of the end-to-end checksums in ZFS in spite of having no
redundancy on the application server.  Having all of the disk
management in one place is a great benefit.

 You'll have two layers for checksums, one on the storage server's
 zpool and a second on the application server's filesystem. The
 application server won't be able to notify the storage server that
 it's detected a bad checksum, other than through retries, but can
 write a user-space monitor that watches for ZFS checksum errors and
 sends notification to the storage server.

The RFE is to enable the two instances of ZFS to exchange information
about checksum failures.

 To poke a hole in your idea: What if the app server does find an
 error? What's the storage server to do at that point? Provided that
 the storage server's zpool already has redundancy, the data written to
 disk should already be exactly what was received from the client. If
 you want to have the ability to recover from erorrs on the app server,
 you should use a redundant zpool - Either a mirror or a raidz.

Yes, if the two instances of ZFS disagree, we have a problem that
needs to be resolved: they need to cooperate in this endevour.

 If you're concerned about data corruption in transit, then it sounds
 like something akin to T10 DIF (which others mentioned) would fit the
 bill. You could also tunnel the traffic over a transit layer such as
 TLS or SSH that provides a measure of validation. Latency should be
 fun to deal with however.

I'm mainly concerned that ZFS on the application server will detect a
checksum error and then be unable to preserve the data.  Iscsi already
has TCP checksums.  I assume that FC-AL does as well.  Using more
reliable checksums has no benefit if ZFS will still detect end-to-end
checksum errors.

-- 
-Gary Mills--Unix Support--U of M Academic Computing and Networking-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] RFE for two-level ZFS

2009-02-20 Thread Gary Mills
On Thu, Feb 19, 2009 at 09:59:01AM -0800, Richard Elling wrote:
 Gary Mills wrote:
 Should I file an RFE for this addition to ZFS?  The concept would be
 to run ZFS on a file server, exporting storage to an application
 server where ZFS also runs on top of that storage.  All storage
 management would take place on the file server, where the physical
 disks reside.  The application server would still perform end-to-end
 error checking but would notify the file server when it detected an
 error.
 
 Currently, this is done as a retry. But retries can suffer from cached
 badness.

So, ZFS on the application server would retry the read from the
storage server.  This would be the same as it does from a physical
disk, I presume.  However, if the checksum failure persisted, it
would declare an error.  That's where the RFE comes in, because it
would then notify the file server to utilize its redundant data
source.  Perhaps this could be done as part of the retry, using
existing protocols.

 There are several advantages to this configuration.  One current
 recommendation is to export raw disks from the file server.  Some
 storage devices, including I assume Sun's 7000 series, are unable to
 do this.  Another is to build two RAID devices on the file server and
 to mirror them with ZFS on the application server.  This is also
 sub-optimal as it doubles the space requirement and still does not
 take full advantage of ZFS error checking.  Splitting the
 responsibilities works around these problems
 
 I'm not convinced, but here is how you can change my mind.
 
 1. Determine which faults you are trying to recover from.

I don't think this has been clearly identified, except that they are
``those faults that are only detected by end-to-end checksums''.

 2. Prioritize these faults based on their observability, impact,
 and rate.

Perhaps the project should be to extend end-to-end checksums in
situations that don't have end-to-end redundancy.  Redundancy at the
storage layer would be required, of course.

-- 
-Gary Mills--Unix Support--U of M Academic Computing and Networking-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] RFE for two-level ZFS

2009-02-19 Thread Gary Mills
Should I file an RFE for this addition to ZFS?  The concept would be
to run ZFS on a file server, exporting storage to an application
server where ZFS also runs on top of that storage.  All storage
management would take place on the file server, where the physical
disks reside.  The application server would still perform end-to-end
error checking but would notify the file server when it detected an
error.

There are several advantages to this configuration.  One current
recommendation is to export raw disks from the file server.  Some
storage devices, including I assume Sun's 7000 series, are unable to
do this.  Another is to build two RAID devices on the file server and
to mirror them with ZFS on the application server.  This is also
sub-optimal as it doubles the space requirement and still does not
take full advantage of ZFS error checking.  Splitting the
responsibilities works around these problems.

-- 
-Gary Mills--Unix Support--U of M Academic Computing and Networking-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-12 Thread Gary Mills
On Thu, Feb 12, 2009 at 11:53:40AM -0500, Greg Palmer wrote:
 Ross wrote:
 I can also state with confidence that very, very few of the 100 staff 
 working here will even be aware that it's possible to unmount a USB volume 
 in windows.  They will all just pull the plug when their work is saved, 
 and since they all come to me when they have problems, I think I can 
 safely say that pulling USB devices really doesn't tend to corrupt 
 filesystems in Windows.  Everybody I know just waits for the light on the 
 device to go out.
   
 The key here is that Windows does not cache writes to the USB drive 
 unless you go in and specifically enable them. It caches reads but not 
 writes. If you enable them you will lose data if you pull the stick out 
 before all the data is written. This is the type of safety measure that 
 needs to be implemented in ZFS if it is to support the average user 
 instead of just the IT professionals.

That implies that ZFS will have to detect removable devices and treat
them differently than fixed devices.  It might have to be an option
that can be enabled for higher performance with reduced data security.

-- 
-Gary Mills--Unix Support--U of M Academic Computing and Networking-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Two-level ZFS

2009-02-02 Thread Gary Mills
On Mon, Feb 02, 2009 at 09:53:15PM +0700, Fajar A. Nugraha wrote:
 On Mon, Feb 2, 2009 at 9:22 PM, Gary Mills mi...@cc.umanitoba.ca wrote:
  On Sun, Feb 01, 2009 at 11:44:14PM -0500, Jim Dunham wrote:
  If there are two (or more) instances of ZFS in the end-to-end data
  path, each instance is responsible for its own redundancy and error
  recovery. There is no in-band communication between one instance of
  ZFS and another instances of ZFS located elsewhere in the same end-to-
  end data path.
 
  I must have been unclear when I stated my question.  The
  configuration, with ZFS on both systems, redundancy only on the
  file server, and end-to-end error detection and correction, does
  not exist.
 
   What additions to ZFS are required to make this work?
 
 None. It's simply not possible.

You're talking about the existing ZFS implementation; I'm not!
Is ZFS now frozen in time, with only bug being fixed?  I have
difficulty believing that.  Putting a wire between two layers
of ZFS should indeed be possible.  Think about the Amber Road
products, from the Fishworks team.  They run ZFS and export Iscsi
and FC-AL.  Redundancy and disk management is already present in
these products.  Should it be implimented again in each of the
servers that imports LUNs from these products?  I think not.

 I believe Jim already state that, but let me give some additional
 comment that might be helpful.
 
 (1) zfs can provide end-to-end protection ONLY if you use it end-end.
 This means :
 - no other filesystem on top of it (e.g. do not use UFS on zvol or
 something similar)
 - no RAID/MIRROR under it (i.e. it must have access to the disk as JBOD)

Exactly!  That leads to my question.  What information needs to be
exchanged between ZFS on the file server and ZFS on the application
server so that end-to-end protection can be maintained with redundancy
and disk management only on the file server?

-- 
-Gary Mills--Unix Support--U of M Academic Computing and Networking-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Two-level ZFS

2009-02-02 Thread Gary Mills
On Sun, Feb 01, 2009 at 11:44:14PM -0500, Jim Dunham wrote:
 I wrote:
 
 I realize that this configuration is not supported.
 
 The configuration is supported, but not in the manner mentioned below.
 
 If there are two (or more) instances of ZFS in the end-to-end data  
 path, each instance is responsible for its own redundancy and error  
 recovery. There is no in-band communication between one instance of  
 ZFS and another instances of ZFS located elsewhere in the same end-to- 
 end data path.

I must have been unclear when I stated my question.  The
configuration, with ZFS on both systems, redundancy only on the
file server, and end-to-end error detection and correction, does
not exist.  What additions to ZFS are required to make this work?

-- 
-Gary Mills--Unix Support--U of M Academic Computing and Networking-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Two-level ZFS

2009-02-01 Thread Gary Mills
I realize that this configuration is not supported.  What's required
to make it work?  Consider a file server running ZFS that exports a
volume with Iscsi.  Consider also an application server that imports
the LUN with Iscsi and runs a ZFS filesystem on that LUN.  All of the
redundancy and disk management takes place on the file server, but
end-to-end error detection takes place on the application server.
This is a reasonable configuration, is it not?

When the application server detects a checksum error, what information
does it have to return to the file server so that it can correct the
error?  The file server could then retry the read from its redundant
source, which might be a mirror or might be synthentic data from
RAID-5.  It might also indicate that a disk must be replaced.

Must any information accompany each block of data sent to the
application server so that the file server can identify the source
of the data in the event of an error?

Does this additional exchange of information fit into the Iscsi
protocol, or does it have to flow out of band somehow?

-- 
-Gary Mills--Unix Support--U of M Academic Computing and Networking-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] What are the usual suspects in data errors?

2009-01-14 Thread Gary Mills
I realize that any error can occur in a storage subsystem, but most
of these have an extremely low probability.  I'm interested in this
discussion in only those that do occur occasionally, and that are
not catastrophic.

Consider the common configuration of two SCSI disks connected to
the same HBA that are configured as a mirror in some manner.  In this
case, the data path in general consists of:

o The application
o The filesystem
o The drivers
o The HBA
o The SCSI bus
o The controllers
o The heads and patters

Many of those components have their own error checking.  Some have
error correction.  For example, parity checking is done on a SCSI bus,
unless it's specifically disabled.  Do SATA and PATA connections also
do error checking?  Disk sector I/O uses CRC error checking and
correction.  Memory buffers would often be protected by parity memory.
Is there any more that I've missed?

Now, let's consider common errors.  To me, the most frequent would
be a bit error on a disk sector.  In this case, the controller would
report a CRC error and would not return bad data.  The filesystem
would obtain the data from its redundant copy.  I assume that ZFS
would also rewrite the bad sector to correct it.  The application
would not see an error.  Similar events would happen for a parity
error on the SCSI bus.

What can go wrong with the disk controller?  A simple seek to the
wrong track is not a problem because the track number is encoded on
the platter.  The controller will simply recalibrate the mechanism and
retry the seek.  If it computes the wrong sector, that would be a
problem.  Does this happen with any frequency?  In this case, ZFS
would detect a checksum error and obtain the data from its redundant
copy.

A logic error in ZFS might result in incorrect metadata being written
with valid checksum.  In this case, ZFS might panic on import or might
correct the error.  How is this sort of error prevented?

If the application wrote bad data to the filesystem, none of the
error checking in lower layers would detect it.  This would be
strictly an error in the application.

Some errors might result from a loss of power if some ZFS data was
written to a disk cache but never was written to the disk platter.
Again, ZFS might panic on import or might correct the error.  How is
this sort of error prevented?

After all of this discussion, what other errors can ZFS checksums
reasonably detect?  Certainly if some of the other error checking
failed to detect an error, ZFS would still detect one.  How likely
are these other error checks to fail?

Is there anything else I've missed in this analysis?

-- 
-Gary Mills--Unix Support--U of M Academic Computing and Networking-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] snapshot before patching..

2008-12-30 Thread Gary Mills
On Tue, Dec 30, 2008 at 02:06:16PM +0100, dick hoogendijk wrote:
 What kind of snapshot do I need to be on the safe side patching a S10u6
 system? rpool? rpool/ROOT? rpool/ROOT/BE?

Use Live Upgrade.  Create a new boot environment and apply the
patches to that.  Activate the new BE and `init 6'.

 And how/what do I do to reverse to the non-patched system in case
 something goes terribly wrong? ;-)

Just revert to the old BE.

-- 
-Gary Mills--Unix Support--U of M Academic Computing and Networking-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How to create a basic new filesystem?

2008-12-20 Thread Gary Mills
On Sat, Dec 20, 2008 at 03:52:46AM -0800, Uwe Dippel wrote:
 This might sound sooo simple, but it isn't. I read the ZFS Administration 
 Guide and it did not give an answer; at least no simple answer, simple enough 
 for me to understand.
 The intention is to follow the thread Easiest way to replace a boot disk 
 with a larger one.
 The command given would be 
 zpool attach rpool /dev/dsk/c1d0s0 /dev/dsk/c2d0s0
 as far as I understand in my case. What it says is cannot open 
 '/dev/dsk/c2d0s0': No such device or address. format shows that the 
 partition exists:

The problem is that fdisk partitions are not the same as Solaris
partitions.  The admin guide refers to a Solaris partition.  For
Solaris 10 x86, this has to be created inside an fdisk partition.

 # format
 Searching for disks...done
 AVAILABLE DISK SELECTIONS:
0. c1d0 DEFAULT cyl 17020 alt 2 hd 255 sec 63
   /p...@0,0/pci-...@9/i...@0/c...@0,0
1. c2d0 DEFAULT cyl 10442 alt 2 hd 255 sec 126
   /p...@0,0/pci-...@9/i...@1/c...@0,0
 Specify disk (enter its number): 1
 selecting c2d0
 Controller working list found
 [disk formatted, defect list found]
 FORMAT MENU:
 [...]
  Total disk size is 38912 cylinders
  Cylinder size is 32130 (512 byte) blocks
 
Cylinders
   Partition   StatusType  Start   End   Length%
   =   ==  =   ===   ==   ===
   1 Linux native  019  20  0
   2 Solaris2 19  1046210444 27
   3 Other OS   10463  130742612  7
   4 EXT-DOS13075  3891225838 66

These are fdisk partitions.

 To my understanding, there is no need to format before using a file system in 
 ZFS.
 The Creating a Basic ZFS File System is not clear to me. The first (and 
 only) command it offers, creates a mirrored storage of a whole disk; none of 
 which I intend to do. (I suggested before, to offer a guide as well 
 containing all the *basic* commands.) I wonder if I really need to use 
 format-partition first to create slice s0 in that second (DOS)partition of 
 c2d0 before ZFS can use it?

The Solaris `format' command is use to create Solaris partitions, and
the label that describes them.  For a ZFS root pool, you have to use a
Solaris label, and a partition (slice).  This was slice 0 in your
example.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How to create a basic new filesystem?

2008-12-20 Thread Gary Mills
On Sat, Dec 20, 2008 at 06:10:10AM -0800, Uwe Dippel wrote:
 
 thanks. All my servers run OpenBSD, so I know the difference between
 a DOS-partition and a slice. :)

My background is Solaris SPARC, where things are simpler.  Solaris
writes a label to a physical disk to define slices (Solaris
partitions) on the disk.  The `format' command sees the physical disk.
In the case of Solaris x86, this command sees one fdisk partition,
which it treats as a disk.  I generally create a single fdisk
partition that occupies the entire disk, to return to simplicity.

 My confusion is about the labels. I could not label it what I
 wanted, like zfsed or pool, it had to be root. And since we can have
 only a single bf-partition per drive (dsk), I was thinking ZFS would
 take the (existing but unlabeled) s0 to attach to. This does not
 seem to be the case.

The tag that appears on the partition menu isn't used in normal
operation of the system.  There are only a few valid choices, but
`root' is fine.

 Out of curiosity: how does it matter (to ZFS) if /dsk/c3t1d0s0 is a
 complete drive or exists in a bf-partition?
 
 One way or another, /dev/dsk/c2d0s0 seems to be over-defined now.

If you give `zpool' a complete disk, by omitting the slice part, it
will write its own label to the drive.  If you specify it with a
slice, it expects that you have already defined that slice.  For a
root pool, it has to be a slice.

-- 
-Gary Mills--Unix Support--U of M Academic Computing and Networking-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Split responsibility for data with ZFS

2008-12-12 Thread Gary Mills
On Thu, Dec 11, 2008 at 10:41:26PM -0600, Bob Friesenhahn wrote:
 On Thu, 11 Dec 2008, Gary Mills wrote:
 The split responsibility model is quite appealing.  I'd like to see
 ZFS address this model.  Is there not a way that ZFS could delegate
 responsibility for both error detection and correction to the storage
 device, at least one more sophisticated than a physical disk?
 
 Why is split responsibility appealing?  In almost any complex system 
 whether it be government or computing, split responsibility results in 
 indecision and confusion.  Heirarchical decision making based on 
 common rules is another matter entirely.

Now this becomes semantics.  There still has to be a hierarchy, but
it's split into areas of responsibility.  In the case of ZFS over SAN
storage, the area boundary now is the SAN cable.

 Unfortunately SAN equipment 
 is still based on technology developed in the early '80s and simply 
 tries to behave like a more reliable disk drive rather than a 
 participating intelligent component in a system which may detect, 
 tolerate, and spontaneously correct any faults.

That's exactly what I'm asking.  How can ZFS and SAN equipment be
improved so that they cooperate to make the whole system more
reliable?  Converting the SAN storage into a JBOD is not a valid
solution.

-- 
-Gary Mills--Unix Support--U of M Academic Computing and Networking-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Split responsibility for data with ZFS

2008-12-11 Thread Gary Mills
On Wed, Dec 10, 2008 at 12:58:48PM -0800, Richard Elling wrote:
 Nicolas Williams wrote:
 On Wed, Dec 10, 2008 at 01:30:30PM -0600, Nicolas Williams wrote:
   
 On Wed, Dec 10, 2008 at 12:46:40PM -0600, Gary Mills wrote:
 
 On the server, a variety of filesystems can be created on this virtual
 disk.  UFS is most common, but ZFS has a number of advantages over
 UFS.  Two of these are dynamic space management and snapshots.  There
 are also a number of objections to employing ZFS in this manner.
   
 ZFS has very strong error detection built-in, and for mirrored and
 RAID-Zed pools can recover from errors automatically as long as there's
 a mirror left or enough disks in RAID-Z left to complete the recovery.
 
 Oh, but I get it: all the redundancy here would be in the SAN, and the
 ZFS pools would have no mirrors, no RAID-Z.
   
 Note that you'll generally be better off using RAID-Z than HW RAID-5.
 
 Precisely because ZFS can reconstruct the correct data if it's
 responsible for redundancy.
 
 But note that the setup you describe puts ZFS in no worse a situation
 than any other filesystem.
 
 Well, actually, it does.  ZFS is susceptible to a class of failure modes
 I classify as kill the canary types.  ZFS will detect errors and complain
 about them, which results in people blaming ZFS (the canary).  If you
 follow this forum, you'll see a kill the canary post about every month
 or so. 
 
 By default, ZFS implements the policy that uncorrectable, but important
 failures may cause it to do an armadillo impression (staying with the
 animal theme ;-) but for which some other file systems, like UFS, will
 blissfully ignore -- putting data at risk.  Occasionally, arguments will
 arise over whether this is the best default policy, though most folks
 seem to agree that data corruption is a bad thing.  Later versions of
 ZFS, particularly that available in Solaris 10 10/08 and all OpenSolaris
 releases, allow system admins to have better control over these policies.

Yes, that's what I was getting at.  Without redundancy at the ZFS
level, ZFS can report errors but not correct them.  Of course, with a
reliable SAN and storage device, those errors will never happen.
Certainly, vendors of these products will claim that they have
extremely high standards of data integrity.  Data corruption is the
worst nightmare of storage designers, after all.  It rarely happens,
although I have seen it on one occasion in a high-quality storage
device.

The split responsibility model is quite appealing.  I'd like to see
ZFS address this model.  Is there not a way that ZFS could delegate
responsibility for both error detection and correction to the storage
device, at least one more sophisticated than a physical disk?

-- 
-Gary Mills--Unix Support--U of M Academic Computing and Networking-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Split responsibility for data with ZFS

2008-12-10 Thread Gary Mills
Large sites that have centralized their data with a SAN typically have
a storage device export block-oriented storage to a server, with a
fibre-channel or Iscsi connection between the two.  The server sees
this as a single virtual disk.  On the storage device, the blocks of
data may be spread across many physical disks.  The storage device
looks after redundancy and management of the physical disks.  It may
even phone home when a disk fails and needs to be replaced.  The
storage device provides reliability and integrity for the blocks of
data that it serves, and does this well.

On the server, a variety of filesystems can be created on this virtual
disk.  UFS is most common, but ZFS has a number of advantages over
UFS.  Two of these are dynamic space management and snapshots.  There
are also a number of objections to employing ZFS in this manner.
``ZFS cannot correct errors'', and ``you will lose all of your data''
are two of the alarming ones.  Isn't ZFS supposed to ensure that data
written to the disk are always correct?  What's the real problem here?

This is a split responsibility configuration where the storage device
is responsible for integrity of the storage and ZFS is responsible for
integrity of the filesystem.  How can it be made to behave in a
reliable manner?  Can ZFS be better than UFS in this configuration?
Is a different form of communication between the two components
necessary in this case?

-- 
-Gary Mills--Unix Support--U of M Academic Computing and Networking-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Separate /var

2008-12-02 Thread Gary Mills
On Mon, Dec 01, 2008 at 04:45:16PM -0700, Lori Alt wrote:
On 11/27/08 17:18, Gary Mills wrote:
 On Fri, Nov 28, 2008 at 11:19:14AM +1300, Ian Collins wrote:
 On Fri 28/11/08 10:53 , Gary Mills [EMAIL PROTECTED] sent:
 On Fri, Nov 28, 2008 at 07:39:43AM +1100, Edward Irvine wrote:
 
 I'm currently working with an organisation who
 want use ZFS for their   full zones. Storage is SAN attached, and they
 also want to create a   separate /var for each zone, which causes issues
 when the zone is   installed. They believe that a separate /var is
 still good practice.
 If your mount options are different for /var and /, you will need
 a separate filesystem.  In our case, we use `setuid=off' and
 `devices=off' on /var for security reasons.  We do the same thing
 for home directories and /tmp .
 
 For zones?
 
 Sure, if you require different mount options in the zones.
 
I looked into this and found that, using ufs,  you can indeed set up
the zone's /var directory as a separate file system.  I  don't know
about
how LiveUpgrade works with that configuration (I didn't try it).
But I was at least able to get the zone to install and boot.
But with zfs, I couldn't even get a zone with a separate /var
dataset to install, let alone be manageable with LiveUpgrade.
I configured the zone like so:
# zonecfg -z z4
z4: No such zone configured
Use 'create' to begin configuring a new zone.
zonecfg:z4 create
zonecfg:z4 set zonepath=/zfszones/z4
zonecfg:z4 add fs
zonecfg:z4:fs set dir=/var
zonecfg:z4:fs set special=rpool/ROOT/s10x_u6wos_07b/zfszones/z4/var
zonecfg:z4:fs set type=zfs
zonecfg:z4:fs end
zonecfg:z4 exit
I then get this result from trying to install the zone:
prancer# zoneadm -z z4 install
Preparing to install zone z4.
ERROR: No such file or directory: cannot mount /zfszones/z4/root/var

You might have to pre-create this filesystem. `special' may not be
needed at all.

in non-global zone to install: the source block device or directory
rpool/ROOT/s10x_u6wos_07b/zfszones/z1/var cannot be accessed
ERROR: cannot setup zone z4 inherited and configured file systems
ERROR: cannot setup zone z4 file systems inherited and configured
from the global zone
ERROR: cannot create zone boot environment z4
I don't fully  understand the failures here.  I suspect that there are
problems both in the zfs code and zones code.  It SHOULD work though.
The fact that it doesn't seems like a bug.
In the meantime, I guess we have to conclude that a separate /var
in a non-global zone is not supported on zfs.  A separate /var in
the global zone is supported  however, even when the root is zfs.

I haven't tried ZFS zone roots myself, but I do have a few comments.
ZFS filesystems are cheap because they don't require separate disk
slices.  As well, they are attribute boundaries.  Those are necessary
or convenient in some case.

-- 
-Gary Mills--Unix Support--U of M Academic Computing and Networking-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Separate /var

2008-11-27 Thread Gary Mills
On Fri, Nov 28, 2008 at 07:39:43AM +1100, Edward Irvine wrote:
 
 I'm currently working with an organisation who want use ZFS for their  
 full zones. Storage is SAN attached, and they also want to create a  
 separate /var for each zone, which causes issues when the zone is  
 installed. They believe that a separate /var is still good practice.

If your mount options are different for /var and /, you will need
a separate filesystem.  In our case, we use `setuid=off' and
`devices=off' on /var for security reasons.  We do the same thing
for home directories and /tmp .

-- 
-Gary Mills--Unix Support--U of M Academic Computing and Networking-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Separate /var

2008-11-27 Thread Gary Mills
On Fri, Nov 28, 2008 at 11:19:14AM +1300, Ian Collins wrote:
 On Fri 28/11/08 10:53 , Gary Mills [EMAIL PROTECTED] sent:
  On Fri, Nov 28, 2008 at 07:39:43AM +1100, Edward Irvine wrote:
   
   I'm currently working with an organisation who
  want use ZFS for their   full zones. Storage is SAN attached, and they
  also want to create a   separate /var for each zone, which causes issues
  when the zone is   installed. They believe that a separate /var is
  still good practice.
  If your mount options are different for /var and /, you will need
  a separate filesystem.  In our case, we use `setuid=off' and
  `devices=off' on /var for security reasons.  We do the same thing
  for home directories and /tmp .
  
 For zones?

Sure, if you require different mount options in the zones.

-- 
-Gary Mills--Unix Support--U of M Academic Computing and Networking-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Fwd: [osol-announce] IMPT: Do not use SXCE Build 102

2008-11-17 Thread Gary Mills
On Mon, Nov 17, 2008 at 07:27:50AM +0200, Johan Hartzenberg wrote:
 
Thank you for the details.  A few more questions:  Does booting into
build 102 do I zpool online on the root pool? And the above disable -t
is temporary till the next reboot - any specific reason for doing it
that way?  And last question:  What do I loose when I disable
sysevent?

I can answer that last one, since I tried it.  When you reboot without
sysevent, you will find that the console login service cannot run.
You'll wind up in single-user mode.  You can enable sysevent at that
point and reboot again.  Disabling it with `-t' after the system's up
seems to do no harm.

-- 
-Gary Mills--Unix Support--U of M Academic Computing and Networking-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Is there a baby thumper?

2008-11-05 Thread Gary Mills
On Tue, Nov 04, 2008 at 05:48:26PM -0600, Tim wrote:
 
Well, what's the end goal?  What are you testing for that you need
from the thumper?
I/O interfaces?  CPU?  Chipset?  If you need *everything* you don't
have any other choice.

I suppose that something with SATA disks and the same disk controller
would be most suitable.

-- 
-Gary Mills--Unix Support--U of M Academic Computing and Networking-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Is there a baby thumper?

2008-11-04 Thread Gary Mills
One of our storage guys would like to put a thumper into service, but
he's looking for a smaller model to use for testing.  Is there something
that has the same CPU, disks, and disk controller as a thumper, but
fewer disks?  The ones I've seen all have 48 disks.

-- 
-Gary Mills--Unix Support--U of M Academic Computing and Networking-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Is there a baby thumper?

2008-11-04 Thread Gary Mills
On Tue, Nov 04, 2008 at 03:31:16PM -0700, Carl Wimmi wrote:
 
 There isn't a de-populated version.
 
 Would X4540 with 250 or 500 GB drives meet your needs?

That might be our only choice.

-- 
-Gary Mills--Unix Support--U of M Academic Computing and Networking-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


  1   2   >