Re: [zfs-discuss] question about COW and snapshots

2011-06-16 Thread Simon Walter

On 06/16/2011 09:09 AM, Erik Trimble wrote:
We had a similar discussion a couple of years ago here, under the 
title A Versioning FS. Look through the archives for the full 
discussion.


The jist is that application-level versioning (and consistency) is 
completely orthogonal to filesystem-level snapshots and consistency.  
IMHO, they should never be mixed together - there are way too many 
corner cases and application-specific memes for a filesystem to ever 
fully handle file-level versioning and *application*-level data 
consistency.  Don't mistake one for the other, and, don't try to *use* 
one for the other.  They're completely different creatures.




I guess that is true of the current FSs available. Though it would be 
nice to essentially have a versioning FS in the kernel rather than an 
application in userspace. But I regress. I'll use SVN and webdav.


Thanks for the advice everyone.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] question about COW and snapshots

2011-06-16 Thread Erik Trimble

On 6/16/2011 12:09 AM, Simon Walter wrote:

On 06/16/2011 09:09 AM, Erik Trimble wrote:
We had a similar discussion a couple of years ago here, under the 
title A Versioning FS. Look through the archives for the full 
discussion.


The jist is that application-level versioning (and consistency) is 
completely orthogonal to filesystem-level snapshots and consistency.  
IMHO, they should never be mixed together - there are way too many 
corner cases and application-specific memes for a filesystem to ever 
fully handle file-level versioning and *application*-level data 
consistency.  Don't mistake one for the other, and, don't try to 
*use* one for the other.  They're completely different creatures.




I guess that is true of the current FSs available. Though it would be 
nice to essentially have a versioning FS in the kernel rather than an 
application in userspace. But I regress. I'll use SVN and webdav.


Thanks for the advice everyone.

It's not really a technical problem, it's a knowledge locality problem. 
The *knowledge* of where to checkmark, where to version, and what data 
consistency means is held at the application level, and can ONLY be 
known by each individual application. There's no way a filesystem (or 
anything like that) can make the proper decisions without the 
application telling it what those decisions should be. So, what would 
the point be in having a smart versioning FS, since the intelligence 
can't be built into the FS, it would still have to be built into each 
and every application.


So, if your apps have to be programmed to be 
versioning/consistency/checkmarking aware in any case, how would having 
a fancy Versioning filesystem be any better than using what we do now? 
(i.e. svn/hg/cvs/git on top of ZFS/btrfs/et al)   ZFS at least makes 
significant practical advances by rolling the logical volume manager 
into the filesystem level, but I can't see any such advantage for a 
Versioning FS.


--
Erik Trimble
Java Platform Group Infrastructure
Mailstop:  usca22-317
Phone:  x17195
Santa Clara, CA
Timezone: US/Pacific (UTC-0800)

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] # disks per vdev

2011-06-16 Thread Lanky Doodle
Thanks guys.

I have decided to bite the bullet and change to 2TB disks now rather than go 
through all the effort using 1TB disks and then maybe changing in 6-12 months 
time or whatever. The price difference between 1TB and 2TB disks is marginal 
and I can always re-sell my 6x 1TB disks.

I think I have also narrowed down the raid config to these 4;

2x 7 disk raid-z2 with 1 hot spare - 20TB usable
3x 5 disk raid-z2 with 0 hot spare - 18TB usable
2x 6 disk raid-z2 with 2 hot spares - 16TB usable

with option 1 probably being preferred at the moment.

I am aware that bad batches of disks do exist so I tend to either a) buy them 
in sets from different suppliers or b) use different manufacturers. How 
sensitive to different disks is ZFS, in terms of disk features (NCQ, RPM speed, 
firmware/software versions, cache etc).

Thanks
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs global hot spares?

2011-06-16 Thread Fred Liu
 This message is from the disk saying that it aborted a command. These
 are
 usually preceded by a reset, as shown here. What caused the reset
 condition?
 Was it actually target 11 or did target 11 get caught up in the reset
 storm?
 

It happed in the mid-night and nobody touched the file box.
I assume it is the transition status before the disk is *thoroughly* damaged:

Jun 10 09:34:11 cn03 fmd: [ID 377184 daemon.error] SUNW-MSG-ID: ZFS-8000-FD, 
TYPE: Fault, VER: 1, SEVERITY: 

Major
Jun 10 09:34:11 cn03 EVENT-TIME: Fri Jun 10 09:34:11 CST 2011
Jun 10 09:34:11 cn03 PLATFORM: X8DTH-i-6-iF-6F, CSN: 1234567890, HOSTNAME: cn03
Jun 10 09:34:11 cn03 SOURCE: zfs-diagnosis, REV: 1.0
Jun 10 09:34:11 cn03 EVENT-ID: 4f4bfc2c-f653-ed20-ab13-eef72224af5e
Jun 10 09:34:11 cn03 DESC: The number of I/O errors associated with a ZFS 
device exceeded
Jun 10 09:34:11 cn03 acceptable levels.  Refer to 
http://sun.com/msg/ZFS-8000-FD for more information.
Jun 10 09:34:11 cn03 AUTO-RESPONSE: The device has been offlined and marked as 
faulted.  An attempt
Jun 10 09:34:11 cn03 will be made to activate a hot spare if available.
Jun 10 09:34:11 cn03 IMPACT: Fault tolerance of the pool may be compromised.
Jun 10 09:34:11 cn03 REC-ACTION: Run 'zpool status -x' and replace the bad 
device.

After I rebooted it, I got:
Jun 10 11:38:49 cn03 genunix: [ID 540533 kern.notice] ^MSunOS Release 5.11 
Version snv_134 64-bit
Jun 10 11:38:49 cn03 genunix: [ID 683174 kern.notice] Copyright 1983-2010 Sun 
Microsystems, Inc.  All rights 

reserved.
Jun 10 11:38:49 cn03 Use is subject to license terms.
Jun 10 11:38:49 cn03 unix: [ID 126719 kern.info] features: 

7f7fsse4_2,sse4_1,ssse3,cpuid,mwait,tscp,cmp,cx16,sse3,nx,asysc,htt,sse2,sse,sep,pat,cx8,pae,mca,mmx,cmov,d

e,pge,mtrr,msr,tsc,lgpg

Jun 10 11:39:06 cn03 scsi: [ID 365881 kern.info] 
/pci@0,0/pci8086,3410@9/pci1000,72@0 (mpt_sas0):
Jun 10 11:39:06 cn03mptsas0 unrecognized capability 0x3

Jun 10 11:39:42 cn03 scsi: [ID 107833 kern.warning] WARNING: 
/scsi_vhci/disk@g5000c50009723937 (sd3):
Jun 10 11:39:42 cn03drive offline
Jun 10 11:39:47 cn03 scsi: [ID 107833 kern.warning] WARNING: 
/scsi_vhci/disk@g5000c50009723937 (sd3):
Jun 10 11:39:47 cn03drive offline
Jun 10 11:39:52 cn03 scsi: [ID 107833 kern.warning] WARNING: 
/scsi_vhci/disk@g5000c50009723937 (sd3):
Jun 10 11:39:52 cn03drive offline
Jun 10 11:39:57 cn03 scsi: [ID 107833 kern.warning] WARNING: 
/scsi_vhci/disk@g5000c50009723937 (sd3):
Jun 10 11:39:57 cn03drive offline


 
 Hot spare will not help you here. The problem is not constrained to one
 disk.
 In fact, a hot spare may be the worst thing here because it can kick in
 for the disk
 complaining about a clogged expander or spurious resets.  This causes a
 resilver
 that reads from the actual broken disk, that causes more resets, that
 kicks out another
 disk that causes a resilver, and so on.
  -- richard
 

So the warm spares could be better choice under this situation?
BTW, in what condition, the scsi reset storm will happen?
How can we be immune to this so as not to avoid interrupting the file service?


Thanks.
Fred
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs global hot spares?

2011-06-16 Thread Fred Liu
Fixing a typo in my last thread...

 -Original Message-
 From: Fred Liu
 Sent: 星期四, 六月 16, 2011 17:22
 To: 'Richard Elling'
 Cc: Jim Klimov; zfs-discuss@opensolaris.org
 Subject: RE: [zfs-discuss] zfs global hot spares?
 
  This message is from the disk saying that it aborted a command. These
  are
  usually preceded by a reset, as shown here. What caused the reset
  condition?
  Was it actually target 11 or did target 11 get caught up in the reset
  storm?
 
 
 It happed in the mid-night and nobody touched the file box.
 I assume it is the transition status before the disk is *thoroughly*
 damaged:
 
 Jun 10 09:34:11 cn03 fmd: [ID 377184 daemon.error] SUNW-MSG-ID: ZFS-
 8000-FD, TYPE: Fault, VER: 1, SEVERITY:
 
 Major
 Jun 10 09:34:11 cn03 EVENT-TIME: Fri Jun 10 09:34:11 CST 2011
 Jun 10 09:34:11 cn03 PLATFORM: X8DTH-i-6-iF-6F, CSN: 1234567890,
 HOSTNAME: cn03
 Jun 10 09:34:11 cn03 SOURCE: zfs-diagnosis, REV: 1.0
 Jun 10 09:34:11 cn03 EVENT-ID: 4f4bfc2c-f653-ed20-ab13-eef72224af5e
 Jun 10 09:34:11 cn03 DESC: The number of I/O errors associated with a
 ZFS device exceeded
 Jun 10 09:34:11 cn03 acceptable levels.  Refer to
 http://sun.com/msg/ZFS-8000-FD for more information.
 Jun 10 09:34:11 cn03 AUTO-RESPONSE: The device has been offlined and
 marked as faulted.  An attempt
 Jun 10 09:34:11 cn03 will be made to activate a hot spare if
 available.
 Jun 10 09:34:11 cn03 IMPACT: Fault tolerance of the pool may be
 compromised.
 Jun 10 09:34:11 cn03 REC-ACTION: Run 'zpool status -x' and replace the
 bad device.
 
 After I rebooted it, I got:
 Jun 10 11:38:49 cn03 genunix: [ID 540533 kern.notice] ^MSunOS Release
 5.11 Version snv_134 64-bit
 Jun 10 11:38:49 cn03 genunix: [ID 683174 kern.notice] Copyright 1983-
 2010 Sun Microsystems, Inc.  All rights
 
 reserved.
 Jun 10 11:38:49 cn03 Use is subject to license terms.
 Jun 10 11:38:49 cn03 unix: [ID 126719 kern.info] features:
 
 7f7fsse4_2,sse4_1,ssse3,cpuid,mwait,tscp,cmp,cx16,sse3,nx,asysc,ht
 t,sse2,sse,sep,pat,cx8,pae,mca,mmx,cmov,d
 
 e,pge,mtrr,msr,tsc,lgpg
 
 Jun 10 11:39:06 cn03 scsi: [ID 365881 kern.info]
 /pci@0,0/pci8086,3410@9/pci1000,72@0 (mpt_sas0):
 Jun 10 11:39:06 cn03mptsas0 unrecognized capability 0x3
 
 Jun 10 11:39:42 cn03 scsi: [ID 107833 kern.warning] WARNING:
 /scsi_vhci/disk@g5000c50009723937 (sd3):
 Jun 10 11:39:42 cn03drive offline
 Jun 10 11:39:47 cn03 scsi: [ID 107833 kern.warning] WARNING:
 /scsi_vhci/disk@g5000c50009723937 (sd3):
 Jun 10 11:39:47 cn03drive offline
 Jun 10 11:39:52 cn03 scsi: [ID 107833 kern.warning] WARNING:
 /scsi_vhci/disk@g5000c50009723937 (sd3):
 Jun 10 11:39:52 cn03drive offline
 Jun 10 11:39:57 cn03 scsi: [ID 107833 kern.warning] WARNING:
 /scsi_vhci/disk@g5000c50009723937 (sd3):
 Jun 10 11:39:57 cn03drive offline
 
 
 
  Hot spare will not help you here. The problem is not constrained to
 one
  disk.
  In fact, a hot spare may be the worst thing here because it can kick
 in
  for the disk
  complaining about a clogged expander or spurious resets.  This causes
 a
  resilver
  that reads from the actual broken disk, that causes more resets, that
  kicks out another
  disk that causes a resilver, and so on.
   -- richard
 
 
 So the warm spares could be better choice under this situation?
 BTW, in what condition, the scsi reset storm will happen?
 How can we be immune to this so as NOT to interrupt the file
 service?
 
 
 Thanks.
 Fred
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] # disks per vdev

2011-06-16 Thread Edward Ned Harvey
 From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
 boun...@opensolaris.org] On Behalf Of Lanky Doodle
 
 can you have one vdev that is a duplicate of another
 vdev? By that I mean say you had 2x 7 disk raid-z2 vdevs, instead of them
 both being used in one large pool could you have one that is a backup of
the
 other, allowing you to destroy one of them and re-build without data loss?

Well, you can't make a vdev from other vdev's, so you can't make a mirror of
raidz, if that's what you were hoping.

As Cindy mentioned, you can split mirrors...

Or you could use zfs send | zfs receive, to sync one pool to another pool.
This would not care if the architecture of the two pools are the same (the
2nd pool could have different or nonexistent redundancy.)  But this will be
based on snapshots.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] question about COW and snapshots

2011-06-16 Thread Toby Thain
On 16/06/11 3:09 AM, Simon Walter wrote:
 On 06/16/2011 09:09 AM, Erik Trimble wrote:
 We had a similar discussion a couple of years ago here, under the
 title A Versioning FS. Look through the archives for the full
 discussion.

 The jist is that application-level versioning (and consistency) is
 completely orthogonal to filesystem-level snapshots and consistency. 
 IMHO, they should never be mixed together - there are way too many
 corner cases and application-specific memes for a filesystem to ever
 fully handle file-level versioning and *application*-level data
 consistency.  Don't mistake one for the other, and, don't try to *use*
 one for the other.  They're completely different creatures.

 
 I guess that is true of the current FSs available. Though it would be
 nice to essentially have a versioning FS in the kernel rather than an
 application in userspace. But I regress. I'll use SVN and webdav.


To use Svn correctly here, you have to resolve the same issue. Svn has a
global revision, just as a snapshot is a state for an *entire*
filesystem. You don't seem to have taken that into sufficient account
when talking about ZFS; it doesn't align with your goal of consistency
from the point of view of a *single document*.

You'll only be able to make a useful snapshot in Svn at moments when
*all* documents in the repository are in a consistent state (I'm
assuming this is a multi-user system). That's a much stronger guarantee
than you probably 'require' for your purpose, so it makes me wonder
whether what you really want is a document database (or, to be honest,
an ordinary filesystem; you can snapshot single documents in an
ordinary filesystem using say hard links) where the state of each
session/document is *independent*. You can see that the latter model is
much more like Google Docs, not to mention simpler; and the snapshot
model is not like it at all.

--Toby

 
 Thanks for the advice everyone.
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
 

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Server with 4 drives, how to configure ZFS?

2011-06-16 Thread Nomen Nescio
Has there been any change to the server hardware with respect to number of
drives since ZFS has come out? Many of the servers around still have an even
number of drives (2, 4) etc. and it seems far from optimal from a ZFS
standpoint. All you can do is make one or two mirrors, or a 3 way mirror and
a spare, right? Wouldn't it make sense to ship with an odd number of drives
so you could at least RAIDZ? Or stop making provision for anything except 1
or two drives or no drives at all and require CD or netbooting and just
expect everybody to be using NAS boxes? I am just a home server user, what
do you guys who work on commercial accounts think? How are people using
these servers?
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Server with 4 drives, how to configure ZFS?

2011-06-16 Thread Jim Klimov

As recently discussed on this list, after all ZFS does not care
very much for the number of drives in a raidzN set, so optimization
is not about stripe alignment and stuff but about number of spindles,
resilver times, number of redundancy disks, etc.

In my setups with 4 identical drives in a server I typically made
a 10-20Gb rpool as a mirror of slices on a couple of drives, a
same-sized pool for swap on the other couple of drives, and
this leaves me with 4 identical-sized slices for a separate
data pool. Depending on requirements we can do any layout:
performance (raid10) vs. reliable (raidz2) vs space (raidz1).

HTH,
//Jim


2011-06-16 0:33, Nomen Nescio пишет:

Has there been any change to the server hardware with respect to number of
drives since ZFS has come out? Many of the servers around still have an even
number of drives (2, 4) etc. and it seems far from optimal from a ZFS
standpoint. All you can do is make one or two mirrors, or a 3 way mirror and
a spare, right? Wouldn't it make sense to ship with an odd number of drives
so you could at least RAIDZ? Or stop making provision for anything except 1
or two drives or no drives at all and require CD or netbooting and just
expect everybody to be using NAS boxes? I am just a home server user, what
do you guys who work on commercial accounts think? How are people using
these servers?
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] question about COW and snapshots

2011-06-16 Thread Frank Van Damme
Op 15-06-11 05:56, Richard Elling schreef:
 You can even have applications like databases make snapshots when
 they want.

Makes me think of a backup utility called mylvmbackup, which is written
with Linux in mind - basically it locks mysql tables, takes an LVM
snapshot and releases the lock (and then you backup the database files
from the snapshot). Should work at least as well with ZFS.

-- 
No part of this copyright message may be reproduced, read or seen,
dead or alive or by any means, including but not limited to telepathy
without the benevolence of the author.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] question about COW and snapshots

2011-06-16 Thread Frank Van Damme
Op 15-06-11 14:30, Simon Walter schreef:
 Anyone know how Google Docs does it?

Anyone from Google on the list? :-)

Seriously, this is the kind of feature to be found in Serious CMS
applications, like, as already mentioned, Alfresco.

-- 
No part of this copyright message may be reproduced, read or seen,
dead or alive or by any means, including but not limited to telepathy
without the benevolence of the author.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] question about COW and snapshots

2011-06-16 Thread Casper . Dik

Op 15-06-11 05:56, Richard Elling schreef:
 You can even have applications like databases make snapshots when
 they want.

Makes me think of a backup utility called mylvmbackup, which is written
with Linux in mind - basically it locks mysql tables, takes an LVM
snapshot and releases the lock (and then you backup the database files
from the snapshot). Should work at least as well with ZFS.

If a database engine or another application keeps both the data and the
log in the same filesystem, a snapshot wouldn't create inconsistent data
(I think this would be true with vim and a large number of database 
engines; vim will detect the swap file and datbase should be able to 
detect the inconsistency and rollback and re-apply the log file.)

Casper

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Server with 4 drives, how to configure ZFS?

2011-06-16 Thread Bob Friesenhahn

On Wed, 15 Jun 2011, Nomen Nescio wrote:


Has there been any change to the server hardware with respect to number of
drives since ZFS has come out? Many of the servers around still have an even
number of drives (2, 4) etc. and it seems far from optimal from a ZFS
standpoint. All you can do is make one or two mirrors, or a 3 way mirror and
a spare, right? Wouldn't it make sense to ship with an odd number of drives
so you could at least RAIDZ? Or stop making provision for anything except 1


Yes, it all seems pretty silly.  Using a small dedicated boot drive 
(maybe an SSD or Compact Flash) would make sense so that the main 
disks can all be used in one pool.  FreeBSD apparently supports 
booting from raidz so it would allow booting from a four-disk raidz 
pool.  Unfortunately, Solaris does not support that.


Given a fixed number of drive bays, there may be value to keeping one 
drive bay completely unused (hot/cold spare, or empty).  The reason 
for this is that it allows you to insert new drives in order to 
upgrade the drives in your pool, or handle the case of a broken drive 
bay.  Without the ability to insert a new drive, you need to 
compromise the safety of your pool in order to replace a drive or 
upgrade the drives to a larger size.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Server with 4 drives, how to configure ZFS?

2011-06-16 Thread Marty Scholes
 Has there been any change to the server hardware with
 respect to number of
 drives since ZFS has come out? Many of the servers
 around still have an even
 number of drives (2, 4) etc. and it seems far from
 optimal from a ZFS
 standpoint. All you can do is make one or two
 mirrors, or a 3 way mirror and
 a spare, right? 

With four drives you could also make a RAIDZ3 set, allowing you to have the 
lowest usable space, poorest performance and worst resilver times possible.

Sorry, couldn't resist.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Server with 4 drives, how to configure ZFS?

2011-06-16 Thread Eric Sproul
On Wed, Jun 15, 2011 at 4:33 PM, Nomen Nescio nob...@dizum.com wrote:
 Has there been any change to the server hardware with respect to number of
 drives since ZFS has come out? Many of the servers around still have an even
 number of drives (2, 4) etc. and it seems far from optimal from a ZFS
 standpoint.

With enterprise-level 2.5 drives hitting 1TB, I've decided to buy
only 2.5-based chassis, which typically provide 6-8 bays in a 1U form
factor.  That's more than enough to build an rpool mirror and a
raidz1+spare, raidz2, or 3x-mirror pool for data.  Having 8 bays is
also a nice fit for the typical 8-port SAS HBA.

Eric
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] # disks per vdev

2011-06-16 Thread Roy Sigurd Karlsbakk
 I have decided to bite the bullet and change to 2TB disks now rather
 than go through all the effort using 1TB disks and then maybe changing
 in 6-12 months time or whatever. The price difference between 1TB and
 2TB disks is marginal and I can always re-sell my 6x 1TB disks.
 
 I think I have also narrowed down the raid config to these 4;
 
 2x 7 disk raid-z2 with 1 hot spare - 20TB usable
 3x 5 disk raid-z2 with 0 hot spare - 18TB usable
 2x 6 disk raid-z2 with 2 hot spares - 16TB usable
 
 with option 1 probably being preferred at the moment.

I would choose option 1. I have similar configurations in production. A hot 
spare can be very good when a drive dies while you're not watching.

 I am aware that bad batches of disks do exist so I tend to either a)
 buy them in sets from different suppliers or b) use different
 manufacturers. How sensitive to different disks is ZFS, in terms of
 disk features (NCQ, RPM speed, firmware/software versions, cache etc).

For a home server, it shouldn't make much difference - the network is likely to 
be the bottleneck anyway. If you choose drives with different spin rate in a 
pool/vdev, the lower ones will probably pull down performance, so if you're 
considering green drives, you should use that for all the drives. Mixing 
Seagate, Samsung and Western drives should work well for this.

Vennlige hilsener / Best regards

roy
--
Roy Sigurd Karlsbakk
(+47) 97542685
r...@karlsbakk.net
http://blogg.karlsbakk.net/
--
I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er 
et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av 
idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og 
relevante synonymer på norsk.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] # disks per vdev

2011-06-16 Thread Richard Elling
On Jun 16, 2011, at 2:07 AM, Lanky Doodle wrote:

 Thanks guys.
 
 I have decided to bite the bullet and change to 2TB disks now rather than go 
 through all the effort using 1TB disks and then maybe changing in 6-12 months 
 time or whatever. The price difference between 1TB and 2TB disks is marginal 
 and I can always re-sell my 6x 1TB disks.
 
 I think I have also narrowed down the raid config to these 4;
 
 2x 7 disk raid-z2 with 1 hot spare - 20TB usable
 3x 5 disk raid-z2 with 0 hot spare - 18TB usable
 2x 6 disk raid-z2 with 2 hot spares - 16TB usable
 
 with option 1 probably being preferred at the moment.

Sounds good to me.

 
 I am aware that bad batches of disks do exist so I tend to either a) buy them 
 in sets from different suppliers or b) use different manufacturers. How 
 sensitive to different disks is ZFS, in terms of disk features (NCQ, RPM 
 speed, firmware/software versions, cache etc).

Actually, ZFS has no idea it is talking to a disk. ZFS uses block devices. So 
there is nothing
in ZFS that knows about NCQ, speed, or any of those sorts of attributes. For 
the current disk
drive market, you really don't have much choice... most vendors offer very 
similar disks.
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Versioning FS was: question about COW and snapshots

2011-06-16 Thread Yaverot


--- erik.trim...@oracle.com wrote:
| So, if your apps have to be programmed to be 
| versioning/consistency/checkmarking aware in any case, how 
| would having a fancy Versioning filesystem be any better 
| than using what we do now? 
| (i.e. svn/hg/cvs/git on top of ZFS/btrfs/et al)   
| ZFS at least makes significant practical advances by rolling 
| the logical volume manager into the filesystem level, but 
| I can't see any such advantage for a Versioning FS.

Given what I've read here. Then the advantage of /a/ versioning FS, would be to 
have calls that make it easy for the app to version/checkmark/rollback 
individual files, and not have to worry about the details on how that is 
handled. The FS can make multiple copies, or just store deltas as it sees 
appropriate. The app can look for matching revision tags and/or auto-rollback 
on corrupt files.

So the magic FS a few people want (and I wouldn't mind) can't exist.  But 
having an interface, AND getting apps to use it, when that is common enough 
between multiple OSes/FSes...

So the problems are properly defining the interface (technical).  Getting 
enough support between the major file systems (social), dealing with slow 
upgrading and backwards capability (time), and then finally getting enough apps 
using the interface (technical  social).

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Resizing ZFS partition, shrinking NTFS?

2011-06-16 Thread Clive Meredith
Problem:

I currently run a duel boot machine with a 45Gb partition for Win7 Ultimate and 
a 25Gb partition for OpenSolaris 10 (134).  I need to shrink NTFS to 20Gb and 
increase the ZFS partion to 45Gb.  Is this possible please?  I have looked at 
using the partition tool in OpenSolaris but both partition are locked, even 
under admin.  Win7 won't allow me to shrink the dynamic volume, as the Finsh 
button is always greyed out, so no luck in that direction.

Thanks in advance.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Resizing ZFS partition, shrinking NTFS?

2011-06-16 Thread Cindy Swearingen

Hi Clive,

What you are asking is not recommended nor supported and could render
your ZFS root pool unbootable. (I'm not saying that some expert
couldn't do it, but its risky, like data corruption risky.)

ZFS expects the partition boundaries to remain the same unless you
replace the original disk with another disk, attach another disk and
detach the original disk, or expand a pool's an underlying LUN.

If you have a larger disk, this is what I would recommend. Attach the
larger disk and the detach the smaller disk. The full steps are
documented on the solarisinternals.com wiki, ZFS troubleshooting
section, replacing the root pool disk steps.


Thanks,

Cindy


On 06/16/11 13:21, Clive Meredith wrote:

Problem:

I currently run a duel boot machine with a 45Gb partition for Win7 Ultimate and 
a 25Gb partition for OpenSolaris 10 (134).  I need to shrink NTFS to 20Gb and 
increase the ZFS partion to 45Gb.  Is this possible please?  I have looked at 
using the partition tool in OpenSolaris but both partition are locked, even 
under admin.  Win7 won't allow me to shrink the dynamic volume, as the Finsh 
button is always greyed out, so no luck in that direction.

Thanks in advance.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Versioning FS was: question about COW and snapshots

2011-06-16 Thread Nico Williams
As Casper pointed out, the right thing to do is to build applications
such that they can detect mid-transaction state and roll it back (or
forward, if there's enough data).  Then mid-transaction snapshots are
fine, and the lack of APIs by which to inform the filesystem of
application transaction boundaries becomes much less of an issue
(adding such APIs is not a good solution, since it'd take many years
for apps to take advantage of them and more years still for legacy
apps to be upgraded or decomissioned).

The existing FS interfaces provide enough that one can build
applications this way.

Nico
--
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] question about COW and snapshots

2011-06-16 Thread Nico Williams
On Thu, Jun 16, 2011 at 8:51 AM,  casper@oracle.com wrote:
 If a database engine or another application keeps both the data and the
 log in the same filesystem, a snapshot wouldn't create inconsistent data
 (I think this would be true with vim and a large number of database
 engines; vim will detect the swap file and datbase should be able to
 detect the inconsistency and rollback and re-apply the log file.)

Correct.  SQLite3 will be able to recover automatically from restores
of mid-transaction snapshots.

VIM does not recover automatically, but it does notice the swap file
and warns the user and gives them a way to handle the problem.

(When you save a file, VIM renames the old one out of the way, creates
a new file with the original name, writes the new contents to it,
closes it, then unlinks the swap file.  On recovery VIM notices the
swap file and gives the user a menu of choices.)

I believe this is the best solution: write applications so they can
recover from being restarted with data restored from a mid-transaction
snapshot.

Nico
--
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] question about COW and snapshots

2011-06-16 Thread Nico Williams
That said, losing committed transactions when you needed and thought
you had ACID semantics... is bad.  But that's implied in any
restore-from-backups situation.  So you replicate/distribute
transactions so that restore from backups (or snapshots) is an
absolutely last resort matter, and if you ever have to restore from
backups you also spend time manually tracking down (from
counterparties, paper trails kept elsewhere, ...) any missing
transactions.

Nico
--
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] question about COW and snapshots

2011-06-16 Thread Richard Elling
On Jun 16, 2011, at 12:09 AM, Simon Walter wrote:

 On 06/16/2011 09:09 AM, Erik Trimble wrote:
 We had a similar discussion a couple of years ago here, under the title A 
 Versioning FS. Look through the archives for the full discussion.
 
 The jist is that application-level versioning (and consistency) is 
 completely orthogonal to filesystem-level snapshots and consistency.  IMHO, 
 they should never be mixed together - there are way too many corner cases 
 and application-specific memes for a filesystem to ever fully handle 
 file-level versioning and *application*-level data consistency.  Don't 
 mistake one for the other, and, don't try to *use* one for the other.  
 They're completely different creatures.
 
 
 I guess that is true of the current FSs available. Though it would be nice to 
 essentially have a versioning FS in the kernel rather than an application in 
 userspace.

You can run OpenVMS :-)
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] question about COW and snapshots

2011-06-16 Thread Paul Kraus
On Thu, Jun 16, 2011 at 4:20 PM, Richard Elling
richard.ell...@gmail.com wrote:

 You can run OpenVMS :-)

Since *you* brought it up (I was not going to :-), how does VMS'
versioning FS handle those issues ?

I know that SAM-FS has rules for _when_ copies of a file are made, so
that intermediate states are not captured. The last time I touched
SAM-FS there was _not_ a nice user interface to the previous version,
you had to trudge through log files and then pull the version you
wanted directly from secondary storage (but they did teach us how to
that in the SAM-FS / QFS class).

-- 
{1-2-3-4-5-6-7-}
Paul Kraus
- Senior Systems Architect, Garnet River ( http://www.garnetriver.com/ )
- Sound Coordinator, Schenectady Light Opera Company (
http://www.sloctheater.org/ )
- Technical Advisor, RPI Players
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] question about COW and snapshots

2011-06-16 Thread Freddie Cash
The OpenVMS filesystem is what you are looking for.

On Thu, Jun 16, 2011 at 12:09 AM, Simon Walter si...@gikaku.com wrote:

 On 06/16/2011 09:09 AM, Erik Trimble wrote:

 We had a similar discussion a couple of years ago here, under the title A
 Versioning FS. Look through the archives for the full discussion.

 The jist is that application-level versioning (and consistency) is
 completely orthogonal to filesystem-level snapshots and consistency.  IMHO,
 they should never be mixed together - there are way too many corner cases
 and application-specific memes for a filesystem to ever fully handle
 file-level versioning and *application*-level data consistency.  Don't
 mistake one for the other, and, don't try to *use* one for the other.
  They're completely different creatures.


 I guess that is true of the current FSs available. Though it would be nice
 to essentially have a versioning FS in the kernel rather than an application
 in userspace. But I regress. I'll use SVN and webdav.

 Thanks for the advice everyone.
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss




-- 
Freddie Cash
fjwc...@gmail.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] OpenIndiana | ZFS | scrub | network | awful slow

2011-06-16 Thread Sven C. Merckens
Hi roy, Hi Dan,

many thanks for Your responses.

I am using napp-it to control the OpenSolaris-Systems
The napp-it-interface shows a dedup factor of 1.18x on System 1 and 1.16x on 
System 2.
Dedup is on always (not only at the start), also compression is activated:
System 1 = compression on (lzib?)
System 2 = compression on (gzip-6)
compression rates:
System 1 = 1.10x
System 2 = 1.48x

compression and dedup were some of the primary reasons to choose zfs in this 
situation.

We tried more RAM (48GB) at the beginning to check if this would do anything on 
performance. But it did not, but we had only about 3-4TB of data on the storage 
at that time (performance was good). So I will order some RAM-modules and 
double the RAM to 48GB.

The RAM usage is about 21GB, 3GB free (on both systems after a while). At the 
start (after a few hours of usage and only 3-4TB of data) the usage was 
identical. I read on some places, that zfs will use all memory, leaving only 
1GB left (so I thought the RAM wasn't used completely). The swap isn't used by 
the system, 

Now the systems are idle and the RAM-usage is very little:
top:

System 1
Memory: 24G phys mem, 21G free mem, 12G total swap, 12G free swap

System 2
Memory: 24G phys mem, 21G free mem, 12G total swap, 12G free swap


Starting to read about 12GB from System 2 and RAM-Usage goes up to
performance is at about 65-70MB/s via GigaBit (iSCSI).
Memory: 24G phys mem, 3096M free mem, 12G total swap, 12G free swap

ok, I understand, more RAM will be no fault.. ;)

On System 1 there is no such massive change in RAM usage while copying files to 
and from the volume.
But the Performance is only about 20MB/s via GigaBit (iSCSI).

So RAM can´t be the issue on System 1 (which has more data stored).
This system ist also equipped with a 240GB SSD used for L2ARC at the second 
LSI-controller inside the server-enclosure.


Roy:
But is the L2ARC also important while writing to the device? Because the 
storeges are used most of the time only for writing data on it, the Read-Cache 
(as I thought) isn´t a performance-factor... Please correct me, if my thoughts 
are wrong...

But this is only a small cost-addition, to add also a 120GB/240GB OCZ Vertex 
2 SSD to System 2 (≈ 150/260 Euro). I will give it a try.

Would it be better to add the SSD to the LSI-Controller (put it in the 
JBOD-Storage) or put it in the server enclosure itself and connect it to the 
internal SATA-Controller?


Do You have any tips for the settings of the dataset? 
These are the settings


PROPERTYSystem 1System 2
 used   34.4T   19.4T
 available  10.7T   40.0T
 referenced 34.4T   19.4T
 compressratio  1.10x   1.43x
 mountedyes yes
 quota  nonenone
 reservationnonenone
 recordsize 128K128k
 mountpoint /   /
 sharenfs   off off
 checksum   on  on
 compressionon  gzip
 atime  off off
 deviceson  on
 exec   on  on
 setuid on  on
 readonly   off off
 zoned  off off
 snapdirhidden  hidden
 aclinherit passthrough passthrough
 canmount   on  on
 xattr  on  on
 copies 1   1
 version5   5
 utf8only   off off
 normalization  nonenone
 casesensitivityinsensitive 
insensitive
 vscan  off off
 nbmand off off
 sharesmb   off off
 refquota   nonenone
 refreservation nonenone
 primarycache   metadataall
 secondarycache all   

Re: [zfs-discuss] OpenIndiana | ZFS | scrub | network | awful slow

2011-06-16 Thread Bill Sommerfeld
On 06/16/11 15:36, Sven C. Merckens wrote:
 But is the L2ARC also important while writing to the device? Because
 the storeges are used most of the time only for writing data on it,
 the Read-Cache (as I thought) isn´t a performance-factor... Please
 correct me, if my thoughts are wrong.

if you're using dedup, you need a large read cache even if you're only
doing application-layer writes, because you need fast random read access
to the dedup tables while you write.

- Bill


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] question about COW and snapshots

2011-06-16 Thread Erik Trimble

On 6/16/2011 1:32 PM, Paul Kraus wrote:

On Thu, Jun 16, 2011 at 4:20 PM, Richard Elling
richard.ell...@gmail.com  wrote:


You can run OpenVMS :-)

Since *you* brought it up (I was not going to :-), how does VMS'
versioning FS handle those issues ?

It doesn't, per se.  VMS's filesystem has a versioning concept (i.e. 
every time you do a close() on a file, it creates a new file with the 
version number appended, e.g.  foo;1  and foo;2  are the same file, 
different versions).  However, it is completely missing the rest of the 
features we're talking about, like data *consistency* in that file. It's 
still up to the app using the file to figure out what data consistency 
means, and such.  Really, all VMS adds is versioning, nothing else (no 
API, no additional features, etc.).



I know that SAM-FS has rules for _when_ copies of a file are made, so
that intermediate states are not captured. The last time I touched
SAM-FS there was _not_ a nice user interface to the previous version,
you had to trudge through log files and then pull the version you
wanted directly from secondary storage (but they did teach us how to
that in the SAM-FS / QFS class).


I'd have to look, but I *think* there is a better way to get to the file 
history/version information now.


--
Erik Trimble
Java Platform Group Infrastructure
Mailstop:  usca22-317
Phone:  x17195
Santa Clara, CA
Timezone: US/Pacific (UTC-0800)

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Resizing ZFS partition, shrinking NTFS?

2011-06-16 Thread John D Groenveld
In message 444915109.61308252125289.JavaMail.Twebapp@sf-app1, Clive Meredith 
writes:
I currently run a duel boot machine with a 45Gb partition for Win7 Ultimate an
d a 25Gb partition for OpenSolaris 10 (134).  I need to shrink NTFS to 20Gb an
d increase the ZFS partion to 45Gb.  Is this possible please?  I have looked a
t using the partition tool in OpenSolaris but both partition are locked, even 
under admin.  Win7 won't allow me to shrink the dynamic volume, as the Finsh b
utton is always greyed out, so no luck in that direction.

Shrink the NTFS filesystem first.
I've used the Knoppix LiveCD against a defragmented NTFS.

Then use beadm(1M) to duplicate your OpenSolaris BE to
a USB drive and also send snapshots of any other rpool ZFS
there.

Then I would boot the USB drive, run format, fdisk and recreate
the Solaris fdisk partition on your system, recreate the rpool
on slice 0 of that fdisk partition, use beadm(1M) to copy
your BE back to your new rpool, and then restore any other ZFS
from those snapshots.

John
groenv...@acm.org

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] # disks per vdev

2011-06-16 Thread Daniel Carosone
On Thu, Jun 16, 2011 at 07:06:48PM +0200, Roy Sigurd Karlsbakk wrote:
  I have decided to bite the bullet and change to 2TB disks now rather
  than go through all the effort using 1TB disks and then maybe changing
  in 6-12 months time or whatever. The price difference between 1TB and
  2TB disks is marginal and I can always re-sell my 6x 1TB disks.
  
  I think I have also narrowed down the raid config to these 4;
  
  2x 7 disk raid-z2 with 1 hot spare - 20TB usable
  3x 5 disk raid-z2 with 0 hot spare - 18TB usable
  2x 6 disk raid-z2 with 2 hot spares - 16TB usable
  
  with option 1 probably being preferred at the moment.
 
 I would choose option 1. I have similar configurations in
 production. A hot spare can be very good when a drive dies while
 you're not watching. 

I would probably also go for option 1, with some additional
considerations:

1 - are the 2 vdevs in the same pool, or two separate pools?

If the majority of your bulk data can be balanced manually or by
application software across 2 filesystems/pools, this offers you the
opportunity to replicate smaller more critical data between pools (and
controllers).  This offers better protection against whole-pool
problems (bugs, fat fingers).  With careful arrangement, you could
even have one pool spun down most of the time. 

You mentioned something early on that implied this kind of thinking,
but it seems to have gone by the wayside since.

If you can, I would recommend 2 pools if you go for 2
vdevs. Conversely, in one pool, you might as well go for 15xZ3 since
even this will likely cover performance needs (and see #4).

2 - disk purchase schedule

With 2 vdevs, regardless of 1 or 2 pools, you could defer purchase of
half the 2Tb drives.  With 2 pools, you can use the 6x1Tb and change
that later to 7x with the next purchase, with some juggling of
data. You might be best to buy 1 more 1Tb to get the shape right at 
the start for in-place upgrades, and in a single pool this is
essentially mandatory.

By the time you need more space to buy the second tranche of drives,
3+Tb drives may be the better option.

3 - spare temperature

for levels raidz2 and better, you might be happier with a warm spare
and manual replacement, compared to overly-aggressive automated
replacement if there is a cascade of errors.  See recent threads.

You may also consider a cold spare, leaving a drive bay free for
disks-as-backup-tapes swapping.  If you replace the 1Tb's now,
repurpose them for this rather than reselling.  

Whatever happens, if you have a mix of drive sizes, your spare should
be of the larger size. Sorry for stating the obvious! :-)

4 - the 16th port

Can you find somewhere inside the case for an SSD as L2ARC on your
last port?  Could be very worthwhile for some of your other data and
metadata (less so the movies).

--
Dan.

pgpFH3G2Esfc9.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Server with 4 drives, how to configure ZFS?

2011-06-16 Thread Edward Ned Harvey
 From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
 boun...@opensolaris.org] On Behalf Of Nomen Nescio
 
 Has there been any change to the server hardware with respect to number
 of
 drives since ZFS has come out? Many of the servers around still have an
even
 number of drives (2, 4) etc. and it seems far from optimal from a ZFS
 standpoint. 

I don't see the problem. Install the OS onto a mirrored partition, and
configure all the remaining storage however you like - raid or mirror or
whatever.

My personal preference, assuming 4 disks, since the OS is mostly reads and
only a little bit of writes, is to create a 4-way mirrored 100G partition
for the OS, and the remaining 900G of each disk (or whatever) becomes either
a stripe of mirrors or raidz, as appropriate in your case, for the
storagepool.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] write cache partial-disk pools (was Server with 4 drives, how to configure ZFS?)

2011-06-16 Thread Daniel Carosone
On Thu, Jun 16, 2011 at 09:15:44PM -0400, Edward Ned Harvey wrote:
 My personal preference, assuming 4 disks, since the OS is mostly reads and
 only a little bit of writes, is to create a 4-way mirrored 100G partition
 for the OS, and the remaining 900G of each disk (or whatever) becomes either
 a stripe of mirrors or raidz, as appropriate in your case, for the
 storagepool.

Is it still the case, as it once was, that allocating anything other
than whole disks as vdevs forces NCQ / write cache off on the drive
(either or both, forget which, guess write cache)? 

If so, can this be forced back on somehow to regain performance when
known to be safe?  

I think the original assumption was that zfs-in-a-partition likely
implied the disk was shared with ufs, rather than another async-safe
pool. 

--
Dan.



pgpV5GoIYjQNs.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] write cache partial-disk pools (was Server with 4 drives, how to configure ZFS?)

2011-06-16 Thread Edward Ned Harvey
 From: Daniel Carosone [mailto:d...@geek.com.au]
 Sent: Thursday, June 16, 2011 10:27 PM
 
 Is it still the case, as it once was, that allocating anything other
 than whole disks as vdevs forces NCQ / write cache off on the drive
 (either or both, forget which, guess write cache)?

I will only say, that regardless of whether or not that is or ever was true,
I believe it's entirely irrelevant.  Because your system performs read and
write caching and buffering in ram, the tiny little ram on the disk can't
possibly contribute anything.

When it comes to reads:  The OS does readahead more intelligently than the
disk could ever hope.  Hardware readahead is useless.

When it comes to writes:  Categorize as either async or sync.

When it comes to async writes:  The OS will buffer and optimize, and the
applications have long since marched onward before the disk even sees the
data.  It's irrelevant how much time has elapsed before the disk finally
commits to platter.

When it comes to sync writes:  The write will not be completed, and the
application will block, until all the buffers have been flushed.  Both ram
and disk buffer.  So neither the ram nor disk buffer is able to help you.

It's like selling usb fobs labeled USB2 or USB3.  If you look up or measure
the actual performance of any one of these devices, they can't come anywhere
near the bus speed...  In fact, I recently paid $45 for a USB3 16G fob,
which is finally able to achieve 380 Mbit.  Oh, thank goodness I'm no longer
constrained by that slow 480 Mbit bus...   ;-)   Even so, my new fob is
painfully slow compared to a normal cheap-o usb2 hard disk.  They just put
these labels on there because it's a marketing requirement.  Something that
formerly mattered one day, but people still use as a purchasing decider.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] write cache partial-disk pools (was Server with 4 drives, how to configure ZFS?)

2011-06-16 Thread Daniel Carosone
On Thu, Jun 16, 2011 at 10:40:25PM -0400, Edward Ned Harvey wrote:
  From: Daniel Carosone [mailto:d...@geek.com.au]
  Sent: Thursday, June 16, 2011 10:27 PM
  
  Is it still the case, as it once was, that allocating anything other
  than whole disks as vdevs forces NCQ / write cache off on the drive
  (either or both, forget which, guess write cache)?
 
 I will only say, that regardless of whether or not that is or ever was true,
 I believe it's entirely irrelevant.  Because your system performs read and
 write caching and buffering in ram, the tiny little ram on the disk can't
 possibly contribute anything.

I disagree.  It can vastly help improve the IOPS of the disk and keep
the channel open for more transactions while one is in progress.
Otherwise, the channel is idle, blocked on command completion, while
the heads seek. 

 When it comes to reads:  The OS does readahead more intelligently than the
 disk could ever hope.  Hardware readahead is useless.

Little argument here, although the disk is aware of physical geometry
and may well read an entire track. 

 When it comes to writes:  Categorize as either async or sync.
 
 When it comes to async writes:  The OS will buffer and optimize, and the
 applications have long since marched onward before the disk even sees the
 data.  It's irrelevant how much time has elapsed before the disk finally
 commits to platter.

To the application in he short term, but not to the system. TXG closes
have to wait for that, and applications have to wait for those to
close so the next can open and accept new writes.

 When it comes to sync writes:  The write will not be completed, and the
 application will block, until all the buffers have been flushed.  Both ram
 and disk buffer.  So neither the ram nor disk buffer is able to help you.

Yes. With write cache on in the drive, and especially with multiple
outstanding commands, the async writes can all be streamed quickly to
the disk. Then a cache sync can be issued, before the sync/FUA writes
to close the txg are done.

Without write cache, each async write (though deferred and perhaps
coalesced) is synchronous to platters.  This adds latency and
decreases IOPS, impacting other operations (reads) as well.
Please measure it, you will find this impact significant and even
perhaps drastic for some quite realistic workloads.

All this before the disk write cache has any chance to provide
additional benefit by seek optimisations - ie, regardless of whether
it is succesful or not in doing so.  

--
Dan.

pgpCzO1l9K1Um.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] write cache partial-disk pools (was Server with 4 drives, how to configure ZFS?)

2011-06-16 Thread Neil Perrin

On 06/16/11 20:26, Daniel Carosone wrote:

On Thu, Jun 16, 2011 at 09:15:44PM -0400, Edward Ned Harvey wrote:
  

My personal preference, assuming 4 disks, since the OS is mostly reads and
only a little bit of writes, is to create a 4-way mirrored 100G partition
for the OS, and the remaining 900G of each disk (or whatever) becomes either
a stripe of mirrors or raidz, as appropriate in your case, for the
storagepool.



Is it still the case, as it once was, that allocating anything other
than whole disks as vdevs forces NCQ / write cache off on the drive
(either or both, forget which, guess write cache)?


It was once the case that using a slice as a vdev forced the write cache 
off,
but I just tried it and found it wasn't disabled - at least with the 
current source.

In fact it looks like we no longer change the setting.
You may want to experiment yourself on your ZFS version (see below for 
how the check).


 


If so, can this be forced back on somehow to regain performance when
known to be safe?  
  


Yes: format -e- select disk -  cache -  write - 
display/enable/disable

I think the original assumption was that zfs-in-a-partition likely
implied the disk was shared with ufs, rather than another async-safe
pool.


- Correct.


Neil.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Resizing ZFS partition, shrinking NTFS?

2011-06-16 Thread Michael Schuster

On 17.06.2011 01:44, John D Groenveld wrote:

In message444915109.61308252125289.JavaMail.Twebapp@sf-app1, Clive Meredith
writes:

I currently run a duel boot machine with a 45Gb partition for Win7 Ultimate an
d a 25Gb partition for OpenSolaris 10 (134).  I need to shrink NTFS to 20Gb an
d increase the ZFS partion to 45Gb.  Is this possible please?  I have looked a
t using the partition tool in OpenSolaris but both partition are locked, even
under admin.  Win7 won't allow me to shrink the dynamic volume, as the Finsh b
utton is always greyed out, so no luck in that direction.


Shrink the NTFS filesystem first.
I've used the Knoppix LiveCD against a defragmented NTFS.

Then use beadm(1M) to duplicate your OpenSolaris BE to
a USB drive and also send snapshots of any other rpool ZFS
there.


I'd suggest a somewhat different approach:
1) boot a live cd and use something like parted to shrink the NTFS partition
2) create a new partition without FS in the space now freed from NTFS
3) boot OpenSolaris, add the partition from 2) as vdev to your zpool.

HTH
Michael
--
Michael Schuster
http://recursiveramblings.wordpress.com/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss