Re: [zfs-discuss] Implementing fbarrier() on ZFS

2007-02-12 Thread Jeff Bonwick

That is interesting. Could this account for disproportionate kernel
CPU usage for applications that perform I/O one byte at a time, as
compared to other filesystems? (Nevermind that the application
shouldn't do that to begin with.)


No, this is entirely a matter of CPU efficiency in the current code.
There are two issues; we know what they are; and we're fixing them.

The first is that as we translate from znode to dnode, we throw away
information along the way -- we go from znode to object number (fast),
but then we have to do an object lookup to get from object number to
dnode (slow, by comparison -- or more to the point, slow relative to
the cost of writing a single byte).  But this is just stupid, since
we already have a dnode pointer sitting right there in the znode.
We just need to fix our internal interfaces to expose it.

The second problem is that we're not very fast at partial-block
updates.  Again, this is entirely a matter of code efficiency,
not anything fundamental.


I still would love to see something like fbarrier() defined by some
standrd (de facto or otherwise) to make the distinction between
ordered writes and guaranteed persistence more easily exploited in the
general case for applications, and encourage filesystems/storage
systems to optimize for that case (i.e., not have fbarrier() simply
fsync()).


Totally agree.

Jeff
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Implementing fbarrier() on ZFS

2007-02-12 Thread Peter Schuller
> I agree about the usefulness of fbarrier() vs. fsync(), BTW.  The cool
> thing is that on ZFS, fbarrier() is a no-op.  It's implicit after
> every system call.

That is interesting. Could this account for disproportionate kernel
CPU usage for applications that perform I/O one byte at a time, as
compared to other filesystems? (Nevermind that the application
shouldn't do that to begin with.)

But the fact that you effectively have an fbarrier() is extremely
nice. Guess that is yet another reason to prefer ZFS for certrain
(granted, very specific) cases.

I still would love to see something like fbarrier() defined by some
standrd (de facto or otherwise) to make the distinction between
ordered writes and guaranteed persistence more easily exploited in the
general case for applications, and encourage filesystems/storage
systems to optimize for that case (i.e., not have fbarrier() simply
fsync()).

-- 
/ Peter Schuller

PGP userID: 0xE9758B7D or 'Peter Schuller <[EMAIL PROTECTED]>'
Key retrieval: Send an E-Mail to [EMAIL PROTECTED]
E-Mail: [EMAIL PROTECTED] Web: http://www.scode.org

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Implementing fbarrier() on ZFS

2007-02-12 Thread Peter Schuller
> That said, actually implementing the underlying mechanisms may not be
> worth the trouble.  It is only a matter of time before disks have fast
> non-volatile memory like PRAM or MRAM, and then the need to do
> explicit cache management basically disappears.

I meant fbarrier() as a syscall exposed to userland, like fsync(), so
that userland applications can achieve ordered semantics without
synchronous writes. Whether or not ZFS in turn manages to eliminate
synchronous writes by some feature of the underlying storage mechanism
is a separate issue. But even if not, an fbarrier() exposes an
asynchronous method of ensuring relative order of I/O operations to
userland, which is often useful.

-- 
/ Peter Schuller

PGP userID: 0xE9758B7D or 'Peter Schuller <[EMAIL PROTECTED]>'
Key retrieval: Send an E-Mail to [EMAIL PROTECTED]
E-Mail: [EMAIL PROTECTED] Web: http://www.scode.org

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Implementing fbarrier() on ZFS

2007-02-12 Thread Jeff Bonwick

Do you agree that their is a major tradeoff of
"builds up a wad of transactions in memory"?


I don't think so.  We trigger a transaction group commit when we
have lots of dirty data, or 5 seconds elapse, whichever comes first.
In other words, we don't let updates get stale.

Jeff

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Implementing fbarrier() on ZFS

2007-02-12 Thread Erblichs
Jeff Bonwick,

Do you agree that their is a major tradeoff of
"builds up a wad of transactions in memory"?

We loose the changes if we have an unstable
environment.

Thus, I don't quite understand why a 2-phase
approach to commits isn't done. First, take the
transactions as they come and do a minimal amount
of a delayed write. If the number of transactions
build up, then convert to the delayed write scheme.

This assumption is that not all ZFS envs are write
heavy versus write once and read-many type accesses.
My assumption is that attribute/meta reading
outweighs all other accesses.

Wouldn't this approach allow minimal outstanding
transactions and favor read access. Yes, the assumption
is that once the "wad" is started, the amount of writing
could be substantial and thus the amount of available
bandwidth for reading is reduced. This would then allow
for a more N states to be available. Right?

Second, their are a multiple uses  of "then: (then pushes,
then flushes all disk..., then writes the new uberblock,
then flushes the caches again), in which seems to have
some level of possible parallelism which should reduce the
latency from the start to the final write. Or did you just
say that for simplicity sake?

Mitchell Erblich
---


Jeff Bonwick wrote:
> 
> Toby Thain wrote:
> > I'm no guru, but would not ZFS already require strict ordering for its
> > transactions ... which property Peter was exploiting to get "fbarrier()"
> > for free?
> 
> Exactly.  Even if you disable the intent log, the transactional nature
> of ZFS ensures preservation of event ordering.  Note that disk caches
> don't come into it: ZFS builds up a wad of transactions in memory,
> then pushes them out as a transaction group.  That entire group will
> either commit or not.  ZFS writes all the new data to new locations,
> then flushes all disk write caches, then writes the new uberblock,
> then flushes the caches again.  Thus you can lose power at any point
> in the middle of committing transaction group N, and you're guaranteed
> that upon reboot, everything will either be at state N or state N-1.
> 
> I agree about the usefulness of fbarrier() vs. fsync(), BTW.  The cool
> thing is that on ZFS, fbarrier() is a no-op.  It's implicit after
> every system call.
> 
> Jeff
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Re: ZFS vs NFS vs array caches, revisited

2007-02-12 Thread Marion Hakanson
[EMAIL PROTECTED] said:
> [b]How the ZFS striped on 7 slices of FC-SATA LUN via NFS worked [u]146 times
> faster[/u] than the ZFS on 1 slice of the same LUN via NFS???[/b] 

Well, I do have more info to share on this issue, though how it worked
faster in that test still remains a mystery.  Folks may recall that I said:

> Not that I'm not complaining, mind you.  I appear to have stumbled across a
> way to get NFS over ZFS to work at a reasonable speed, without making changes
> to the array (nor resorting to giving ZFS SVN soft partitions instead of
> "real" devices).  Suboptimal, mind you, but it's workable if our Hitachi
> folks don't turn up a way to tweak the array.

Unfortunately, I was wrong.  I _don't_ know how to make it go fast.  While
I _have_ been able to reproduce the result on a couple different LUN/slice
configurations, I don't know what triggers the "fast" behavior.  All I can
say for sure is that a little dtrace one-liner that counts sync-cache calls
turns up no such calls (for both local ZFS and remote NFS extracts) when
things are going fast on a particular filesystem.

By comparison, a local ZFS tar-extraction triggers 12 sync-cache calls,
and one hits 288 such calls during an NFS extraction before interrupting
the run after 30 seconds (est. 1/100th of the way through) when things
are working in the "slow" mode.  Oh yeah, here's the one-liner (type in
the command, run your test in another session, then hit ^C on this one):

  dtrace -n fbt::ssd_send_scsi_SYNCHRONIZE_CACHE:entry'[EMAIL PROTECTED] = 
count()}'

This is my first ever use of dtrace, so please be gentle with me (:-).


[EMAIL PROTECTED] said:
> Guess I should go read the ZFS source code (though my 10U3 surely lags the
> Opensolaris stuff). 

I did go read the source code, for my own edification.  To reiterate what
was said earlier:

[EMAIL PROTECTED] said:
> The point is that the flushes occur whether or not ZFS turned the caches on
> or not (caches might be turned on by some other means outside the visibility
> of ZFS). 

My limited reading of ZFS (on opensolaris.org site) code so far has turned
up no obvious way to make ZFS skip the sync-cache call.  However my dtrace
test, unless it's flawed, shows that on some filesystems, the call is made,
and on other filesystems the call is not made.


[EMAIL PROTECTED] said:
> 2.I never saw the storage controller with cache-per-LUN setting. Cache size
> doesn't depend on number of LUNs IMHO, it's a fixed size per controller or
> per FC port, SAN-experts-please-fix-me-if-I'm-wrong. 

Robert has already mentioned array cache being reserved on a per-LUN basis
in Symmetrix boxes.  Our low-end HDS unit also has cache pre-fetch settings
on a per-LUN basis (defaults according to number of disks in RAID-group).

Regards,

Marion


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] SPEC SFS testing of NFS/ZFS/B56

2007-02-12 Thread Leon Koll
Hello,
I am running SPEC SFS benchmark [1] on dual Xeon 2.80GHz box with 4GB memory. 
More details:
snv_56, zil_disable=1, zfs_arc_max = 0x8000 #2GB
Configurations that were tested: 
160 dirs/1 zfs/1 zpool/4 SAN LUNs 
160 zfs'es/1 zpool/4 SAN LUNs
40 zfs'es/4 zpools/4 SAN LUNs
One zpool was created on 4 SAN LUNs. The SAN storage array used doesn't honor 
flush cache commands. 
NFSD_SERVERS=1024, NFS3 via UDP was used.
Max. number of obtained SPEC NFS IOPS: 5K
Max. number of SPEC NFS IOPS for SVM/VxFS configuration obtained before: 24K [2]
So we have almost a five-times difference. Can we improve this? How can we 
accelerate this NFS/ZFS setup?
Two serious problems were observed:
1.Degradation of benchmark results of the same setup. The same benchmark gave 
first time 4030 IOPS, when was ran second time - 2037 IOPS.
2.When 4 zpools were used instead of 1, the result was degraded about 4 times.

The benchmark report shows abnormally high part of [b]readdirplus[/b] 
operations that reached 50% of the test time. It's part in SFS mix is: 9%. Does 
it point to some known problem? Increasing of DNLC size doesn't help in case 
ZFS, I checked this.
I will appreciate your help very much. This testing is a part of preparation 
for production deployment. I will provide any additional information that may 
be needed.

Thank you,
[i]-- leon[/i]










[1] http://www.spec.org/osg/sfs/
[2] http://napobo3.blogspot.com/2006/08/spec-sfs-bencmark-of-zfsufsvxfs.html
[3] http://www.opensolaris.org/jive/thread.jspa?threadID=23263
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Implementing fbarrier() on ZFS

2007-02-12 Thread Jeff Bonwick

Toby Thain wrote:
I'm no guru, but would not ZFS already require strict ordering for its 
transactions ... which property Peter was exploiting to get "fbarrier()" 
for free?


Exactly.  Even if you disable the intent log, the transactional nature
of ZFS ensures preservation of event ordering.  Note that disk caches
don't come into it: ZFS builds up a wad of transactions in memory,
then pushes them out as a transaction group.  That entire group will
either commit or not.  ZFS writes all the new data to new locations,
then flushes all disk write caches, then writes the new uberblock,
then flushes the caches again.  Thus you can lose power at any point
in the middle of committing transaction group N, and you're guaranteed
that upon reboot, everything will either be at state N or state N-1.

I agree about the usefulness of fbarrier() vs. fsync(), BTW.  The cool
thing is that on ZFS, fbarrier() is a no-op.  It's implicit after
every system call.

Jeff
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Implementing fbarrier() on ZFS

2007-02-12 Thread Chris Csanady

2007/2/12, Frank Hofmann <[EMAIL PROTECTED]>:

On Mon, 12 Feb 2007, Chris Csanady wrote:

> This is true for NCQ with SATA, but SCSI also supports ordered tags,
> so it should not be necessary.
>
> At least, that is my understanding.

Except that ZFS doesn't talk SCSI, it talks to a target driver. And that
one may or may not treat async I/O requests dispatched via its strategy()
entry point as strictly ordered / non-coalescible / non-cancellable.

See e.g. disksort(9F).


Yes, however, this functionality could be exposed through the target
driver.  While the implementation does not (yet) take full advantage
of ordered tags, linux does provide an interface to do this:

   http://www.mjmwired.net/kernel/Documentation/block/barrier.txt


From a correctness standpoint, the interface seems worthwhile, even if

the mechanisms are never implemented.  It just feels wrong to execute
a synchronize cache command from ZFS, when often that is not the
intention.  The changes to ZFS itself would be very minor.

That said, actually implementing the underlying mechanisms may not be
worth the trouble.  It is only a matter of time before disks have fast
non-volatile memory like PRAM or MRAM, and then the need to do
explicit cache management basically disappears.

Chris
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS tie-breaking

2007-02-12 Thread Darren Dunham
> Then there is a failure, such that D1 becomes disconnected.  ZFS 
> continues to write on D0.  If D1 were to become reconnected, it would 
> get resilvered normally and all would be well.
> 
> But suppose instead there is a crash, and when the system reboots it is 
> connected only to D1, and D0 is not available.  Does ZFS have any way to 
> know that the data on D1 (while self-consistent) is stale and should not 
> be used?
> 
> The specific case of interest is not necessarily a single-server 
> environment (although thinking of just one server simplifies the 
> scenario without reducing it too far), but a cluster where ZFS is used 
> as a fail-over file system and connectivity issues are more likely to 
> arise.  SVM does have a means of detecting this scenario and refusing to 
> mount the stale mirror.

Are you referring to SVMs requirement for a strict quorum or some other
mechanism?

I don't know what requirements ZFS has on pool membership for import...

-- 
Darren Dunham   [EMAIL PROTECTED]
Senior Technical Consultant TAOShttp://www.taos.com/
Got some Dr Pepper?   San Francisco, CA bay area
 < This line left intentionally blank to confuse you. >
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Implementing fbarrier() on ZFS

2007-02-12 Thread Frank Hofmann

On Mon, 12 Feb 2007, Toby Thain wrote:

[ ... ]
I'm no guru, but would not ZFS already require strict ordering for its 
transactions ... which property Peter was exploiting to get "fbarrier()" for 
free?


It achieves this by flushing the disk write cache when there's need to 
barrier. Which completes outstanding writes.


A "perfect fsync()" for ZFS shouldn't need to do way more; that it does 
right now is something, as I understand, that is being worked on.


FrankH.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Implementing fbarrier() on ZFS

2007-02-12 Thread Toby Thain


On 12-Feb-07, at 5:55 PM, Frank Hofmann wrote:


On Mon, 12 Feb 2007, Peter Schuller wrote:


Hello,

Often fsync() is used not because one cares that some piece of  
data is on
stable storage, but because one wants to ensure the subsequent I/O  
operations
are performed after previous I/O operations are on stable storage.  
In these
cases the latency introduced by an fsync() is completely  
unnecessary. An
fbarrier() or similar would be extremely useful to get the proper  
semantics
while still allowing for better performance than what you get with  
fsync().


My assumption has been that this has not been traditionally  
implemented for

reasons of implementation complexity.

Given ZFS's copy-on-write transactional model, would it not be  
almost trivial
to implement fbarrier()? Basically just choose to wrap up the  
transaction at

the point of fbarrier() and that's it.

Am I missing something?


How do you guarantee that the disk driver and/or the disk firmware  
doesn't reorder writes ?


The only guarantee for in-order writes, on actual storage level, is  
to complete the outstanding ones before issuing new ones.


Or am _I_ now missing something :)


I'm no guru, but would not ZFS already require strict ordering for  
its transactions ... which property Peter was exploiting to get  
"fbarrier()" for free?


--Toby
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS tie-breaking

2007-02-12 Thread Ed Gould

Consider the following scenario involving various failures.

We have a zpool composed of a simple mirror of two devices D0 and D1 
(these may be local disks, slices, LUNs on a SAN, or whatever).  For the 
sake of this scenario, it's probably most intuitive to think of them as 
LUNs on a SAN.  Initially, all is well and both halves of the mirror are 
in sync; the data on D0 is fully consistent with that on D1.


Then there is a failure, such that D1 becomes disconnected.  ZFS 
continues to write on D0.  If D1 were to become reconnected, it would 
get resilvered normally and all would be well.


But suppose instead there is a crash, and when the system reboots it is 
connected only to D1, and D0 is not available.  Does ZFS have any way to 
know that the data on D1 (while self-consistent) is stale and should not 
be used?


The specific case of interest is not necessarily a single-server 
environment (although thinking of just one server simplifies the 
scenario without reducing it too far), but a cluster where ZFS is used 
as a fail-over file system and connectivity issues are more likely to 
arise.  SVM does have a means of detecting this scenario and refusing to 
mount the stale mirror.


Thanks.
--
--Ed
begin:vcard
fn:Ed Gould
n:Gould;Ed
org:Sun Microsystems, Inc.;Solaris Cluster
adr;dom:M/S UMPK17-201;;17 Network Circle;Menlo Park;CA;94025
email;internet:[EMAIL PROTECTED]
title:File System Architect, PSARC Chair
tel;work:+1.650.786.4937
x-mozilla-html:FALSE
version:2.1
end:vcard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Implementing fbarrier() on ZFS

2007-02-12 Thread Bart Smaalders

Peter Schuller wrote:

Hello,

Often fsync() is used not because one cares that some piece of data is on 
stable storage, but because one wants to ensure the subsequent I/O operations 
are performed after previous I/O operations are on stable storage. In these 
cases the latency introduced by an fsync() is completely unnecessary. An 
fbarrier() or similar would be extremely useful to get the proper semantics 
while still allowing for better performance than what you get with fsync().


My assumption has been that this has not been traditionally implemented for 
reasons of implementation complexity.


Given ZFS's copy-on-write transactional model, would it not be almost trivial 
to implement fbarrier()? Basically just choose to wrap up the transaction at 
the point of fbarrier() and that's it.


Am I missing something?

(I do not actually have a use case for this on ZFS, since my experience with 
ZFS is thus far limited to my home storage server. But I have wished for an 
fbarrier() many many times over the past few years...)




Hmmm... is store ordering what you're looking for?  Eg
make sure that in the case of power failure, all previous writes
will be visible after reboot if any subsequent write are visible.


- Bart


--
Bart Smaalders  Solaris Kernel Performance
[EMAIL PROTECTED]   http://blogs.sun.com/barts
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] number of lun's that zfs can handle

2007-02-12 Thread Claus Guttesen

Hi.

I have tested zfs for a while and is very impressed with the ease one
can create filesystems (tanks). I'm about to try it out on a atabeast
with 42 ata 400 GB disks for internal use, mailny as a fileserver. If
this goes well (as I assume it will) I'll consider to desploy zfs on a
larger scale in about a year from now.

First I wanted to stripe two or four disks on the atabeast and do a
raidz2 volume in zfs. The consensus (reading the faq' and browsing the
archives) appears to be to let zfs handle the disk if possible. Then I
was planning to create four raidz2 filesystems (8+2) with two spares
but would have preferred to create one large volume if possible rather
than four smaller ones.

Searching the archives brought me to
http://www.opensolaris.org/jive/thread.jspa?messageID=72858 which is
an elegant way to solve my quest for space. May I suggest that the
answer is included in the faq.

First of all I will create one large volume doing

zspool create myspace \
 raidz2 dev01 dev02 .. dev10 spare dev21
 raidz2 dev11 dev12 .. dev20 spare dev21
 raidz2 dev22 dev23 .. dev31 spare dev42
 raidz2 dev32 dev33 .. dev41 spare dev42

Can two raidz2 pool's share the same spare?

Our main storage is a HDS 9585V Thunder with vxfs and raid5 on 400 GB
sata disk handled by the storage system. If I would migrate to zfs
that would mean 390 jbod's. What's the largest volume that zfs has
been exposed to? Will it be viable to repeat the above mentioned
command to create something like a 100+TB filesystem and create
smaller ones inside each 4 TB in size?

The files are approx. 1 MB and the thumbnails around 20 KB.

regards
Claus
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Implementing fbarrier() on ZFS

2007-02-12 Thread Frank Hofmann

On Mon, 12 Feb 2007, Chris Csanady wrote:

[ ... ]

> Am I missing something?

How do you guarantee that the disk driver and/or the disk firmware doesn't
reorder writes ?

The only guarantee for in-order writes, on actual storage level, is to
complete the outstanding ones before issuing new ones.


This is true for NCQ with SATA, but SCSI also supports ordered tags,
so it should not be necessary.

At least, that is my understanding.


Except that ZFS doesn't talk SCSI, it talks to a target driver. And that 
one may or may not treat async I/O requests dispatched via its strategy() 
entry point as strictly ordered / non-coalescible / non-cancellable.


See e.g. disksort(9F).

FrankH.



Chris


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How to backup a slice ? - newbie

2007-02-12 Thread Richard Elling

comment below...

Uwe Dippel wrote:

Dear Richard,

 > > Could it be that you are looking for the zfs clone subcommand?
 >
 > I'll have to look into it !

I *did* look into it.
man zfs, /clone. This is what I read:

Clones
   A clone is a writable volume or file system whose initial 
contents are the same as another dataset. As with snapshots, creating a 
clone  is  nearly  instantaneous,  and

   initially consumes no additional space.

   Clones  can only be created from a snapshot. When a snapshot is 
cloned, it creates an implicit dependency between the parent and child. 
Even though the clone is created
   somewhere else in the dataset hierarchy, the original snapshot 
cannot be destroyed as long as a clone exists. The "origin" property 
exposes  this  dependency,  and  the

   destroy command lists any such dependencies, if they exist.

   The  clone  parent-child dependency relationship can be reversed 
by using the "promote" subcommand. This causes the "origin" file system 
to become a clone of the speci-
   fied file system, which makes it possible to destroy the file 
system that the clone was created from.

...
zfs clone snapshot filesystem|volume

   Creates a clone of the given snapshot. See the "Clones" 
section for details. The target dataset can be located anywhere in the 
ZFS hierarchy, and is created as  the

   same type as the original.
...
Example 9 Creating a ZFS Clone

   The following command creates a writable file system whose 
initial contents are the same as " pool/home/[EMAIL PROTECTED]".


 # zfs clone pool/home/[EMAIL PROTECTED] pool/clone

Richard, I can read and usually understand Shakespeare, though my mother 
tongue is not English. And I've been in computers for 25 years, but this 
is definitively above my head.


Yeah, I know what you mean.  And I don't think that you wanted to clone
when a simple copy would suffice.

In order to understand clones, you need to understand snapshots.  In my
mind a clone is a writable snapshot, similar to a fork in source code
management.  This is not what you currently need.

The latter comes closest to be understood, but does not address my 
persistent problem of me having slices on other disks; not a new pool 
within my file system.


zpools are composed of devices.
zfs file systems are created in zpools.
Historically, a file system was created on one device and there was only
one file system per device.  If you don't understand this simple change,
then the rest gets very confusing.

To me it currently looks like a 'dead' invention; like so many so great 
ideas in the history of mankind.
Serious, I saw the flash presentation, knew ZFS is *the* filesystem for 
at least as long as I live !
On the other hand, it needs a 'handle'; it needs to solve legacy 
problems. To me, the worst decision taken until here, is, that we cannot 
associate an arbitrary disk partition or slice - though formatted as ZFS 
- readily with a mount point in our systems; do something that we 
control; and relinquish the association.


See previous point.

In order to be accepted on a breadth, IMHO a new filesystem - as much as 
it shines - can only succeed if it offers a transition from what we 
system admins have been doing all along, and adds all those phantastic 
items.
Look, I was kind of feeling bad and stupid for my initial post. Because 
I'd myself answer RTFM if someone asked this in a list for BSD or Linux. 
And the desire is so straightforward:
 - replicating an existing, 'live' file system on another drive, any 
other drive


tar, cpio, rsync, rdist, cp, pax, zfs send/receive,... take your pick.

 - associate (mount) any slice from an arbitrary other drive to a branch 
in my file system


Perhaps you are getting confused over the default mount point for ZFS
file systems?  You can set a specific mount point for each ZFS file system
as a "mountpoint" property.  There is an example of this in the zfs(1m)
man page:
  EXAMPLES
   Example 1 Creating a ZFS File System Hierarchy

   The  following  commands  create   a   file   system   named
   "pool/home"  and  a  file  system named "pool/home/bob". The
   mount point "/export/home" is set for the parent  file  sys-
   tem, and automatically inherited by the child file system.
 # zfs create pool/home
 # zfs set mountpoint=/export/home pool/home
 # zfs create pool/home/bob

What you end up with in this example is:
ZFS file system "pool/home" mounted as "/export/home" (rather than
  the default "/pool/home")
ZFS file system "pool/home/bob" mounted as "/export/home/bob"
IMHO, this isn't clear from the example :-(
 -- richard
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Implementing fbarrier() on ZFS

2007-02-12 Thread Chris Csanady

2007/2/12, Frank Hofmann <[EMAIL PROTECTED]>:

On Mon, 12 Feb 2007, Peter Schuller wrote:

> Hello,
>
> Often fsync() is used not because one cares that some piece of data is on
> stable storage, but because one wants to ensure the subsequent I/O operations
> are performed after previous I/O operations are on stable storage. In these
> cases the latency introduced by an fsync() is completely unnecessary. An
> fbarrier() or similar would be extremely useful to get the proper semantics
> while still allowing for better performance than what you get with fsync().
>
> My assumption has been that this has not been traditionally implemented for
> reasons of implementation complexity.
>
> Given ZFS's copy-on-write transactional model, would it not be almost trivial
> to implement fbarrier()? Basically just choose to wrap up the transaction at
> the point of fbarrier() and that's it.
>
> Am I missing something?

How do you guarantee that the disk driver and/or the disk firmware doesn't
reorder writes ?

The only guarantee for in-order writes, on actual storage level, is to
complete the outstanding ones before issuing new ones.


This is true for NCQ with SATA, but SCSI also supports ordered tags,
so it should not be necessary.

At least, that is my understanding.

Chris
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How to backup a slice ? - newbie

2007-02-12 Thread Richard Elling

Uwe Dippel wrote:

On 2/11/07, Richard Elling <[EMAIL PROTECTED]> wrote:
D'Oh!  someone needs to update 
www.opensolaris.org/os/community/zfs/demos/zfs_demo.pdf

answers below...



About a year ago we changed 'backup' to 'send' and 'restore' to 'receive'
The zfs_demo.pdf needs to be updated.


Oh, yes, then, please !


Cindy has found the source document and is bringing it up to date.
Thanks Cindy!


What is using c0d0s7?  Was is previously exported?  If you really don't
want the data on c0d0s7 any more, try using the '-f' flag.



> A third way also doesn't work:
> % zpool export home
> cannot unmount '/export/home': Device busy

This is often the case if there is an active process with files open or
current working directory in /export/home.


Also, this might find its way into the demo / document ...


IIRC, it was added to the sun-managers FAQ sometime around 1990.  It is
not ZFS-specific.  Eventually a '-f' flag was added to umount(1m)
That option also exists for "zfs unmount"


What exactly are you trying to accomplish?


Quite straightforward: I have an install on c0d1 and want to transfer
that install to c0d0. As sysadmin, I need to do that frequently. On
c0d1, s1 is 'home'; on c0d0 it will be s7. Different size, so 'dd' is
out. Usually (BSD and Linux), 'dump' works extremely well for me, to
create a dump-file from a *mounted* file-system; which needs 'restore'
(or '|') for the other partition.


tar, cpio, rsync, rdist, cp, pax, zfs send/receive,... take your pick.


Could it be that you are looking for the zfs clone subcommand?


I'll have to look into it !


These are good questions, we should look to update the FAQ to show
some examples of common procedures.


Yes, please ! - ZFS seems to be so rich in features and so versatile.
If you guys are not careful, though, you are moving too fast for
newcomers. And then, what is 'sooo obvious' for you as developers,
might simply scare off others; who have no slightest clue how to even
*start* ! - One of my largest hurdles was and is the lack of a 'mount
/dev/dsk/cndmpx /mnt/helper'. *You* don't need it, but I still have no
clue, how to read a file on an unmounted slice on the other drive !!:
I am now on c0d0, everything quite okay, but I need a file from my
'old' home on c0d1s1. See, for you this is obvious, for me, after
hours of reading, not. So I need to boot to the other drive, copy the
file to '/' (ufs), reboot to c0d0, mount ufs on c0d1 and read that
file !! You will laugh about this, but your examples are simply all
'too high' and there are too many commands for me to know how to mount
an inactive slice *without creating a mirror, clone, slave, backup
..*; just to *read* a simple file and umount safely again! :)


I'm not sure why you would need to "boot to the other drive" when
you could just mount it?


Thanks for listening; and don't forget us beginners ! In the end, you
will need people to migrate to ZFS, and then it would be good to have
a 'cheat sheet'; a side-by-side comparison of 'classical' file system
tasks and commands with those used for ZFS.


I think this is a good idea if we could keep it at the procedural
level and not get into the "this option flag == that option flag"
Perhaps we should start another thread on this.
 -- richard


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Implementing fbarrier() on ZFS

2007-02-12 Thread Frank Hofmann

On Mon, 12 Feb 2007, Peter Schuller wrote:


Hello,

Often fsync() is used not because one cares that some piece of data is on
stable storage, but because one wants to ensure the subsequent I/O operations
are performed after previous I/O operations are on stable storage. In these
cases the latency introduced by an fsync() is completely unnecessary. An
fbarrier() or similar would be extremely useful to get the proper semantics
while still allowing for better performance than what you get with fsync().

My assumption has been that this has not been traditionally implemented for
reasons of implementation complexity.

Given ZFS's copy-on-write transactional model, would it not be almost trivial
to implement fbarrier()? Basically just choose to wrap up the transaction at
the point of fbarrier() and that's it.

Am I missing something?


How do you guarantee that the disk driver and/or the disk firmware doesn't 
reorder writes ?


The only guarantee for in-order writes, on actual storage level, is to 
complete the outstanding ones before issuing new ones.


Or am _I_ now missing something :)

FrankH.



(I do not actually have a use case for this on ZFS, since my experience with
ZFS is thus far limited to my home storage server. But I have wished for an
fbarrier() many many times over the past few years...)

--
/ Peter Schuller

PGP userID: 0xE9758B7D or 'Peter Schuller <[EMAIL PROTECTED]>'
Key retrieval: Send an E-Mail to [EMAIL PROTECTED]
E-Mail: [EMAIL PROTECTED] Web: http://www.scode.org

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Implementing fbarrier() on ZFS

2007-02-12 Thread Peter Schuller
Hello,

Often fsync() is used not because one cares that some piece of data is on 
stable storage, but because one wants to ensure the subsequent I/O operations 
are performed after previous I/O operations are on stable storage. In these 
cases the latency introduced by an fsync() is completely unnecessary. An 
fbarrier() or similar would be extremely useful to get the proper semantics 
while still allowing for better performance than what you get with fsync().

My assumption has been that this has not been traditionally implemented for 
reasons of implementation complexity.

Given ZFS's copy-on-write transactional model, would it not be almost trivial 
to implement fbarrier()? Basically just choose to wrap up the transaction at 
the point of fbarrier() and that's it.

Am I missing something?

(I do not actually have a use case for this on ZFS, since my experience with 
ZFS is thus far limited to my home storage server. But I have wished for an 
fbarrier() many many times over the past few years...)

-- 
/ Peter Schuller

PGP userID: 0xE9758B7D or 'Peter Schuller <[EMAIL PROTECTED]>'
Key retrieval: Send an E-Mail to [EMAIL PROTECTED]
E-Mail: [EMAIL PROTECTED] Web: http://www.scode.org

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Disk Failure Rates and Error Rates -- ( Off topic: Jim Gray lost at sea)

2007-02-12 Thread Richard Elling

Henk Langeveld wrote:

Selim Daoud wrote:

here's an interesting status report published by Microsoft labs


http://research.microsoft.com/research/pubs/view.aspx?msr_tr_id=MSR-TR-2005-166 


That is the paper in which Jim Gray coined "Mean time to data loss".
It's been quoted here before.


Nit: MTTDL has been in the reliability vernacular for quite some time,
long (decades) before Gray's 2005 paper.


Sad note: Turing award winner Jim Gray has been missing now for two
weeks, after he went sailing out of San Fransisco on his sailboat
Tenacious on jan 28.

Friends of Jim mobilised Amazon's mechanical Turk to scan the ocean
for possible signs of the boat.  The search efforts can be tracked
at http://openphi.net/tenacious/


We all hope for his safe return.
 -- richard
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: Re[4]: [zfs-discuss] Re: NFS/ZFS performance problems - txg_wait_open() deadlocks?

2007-02-12 Thread eric kustarz


On Feb 12, 2007, at 7:52 AM, Robert Milkowski wrote:


Hello Roch,

Monday, February 12, 2007, 3:54:30 PM, you wrote:

RP> Duh!.

RP> Long sync (which delays the next  sync) are also possible on
RP> a write intensive workloads. Throttling heavy writters, I
RP> think, is the key to fixing this.

Well, then maybe it's not the cause to our problems.
Nevertheless 60-90s for unlink() is just plain wrong especially when
you've got <10ms IOs to array, almost zero writes, plenty of CPU free,
etc.

Definitely something is wrong here.


Looks like spa_sync() via the txg_sync_thread thread is taking way  
too long, which is causing new (NFS) requests to be delayed (such as  
unlink).


Is this just a NFS server, or is there local activity as well?

A complete threadlist would be interesting, as would memory usage.

Have you increased the load on this machine?  I have seen a similar  
situation (new requests being blocked waiting for the sync thread to  
finish), but that's only been when either 1) the hardware is broken  
and taking too long or 2) the server is way overloaded.


eric

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS mirrored laptop

2007-02-12 Thread Francois Dion
Been using ZFS for a good bit now, and particularly on my laptop. Until
B60 is out, I've kind of refrained from using ZFS boot. Works fine, but
I ran into various issues, plus when it is upgrade time, that is a bit
brutal.

What I've been wanting is a way to make my laptop a bit more
"redundant", so to speak, a bit more travel proof.

So, I just set aside a slice on the internal hard disk, and added a
compact flash memory card in a PC card adapter, and created a mirrored
pool out of the two "device". My Documents folder is now automatically
mirrored on top of all the other ZFS advantages. It has been pretty
solid.

I just replicated the steps with an IBM microdrive in a similar laptop
and put the steps on my blog: http://solarisdesktop.blogspot.com

On a similar vein, I've been demoing raidz, raidz2 and hot spare with
usb sticks for work and I've found some issues which i'll post later
this week to this list.

Francois
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: Disk Failure Rates and Error Rates -- ( Off topic: Jim Gray lost at sea)

2007-02-12 Thread Anantha N. Srirama
Here's another website working on his rescue, myy prayers are for a safe return 
of this CS icon.

http://www.helpfindjim.com/
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] RACF: SunFire x4500 Thumper Evaluation

2007-02-12 Thread eric kustarz


On Feb 12, 2007, at 8:05 AM, Robert Petkus wrote:


Some comments from the author:
1. It was a preliminary scratch report not meant to be exhaustive  
and complete by any means.  A comprehensive report of our findings  
will be released soon.


2. I claim responsibility for any benchmarks gathered from Thumper  
and the Linux/FASST/ZFS configuration.  Any metrics regarding the  
"Jackrabbit" SI system was provided by others and simply graphed by  
myself.


3. We didn't have the evaluation unit for long.  However, I did  
have time to test Thumper configured with both Solaris 10 u2 and  
Fedora Core 6 Linux (both ext3 and xfs) running iozone, fileop,  
nfs, dcache, and gridftp.  Comparative results will be in the next  
report.


4. Why didn't I upgrade to S10U3?  Time mostly.  Plus it was not  
clear, at least to me, that u3 offered much in the way of  
performance gain in our configuration.  U3 seemed more of a feature  
upgrade -- (http://www.cuddletech.com/blog/pivot/entry.php?id=777 -- 
> ZFS Command Improvements and Changes, including RAIDZ-2, Hot- 
Spares, Recursive Snapshots, Promotion of Clones, Compact NFSv4  
ACL's, Destroyed Pools Recovery, Error Clearing, ZFS integration  
with FMA).  Correct me if I'm wrong;  i.e., what would we have  
gained performance-wise using U3 in our configuration?


RAIDZ-2 would have been a fair comparison to RAID-6 (which was used  
on the JackRabbit linux config).


I'd have to look closer into how iozone does its writes, but the re- 
write tests could be hurt in s10u2, but fixed in s10u3 by:

6424554 full block re-writes need not read data in
http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6424554

The prefetching code was improved in s10u3 via:
6447377 ZFS prefetch is inconsistant
http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6447377

Depending on the workload, this bug may have been in play as well:
6440499 zil should avoid txg_wait_synced() and use dmu_sync() to  
issue parallel IOs when fsyncing (INT snv_43, S10_U3):

http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6440499

eric


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] RACF: SunFire x4500 Thumper Evaluation

2007-02-12 Thread Robert Petkus

Some comments from the author:
1. It was a preliminary scratch report not meant to be exhaustive and 
complete by any means.  A comprehensive report of our findings will be 
released soon.


2. I claim responsibility for any benchmarks gathered from Thumper and 
the Linux/FASST/ZFS configuration.  Any metrics regarding the 
"Jackrabbit" SI system was provided by others and simply graphed by myself.


3. We didn't have the evaluation unit for long.  However, I did have 
time to test Thumper configured with both Solaris 10 u2 and Fedora Core 
6 Linux (both ext3 and xfs) running iozone, fileop, nfs, dcache, and 
gridftp.  Comparative results will be in the next report.


4. Why didn't I upgrade to S10U3?  Time mostly.  Plus it was not clear, 
at least to me, that u3 offered much in the way of performance gain in 
our configuration.  U3 seemed more of a feature upgrade -- 
(http://www.cuddletech.com/blog/pivot/entry.php?id=777 --> ZFS Command 
Improvements and Changes, including RAIDZ-2, Hot-Spares, Recursive 
Snapshots, Promotion of Clones, Compact NFSv4 ACL's, Destroyed Pools 
Recovery, Error Clearing, ZFS integration with FMA).  Correct me if I'm 
wrong;  i.e., what would we have gained performance-wise using U3 in our 
configuration?


5. "Solaris doesn't support dual-core Intel" was an honest mistake.  See 
earlier comments in this thread.


6. I imagine that the Scalable Informatics crew was indeed upset that 
their "Jackrabbit" did not fare as well in our tests.  That's life.  And 
while they may have some valid points in their rebuttal piece (mostly 
complaints about our configuration, etc.), well, that's beyond the scope 
of this mailing list.


7. Thumper/ZFS fared nicely.  I do regret not having the time to have 
loaded Solaris Express in order to test iSCSI performance.  Does anyone 
have benchmarks in this area to share?



--
Robert Petkus
RHIC/USATLAS Computing Facility
Brookhaven National Laboratory
Physics Dept. - Bldg. 510A
Upton, New York 11973

http://www.bnl.gov/RHIC
http://www.acf.bnl.gov



Luke Lonergan wrote:

Has someone e-mailed the author to recommend upgrading to S10U3?  I'm
shocked the eval was favorable with S10U2 given S10U3's substantial
performance improvements...

- Luke

  

Rayson Ho wrote:

  

Interesting...




http://www.rhic.bnl.gov/RCF/LiaisonMeeting/20070118/Other/thumper-eva


l.pdf



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
  


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re[4]: [zfs-discuss] Re: NFS/ZFS performance problems - txg_wait_open() deadlocks?

2007-02-12 Thread Robert Milkowski
Hello Roch,

Monday, February 12, 2007, 3:54:30 PM, you wrote:

RP> Duh!.

RP> Long sync (which delays the next  sync) are also possible on
RP> a write intensive workloads. Throttling heavy writters, I
RP> think, is the key to fixing this.

Well, then maybe it's not the cause to our problems.
Nevertheless 60-90s for unlink() is just plain wrong especially when
you've got <10ms IOs to array, almost zero writes, plenty of CPU free,
etc.

Definitely something is wrong here.


-- 
Best regards,
 Robertmailto:[EMAIL PROTECTED]
   http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: Re: ZFS volume is hosing BIOS POST on Ultra20 (BIOS

2007-02-12 Thread Eric Haycraft
I had the same issue with zfs killing my Ultra20. I can confirm that flashing 
the BIOS fixed the issue.

http://www.sun.com/desktop/workstation/ultra20/downloads.jsp#Ultra

Eric
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: Re[2]: [zfs-discuss] Re: NFS/ZFS performance problems - txg_wait_open() deadlocks?

2007-02-12 Thread Roch - PAE

Duh!.

Long sync (which delays the next  sync) are also possible on
a write intensive workloads. Throttling heavy writters, I
think, is the key to fixing this.

Robert Milkowski writes:
 > Hello Roch,
 > 
 > Monday, February 12, 2007, 3:19:23 PM, you wrote:
 > 
 > RP> Robert Milkowski writes:
 >  >> bash-3.00# dtrace -n fbt::txg_quiesce:return'{printf("%Y 
 > ",walltimestamp);}'
 >  >> dtrace: description 'fbt::txg_quiesce:return' matched 1 probe
 >  >> CPU IDFUNCTION:NAME
 >  >>   3  38168   txg_quiesce:return 2007 Feb 12 14:08:15 
 >  >>   0  38168   txg_quiesce:return 2007 Feb 12 14:12:14 
 >  >>   3  38168   txg_quiesce:return 2007 Feb 12 14:15:05 
 >  >> ^C
 >  >> 
 >  >> 
 >  >> 
 >  >> Why I do not see it exactly every 5s?
 >  >> On other server I get output exactly every 5s.
 >  >>  
 >  >>  
 > 
 > RP> I am not sure about this specific funtion but if the
 > RP> question is the same as why is the pool synching more often
 > RP> than 5sec, then that can be because of low memory condition
 > RP> (if we have too much dirty memory to sync we don't wait the
 > RP> 5 seconds.). See arc_tempreserve_space around (ERESTART).
 > 
 > The opposite - why it's not syncing every 5s and rather every
 > few minutes on that server.
 > 
 > 
 > -- 
 > Best regards,
 >  Robertmailto:[EMAIL PROTECTED]
 >http://milek.blogspot.com
 > 
 > ___
 > zfs-discuss mailing list
 > zfs-discuss@opensolaris.org
 > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re[2]: [zfs-discuss] Re: NFS/ZFS performance problems - txg_wait_open() deadlocks?

2007-02-12 Thread Robert Milkowski
Hello Roch,

Monday, February 12, 2007, 3:19:23 PM, you wrote:

RP> Robert Milkowski writes:
 >> bash-3.00# dtrace -n fbt::txg_quiesce:return'{printf("%Y ",walltimestamp);}'
 >> dtrace: description 'fbt::txg_quiesce:return' matched 1 probe
 >> CPU IDFUNCTION:NAME
 >>   3  38168   txg_quiesce:return 2007 Feb 12 14:08:15 
 >>   0  38168   txg_quiesce:return 2007 Feb 12 14:12:14 
 >>   3  38168   txg_quiesce:return 2007 Feb 12 14:15:05 
 >> ^C
 >> 
 >> 
 >> 
 >> Why I do not see it exactly every 5s?
 >> On other server I get output exactly every 5s.
 >>  
 >>  

RP> I am not sure about this specific funtion but if the
RP> question is the same as why is the pool synching more often
RP> than 5sec, then that can be because of low memory condition
RP> (if we have too much dirty memory to sync we don't wait the
RP> 5 seconds.). See arc_tempreserve_space around (ERESTART).

The opposite - why it's not syncing every 5s and rather every
few minutes on that server.


-- 
Best regards,
 Robertmailto:[EMAIL PROTECTED]
   http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: NFS/ZFS performance problems - txg_wait_open() deadlocks?

2007-02-12 Thread Roch - PAE

Robert Milkowski writes:
 > bash-3.00# dtrace -n fbt::txg_quiesce:return'{printf("%Y ",walltimestamp);}'
 > dtrace: description 'fbt::txg_quiesce:return' matched 1 probe
 > CPU IDFUNCTION:NAME
 >   3  38168   txg_quiesce:return 2007 Feb 12 14:08:15 
 >   0  38168   txg_quiesce:return 2007 Feb 12 14:12:14 
 >   3  38168   txg_quiesce:return 2007 Feb 12 14:15:05 
 > ^C
 > 
 > 
 > 
 > Why I do not see it exactly every 5s?
 > On other server I get output exactly every 5s.
 >  
 >  

I am not sure about this specific funtion but if the
question is the same as why is the pool synching more often
than 5sec, then that can be because of low memory condition
(if we have too much dirty memory to sync we don't wait the
5 seconds.). See arc_tempreserve_space around (ERESTART).

-r




 > This message posted from opensolaris.org
 > ___
 > zfs-discuss mailing list
 > zfs-discuss@opensolaris.org
 > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: NFS/ZFS performance problems - txg_wait_open() deadlocks?

2007-02-12 Thread Robert Milkowski
bash-3.00# dtrace -n fbt::txg_quiesce:return'{printf("%Y ",walltimestamp);}'
dtrace: description 'fbt::txg_quiesce:return' matched 1 probe
CPU IDFUNCTION:NAME
  3  38168   txg_quiesce:return 2007 Feb 12 14:08:15 
  0  38168   txg_quiesce:return 2007 Feb 12 14:12:14 
  3  38168   txg_quiesce:return 2007 Feb 12 14:15:05 
^C



Why I do not see it exactly every 5s?
On other server I get output exactly every 5s.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re[2]: [storage-discuss] Why doesn't Solaris remove a faulty disk from operation?

2007-02-12 Thread Robert Milkowski
Hello Matty,

Monday, February 12, 2007, 1:44:13 AM, you wrote:

M> On 2/11/07, Robert Milkowski <[EMAIL PROTECTED]> wrote:
>> Hello Matty,
>>
>> Sunday, February 11, 2007, 6:56:14 PM, you wrote:
>>
>> M> Howdy,
>>
>> M> On one of my Solaris 10 11/06 servers, I am getting numerous errors
>> M> similar to the following:
>>
>> AFAIK nothing was integrated yet to do it.
>> Hot Spare will kick in automatically only when zfs can't open a device
>> other than that you are on manual mode for now.

M> Yikes! Does anyone from the ZFS / storage team happen to know when
M> work will complete to detect and replace failed disk drives? If hot
M> spares don't actually kick in to replace failed drives, is there any
M> value in using them?

Of course there's.
Nevertheless I completely agree that HS support in ZFS is "somewhat"
lacking.

-- 
Best regards,
 Robertmailto:[EMAIL PROTECTED]
   http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss