Re: [zfs-discuss] memory hog

2008-06-23 Thread Edward
So does that mean ZFS is not for consumer computer?
If ZFS require 4GB of Ram for operation, that means i will need  8GB+ Ram if i 
were to use Photoshop or any other memory intensive application?

And it seems ZFS memory usage scales with the amount of HDD space?
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] memory hog

2008-06-23 Thread James C. McPherson
Edward wrote:
 So does that mean ZFS is not for consumer computer?

Not at all. Consumer computers are plenty powerful enough
to use ZFS with.


 If ZFS require 4GB of Ram for operation, that means i will
 need  8GB+ Ram if i were to use Photoshop or any other memory
 intensive application?

ZFS doesn't require 4Gb of ram. That's merely a recommendation
of the amount you might want installed in your system - a subtle
difference :-)

 And it seems ZFS memory usage scales with the amount of HDD space?

I'm not quite sure how to address this, could you re-phrase
your question please?

You might find this wiki page useful
http://www.solarisinternals.com/wiki/index.php/ZFS_Configuration_Guide,
along with the others that it points to.


James C. McPherson
--
Senior Kernel Software Engineer, Solaris
Sun Microsystems
http://blogs.sun.com/jmcp   http://www.jmcp.homeunix.com/blog
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] memory hog

2008-06-23 Thread Erik Trimble
Edward wrote:
 So does that mean ZFS is not for consumer computer?
 If ZFS require 4GB of Ram for operation, that means i will need  8GB+ Ram if 
 i were to use Photoshop or any other memory intensive application?

   
No.  It works fine on desktops - I'm writing this on an older Athlon64 
with 1GB.   Memory pressure does seem to become a bit more of an issue 
when I'm doing more I/O on the box (which, I'm assuming, is due to the 
various caches), so for things like compiling, I feel a little cramped.  

Personally, (in my experience only), I'd say that ZFS works well for use 
on the desktop, ASSUMING you dedicate 1GB of RAM to solely the OS (and 
ZFS).  For very heavy I/O work, I think at least 2GB is a better idea.

So, size your total memory accordingly.

 And it seems ZFS memory usage scales with the amount of HDD space?
   
I think the more proper thing to say is that ZFS memory usage is 
relative to the amount of I/O you are doing.  Very heavy I/O uses much 
more RAM.  It is not per se connected to total size of the pool.

That is, if I've got several TB of disk in a zpool, but I'm doing only 
10 op/sec,  it will consume much less RAM than if I have a 100GB zpool, 
but I'm trying to do 1000 ops/sec.

-- 
Erik Trimble
Java System Support
Mailstop:  usca22-123
Phone:  x17195
Santa Clara, CA
Timezone: US/Pacific (GMT-0800)

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] memory hog

2008-06-23 Thread Kaiwai Gardiner
Erik Trimble wrote:
 Edward wrote:
   
 So does that mean ZFS is not for consumer computer?
 If ZFS require 4GB of Ram for operation, that means i will need  8GB+ Ram if 
 i were to use Photoshop or any other memory intensive application?

   
 
 No.  It works fine on desktops - I'm writing this on an older Athlon64 
 with 1GB.   Memory pressure does seem to become a bit more of an issue 
 when I'm doing more I/O on the box (which, I'm assuming, is due to the 
 various caches), so for things like compiling, I feel a little cramped.  

 Personally, (in my experience only), I'd say that ZFS works well for use 
 on the desktop, ASSUMING you dedicate 1GB of RAM to solely the OS (and 
 ZFS).  For very heavy I/O work, I think at least 2GB is a better idea.

 So, size your total memory accordingly.

I've got a Dell Dimension 8400 w/ 2.5gb ram and p4 3.2Ghz processor; I 
haven't noticed any slow downs either. Memory is so cheap, adding an 
extra 2gb is only around NZ$100 these days anyway.

Matthew
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] memory hog

2008-06-23 Thread Nico Sabbi
On Monday 23 June 2008 09:39:13 Kaiwai Gardiner wrote:
 Erik Trimble wrote:
  Edward wrote:
  So does that mean ZFS is not for consumer computer?
  If ZFS require 4GB of Ram for operation, that means i will need 
  8GB+ Ram if i were to use Photoshop or any other memory
  intensive application?
 
  No.  It works fine on desktops - I'm writing this on an older
  Athlon64 with 1GB.   Memory pressure does seem to become a bit
  more of an issue when I'm doing more I/O on the box (which, I'm
  assuming, is due to the various caches), so for things like
  compiling, I feel a little cramped.
 
  Personally, (in my experience only), I'd say that ZFS works well
  for use on the desktop, ASSUMING you dedicate 1GB of RAM to
  solely the OS (and ZFS).  For very heavy I/O work, I think at
  least 2GB is a better idea.
 
  So, size your total memory accordingly.

 I've got a Dell Dimension 8400 w/ 2.5gb ram and p4 3.2Ghz
 processor; I haven't noticed any slow downs either. Memory is so
 cheap, adding an extra 2gb is only around NZ$100 these days anyway.

 Matthew

this is the kind of reasoning that hides problems rather than
correcting them. Sooner or later problems will show up in
other - maybe worse -  forms
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] swap dump on ZFS volume

2008-06-23 Thread jan damborsky
Hi folks,

I am member of Solaris Install team and I am currently working
on making Slim installer compliant with ZFS boot design specification:

http://opensolaris.org/os/community/arc/caselog/2006/370/commitment-materials/spec-txt/

After ZFS boot project was integrated into Nevada and support
for installation on ZFS root delivered into legacy installer,
some differences occurred between how Slim installer implements
ZFS root and how it is done in legacy installer.

One part is that we need to change in Slim installer is to create
swap  dump on ZFS volume instead of utilizing UFS slice for this
as defined in design spec and implemented in SXCE installer.

When reading through the specification and looking at SXCE
installer source code, I have realized some points are not quite
clear to me.

Could I please ask you to help me clarify them in order to
follow the right way as far as implementation of that features
is concerned ?

Thank you very much,
Jan


[i] Formula for calculating dump  swap size


I have gone through the specification and found that
following formula should be used for calculating default
size of swap  dump during installation:

o size of dump: 1/4 of physical memory
o size of swap: max of (512MiB, 1% of rpool size)

However, looking at the source code, SXCE installer
calculates default sizes using slightly different
algorithm:

size_of_swap = size_of_dump = MAX(512 MiB, MIN(physical_memory/2, 32 GiB))

Are there any preferences which one should be used or is
there any other possibility we might take into account ?


[ii] Procedure of creating dump  swap
--

Looking at the SXCE source code, I have discovered that following
commands should be used for creating swap  dump:

o swap
# /usr/sbin/zfs create -b PAGESIZE -V size_in_mbm rpool/swap
# /usr/sbin/swap -a /dev/zvol/dsk/rpool/swap

o dump
# /usr/sbin/zfs create -b 128*1024 -V size_in_mbm rpool/dump
# /usr/sbin/dumpadm -d /dev/zvol/dsk/rpool/dump

Could you please let me know, if my observations are correct
or if I should use different approach ?

As far as setting of volume block size is concerned (-b option),
how that numbers are to be determined ? Will they be the same in
different scenarios or are there plans to tune them in some way
in future ?


[iii] Is there anything else I should be aware of ?
---


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] raid card vs zfs

2008-06-23 Thread Mertol Ozyoney
I agree to other comments. From the Day 1 ZFS is fine tuned for JBOD's.
While Raid cards are welcome ZFS will perform better with JBOD's.
Most of the Raid cards do have limited power and bandwith to support platter
speeds of the newer drives. And ZFS code seems to be more intelligent for
caching. 

A few days a ago a customer tested a Sunfire X4500 connected to a network
with 4 x 1 Gbit ethernets. X4500 have modest CPU power and do not use any
Raid card. The unit easly performaed 400 MB/sec on write from LAN tests
which clearly limited by the ethernet ports. 

Mertol 



Mertol Ozyoney 
Storage Practice - Sales Manager

Sun Microsystems, TR
Istanbul TR
Phone +902123352200
Mobile +905339310752
Fax +90212335
Email [EMAIL PROTECTED]



-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Bob Friesenhahn
Sent: Monday, June 23, 2008 5:33 AM
To: kevin williams
Cc: zfs-discuss@opensolaris.org
Subject: Re: [zfs-discuss] raid card vs zfs

On Sun, 22 Jun 2008, kevin williams wrote:

 The article says that ZFS eliminates the need for a RAID card and is 
 faster because the striping is running on the main cpu rather than 
 an old chipset on a card.  My question is, is this true?  Can I

Ditto what the other guys said.  Since ZFS may generate more I/O 
traffic from the CPU, you will want an adaptor with lots of I/O ports. 
SATA/SAS with a port per drive is ideal.  It is useful to have a NVRAM 
cache on the card if you will be serving NFS or running a database, 
although some vendors sell this NVRAM cache as a card which plugs into 
the backplane and uses a special driver.  ZFS is memory-hungry so 4GB 
of RAM is a good starting point for a server.  Make sure that your CPU 
and OS are able to run a 64-bit kernel.

Bob
==
Bob Friesenhahn
[EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] memory hog

2008-06-23 Thread Mertol Ozyoney
No, ZFS loves memory and unlike most other FS's around it can make good use
of memory. But ZFS will free memory if it recognizes that other apps require
memory or you can limit the cache ARC will be using. 

To my experiance ZFS still performs nicely on 1 GB boxes.

PS: How much 4 GB Ram costs for a desktop ? 



Mertol Ozyoney 
Storage Practice - Sales Manager

Sun Microsystems, TR
Istanbul TR
Phone +902123352200
Mobile +905339310752
Fax +90212335
Email [EMAIL PROTECTED]



-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Edward
Sent: Monday, June 23, 2008 9:32 AM
To: zfs-discuss@opensolaris.org
Subject: Re: [zfs-discuss] memory hog

So does that mean ZFS is not for consumer computer?
If ZFS require 4GB of Ram for operation, that means i will need  8GB+ Ram if
i were to use Photoshop or any other memory intensive application?

And it seems ZFS memory usage scales with the amount of HDD space?
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [SOLVED] Confusion with snapshot send-receive

2008-06-23 Thread Andrius

James C. McPherson wrote:

Andrius wrote:

Boyd Adamson wrote:

Andrius [EMAIL PROTECTED] writes:


Hi,
there is a small confusion with send receive.

zfs andrius/sounds was snapshoted @421 and should be copied to new
zpool beta that on external USB disk.
After
/usr/sbin/zfs send andrius/[EMAIL PROTECTED] | ssh host1 /usr/sbin/zfs recv 
beta

or
usr/sbin/zfs send andrius/[EMAIL PROTECTED] | ssh host1 /usr/sbin/zfs recv
beta/sounds
answer come
ssh: host1: node name or service name not known

What has been done bad?


Your machine cannot resolve the name host1 into an IP address. This is
a network configuration problem, not a zfs problem.

You should find that
ssh host1

fails too.

Second pool is in the same machine. What to write instead of host2?


try

/usr/sbin/zfs send andrius/[EMAIL PROTECTED] | /usr/sbin/zfs recv beta/sounds


You only need to pipe the zfs send output through ssh if
you're actually sending it to a different system.


James C. McPherson
--
Senior Kernel Software Engineer, Solaris
Sun Microsystems
http://blogs.sun.com/jmcphttp://www.jmcp.homeunix.com/blog



Thanks, it works. Just strange, why s your simplified sample are hot in 
ZFS Administration Guide. Somebody wanted to complicate things.




--
Regards,
Andrius Burlega
begin:vcard
fn:Andrius Burlega
n:Burlega;Andrius
email;internet:[EMAIL PROTECTED]
tel;cell:+353876301575
x-mozilla-html:FALSE
version:2.1
end:vcard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs mirror broken?

2008-06-23 Thread Justin Vassallo
I am running zfs 3 on SunOS zen 5.10 Generic_118855-33 i86pc i386 i86pc

What is baffling is that the disk did come online and appear as healthy, but
zpool showed the fs inconsistency. As Miles said, after the disk came back
the resilver did not resume.

The only additions i have to the sequence shown are:
1) i am absolutely sure there were no disk writes in the interim since  the
non-global zones which use these fses were halted during the operation
2) The first time i unplugged the disk, i upgraded to a larger disk so i
still have that original disk intact
3) i was afraid that zfs might resilver backwards, ie from the 22% image
back to the original copy. I therefore pulled the new disk out again.

Current status:
# zpool status
  pool: external
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://www.sun.com/msg/ZFS-8000-9P
 scrub: resilver completed with 0 errors on Sat Jun 21 07:42:03 2008
config:

NAME   STATE READ WRITE CKSUM
external   ONLINE   26.57   114 0
  c12t0d0p0ONLINE   4   114 0
  mirror   ONLINE   26.57 0 0
c13t0d0p0  ONLINE   55.25 4.48K 0
c16t0d0p0  ONLINE   0 0 53.14

Can i be sure that the unrecoverable error found is on the failed mirror?

I was thinking of the following ways forward. Any comments most welcome:
1) run a scrub. I am thinking that kicking this off might actually corrupt
data in the second vdev, so maybe starting off with 2 might be better
idea...
2) physically replace disk1 with ORIGINAL disk2 and attempt a scrub

justin


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Miles Nordin
Sent: 21 June 2008 02:46
To: zfs-discuss@opensolaris.org
Subject: Re: [zfs-discuss] zfs mirror broken?

 jb == Jeff Bonwick [EMAIL PROTECTED] writes:

jb If you say 'zpool online pool disk' that should tell ZFS
jb that the disk is healthy again and automatically kick off a
jb resilver.

jb Of course, that should have happened automatically.

with b71 I find that it does sometimes happen automatically, but the
resilver isn't enough to avoid checksum errors later.  Only a
manually-requested scrub will stop any more checksum errors from
accumulating.

Also, if I reboot before one of these auto-resilvers finishes, or plug in
the component that flapped while powered down, the auto-resilver never
resumes.

 While one vdev was resilvering at 22% (HD replacement), the
 original disk went away 

so if I understand you, it happened like this:

#1#2

  online online
t online UNPLUG
i online UNPLUG-- filesystem writes
m online UNPLUG-- filesystem writes
e online online
| online resilver - online
v UNPLUGxxx  online-- fs reads allowed?  how?
  online onlinewhy no resilvering?

It seems to me like DTRT after #1 is unplugged is to take the whole pool
UNAVAIL until the original disk #1 comes back.  When the original disk #1
drops off, the only available component left is the #2 component that
flapped earlier and is being resilvered, so #2 is out-of-date and should be
ignored.  but I'm pretty sure ZFS doesn't work that way, right?

What does it do?  Will it serve incorrect, old data?  Will it somehow return
I/O errors for data that has changed on #1 and not been resilvered onto #2
yet?


smime.p7s
Description: S/MIME cryptographic signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] raid card vs zfs

2008-06-23 Thread Charles Soto
On 6/23/08 6:22 AM, Mertol Ozyoney [EMAIL PROTECTED] wrote:

 A few days a ago a customer tested a Sunfire X4500 connected to a network
 with 4 x 1 Gbit ethernets. X4500 have modest CPU power and do not use any
 Raid card. The unit easly performaed 400 MB/sec on write from LAN tests
 which clearly limited by the ethernet ports.
 
 Mertol 

This is what we are seeing with our X4500.  Clearly, the four Ethernet
channels are our limiting factor.  We put 10Gbps Ethernet on the unit, but
as this is currently the only 10-gig host on our network (waiting for Vmware
drivers to support the X6250 cards we bought), I can't really test that
fully.  We're using this as a NFS/Samba server, so JBOD with ZFS is fast
enough.

I'm waiting for COMSTAR and ADM to really take advantage of the Thumper
platform.  The complete storage stack that Sun and the OpenSolaris project
have envisioned will make such commodity hardware useful pieces of our
solution.  I love our EMC/Brocade/HP SAN gear, but it's just too expensive
to scale (particularly when it comes to total data management).

Charles

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] memory hog

2008-06-23 Thread Charles Soto
On 6/23/08 6:24 AM, Mertol Ozyoney [EMAIL PROTECTED] wrote:

 No, ZFS loves memory and unlike most other FS's around it can make good use
 of memory. But ZFS will free memory if it recognizes that other apps require
 memory or you can limit the cache ARC will be using.

This is an important distinction.  There are many examples of software which
does not utilize the resources we make available.  I'm happy with code that
takes advantage of these additional resources to improve performance.
Otherwise, it becomes difficult to make cost/benefit decisions.  I need
more performance.  It's worth $x to get that.


 To my experiance ZFS still performs nicely on 1 GB boxes.

This is probably fine for the typical consumer usage pattern.

 PS: How much 4 GB Ram costs for a desktop ?

I just bought 2GB DIMMs for $40.  IIRC, they were Kingston, so not a no-name
brand.

Charles

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] swap dump on ZFS volume

2008-06-23 Thread Richard Elling
Hi Jan, comments below...

jan damborsky wrote:
 Hi folks,

 I am member of Solaris Install team and I am currently working
 on making Slim installer compliant with ZFS boot design specification:

 http://opensolaris.org/os/community/arc/caselog/2006/370/commitment-materials/spec-txt/

 After ZFS boot project was integrated into Nevada and support
 for installation on ZFS root delivered into legacy installer,
 some differences occurred between how Slim installer implements
 ZFS root and how it is done in legacy installer.

 One part is that we need to change in Slim installer is to create
 swap  dump on ZFS volume instead of utilizing UFS slice for this
 as defined in design spec and implemented in SXCE installer.

 When reading through the specification and looking at SXCE
 installer source code, I have realized some points are not quite
 clear to me.

 Could I please ask you to help me clarify them in order to
 follow the right way as far as implementation of that features
 is concerned ?

 Thank you very much,
 Jan


 [i] Formula for calculating dump  swap size
 

 I have gone through the specification and found that
 following formula should be used for calculating default
 size of swap  dump during installation:

 o size of dump: 1/4 of physical memory
   

This is a non-starter for systems with 1-4 TBytes of physical
memory.  There must be a reasonable maximum cap, most
likely based on the size of the pool, given that we regularly
boot large systems from modest-sized disks.

 o size of swap: max of (512MiB, 1% of rpool size)

 However, looking at the source code, SXCE installer
 calculates default sizes using slightly different
 algorithm:

 size_of_swap = size_of_dump = MAX(512 MiB, MIN(physical_memory/2, 32 GiB))

 Are there any preferences which one should be used or is
 there any other possibility we might take into account ?
   

zero would make me happy :-)  But there are some cases where swap
space is preferred.  Again, there needs to be a reasonable cap.  In
general, the larger the system, the less use for swap during normal
operations, so for most cases there is no need for really large swap
volumes.  These can also be adjusted later, so the default can be
modest.  One day perhaps it will be fully self-adjusting like it is
with other UNIX[-like] implementations.


 [ii] Procedure of creating dump  swap
 --

 Looking at the SXCE source code, I have discovered that following
 commands should be used for creating swap  dump:

 o swap
 # /usr/sbin/zfs create -b PAGESIZE -V size_in_mbm rpool/swap
 # /usr/sbin/swap -a /dev/zvol/dsk/rpool/swap

 o dump
 # /usr/sbin/zfs create -b 128*1024 -V size_in_mbm rpool/dump
 # /usr/sbin/dumpadm -d /dev/zvol/dsk/rpool/dump

 Could you please let me know, if my observations are correct
 or if I should use different approach ?

 As far as setting of volume block size is concerned (-b option),
 how that numbers are to be determined ? Will they be the same in
 different scenarios or are there plans to tune them in some way
 in future ?
   

Setting the swap blocksize to pagesize is interesting, but should be
ok for most cases.  The reason I say it is interesting is because it
is optimized for small systems, but not for larger systems which
typically see more use of large page sizes.  OTOH larger systems
should not swap, so it is probably a non-issue for them.  Small
systems should see this as the best solution.

Dump just sets the blocksize to the default, so it is a no-op.
 -- richard


 [iii] Is there anything else I should be aware of ?
 ---
   

Installation should *not* fail due to running out of space because
of large dump or swap allocations.  I think the algorithm should
first take into account the space available in the pool after accounting
for the OS.
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Oracle and ZFS

2008-06-23 Thread Mertol Ozyoney
Hi All ;

 

One of our customer is suffered from FS being corrupted after an unattanded
shutdonw due to power problem. 

They want to switch to ZFS. 

 

From what I read on, ZFS will most probably not be corrupted from the same
event. But I am not sure how will Oracle be affected from a sudden power
outage when placed over ZFS ? 

 

Any comments ?

 

PS: I am aware of UPS's and smilar technologies but customer is still asking
those if ... questions ...

 

Mertol 

 

 

 


 http://www.sun.com/ http://www.sun.com/emrkt/sigs/6g_top.gif

Mertol Ozyoney 
Storage Practice - Sales Manager

Sun Microsystems, TR
Istanbul TR
Phone +902123352200
Mobile +905339310752
Fax +90212335
Email  mailto:[EMAIL PROTECTED] [EMAIL PROTECTED]

 

 

attachment: image001.gif___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Oracle and ZFS

2008-06-23 Thread Chris Cosby
From my usage, the first question you should ask your customer is how much
of a performance hit they can spare when switching to ZFS for Oracle. I've
done lots of tweaking (following threads I've read on the mailing list), but
I still can't seem to get enough performance out of any databases on ZFS.
I've tried using zvols, cooked files on top of ZFS filesystems, everything,
but either raw disk devices via the old style DiskSuite tools or cooked
files on top of the same are far more performant than anything on ZFS. Your
mileage may vary, but so far, that's where I stand.

As for the corrupted filesystem, ZFS is much better, but there are still no
guarantees that your filesystem won't be corrupted during a hard shutdown.
The CoW and checksumming gives you a much lower incidence of corruption, but
the customer still needs to be made aware that things like battery backed
controllers, managed UPS, redundant power supplies, and the like are the
first thing they need to put into place - not the last.

On Mon, Jun 23, 2008 at 11:56 AM, Mertol Ozyoney [EMAIL PROTECTED]
wrote:

  Hi All ;



 One of our customer is suffered from FS being corrupted after an unattanded
 shutdonw due to power problem.

 They want to switch to ZFS.



 From what I read on, ZFS will most probably not be corrupted from the same
 event. But I am not sure how will Oracle be affected from a sudden power
 outage when placed over ZFS ?



 Any comments ?



 PS: I am aware of UPS's and smilar technologies but customer is still
 asking those if ... questions ...



 Mertol







 [image: http://www.sun.com/emrkt/sigs/6g_top.gif] http://www.sun.com/

 *Mertol Ozyoney *
 Storage Practice - Sales Manager

 *Sun Microsystems, TR*
 Istanbul TR
 Phone +902123352200
 Mobile +905339310752
 Fax +90212335
 Email [EMAIL PROTECTED] [EMAIL PROTECTED]





 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss




-- 
chris -at- microcozm -dot- net
=== Si Hoc Legere Scis Nimium Eruditionis Habes
image001.gif___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] memory hog

2008-06-23 Thread Edward
Yes you are all correct. Ram cost nothing today, even though it might be 
bouncing back to their normal margin. DDR2 Ram are relatively cheap. Not to 
mention DDR3 will bring us double or more memory capacity. 

Most people could afford 4GB Ram on their Desktop today. With 8GB Ram for 
Prosumers. At todays price i reckon ALL systems, even entry level should have 
2GB Ram Standard.

But the sad thing is Windows XP / Vista is still 32Bit. It doesn't recognize 
more then 3.x GB of Ram. 64Bit version is still premature and hardly OEM are 
adopting it. Hardware makers have yet to full jump on broad for 64 bit drivers.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [caiman-discuss] swap dump on ZFS volume

2008-06-23 Thread Lori Alt

Richard Elling wrote:

Hi Jan, comments below...

jan damborsky wrote:
  

Hi folks,

I am member of Solaris Install team and I am currently working
on making Slim installer compliant with ZFS boot design specification:

http://opensolaris.org/os/community/arc/caselog/2006/370/commitment-materials/spec-txt/

After ZFS boot project was integrated into Nevada and support
for installation on ZFS root delivered into legacy installer,
some differences occurred between how Slim installer implements
ZFS root and how it is done in legacy installer.

One part is that we need to change in Slim installer is to create
swap  dump on ZFS volume instead of utilizing UFS slice for this
as defined in design spec and implemented in SXCE installer.

When reading through the specification and looking at SXCE
installer source code, I have realized some points are not quite
clear to me.

Could I please ask you to help me clarify them in order to
follow the right way as far as implementation of that features
is concerned ?

Thank you very much,
Jan


[i] Formula for calculating dump  swap size


I have gone through the specification and found that
following formula should be used for calculating default
size of swap  dump during installation:

o size of dump: 1/4 of physical memory
  



This is a non-starter for systems with 1-4 TBytes of physical
memory.  There must be a reasonable maximum cap, most
likely based on the size of the pool, given that we regularly
boot large systems from modest-sized disks.
Actually, starting with build 90, the legacy installer sets the default 
size of the

swap and dump zvols to half the size of physical memory, but no more
then 32 GB and no less than 512 MB.   Those are just the defaults.
Administrators can use the zfs command to modify the volsize
property of both the swap and dump zvols (to any value, including
values larger than 32 GB).




o size of swap: max of (512MiB, 1% of rpool size)

However, looking at the source code, SXCE installer
calculates default sizes using slightly different
algorithm:

size_of_swap = size_of_dump = MAX(512 MiB, MIN(physical_memory/2, 32 GiB))

Are there any preferences which one should be used or is
there any other possibility we might take into account ?
  



zero would make me happy :-)  But there are some cases where swap
space is preferred.  Again, there needs to be a reasonable cap.  In
general, the larger the system, the less use for swap during normal
operations, so for most cases there is no need for really large swap
volumes.  These can also be adjusted later, so the default can be
modest.  One day perhaps it will be fully self-adjusting like it is
with other UNIX[-like] implementations.

  

[ii] Procedure of creating dump  swap
--

Looking at the SXCE source code, I have discovered that following
commands should be used for creating swap  dump:

o swap
# /usr/sbin/zfs create -b PAGESIZE -V size_in_mbm rpool/swap
# /usr/sbin/swap -a /dev/zvol/dsk/rpool/swap

o dump
# /usr/sbin/zfs create -b 128*1024 -V size_in_mbm rpool/dump
# /usr/sbin/dumpadm -d /dev/zvol/dsk/rpool/dump



The above commands for creating the swap and dump zvols match
what the legacy installer does, as of build 90.


Could you please let me know, if my observations are correct
or if I should use different approach ?

As far as setting of volume block size is concerned (-b option),
how that numbers are to be determined ? Will they be the same in
different scenarios or are there plans to tune them in some way
in future ?


There are no plans to tune this.  The block sizes are appropriate
for the way the zvols are to be used.

  



Setting the swap blocksize to pagesize is interesting, but should be
ok for most cases.  The reason I say it is interesting is because it
is optimized for small systems, but not for larger systems which
typically see more use of large page sizes.  OTOH larger systems
should not swap, so it is probably a non-issue for them.  Small
systems should see this as the best solution.

Dump just sets the blocksize to the default, so it is a no-op.
 -- richard

  

[iii] Is there anything else I should be aware of ?
---
  



Installation should *not* fail due to running out of space because
of large dump or swap allocations.  I think the algorithm should
first take into account the space available in the pool after accounting
for the OS.


  

The Caiman team can make their own decision here, but we
decided to be more hard-nosed about disk space requirements in the
legacy install.  If the pool is too small to accommodate the recommended
swap and dump zvols, then maybe this system isn't a good candidate for
a zfs root pool.  Basically, we decided that since you almost
can't buy disks smaller than 60 GB these days, it's not worth much
effort to facilitate the setup of zfs root pools on disks that are smaller
than 

Re: [zfs-discuss] memory hog

2008-06-23 Thread Tim
On Mon, Jun 23, 2008 at 11:18 AM, Edward [EMAIL PROTECTED] wrote:

 Yes you are all correct. Ram cost nothing today, even though it might be
 bouncing back to their normal margin. DDR2 Ram are relatively cheap. Not to
 mention DDR3 will bring us double or more memory capacity.


Not likely.  Their *normal margins* were because of their collusion.  The
anti-trust lawsuit, and subsequent multi-billion dollar settlement assured
we won't be seeing that again anytime soon.



 Most people could afford 4GB Ram on their Desktop today. With 8GB Ram for
 Prosumers. At todays price i reckon ALL systems, even entry level should
 have 2GB Ram Standard.


And most vista systems do.  OEM's slowly learned their lesson.




 But the sad thing is Windows XP / Vista is still 32Bit. It doesn't
 recognize more then 3.x GB of Ram. 64Bit version is still premature and
 hardly OEM are adopting it. Hardware makers have yet to full jump on broad
 for 64 bit drivers.


false, both of them recognize well in excess of 4GB of ram.  What they CAN'T
do is address it for *ONE* process.  That's why applications like oracle
were quick to hop on the 64bit bandwagon, they actually need it.  I don't
know of too many consumer level apps besides maybe photoshop (and firefox ;)
) that come anywhere near 4GB ram usage.






 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Oracle and ZFS

2008-06-23 Thread Richard Elling
Mertol Ozyoney wrote:

 Hi All ;

 One of our customer is suffered from FS being corrupted after an 
 unattanded shutdonw due to power problem.

 They want to switch to ZFS.

 From what I read on, ZFS will most probably not be corrupted from the 
 same event. But I am not sure how will Oracle be affected from a 
 sudden power outage when placed over ZFS ?

 Any comments ?


Most databases have the ability to recover from unscheduled interruptions
without causing corruption. ZFS works in the same way -- you will recover
to a stable point in time. In-flight transactions will not be completed, as
expected. Upon restart, ZFS recovery will happen first, followed by the
database recovery.

 PS: I am aware of UPS’s and smilar technologies but customer is still 
 asking those if ... questions ...



UPS's fail, too. When we design highly available services, we will
expect that unscheduled interruptions will occur -- that is the only way
to design effective solutions.
-- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS root finally here in SNV90

2008-06-23 Thread Lori Alt

Mike Gerdts wrote:

On Wed, Jun 4, 2008 at 11:18 PM, Rich Teer [EMAIL PROTECTED] wrote:
  

Why would one do that?  Just keep an eye on the root pool and all is good.



The only good argument I have for separating out some of /var is for
boot environment management.  I grew tired of repeating my arguments
and suggestions and wrote a blog entry.

http://mgerdts.blogspot.com/2008/03/future-of-opensolaris-boot-environment.html

  

Sorry it's taken me so long to weigh in on this.

The reason that the install program permits the user
to set up a separate /var dataset is because some
production environments require it.  More exactly, some
production environments require that /var have its
own slice so that unrestrained growth in /var can't
fill up the root file system.  (I have no idea whether
this is actually a good or sufficient policy.
I just know that some customer environments have
such a policy.)

With zfs, we don't actually have to put /var in its own
slice.  We can achieve the same goal by putting it
in its own dataset and assigning a quota to that dataset.

That's really the only reason we offered this option.

Lori
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Oracle and ZFS

2008-06-23 Thread Miles Nordin
 mo == Mertol Ozyoney [EMAIL PROTECTED] writes:

mo One of our customer is suffered from FS being corrupted after
mo an unattanded shutdonw due to power problem.

mo They want to switch to ZFS.

mo From what I read on, ZFS will most probably not be corrupted
mo from the same event.

It's not supposed to happen with UFS, either.  nor XFS, JFS, ext3,
reiserfs, FFS+softdep, plain FFS, mac-HFS+journal.  All filesystems in
popular use for many years except maybe NTFS are supposed to obey
fsync and survive kernel crashes and unplanned power outage that
happens after fsync returns, without losing any data written before
fsync was called.  The fact that they don't in practice is a warning
that ZFS might not, either, no matter what it promises in theory.

I think many cheap PeeCee RAID setups without batteries suffer from
``the RAID5 write hole'' which takes away all the guarantees of
no-power-fail-corruption that the filesystems made, and these broken
no-battery setups seem to be really popular.  If one used ZFS on top
of such a no-battery RAID instead of switching it to JBOD mode, ZFS
would be vulnerable, too.

One interesting part of ZFS's ``in theory'' pitch is that, if you use
redundancy with ZFS, the checksums may somewhat address this problem
described below:

 http://linuxmafia.com/faq/Filesystems/reiserfs.html

-8-
You see, when you yank the power cord out of the wall, not all parts
of the computer stop functioning at the same time. As the voltage
starts dropping on the +5 and +12 volt rails, certain parts of the
system may last longer than other parts. For example, the DMA
controller, hard drive controller, and hard drive unit may continue
functioning for several hundred of milliseconds, long after the DIMMs,
which are very voltage sensitive, have gone crazy, and are returning
total random garbage. If this happens while the filesystem is writing
critical sections of the filesystem metadata, well, you get to visit
the fun Web pages at http://You.Lose.Hard/ .

I was actually told about this by an XFS engineer, who discovered this
about the hardware. Their solution was to add a power-fail interrupt
and bigger capacitors in the power supplies in SGI hardware; and, in
Irix, when the power-fail interrupt triggers, the first thing the OS
does is to run around frantically aborting I/O transfers to the
disk. Unfortunately, PC-class hardware doesn't have power-fail
interrupts. Remember, PC-class hardware is cr*p.
-8-

I would suspect a ZFS mirror might have a better shot of coming
through that type of crazy power failure, but I don't know how
anything can be robust to a mysterious force that scribbles randomly
all over the disk.

On the downside there are some things I thought I understood about
SVM's ideas of quorum that I do not yet understand in the ZFS world.

also...FTR I use his ext3 rather than XFS myself, but I'm a little
skeptical of Ted Ts'o ranting above because he is defending a shortcut
he took writing his own filesystem.

And I'm not sure the cord-pulling problem he describes is really
universal, and is really a reason for XFS-users losing data that
ext3-users don't---it sounds like it could be a specific-quirk type
problem, a blip in history just like ``the 5-volt rail'' he talks
about (+5V?  what did they used to run on 5 volts, a disk motor or a
battery charger or something?).  The SGI engineers had the problem on
their specific hardware, and solved it, but it may or may not exist on
present machines.  Maybe current hardware has other equally weird
problems when one pulls the power cord.

-- 
READ CAREFULLY. By reading this fortune, you agree, on behalf of your employer,
to release me from all obligations and waivers arising from any and all
NON-NEGOTIATED  agreements, licenses, terms-of-service, shrinkwrap, clickwrap,
browsewrap, confidentiality, non-disclosure, non-compete and acceptable use
policies (BOGUS AGREEMENTS) that I have entered into with your employer, its
partners, licensors, agents and assigns, in perpetuity, without prejudice to my
ongoing rights and privileges. You further represent that you have the
authority to release me from any BOGUS AGREEMENTS on behalf of your employer.


pgpQRPlS0ZITA.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Oracle and ZFS

2008-06-23 Thread Keith Bierman

On Jun 23, 2008, at 11:36 AM, Miles Nordin wrote:

 unplanned power outage that
 happens after fsync returns

Aye, but isn't that the real rub ... when the power fails after the  
write but *before* the fsync has occurred...


-- 
Keith H. Bierman   [EMAIL PROTECTED]  | AIM kbiermank
5430 Nassau Circle East  |
Cherry Hills Village, CO 80113   | 303-997-2749
speaking for myself* Copyright 2008




___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Oracle and ZFS

2008-06-23 Thread Richard Elling
Miles Nordin wrote:
 mo == Mertol Ozyoney [EMAIL PROTECTED] writes:
 

 mo One of our customer is suffered from FS being corrupted after
 mo an unattanded shutdonw due to power problem.

 mo They want to switch to ZFS.

 mo From what I read on, ZFS will most probably not be corrupted
 mo from the same event.

 It's not supposed to happen with UFS, either.  nor XFS, JFS, ext3,
 reiserfs, FFS+softdep, plain FFS, mac-HFS+journal.  All filesystems in
 popular use for many years except maybe NTFS are supposed to obey
 fsync and survive kernel crashes and unplanned power outage that
 happens after fsync returns, without losing any data written before
 fsync was called.  The fact that they don't in practice is a warning
 that ZFS might not, either, no matter what it promises in theory.
   

There is a more common failure mode at work here.  Most low-cost
disks have their volatile write cache enabled.  UFS knows nothing of
such caches and believes the disk has committed data when it acks.
In other words, even with O_DSYNC and friends doing the right
thing in the OS, the disk lies about the persistence of the data.  ZFS
knows disks lie, so it sends sync commands when necessary to help
ensure that the data is flushed to persistent storage.  But even if it is
not flushed, the ZFS on-disk format is such that you can recover to
a point in time where the file system is consistent. This is not the
case for UFS which was designed to trust the hardware.
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] memory hog

2008-06-23 Thread Charles Soto



On 6/23/08 11:59 AM, Tim [EMAIL PROTECTED] wrote:

 On Mon, Jun 23, 2008 at 11:18 AM, Edward [EMAIL PROTECTED] wrote:
 
 But the sad thing is Windows XP / Vista is still 32Bit. It doesn't
 recognize more then 3.x GB of Ram. 64Bit version is still premature and
 hardly OEM are adopting it. Hardware makers have yet to full jump on broad
 for 64 bit drivers.
 
 
 false, both of them recognize well in excess of 4GB of ram.  What they CAN'T
 do is address it for *ONE* process.  That's why applications like oracle
 were quick to hop on the 64bit bandwagon, they actually need it.  I don't
 know of too many consumer level apps besides maybe photoshop (and firefox ;)
 ) that come anywhere near 4GB ram usage.


While Edward is technically incorrect, the ceiling is still 4GB total
physical memory:

http://msdn.microsoft.com/en-us/library/aa366778.aspx

Note that even though

A 25% higher RAM ceiling is one thing, but it's a far cry from the 64-128GB
the enterprise target Windows versions can use (yes, some of them are
32-bit but if you pay the extra $, you are allowed to use more RAM).  The
3GB per-process limit is the real factor.  But then again, who runs Oracle
on Windows? :)

Charles
(ok, I have, but only for testing)

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] memory hog

2008-06-23 Thread Tim
On Mon, Jun 23, 2008 at 1:26 PM, Charles Soto [EMAIL PROTECTED] wrote:




 On 6/23/08 11:59 AM, Tim [EMAIL PROTECTED] wrote:

  On Mon, Jun 23, 2008 at 11:18 AM, Edward [EMAIL PROTECTED] wrote:
 
  But the sad thing is Windows XP / Vista is still 32Bit. It doesn't
  recognize more then 3.x GB of Ram. 64Bit version is still premature and
  hardly OEM are adopting it. Hardware makers have yet to full jump on
 broad
  for 64 bit drivers.
 
 
  false, both of them recognize well in excess of 4GB of ram.  What they
 CAN'T
  do is address it for *ONE* process.  That's why applications like oracle
  were quick to hop on the 64bit bandwagon, they actually need it.  I don't
  know of too many consumer level apps besides maybe photoshop (and firefox
 ;)
  ) that come anywhere near 4GB ram usage.


 While Edward is technically incorrect, the ceiling is still 4GB total
 physical memory:

 http://msdn.microsoft.com/en-us/library/aa366778.aspx

 Note that even though

 A 25% higher RAM ceiling is one thing, but it's a far cry from the 64-128GB
 the enterprise target Windows versions can use (yes, some of them are
 32-bit but if you pay the extra $, you are allowed to use more RAM).  The
 3GB per-process limit is the real factor.  But then again, who runs Oracle
 on Windows? :)

 Charles
 (ok, I have, but only for testing)

 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss



Read the fine print:

Limits on physical memory for 32-bit platforms also depend on the Physical
Address 
Extensionhttp://msdn.microsoft.com/en-us/library/aa366796%28VS.85%29.aspx(PAE),
which allows 32-bit Windows systems to use more than 4 GB of physical
memory.
PAE is enabled by default on XP after SP1, and all builds of vista.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] memory hog

2008-06-23 Thread Brian Hechinger
On Mon, Jun 23, 2008 at 03:16:45PM -0400, Brian H. Nelson wrote:
 
 Limits on physical memory for 32-bit platforms also depend on the 
 Physical Address Extension 
 http://msdn.microsoft.com/en-us/library/aa366796%28VS.85%29.aspx 
 (PAE), which allows 32-bit Windows systems to use more than 4 GB of 
 physical memory.
 
 PAE is enabled by default on XP after SP1, and all builds of vista.
 
 Read the regular-sized print in the XP and Vista tables:
 
 Under Windows, the 4GB limit is a LICENSING limit, not a problem of 
 addressability, PAE or otherwise. The 4GB limit is also in place for 
 32-bit Windows Server Standard editions. If you want to be able to use 
 more memory, you need to pay more money (as Charles already stated).

Regardless of licensing issues, PAE is an ugly hack and shouldn't be used
it at all possible. ;)

-brian
-- 
Coding in C is like sending a 3 year old to do groceries. You gotta
tell them exactly what you want or you'll end up with a cupboard full of
pop tarts and pancake mix. -- IRC User (http://www.bash.org/?841435)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] memory hog

2008-06-23 Thread Erik Trimble
Brian Hechinger wrote:
 On Mon, Jun 23, 2008 at 03:16:45PM -0400, Brian H. Nelson wrote:
   
 Limits on physical memory for 32-bit platforms also depend on the 
 Physical Address Extension 
 http://msdn.microsoft.com/en-us/library/aa366796%28VS.85%29.aspx 
 (PAE), which allows 32-bit Windows systems to use more than 4 GB of 
 physical memory.

 PAE is enabled by default on XP after SP1, and all builds of vista.
   
 Read the regular-sized print in the XP and Vista tables:

 Under Windows, the 4GB limit is a LICENSING limit, not a problem of 
 addressability, PAE or otherwise. The 4GB limit is also in place for 
 32-bit Windows Server Standard editions. If you want to be able to use 
 more memory, you need to pay more money (as Charles already stated).
 

 Regardless of licensing issues, PAE is an ugly hack and shouldn't be used
 it at all possible. ;)

 -brian
   
But, but, but, PAE works so nice on my Solaris 8 x86 boxes for 
massive /tmp.   :-)


To be even more pedantic about XP, here's the FINAL word from microsoft 
about the PAE and 2+ GB RAM support:

http://msdn.microsoft.com/en-us/library/ms791485.aspx

http://www.microsoft.com/whdc/system/platform/server/PAE/PAEmem.mspx


Bottom line:  Windows XP (any SP) supports a MAXIMUM of 4GB of ram, 
regardless of the various switches. This is a CODE limit, not a license 
limit.  While there are a bunch of APIs which are nominally available 
under XP for use of 4+GB address spaces, the OS kernel itself it limited 
to 4GB of physical RAM.



Back on topic:  the one thing I haven't tried out is ZFS on a 
32-bit-only system with PAE, and more than 4GB of RAM.   Anyone?


-- 
Erik Trimble
Java System Support
Mailstop:  usca22-123
Phone:  x17195
Santa Clara, CA
Timezone: US/Pacific (GMT-0800)

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [SOLVED] Confusion with snapshot send-receive

2008-06-23 Thread Cindy . Swearingen
I modified the ZFS Admin Guide to show a simple zfs send | zfs recv
example, then a more complex example using ssh to another system.

Thanks for the feedback...

Cindy

Andrius wrote:
 James C. McPherson wrote:
 
 Andrius wrote:

 Boyd Adamson wrote:

 Andrius [EMAIL PROTECTED] writes:

 Hi,
 there is a small confusion with send receive.

 zfs andrius/sounds was snapshoted @421 and should be copied to new
 zpool beta that on external USB disk.
 After
 /usr/sbin/zfs send andrius/[EMAIL PROTECTED] | ssh host1 /usr/sbin/zfs 
 recv beta
 or
 usr/sbin/zfs send andrius/[EMAIL PROTECTED] | ssh host1 /usr/sbin/zfs recv
 beta/sounds
 answer come
 ssh: host1: node name or service name not known

 What has been done bad?


 Your machine cannot resolve the name host1 into an IP address. 
 This is
 a network configuration problem, not a zfs problem.

 You should find that
 ssh host1

 fails too.

 Second pool is in the same machine. What to write instead of host2?


 try

 /usr/sbin/zfs send andrius/[EMAIL PROTECTED] | /usr/sbin/zfs recv beta/sounds


 You only need to pipe the zfs send output through ssh if
 you're actually sending it to a different system.


 James C. McPherson
 -- 
 Senior Kernel Software Engineer, Solaris
 Sun Microsystems
 http://blogs.sun.com/jmcphttp://www.jmcp.homeunix.com/blog

 
 Thanks, it works. Just strange, why s your simplified sample are hot in 
 ZFS Administration Guide. Somebody wanted to complicate things.
 
 
 
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS-Performance: Raid-Z vs. Raid5/6 vs. mirrored

2008-06-23 Thread Richard Elling
Ralf Bertling wrote:
 Hi list,
 as this matter pops up every now and then in posts on this list I just 
 want to clarify that the real performance of RaidZ (in its current 
 implementation) is NOT anything that follows from raidz-style data 
 efficient redundancy or the copy-on-write design used in ZFS.

 In a M-Way mirrored setup of N disks you get the write performance of 
 the worst disk and a read performance that is the sum of all disks 
 (for streaming and random workloads, while latency is not improved)
 Apart from the write performance you get very bad disk utilization 
 from that scenario.

I beg to differ with very bad disk utilization.  IMHO you get perfect
disk utilization for M-way redundancy :-)

 In Raid-Z currently we have to distinguish random reads from streaming 
 reads:
 - Write performance (with COW) is (N-M)*worst single disk write 
 performance since all writes are streaming writes by design of ZFS 
 (which is N-M-1 times faste than mirrored)
 - Streaming read performance is N*worst read performance of a single 
 disk (which is identical to mirrored if all disks have the same speed)
 - The problem with the current implementation is that N-M disks in a 
 vdev are currently taking part in reading a single byte from a it, 
 which i turn results in the slowest performance of N-M disks in question.

You will not be able to predict real-world write or sequential
read performance with this simple analysis because there are
many caches involved.  The caching effects will dominate for
many cases.  ZFS actually works well with write caches, so
it will be doubly difficult to predict write performance.

You can predict small, random read workload performance,
though, because you can largely discount the caching effects
for most scenarios, especially JBODs.


 Now lets see if this really has to be this way (this implies no, 
 doesn't it ;-)
 When reading small blocks of data (as opposed to streams discussed 
 earlier) the requested data resides on a single disk and thus reading 
 it does not require to send read commands to all disks in the vdev. 
 Without detailed knowledge of the ZFS code, I suspect the problem is 
 the logical block size of any ZFS operation always uses the full 
 stripe. If true, I think this is a design error.

No, the reason is that the block is checksummed and we check
for errors upon read by verifying the checksum.  If you search
the zfs-discuss archives you will find this topic arises every 6
months or so.  Here is a more interesting thread on the subject,
dated November 2006:
http://mail.opensolaris.org/pipermail/zfs-discuss/2006-November/035711.html

You will also note that for fixed record length workloads, we
tend to recommend the blocksize be matched with the ZFS
recordsize. This will improve efficiency for reads, in general.

 Without that, random reads to a raid-z are almost as fast as mirrored 
 data. 
 The theoretical disadvantages come from disks that have different 
 speed (probably insignificant in any real-life scenario) and the 
 statistical probability that by chance a few particular random reads 
 do in fact have to access the same disk drive to be fulfilled. (In a 
 mirrored setup, ZFS can choose from all idle devices, whereas in 
 RAID-Z it has to wait for the disk that holds the data to be ready 
 processing its current requests).
 Looking more closely, this effect mostly affects latency (not 
 performance) as random read-requests coming in should be distributed 
 equally across all devices even bette if the queue of requests gets 
 longer (this would however require ZFS to reorder requests for 
 maximum performance.

ZFS does re-order I/O.  Array controllers re-order the re-ordered
I/O. Disks then re-order I/O, just to make sure it was re-ordered
again. So it is also difficult to develop meaningful models of disk
performance in these complex systems.


 Since this seems to be a real issue for many ZFS users, it would be 
 nice if someone who has more time than me to look into the code, can 
 comment on the amount of work required to boost RaidZ read performance.

Periodically, someone offers to do this... but I haven't seen an
implementation.


 Doing so would level the tradeoff between read- write- performance and 
 disk utilization significantly.
 Obviously if disk space (and resulting electricity costs) do not 
 matter compared to getting maximum read performance, you will always 
 be best of with 3 or even more way mirrors and a very large number of 
 vdevs in your pool.

Space, performance, reliability: pick two.

sidebar
The ZFS checksum has proven to be very effective at
identifying data corruption in systems.  In a traditional
RAID-5 implementation, like SVM, the data is assumed
to be correct if the read operation returned without an
error. If you try to make SVM more reliable by adding a
checksum, then you will end up at approximately the
same place ZFS is: by distrusting the hardware you take
a performance penalty, but improve your data 

Re: [zfs-discuss] ZFS root finally here in SNV90

2008-06-23 Thread Mike Gerdts
On Mon, Jun 23, 2008 at 4:04 PM, Orvar Korvar
[EMAIL PROTECTED] wrote:
 Wouldnt it be nice to break out all file systems in separate zfs file 
 systems? Then you could snapshot each file system individually. Just like 
 each user has his own filesystem, and I can snapshot that filesystem 
 independently from other users.

 Because of now, if I do a snapshot of /, then everything gets snapshotted, 
 even /var which changes a lot. I dont want to snapshot /var. I only want to 
 snapshot /usr.


Some things in /var are likely appropriate to snapshot with /.  For
example, /var/sadm has lots of information about which packages and
patches are installed.  There is a lot of other stuff that shouldn't
be snapshotted with it.  I have proposed /var/share to cope with this.

http://mgerdts.blogspot.com/2008/03/future-of-opensolaris-boot-environment.html

-- 
Mike Gerdts
http://mgerdts.blogspot.com/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] CIFS HA service with solaris 10 and SC 3.2

2008-06-23 Thread Ross
Yeah, that's something I'd love to see.  CIFS isn't quite there yet, but it's 
miles ahead of Samba, and as soon as it is ready we'll want to be rolling it 
out under Sun Cluster.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] CIFS HA service with solaris 10 and SC 3.2

2008-06-23 Thread Richard Elling
Marcelo Leal wrote:
 Thanks all for the answers!
  Seems like the solution to have a opensolaris storage solution is the CIFS 
 project. And there is no agent to provide HA, so seems like a good project 
 too.

   

Currently, the HA-NFS service requires that you disable the
sharenfs property.
http://docs.sun.com/app/docs/doc/820-2565/geaov?a=view

I'm not sure of the reasoning here, except that the NFS agent
monitor currently reads dfstab for configuration information.

ZFS offers a different approach. For the alias, Solaris Cluster
will know that for a HA-NFS implementation, there will be
some devices containing a file system which must be mounted
prior to starting the NFS service. This ballet is scheduled based
on the configuration of the cluster and its services including IP
addresses, storage affinity, etc.  With ZFS sharenfs, shareiscsi,
and scharesmb, some of the ballet steps are combined with
zpool import. IMHO, it would be worthwhile to investigate
how to leverage and adjust the NFS and Samba agents to
also understand how ZFS works and do the right thing.  Adding
iSCSI should be a trivial addition.  Might be a good project...
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Oracle and ZFS

2008-06-23 Thread Miles Nordin
 re == Richard Elling [EMAIL PROTECTED] writes:
 kb == Keith Bierman [EMAIL PROTECTED] writes:

re the disk lies about the persistence of the data.  ZFS knows
re disks lie, so it sends sync commands when necessary

(1) i don't think ``lie'' is a correct characerization given that the
sync commands exist, but point taken about the other area of risk.

I suspect there may be similar problems in ZFS's write path when
one is using iSCSI targets.  Or it's just common for iSCSI target
implementations to suck (lie).  or maybe it's something else I'm
seeing.

(2) i thought the recommendation that one give ZFS whole disks and let
it put EFI labels on them came from the Solaris behavior that,
only in a whole-disk-for-zfs configuration, will the Solaris
drivers refrain from explicitly disabling the write cache in these
inexpensive disks.  so the cache shouldn't be a problem for UFS,
but it might be for non-Solaris operating systems (even for ZFS on
platforms where ZFS is ported but the SYNCHRONIZE CACHE commands
don't make it through some mid-layer or CAM or driver).

kb Aye, but isn't that the real rub ... when the power fails
kb after the write but *before* the fsync has occurred...

no, there is no rub here, I was only speaking precisely.  A proper
DBMS (anything except MySQL) is also designed to understand that power
failures happen.  It does its writes in a deliberate order such that
it won't return success to the application calling it until it gets
the return from fsync(), and also so that the system will never
recover such that a transaction is half-completed.

re the ZFS on-disk format is such that you can recover to a point
re in time where the file system is consistent.

do you mean taht, ``after a power outage ZFS will always recover the
filesystem to a state that it passed through in the moments leading up
to the outage,'' while UFS, which logs only metadata, typically
recovers to some state the filesystem never passed through---but it
never loses fsync()ed data nor data that wasn't written ``recently''
before the crash?

For casual filesystem use, or for applications that weren't designed
with cord-pulling in mind, ZFS's guarantee is larger and more
comforting.  But for databases, I don't think the distinction matters
because they call fsync() at deliberate moments and do their own
copy-on-write logging above the filesystem, so they provide the same
consistency guarantees whether operating on UFS or ZFS.  It would be
fine to feed a database the type of hacked non-CoW zvol that's used
for swap, if fsync could be made to work there, which maybe it can't.


pgpUl3DbdgW5f.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] CIFS HA service with solaris 10 and SC 3.2

2008-06-23 Thread Bob Friesenhahn
On Mon, 23 Jun 2008, Ross wrote:

 Yeah, that's something I'd love to see.  CIFS isn't quite there yet, 
 but it's miles ahead of Samba, and as soon as it is ready we'll want 
 to be rolling it out under Sun Cluster.

If Samba is already there for many people for many years, in what 
way is native CIFS miles ahead of Samba?  Does this apply to CIFS in 
general or just for HA?  This is not meant as a silly question since I 
would like to understand the benefits (beyond the native ACLs).

Bob
==
Bob Friesenhahn
[EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] memory hog

2008-06-23 Thread Brian Hechinger
On Mon, Jun 23, 2008 at 01:36:53PM -0700, Erik Trimble wrote:
 But, but, but, PAE works so nice on my Solaris 8 x86 boxes for 
 massive /tmp.   :-)

What CPU?  If it's a 64-bit CPU, you don't need PAE. ;)

 Back on topic:  the one thing I haven't tried out is ZFS on a 
 32-bit-only system with PAE, and more than 4GB of RAM.   Anyone?

Probably poorly.  ZFS needs address space, which is lacking in a 32-bit
kernel.

-brian
-- 
Coding in C is like sending a 3 year old to do groceries. You gotta
tell them exactly what you want or you'll end up with a cupboard full of
pop tarts and pancake mix. -- IRC User (http://www.bash.org/?841435)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool iostat

2008-06-23 Thread Brian Hechinger
On Thu, Jun 19, 2008 at 10:06:19AM +0100, Robert Milkowski wrote:
 Hello Brian,
 
 BH A three-way mirror and three disks in a double parity array are going to 
 get you
 BH the same usable space.  They are going to get you the same level of 
 redundancy.
 BH The only difference is that the RAIDZ2 is going to consume a lot more CPU 
 cycles
 BH calculating parity for no good cause.
 
 And you will also get higher IOPS with 3-way mirror.

That's a good point that I completely forgot to make, thanks!

-brian
-- 
Coding in C is like sending a 3 year old to do groceries. You gotta
tell them exactly what you want or you'll end up with a cupboard full of
pop tarts and pancake mix. -- IRC User (http://www.bash.org/?841435)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS root finally here in SNV90

2008-06-23 Thread Richard Elling
Brian Hechinger wrote:
 On Mon, Jun 23, 2008 at 11:18:21AM -0600, Lori Alt wrote:
   
 Sorry it's taken me so long to weigh in on this.
 

 You're busy with important things, we'll forgive you. ;)

   
 With zfs, we don't actually have to put /var in its own
 slice.  We can achieve the same goal by putting it
 in its own dataset and assigning a quota to that dataset.

 That's really the only reason we offered this option.
 

 And thank you for doing so.  I will always put /var in it's own area
 even if the definition of that area has changed with the use of ZFS.

 Rampant writes to /var can *still* run / out of space even on ZFS, being
 able to keep that from happening is never a bad idea as far as I'm
 concerned. :)

   

I think the ability to have different policies for file systems
is pure goodness -- though you pay for it on the backup/
restore side.

A side question though, my friends who run Windows,
Linux, or OSX don't seem to have this bias towards isolating
/var.  Is this a purely Solaris phenomenon?  If so, how do we
fix it?
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS root finally here in SNV90

2008-06-23 Thread Chris Cosby
On Mon, Jun 23, 2008 at 8:45 PM, Richard Elling [EMAIL PROTECTED]
wrote:

 Brian Hechinger wrote:
  On Mon, Jun 23, 2008 at 11:18:21AM -0600, Lori Alt wrote:
 
  Sorry it's taken me so long to weigh in on this.
 
 
  You're busy with important things, we'll forgive you. ;)
 
 
  With zfs, we don't actually have to put /var in its own
  slice.  We can achieve the same goal by putting it
  in its own dataset and assigning a quota to that dataset.
 
  That's really the only reason we offered this option.
 
 
  And thank you for doing so.  I will always put /var in it's own area
  even if the definition of that area has changed with the use of ZFS.
 
  Rampant writes to /var can *still* run / out of space even on ZFS, being
  able to keep that from happening is never a bad idea as far as I'm
  concerned. :)
 
 

 I think the ability to have different policies for file systems
 is pure goodness -- though you pay for it on the backup/
 restore side.

 A side question though, my friends who run Windows,
 Linux, or OSX don't seem to have this bias towards isolating
 /var.  Is this a purely Solaris phenomenon?  If so, how do we
 fix it?

I don't think it's a Solaris phenomenon, and it's not really a /var thing.
UNIX heads have always had to contend with the disaster that is a full /
filesystem. /var was always the most common culprit for causing it to run
out of space. If you talk to the really paranoid among us, we run a
read-only root filesystem. The real way to fix it, in zfs terms, is to
reserve a minimum amount of space in / - thereby guaranteeing that you don't
fill up your root filesystem.


  -- richard

 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss




-- 
chris -at- microcozm -dot- net
=== Si Hoc Legere Scis Nimium Eruditionis Habes
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS root finally here in SNV90

2008-06-23 Thread Brian Hechinger
On Mon, Jun 23, 2008 at 05:45:45PM -0700, Richard Elling wrote:
 
 I think the ability to have different policies for file systems
 is pure goodness -- though you pay for it on the backup/
 restore side.

That's a price I for one am willing to pay. ;)

 A side question though, my friends who run Windows,
 Linux, or OSX don't seem to have this bias towards isolating
 /var.  Is this a purely Solaris phenomenon?  If so, how do we
 fix it?

This is not a purely Solaris phenomenon, this is a UNIX phenomenon.
People who run Linux or OSX (I can't speak for Windows users) tend to
be new to the game and feel that This 40/80/500GB disk will never
fill up and so don't believe that seperating /var is needed.

It doesn't matter how big your disk is, a rampant process can fill up
any disk of any size, it's just a matter of how long it takes.

It isn't just /var that can cause trouble either, it's just that /var
is the usual suspect since it's the filesystem that tends to get written
to by the largest number of different processes.  /tmp on a system that
doesn't do tmpfs (BSD for example) is another likely candidate.

Keeping / as far away from everything else as possible is never a bad
idea.  ZFS only makes this task easier (IMHO) since you can set quotas
and reserves on different filesystems, thus protecting yourself from
damage, and also at the same time not wasting disk space that could
be better used elsewhere.

Opinionated? Me?

Yes.  ;)

-brian
-- 
Coding in C is like sending a 3 year old to do groceries. You gotta
tell them exactly what you want or you'll end up with a cupboard full of
pop tarts and pancake mix. -- IRC User (http://www.bash.org/?841435)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS root finally here in SNV90

2008-06-23 Thread Maurice Castro
Hi All,
the separating of /var is something that comes from the Unix  
tradition. Much of the Unix tradition of systems administration is  
based on making sure systems with many users remain stable and so  
administrators are prepared to work to make the system more reliable.  
Common Windows, Linux and OS X practices are dominated by the concept  
of a personal computer ie you only hurt yourself so ease is a priority  
to them.

The original filesystem layout separated
/
/var
/tmp
/usr
onto separate filessytems. In the bad old days every time there is a  
write there is risk that the filesystem may be made unstable so the  
aim was to minimise writes to / as without / booting to a minimal  
environment is a serious trial.

/tmp was used for data that is not required to persist over reboots.
/var was used for data that should persist over reboots

The other filesystems were used to store user files / non-minimal boot  
programs etc

By separating the filesystems it is possible to make a far more  
recoverable system in the event of:
- a user deciding to fill up all of one piece of temporary storage  
(ramdisk /tmp was one of those optimisations that sun made that had  
some serious negative consequences; many admins on large shared  
systems make it back into a disk backed filesystem)
- high write rate to other filesystems reduces risk of boot affecting  
writes from being made

So keeping /var and /tmp separate make life much easier. Some of us  
have even been known to run with a read-only root filesystem.

Linux and windows users appear to value the flexibility of not having  
to make system use decisions ie how big /var and /tmp should be at  
installation and being able to use the disk as they see fit; however,  
they are typically not managing systems for others and so they have  
made a choice of convenience which can be seriously inconvenient in a  
shared environment.

Maurice Castro

On 24/06/2008, at 10:45 AM, Richard Elling wrote:


 I think the ability to have different policies for file systems
 is pure goodness -- though you pay for it on the backup/
 restore side.

 A side question though, my friends who run Windows,
 Linux, or OSX don't seem to have this bias towards isolating
 /var.  Is this a purely Solaris phenomenon?  If so, how do we
 fix it?
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Oracle and ZFS

2008-06-23 Thread Richard Elling
Miles Nordin wrote:
 re == Richard Elling [EMAIL PROTECTED] writes:
 kb == Keith Bierman [EMAIL PROTECTED] writes:
 

 re the disk lies about the persistence of the data.  ZFS knows
 re disks lie, so it sends sync commands when necessary

 (1) i don't think ``lie'' is a correct characerization given that the
 sync commands exist, but point taken about the other area of risk.
   

IMNSHO, they lie. Some disks do not disable volatile write
caches, even when you ask them.  I've got a scar... right there
below the ORA-27062 and next to the FC-disk firmware scars...
I think Torrey's is on his backside... :-)

 I suspect there may be similar problems in ZFS's write path when
 one is using iSCSI targets.  Or it's just common for iSCSI target
 implementations to suck (lie).  or maybe it's something else I'm
 seeing.
   

I hope they aren't making assumptions about volatility...

 (2) i thought the recommendation that one give ZFS whole disks and let
 it put EFI labels on them came from the Solaris behavior that,
 only in a whole-disk-for-zfs configuration, will the Solaris
 drivers refrain from explicitly disabling the write cache in these
 inexpensive disks.  so the cache shouldn't be a problem for UFS,
 but it might be for non-Solaris operating systems (even for ZFS on
 platforms where ZFS is ported but the SYNCHRONIZE CACHE commands
 don't make it through some mid-layer or CAM or driver).
   

Close.  By default, Solaris will try to disable the write cache,
ostensibly to protect UFS.  But if the whole disk is in use by
ZFS, then it will enable the write cache and ZFS uses the
synchronize cache commands, as appropriate.  Solaris is a
bit conservative here, maybe too conservative.  In some
cases you can override it with format -e.

 kb Aye, but isn't that the real rub ... when the power fails
 kb after the write but *before* the fsync has occurred...

 no, there is no rub here, I was only speaking precisely.  A proper
 DBMS (anything except MySQL) is also designed to understand that power
 failures happen.  It does its writes in a deliberate order such that
 it won't return success to the application calling it until it gets
 the return from fsync(), and also so that the system will never
 recover such that a transaction is half-completed.
   

ZFS has similar protections. The most interesting is that since it is
COW, the metadata is (almost) never overwritten.  The almost
applies to the uberblocks which use a circular queue.

 re the ZFS on-disk format is such that you can recover to a point
 re in time where the file system is consistent.

 do you mean taht, ``after a power outage ZFS will always recover the
 filesystem to a state that it passed through in the moments leading up
 to the outage,'' while UFS, which logs only metadata, typically
 recovers to some state the filesystem never passed through---but it
 never loses fsync()ed data nor data that wasn't written ``recently''
 before the crash?
   

The system can lose fsync()ed data if UFS thinks it wrote
to persistent storage, but was actually writing to volatile
storage.  This may be less common, though.  I think the
more common symptom is a need to fsck to rebuild the
metadata.

 For casual filesystem use, or for applications that weren't designed
 with cord-pulling in mind, ZFS's guarantee is larger and more
 comforting.  But for databases, I don't think the distinction matters
 because they call fsync() at deliberate moments and do their own
 copy-on-write logging above the filesystem, so they provide the same
 consistency guarantees whether operating on UFS or ZFS.  It would be
 fine to feed a database the type of hacked non-CoW zvol that's used
 for swap, if fsync could be made to work there, which maybe it can't.
   

Hacked non-COW zvol?  Since COW occurs at the DMU layer,
below ZPL or ZVol, I don't see how to bypass it.  AFAIK,
the trick to using ZVols for swap was to just fix some bugs
in ZFS and rewrite the pertinent parts of the installer(s).

The subject of a non-COW volume does come up periodically.
I refer to these as raw devices :-)  Since many of the
features of ZFS depend on COW, if you get rid of COW
then you get rid of the features, and you might as well use
raw devices, no?
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS root finally here in SNV90

2008-06-23 Thread Richard Elling
Maurice Castro wrote:
 Hi All,
   the separating of /var is something that comes from the Unix  
 tradition. Much of the Unix tradition of systems administration is  
 based on making sure systems with many users remain stable and so  
 administrators are prepared to work to make the system more reliable.  
 Common Windows, Linux and OS X practices are dominated by the concept  
 of a personal computer ie you only hurt yourself so ease is a priority  
 to them.
   

So the consensus is that if we are to compete with them at
the desktop, then a simpler, easier to maintain file system
structure is cool.

 The original filesystem layout separated
 /
 /var
 /tmp
 /usr
 onto separate filessytems. In the bad old days every time there is a  
 write there is risk that the filesystem may be made unstable so the  
 aim was to minimise writes to / as without / booting to a minimal  
 environment is a serious trial.
   

Actually, no.  There was /.  /var didn't show up until SunOS 4
circa 1988.  /usr was made separable when it began to grow
bigger than the disks available at the time, circa 1986 or so.
In any case, well after UNIX was established.

 /tmp was used for data that is not required to persist over reboots.
 /var was used for data that should persist over reboots

 The other filesystems were used to store user files / non-minimal boot  
 programs etc

 By separating the filesystems it is possible to make a far more  
 recoverable system in the event of:
 - a user deciding to fill up all of one piece of temporary storage  
 (ramdisk /tmp was one of those optimisations that sun made that had  
 some serious negative consequences; many admins on large shared  
 systems make it back into a disk backed filesystem)
 - high write rate to other filesystems reduces risk of boot affecting  
 writes from being made
   

The reason for separating was very different, though this was
also a side-product.  In the days of diskless systems, you could
share parts of the OS as read-only.  /usr originally contained
/usr/tmp and /usr/Richard (or whatever). To make /usr be
read-only, user home directories and tmp had to be moved
out to /var.  / also was unique to each diskless client, so while
they could share much of the stuff in /usr, each had to have
its own /.  Some people also took advantage of the fact that
/usr/spool is now in /var/spool, where the printer files and
mail collected and separated these out so that you wouldn't
have to back them up (tapes were 60MBytes or so at the
time).  So, there you have it: /, /var, /usr, and /export/home.
Each has a different policy, which is the key here.

NB.  UFS, by default, reserves 10% which only root can
write.  So a regular user could not directly impact a running
UNIX system using UFS. ZFS does not have such reserve,
so if you want to implement it, you will end up with a
separate file system somewhere.
 -- richard

 So keeping /var and /tmp separate make life much easier. Some of us  
 have even been known to run with a read-only root filesystem.

 Linux and windows users appear to value the flexibility of not having  
 to make system use decisions ie how big /var and /tmp should be at  
 installation and being able to use the disk as they see fit; however,  
 they are typically not managing systems for others and so they have  
 made a choice of convenience which can be seriously inconvenient in a  
 shared environment.

 Maurice Castro

 On 24/06/2008, at 10:45 AM, Richard Elling wrote:

   
 I think the ability to have different policies for file systems
 is pure goodness -- though you pay for it on the backup/
 restore side.

 A side question though, my friends who run Windows,
 Linux, or OSX don't seem to have this bias towards isolating
 /var.  Is this a purely Solaris phenomenon?  If so, how do we
 fix it?
 -- richard
 

 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
   

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs primarycache and secondarycache properties

2008-06-23 Thread Darren Reed
Moved from PSARC to zfs-code...this discussion is seperate from the case.

Eric kustarz wrote:

 On Jun 23, 2008, at 1:20 PM, Darren Reed wrote:

 eric kustarz wrote:

 On Jun 23, 2008, at 1:07 PM, Darren Reed wrote:

 Tim Haley wrote:
 
 primarycache=all | none | metadata

 Controls what is cached in the primary cache (ARC).  If set to
 all, then both user data and metadata is cached.  If set to
 none, then neither user data nor metadata is cached.  If set to
 metadata, then only metadata is cached.  The default 
 behavior is
 all.


 The description above kind of implies that user data is somehow 
 separate to metadata
 but it isn't possible to say cache only user data (with the text 
 given.)  Is this just an
 oversight or is this really saying you cannot cache only the user 
 data?

 We couldn't come up with any realistic workload that would want to 
 cache user data but not metadata, so we're not allowing it.

 We can always add the option later, but if someone has a realistic 
 use case for it, i'd be happy to add it now.

 It's not so much the why, but maybe I'd like to say the primarycache
 gets metadata and the secondary cache gets user data (or vice versa.)
 If that make sense?  Or would that require linkage between metadata
 and user data (across cache boundaries) in order to maintain sanity?

 It is the why.  If there's no reason to do it, then we shouldn't 
 allow it (adds more complexity, more confusion, more ways for a 
 customer to shoot themselves in the foot).

 However, if there is a legitimate use case, let's discuss that.

In considering the why, being aware of some implementation details
seems necessary, such as:
- is there a difference in size between the primary and secondary cache
- how big is the meta data relative to user data
- how many meta data items are there relative to user data

So if I'm constantly accessing just a few files, I may prefer to have all of
the metadata cache'd by a smaller (primary?) cache and for user data to
not be able to cause that any problems (that capability is there now, ok.)

So the question becomes why does one not want the metadata in the same
cache as the user data?

So I spent some time thinking about different directions you could build
on this in the future, for example:
1) controlling the size of the ARC/L2ARC by controlling the cache size
2) specifying different backing storage for primary/secondary cache
3) having more than two levels of cache
...none of which is precluded by current efforts.

With (2), if the backing storage for each cache is different and it is 
slower
to access the secondary cache than the primary, then you may not want
metadata to be stored in the secondary cache for performance reasons.

As an example, you might be using NVRAM (be it flash or otherwise)
for the primary cache and ordinary RAM for the secondary.  In this case
you probably don't want any metadata to be stored in the secondary
cache (power failure issues) but  the same may not hold for user data.
But I'm probably wrong about that.

Darren

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS root finally here in SNV90

2008-06-23 Thread Mike Gerdts
On Mon, Jun 23, 2008 at 8:06 PM, Brian Hechinger [EMAIL PROTECTED] wrote:
 This is not a purely Solaris phenomenon, this is a UNIX phenomenon.
 People who run Linux or OSX (I can't speak for Windows users) tend to
 be new to the game and feel that This 40/80/500GB disk will never
 fill up and so don't believe that seperating /var is needed.

Why is having a full /var so much better than having a full /?  I've
had a number of Solaris systems fail to boot because it can't update
/var/adm/utmpx, but I've never had one fail to boot because / was
full.  As best as I can deduce, the root file system corruption when
it gets full is a combination of ancient history and urban legend.
I've brought this up on a lengthy thread over at sysadmin-discuss a
while back and have had no one refute my assertion with credible data.

http://mail.opensolaris.org/pipermail/sysadmin-discuss/2007-September/001668.html

I've also shared more detailed thoughts on file system sprawl at...

http://mail.opensolaris.org/pipermail/sysadmin-discuss/2007-September/001641.html

Really it boils down to lots of file systems to hold the OS adds
administrative complexity and rarely saves more work than it creates.
I believe this especially holds true for enterprise server
environments where downtime is really expensive.  I much prefer to ask
for a 3 hour outage to patch than a 5 hour outage to relayout file
systems then patch.  Of course today's development work will make the
3 hour outage for patching a thing of ancient history as well.

-- 
Mike Gerdts
http://mgerdts.blogspot.com/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS root finally here in SNV90

2008-06-23 Thread ian
Mike Gerdts writes: 

 On Mon, Jun 23, 2008 at 8:06 PM, Brian Hechinger [EMAIL PROTECTED] wrote:
 This is not a purely Solaris phenomenon, this is a UNIX phenomenon.
 People who run Linux or OSX (I can't speak for Windows users) tend to
 be new to the game and feel that This 40/80/500GB disk will never
 fill up and so don't believe that seperating /var is needed.
 
WIth ZFS boot, the point is moot. 

 Why is having a full /var so much better than having a full /?  I've
 had a number of Solaris systems fail to boot because it can't update
 /var/adm/utmpx, but I've never had one fail to boot because / was
 full.  As best as I can deduce, the root file system corruption when
 it gets full is a combination of ancient history and urban legend.
 I've brought this up on a lengthy thread over at sysadmin-discuss a
 while back and have had no one refute my assertion with credible data. 
 
We can only hope that ZFS boot will consign this never ending layout 
argument to the dust of history. 

Ian 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] mv between ZFSs on same zpool

2008-06-23 Thread Darren Reed
Yaniv Aknin wrote:
 Thanks for the reference.

 I read that thread to the end, and saw there are some complex considerations 
 regarding changing st_dev on an open file, but no decision. Despite this 
 complexity, I think the situation is quite brain damanged - I'm moving large 
 files between ZFSs all the time, otherwise I can't separate the tree as I'd 
 like to, and it's fairly annoying to think these blocks are basically not 
 doing anything at 50mb/s.

 I think even a hack will do for a start (do I hear 'zmv').

 Thoughts? Objections?
   

It is filed as an RFE - 6650426 and 6483179 are related.

Darren

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss