Re: [zfs-discuss] x4500 vs AVS ?

2008-09-10 Thread Victor Latushkin
On 09.09.08 19:32, Richard Elling wrote:
 Ralf Ramge wrote:
 Richard Elling wrote:

 Yes, you're right. But sadly, in the mentioned scenario of having 
 replaced an entire drive, the entire disk is rewritten by ZFS.
 No, this is not true.  ZFS only resilvers data.
 Okay, I see we have a communication problem here. Probably my fault, I 
 should have written the entire data and metadata.
 I made the assumption that a 1 TB drive in a X4500 may have up to 1 TB 
 of data on it. Simply because nobody buys the 1 TB X4500 just to use 
 10% of the disk space, he would have bought the 250 GB, 500 GB or 750 
 GB model then.
 
 Actually, they do :-)  Some storage vendors insist on it, to keep
 performance up -- short-stroking.
 
 I've done several large-scale surveys of this and the average usage
 is 50%.  This is still a large difference in resilver times between
 ZFS and SVM.

There is RFE 6722786 resilver on mirror could reduce window of 
vulnerability which is aimed to reduce this difference for mirrors.

See here: http://bugs.opensolaris.org/view_bug.do?bug_id=6722786

Wbr,
Victor
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS over multiple iSCSI targets

2008-09-10 Thread James Andrewartha
Tuomas Leikola wrote:
 On Mon, Sep 8, 2008 at 8:35 PM, Miles Nordin [EMAIL PROTECTED] wrote:
ps iSCSI with respect to write barriers?

 +1.

 Does anyone even know of a good way to actually test it?  So far it
 seems the only way to know if your OS is breaking write barriers is to
 trade gossip and guess.
 
 Write a program that writes backwards (every other block to avoid
 write merges) with and without O_DSYNC, measure speed.
 
 I think you can also deduce driver and drive cache flush correctness
 by calculating the best theoretical correct speed (which should be
 really slow, one write per disc spin)
 
 this has been on my TODO list for ages.. :(

Does the perl script at http://brad.livejournal.com/2116715.html do what you
want?

-- 
James Andrewartha
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] The best motherboard for a home ZFS fileserver

2008-09-10 Thread W. Wayne Liauh
 I'm a fan of ZFS since I've read about it last year.
 
 Now I'm on the way to build a home fileserver and I'm
 thinking to go with Opensolaris and eventually ZFS!!

This seems to be a good candidate to build a home ZFS server:

http://tinyurl.com/msi-so

It's cheap, low power, fan-less; the only concern is the Realtek 8111C NIC.  
According to a Sun Blogger, there is no Solaris driver:

http://blogs.sun.com/roberth/entry/msi_wind_as_a_low

(Thanks for the info)
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] The best motherboard for a home ZFS fileserver

2008-09-10 Thread Mads Toftum
On Wed, Sep 10, 2008 at 03:57:13AM -0700, W. Wayne Liauh wrote:
 This seems to be a good candidate to build a home ZFS server:
 
 http://tinyurl.com/msi-so
 
 It's cheap, low power, fan-less; the only concern is the Realtek 8111C NIC.  
 According to a Sun Blogger, there is no Solaris driver:
 
Looking at the pictures, there may not be a cpu fan but there's still a
case fan. One could also argue that the case really isn't optimal for
multiple disks.

vh

Mads Toftum
-- 
http://soulfood.dk
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Nexenta/ZFS vs Heartbeat/DRBD

2008-09-10 Thread Axel Schmalowsky
Hallo list,

hope that so can help me on this topic.

I'd like to know where the *real* advantages of Nexenta/ZFS (i.e. 
ZFS/StorageTek) over DRBD/Heartbeat are.
I'm pretty new to this topic and hence do not have enough experience to judge 
their respective advantages/disadvantages reasonably.

Any suggestion would be appreciated.


-- 
Best regards

Axel Schmalowsky
Platform Engineer
___

domainfactory GmbH
Oskar-Messter-Str. 33
85737 Ismaning
Germany

Telefon:  +49 (0)89 / 55266-356
Telefax:  +49 (0)89 / 55266-222

E-Mail:   [EMAIL PROTECTED]
Internet: www.df.eu

Registergericht: Amtsgericht München
HRB 150294, Geschäftsführer Tobias
Marburg, Jochen Tuchbreiter
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] The best motherboard for a home ZFS fileserver

2008-09-10 Thread Al Hopper
On Wed, Sep 10, 2008 at 5:57 AM, W. Wayne Liauh [EMAIL PROTECTED] wrote:
 I'm a fan of ZFS since I've read about it last year.

 Now I'm on the way to build a home fileserver and I'm
 thinking to go with Opensolaris and eventually ZFS!!

 This seems to be a good candidate to build a home ZFS server:

 http://tinyurl.com/msi-so

 It's cheap, low power, fan-less; the only concern is the Realtek 8111C NIC.  
 According to a Sun Blogger, there is no Solaris driver:

 http://blogs.sun.com/roberth/entry/msi_wind_as_a_low

 (Thanks for the info)
 --

From the other reviews I've read on the Atom 230 and 270, I don't
think this box has enough CPU horsepower for a ZFS based fileserver
- or maybe I have different performance expectations than the OP.  To
each his own.

I would like to give the list a heads-up on a mini-ITX board that is
already available based on the Atom 330 - the dual core version of the
chip.  Here you'll find a couple of pictures of the board:
http://www.mp3car.com/vbulletin/general-hardware-discussion/123966-intel-d945gclf2-dual-core-atom.html
 NB: the 2 at the end of the part # is the Atom330 based part; no
2 indicates the board with the single-core Atom.  Also: the 330 has
twice the cache as the single-core Atom.  This board is already
available for around $85.  Bear in mind that the chipset used on this
board dissipated around 45 Watts - so don't just look at the power
dissipation numbers for the CPU.

I'm not specifically recommending this board for use as a ZFS based
fileserver - but it might provide a solution for someone on this list.

PS: Since the Atom supports hyperthreading, the Atom 330 will appears
to Solaris as 4 CPUs.

Regards,

-- 
Al Hopper Logical Approach Inc,Plano,TX [EMAIL PROTECTED]
 Voice: 972.379.2133 Timezone: US CDT
OpenSolaris Governing Board (OGB) Member - Apr 2005 to Mar 2007
http://www.opensolaris.org/os/community/ogb/ogb_2005-2007/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Intel M-series SSD

2008-09-10 Thread Al Hopper
Interesting flash technology overview and SSD review here:

http://www.anandtech.com/cpuchipsets/intel/showdoc.aspx?i=3403
and another review here:
http://www.tomshardware.com/reviews/Intel-x25-m-SSD,2012.html

Regards,

-- 
Al Hopper  Logical Approach Inc,Plano,TX [EMAIL PROTECTED]
   Voice: 972.379.2133 Timezone: US CDT
OpenSolaris Governing Board (OGB) Member - Apr 2005 to Mar 2007
http://www.opensolaris.org/os/community/ogb/ogb_2005-2007/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Sun samba - ZFS ACLs

2008-09-10 Thread Paul B. Henson
On Sat, 6 Sep 2008, Sean McGrath wrote:

   The sfw project's bit has whats needed here, the libsunwrap.a src etc,
   http://www.opensolaris.org/os/project/sfwnv/

Thanks for the pointer, I was able to pull out the libsunwrap.a source code
and use it to compile the bundled samba source from S10U5. Although a
couple of functions in vfs_zfsacl.c returned NTSTATUS instead of BOOL,
which initially caused a compilation error until I fixed it. I don't think
the source code shipped with Solaris is the same source code actually used
to make the binary packages :(.

For the benefit of anyone with a similar problem that finds this thread via
a search, it turns out the issue was actually with the nfs4acl module,
which vfs_zfsacl.c uses.

From README.nfs4acls.txt:


mode = [simple|special]
- simple: don't use OWNER@ and GROUP@ special IDs in ACEs. - default
- special: use OWNER@ and GROUP@ special IDs in ACEs instead of simple
usergroup ids.


The default for the NFS4 ACL mapper subsystem is to remove special ACEs
and replace them with specific user/group ACEs. Kind of seems like a dumb
default to me. If you add nfs4: mode = special to your smb.conf things
work as expected.

Thanks again for the help...


-- 
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  [EMAIL PROTECTED]
California State Polytechnic University  |  Pomona CA 91768
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Intel M-series SSD

2008-09-10 Thread Bob Friesenhahn
On Wed, 10 Sep 2008, Al Hopper wrote:

 Interesting flash technology overview and SSD review here:

 http://www.anandtech.com/cpuchipsets/intel/showdoc.aspx?i=3403
 and another review here:
 http://www.tomshardware.com/reviews/Intel-x25-m-SSD,2012.html

These seem like regurgitations of the same marketing drivel that you 
notified us about before.

These Intel products are assembled in China based on non-Intel FLASH 
components (from Micron).  There is little reason to believe that 
Intel will corner the market due to having an aggressive marketing 
department.  There are other companies in the business who may seem 
oddly silent compared with Intel/Micron, but enjoy a vastly larger 
share of the FLASH market.

These reviews continue their apples/oranges comparison by comparing 
cheap lowest-grade desktop/laptop drives with the expensive Intel SSD 
drives.  The hard drive performance specified is for low-grade 
consumer drives rather than enterprise drives.  The hard drive 
reliability specified is for low-grade consumer drives rather than 
enterprise drives.  The table at Tom's Hardware talks about 160GB SSD 
drives which are not even announced.

The SLC storage sizes are still quite tiny.  The wear leveling 
algorithm ensures that the drive starts losing its memory in all 
locations at about the same time.  RAID does not really help much here 
for reliability since RAID systems are usually comprised of the same 
devices installed at the same time and seeing identical write 
activity.  RAID works due to failures being random.  If the failures 
are not random (i.e. all drives start reporting read errors at once) 
then RAID does not really help. Hopefully the integration with the OS 
is sufficient that the user knows it is time to change out the drive 
before it is too late to salvage the data.

Write performance to SSDs is not all it is cracked up to be.  Buried 
in the AnandTech writeup, there is mention that while 4K can be 
written at once, 512KB needs to be erased at once.  This means that 
write performance to an empty device will seem initially pretty good, 
but then it will start to suffer as 512KB regions need to be erased to 
make space for more writes.  ZFS's COW scheme will intially be fast, 
but then the writes will slow after all blocks on the device have been 
written to before.  Since writing to a used drive incurs additional 
latency, the device will need to buffer writes in RAM so that it 
returns to the user faster.  This may increase the chance of data loss 
due to power failure.

Bob
==
Bob Friesenhahn
[EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Nexenta/ZFS vs Heartbeat/DRBD

2008-09-10 Thread Erast Benson
Well, obviously - its Linux vs. OpenSolaris question. Most serious
advantage of OpenSolaris is ZFS and its enterprise level storage stack.
Linux just not there yet..

On Wed, 2008-09-10 at 14:51 +0200, Axel Schmalowsky wrote:
 Hallo list,
 
 hope that so can help me on this topic.
 
 I'd like to know where the *real* advantages of Nexenta/ZFS (i.e. 
 ZFS/StorageTek) over DRBD/Heartbeat are.
 I'm pretty new to this topic and hence do not have enough experience to judge 
 their respective advantages/disadvantages reasonably.
 
 Any suggestion would be appreciated.
 
 

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Intel M-series SSD

2008-09-10 Thread Keith Bierman

On Sep 10, 2008, at 11:40 AM, Bob Friesenhahn wrote:


 Write performance to SSDs is not all it is cracked up to be.  Buried
 in the AnandTech writeup, there is mention that while 4K can be
 written at once, 512KB needs to be erased at once.  This means that
 write performance to an empty device will seem initially pretty good,
 but then it will start to suffer as 512KB regions need to be erased to
 make space for more writes.

That assumes that one doesn't code up the system to batch up erases  
prior to writes.

...
 returns to the user faster.  This may increase the chance of data loss
 due to power failure.


Presumably anyone deft enough to design such an enterprise grade  
device will be able to provide enough super-capacitor (or equivalent)  
to ensure that DRAM is flushed to SSD before anything bad happens.

Clever use of such devices in L2ARC and slog ZFS configurations (or  
moral equivalents in other environments) is pretty much the only  
affordable way (vs. huge numbers of spindles) to bridge the gap  
between rotating rust and massively parallel CPUs.

One imagines that Intel will go back to fabbing their own at some  
point; that is closer to their usual business model than OEMing other  
people's parts ;


-- 
Keith H. Bierman   [EMAIL PROTECTED]  | AIM kbiermank
5430 Nassau Circle East  |
Cherry Hills Village, CO 80113   | 303-997-2749
speaking for myself* Copyright 2008




___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Intel M-series SSD

2008-09-10 Thread Richard Elling
Bob Friesenhahn wrote:
 On Wed, 10 Sep 2008, Al Hopper wrote:

   
 Interesting flash technology overview and SSD review here:

 http://www.anandtech.com/cpuchipsets/intel/showdoc.aspx?i=3403
 and another review here:
 http://www.tomshardware.com/reviews/Intel-x25-m-SSD,2012.html
 

 These seem like regurgitations of the same marketing drivel that you 
 notified us about before.

 These Intel products are assembled in China based on non-Intel FLASH 
 components (from Micron).  There is little reason to believe that 
 Intel will corner the market due to having an aggressive marketing 
 department.  There are other companies in the business who may seem 
 oddly silent compared with Intel/Micron, but enjoy a vastly larger 
 share of the FLASH market.
   

Intel and Micron have a joint venture for doing the flash SSDs.
For some reason, Intel's usually excellent marketing team wasn't involved
in naming the JV, so it is called IM Flash Technologies... boring :-)
http://www.imftech.com/

Samsung is another major vendor, rumored to be trying to buy Sandisk,
but it ain't over 'til its over... might be a  JV opportunity, too.
http://www.ft.com/cms/s/2/eb1f748e-7f34-11dd-a3da-77b07658.html

 These reviews continue their apples/oranges comparison by comparing 
 cheap lowest-grade desktop/laptop drives with the expensive Intel SSD 
 drives.  The hard drive performance specified is for low-grade 
 consumer drives rather than enterprise drives.  The hard drive 
 reliability specified is for low-grade consumer drives rather than 
 enterprise drives.  The table at Tom's Hardware talks about 160GB SSD 
 drives which are not even announced.

 The SLC storage sizes are still quite tiny.  The wear leveling 
 algorithm ensures that the drive starts losing its memory in all 
 locations at about the same time.  RAID does not really help much here 
 for reliability since RAID systems are usually comprised of the same 
 devices installed at the same time and seeing identical write 
 activity.  RAID works due to failures being random.  If the failures 
 are not random (i.e. all drives start reporting read errors at once) 
 then RAID does not really help. Hopefully the integration with the OS 
 is sufficient that the user knows it is time to change out the drive 
 before it is too late to salvage the data.
   

I think the market segments are becoming more solidified.  There is
clearly a low-cost consumer market.  But there is also a large, unsatisfied
demand for enterprise-class SSDs.  Intel has already announced an SLC
based extreme product line.  Brian's blog seems to be one of the
best distilled descriptions I've seen:
http://www.edn.com/blog/40040/post/360032036.html
 -- richard

 Write performance to SSDs is not all it is cracked up to be.  Buried 
 in the AnandTech writeup, there is mention that while 4K can be 
 written at once, 512KB needs to be erased at once.  This means that 
 write performance to an empty device will seem initially pretty good, 
 but then it will start to suffer as 512KB regions need to be erased to 
 make space for more writes.  ZFS's COW scheme will intially be fast, 
 but then the writes will slow after all blocks on the device have been 
 written to before.  Since writing to a used drive incurs additional 
 latency, the device will need to buffer writes in RAM so that it 
 returns to the user faster.  This may increase the chance of data loss 
 due to power failure.

 Bob
 ==
 Bob Friesenhahn
 [EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/
 GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
   

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Intel M-series SSD

2008-09-10 Thread Keith Bierman

On Sep 10, 2008, at 12:37 PM, Bob Friesenhahn wrote:

 On Wed, 10 Sep 2008, Keith Bierman wrote:

 written at once, 512KB needs to be erased at once.  This means that
 write performance to an empty device will seem initially pretty  
 good,
 but then it will start to suffer as 512KB regions need to be  
 erased to
 make space for more writes.

 That assumes that one doesn't code up the system to batch up  
 erases prior to writes.

 Is the notion of block erase even exposed via SATA/SCSI  
 protocols? Maybe it is for CD/DVD type devices.

 This is something that only the device itself would be aware of.  
 Only the device knows if the block has been used before.


A conspiracy between the device and a savvy host is sure to emerge ;
 ...
 That is reasonable.  It adds to product cost and size though. Super- 
 capacitors are not super-small.

True, but for enterprise class devices they are sufficiently small.  
Laptops will have a largish battery and won't need the caps ;  
Desktops will be on their own.

-- 
Keith H. Bierman   [EMAIL PROTECTED]  | AIM kbiermank
5430 Nassau Circle East  |
Cherry Hills Village, CO 80113   | 303-997-2749
speaking for myself* Copyright 2008




___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Intel M-series SSD

2008-09-10 Thread Bob Friesenhahn
On Wed, 10 Sep 2008, Keith Bierman wrote:
 ...
 That is reasonable.  It adds to product cost and size though. 
 Super-capacitors are not super-small.
 
 True, but for enterprise class devices they are sufficiently small. Laptops 
 will have a largish battery and won't need the caps ; Desktops will be on 
 their own.

The Intel SSDs are still not advertised as enterprise class devices.

Bob
==
Bob Friesenhahn
[EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Nexenta/ZFS vs Heartbeat/DRBD

2008-09-10 Thread Erast Benson
On Wed, 2008-09-10 at 14:36 -0400, Maurice Volaski wrote:
 A disadvantage, however, is that Sun StorageTek Availability Suite 
 (AVS), the DRBD equivalent in OpenSolaris, is much less flexible than 
 DRBD. For example, AVS is intended to replicate in one direction, 
 from a primary to a secondary, whereas DRBD can switch on the fly. 
 See 
 http://www.opensolaris.org/jive/thread.jspa?threadID=68881tstart=30 
 for details on this.

I would be curious to see production environments switching direction
on the fly at that low level... Usually some top-level brain does that
in context of HA fail-over and so on.

well, AVS actually does reverse synchronization and does it very good.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Nexenta/ZFS vs Heartbeat/DRBD

2008-09-10 Thread Maurice Volaski
On Wed, 2008-09-10 at 14:36 -0400, Maurice Volaski wrote:
  A disadvantage, however, is that Sun StorageTek Availability Suite
  (AVS), the DRBD equivalent in OpenSolaris, is much less flexible than
  DRBD. For example, AVS is intended to replicate in one direction,
  from a primary to a secondary, whereas DRBD can switch on the fly.
  See
  http://www.opensolaris.org/jive/thread.jspa?threadID=68881tstart=30
  for details on this.

I would be curious to see production environments switching direction
on the fly at that low level... Usually some top-level brain does that
in context of HA fail-over and so on.

By switching on the fly, I mean if the primary services are taken 
down and then brought up on the secondary, the direction of 
synchronization gets reversed. That's not possible with AVS because...

well, AVS actually does reverse synchronization and does it very good.

It's a one-time operation that re-reverses once it completes.
-- 

Maurice Volaski, [EMAIL PROTECTED]
Computing Support, Rose F. Kennedy Center
Albert Einstein College of Medicine of Yeshiva University
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Intel M-series SSD

2008-09-10 Thread Bob Friesenhahn
On Wed, 10 Sep 2008, Keith Bierman wrote:

 written at once, 512KB needs to be erased at once.  This means that
 write performance to an empty device will seem initially pretty good,
 but then it will start to suffer as 512KB regions need to be erased to
 make space for more writes.

 That assumes that one doesn't code up the system to batch up erases prior to 
 writes.

Is the notion of block erase even exposed via SATA/SCSI protocols? 
Maybe it is for CD/DVD type devices.

This is something that only the device itself would be aware of. 
Only the device knows if the block has been used before.  Only the 
device knows the block of physical storage which will be used for the 
write.  The device does not know what can be erased before it sees a 
(over) write request and if the write request is for a smaller size, 
then existing data needs to be moved (for leveling) or buffered and 
written back to the same locations.  This means that 512KB needs to be 
erased and re-written.

 Presumably anyone deft enough to design such an enterprise grade device will 
 be able to provide enough super-capacitor (or equivalent) to ensure that DRAM 
 is flushed to SSD before anything bad happens.

That is reasonable.  It adds to product cost and size though. 
Super-capacitors are not super-small.

Bob
==
Bob Friesenhahn
[EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Nexenta/ZFS vs Heartbeat/DRBD

2008-09-10 Thread Erast Benson
On Wed, 2008-09-10 at 15:00 -0400, Maurice Volaski wrote:
 On Wed, 2008-09-10 at 14:36 -0400, Maurice Volaski wrote:
   A disadvantage, however, is that Sun StorageTek Availability Suite
   (AVS), the DRBD equivalent in OpenSolaris, is much less flexible than
   DRBD. For example, AVS is intended to replicate in one direction,
   from a primary to a secondary, whereas DRBD can switch on the fly.
   See
   http://www.opensolaris.org/jive/thread.jspa?threadID=68881tstart=30
   for details on this.
 
 I would be curious to see production environments switching direction
 on the fly at that low level... Usually some top-level brain does that
 in context of HA fail-over and so on.
 
 By switching on the fly, I mean if the primary services are taken 
 down and then brought up on the secondary, the direction of 
 synchronization gets reversed. That's not possible with AVS because...
 
 well, AVS actually does reverse synchronization and does it very good.
 
 It's a one-time operation that re-reverses once it completes.

When primary is repaired you want to have it on-line and retain the
changes made on the secondary. Your secondary did the job and switched
back to its secondary role. This HA fail-back cycle could be repeated as
many times as you need using reverse sync command.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Any commands to dump all zfs snapshots like NetApp snapmirror

2008-09-10 Thread Richard Elling
Haiou Fu (Kevin) wrote:
 I wonder if there are any equivalent commands in zfs to dump all its 
 associated snapshots at maximum efficiency (only the changed data blocks 
 among all snapshots)?   I know you can just zfs send all snapshots but each 
 one is like a full dump and if you use zfs send -i it is hard to maintain 
 the relationship of the snapshots.

 In NetApp filer,  I can do snapmirror store /vol/vol01 ...  then everything 
 in /vol/vol01 and all of its snapshots will be mirrored to destination, and 
 it is block level which means only the changes in snapshots are sent out.

 So I wonder if there is an equivalent for ZFS to do similar things:
 For example:   given one zfs:  diskpool/myzfs,  it has n snapshots:
  diskpool/[EMAIL PROTECTED], diskpool/[EMAIL PROTECTED],..  
 diskpool/[EMAIL PROTECTED],
 and if you just do zfs  magic subcommands diskpool/myzfs ... and it will 
 dump myzfs and all its n snapshots out at maximum efficiency (only changed 
 data in snapshots) ?

 Any hints/helps are aprreciated!
   

zfs send -I
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] x4500 vs AVS ?

2008-09-10 Thread Matt Beebe
Just to clarify a few items... consider a setup where we desire to use AVS to 
replicate the ZFS pool on a 4 drive server to like hardware.  The 4 drives are 
setup as RaidZ.

If we lose a drive (say #2) in the primary server, RaidZ will take over, and 
our data will still be available but the array is at a degraded state.

But what happens to the secondary server?  Specifically to its bit-for-bit copy 
of Drive #2... presumably it is still good, but ZFS will offline that disk on 
the primary server, replicate the metadata, and when/if I promote the 
seconday server, it will also be running in a degraded state (ie: 3 out of 4 
drives).  correct?

In this scenario, my replication hasn't really bought me any increased 
availablity... or am I missing something?  

Also, if I do chose to fail over to the secondary, can I just to a scrub the 
broken drive (which isn't really broken, but the zpool would be inconsistent 
at some level with the other online drives) and get back to full speed 
quickly? or will I always have to wait until one of the servers resilvers 
itself (from scratch?), and re-replicates itself??

thanks in advance.

-Matt
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SAS or SATA HBA with write cache

2008-09-10 Thread Matt Beebe
 
  I'm guessing one of the reasons you wanted a
 non-RAID controller with
  a write cache was so that if the controller failed,
 and the exact same
  model wasn't available to replace it, most of your
 pool would still be
  readable with any random controller, modulo risk of
 corruption from
  the lost write cache.  so...with the slog, you
 don't have that,
  because there are magic irreplaceable bits stored
 on the slog without
  which your whole pool is useless.
 


Actually I just wanted to get the benefit of increased write cache without 
paying for the RAID controller... All the best practice guides say that using 
your RAID controller is generally redundant, and you should use RaidZ (or a 
variant) in most implementations (leaving room for scenarios where hardware 
mirroring of some of the drives may be better, etc).  

Telling the RAID controller to export each drive as a single LUN works with 
most of the RAID controllers our there... but in addition to being painful to 
configure (on most of the RAID cards), you're paying for RAID hardware logic 
that goes unused.  Also, all the RAID cards (that I've seen) write some sort of 
magic secret on the drive (even in 1:1 config) that messes with you when you 
need to replace/move the drives down the road.

So how 'bout it hardware vendors?  when can we get a PCIe(x8) SAS/SATA 
controller with an x4 internal port and an x4 external port and 512MB battery 
backed cache for about $250??  :)  Heck, I'd take SATA only if I could get it 
at a decent price point... 

While we're at it, I'd also be happy with a PCIe(x4) card with 2 or 4 DIMM 
slots and a battery back-up that exposes itself as a system drive (ala iRAM, 
but PCIe not SATA 150) for slog and read cache... say $150 price point?  
heehee...  there is an SSD based option out there, but it has 80GB available, 
and starts at $2500 (overkill for my requirement)

-Matt
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Any commands to dump all zfs snapshots like NetApp snapmirror

2008-09-10 Thread Haiou Fu (Kevin)
Can you explain more about zfs send -lI know zfs send -i but didn't 
know there is a -l option? In which release is this option available?
Thanks!
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Any commands to dump all zfs snapshots like NetApp snapmirror

2008-09-10 Thread Haiou Fu (Kevin)
The closest thing I can find is:
http://bugs.opensolaris.org/view_bug.do?bug_id=6421958

But just like it says:   Incremental +
recursive will be a bit tricker, because how do you specify the multiple
source and dest snaps?  

Let me clarify this more:

Without send -r I need do something like this;

   Given  a zfs file system myzfs in zpool  mypool,  it has N snapshots:
mypool/myzfs
mypool/[EMAIL PROTECTED]
mypool/[EMAIL PROTECTED]

mypool/[EMAIL PROTECTED],

   Do following things:

   zfs snapshot mypool/[EMAIL PROTECTED] 
   zfs send mypool/[EMAIL PROTECTED] | gzip -   /somewhere/myzfs-current.gz
   zfs send -i mypool/[EMAIL PROTECTED] mypool/[EMAIL PROTECTED] | gzip -  
/somewhere/myzfs-1.gz
   zfs send -i mypool/[EMAIL PROTECTED] mypool/[EMAIL PROTECTED] | gzip -  
/somewhere/myzfs-2.gz 
  ..
   zfs send -i mypool/[EMAIL PROTECTED] mypool/[EMAIL PROTECTED] | gzip -  
/somewhere/myzfs-N.gz

   As you can see, above commands are kind of a stupid solution, and it 
didn't reach maximum efficiency because those myzfs-1 ~ N.gz files contain 
a lot of common stuffs in them!
I wonder how will send -r do in above situation?  How does it choose 
multiple source and dest snaps? And can -r efficient enough to just dump the 
incremental changes?  What is the corresponding receive command for send -r?
(receive -r ? I guess? )

Thanks!
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Any commands to dump all zfs snapshots like NetApp snapmirror

2008-09-10 Thread Richard Elling
Haiou Fu (Kevin) wrote:
 The closest thing I can find is:
 http://bugs.opensolaris.org/view_bug.do?bug_id=6421958
   

Look at the man page section on zfs(1m) for -R and -I option explanations.
http://docs.sun.com/app/docs/doc/819-2240/zfs-1m?a=view

 But just like it says:   Incremental +
 recursive will be a bit tricker, because how do you specify the multiple
 source and dest snaps?  

 Let me clarify this more:

 Without send -r I need do something like this;

Given  a zfs file system myzfs in zpool  mypool,  it has N 
 snapshots:
 mypool/myzfs
 mypool/[EMAIL PROTECTED]
 mypool/[EMAIL PROTECTED]
 
 mypool/[EMAIL PROTECTED],

Do following things:

zfs snapshot mypool/[EMAIL PROTECTED] 
zfs send mypool/[EMAIL PROTECTED] | gzip -   
 /somewhere/myzfs-current.gz
zfs send -i mypool/[EMAIL PROTECTED] mypool/[EMAIL PROTECTED] | gzip - 
  /somewhere/myzfs-1.gz
zfs send -i mypool/[EMAIL PROTECTED] mypool/[EMAIL PROTECTED] | gzip - 
  /somewhere/myzfs-2.gz 
   ..
zfs send -i mypool/[EMAIL PROTECTED] mypool/[EMAIL PROTECTED] | gzip - 
  /somewhere/myzfs-N.gz

As you can see, above commands are kind of a stupid solution, and it 
 didn't reach maximum efficiency because those myzfs-1 ~ N.gz files 
 contain a lot of common stuffs in them!
   

No, in this example, each file will contain only the incremental
changes.

 I wonder how will send -r do in above situation?  How does it 
 choose multiple source and dest snaps? And can -r efficient enough to just 
 dump the incremental changes?  What is the corresponding receive command for 
 send -r?
 (receive -r ? I guess? )
   

No, receive handles what was sent.
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Nexenta/ZFS vs Heartbeat/DRBD

2008-09-10 Thread Maurice Volaski
On Wed, 2008-09-10 at 15:00 -0400, Maurice Volaski wrote:
  On Wed, 2008-09-10 at 14:36 -0400, Maurice Volaski wrote:
A disadvantage, however, is that Sun StorageTek Availability Suite
(AVS), the DRBD equivalent in OpenSolaris, is much less flexible than
DRBD. For example, AVS is intended to replicate in one direction,
from a primary to a secondary, whereas DRBD can switch on the fly.
See
http://www.opensolaris.org/jive/thread.jspa?threadID=68881tstart=30
for details on this.
  
  I would be curious to see production environments switching direction
  on the fly at that low level... Usually some top-level brain does that
  in context of HA fail-over and so on.

  By switching on the fly, I mean if the primary services are taken
  down and then brought up on the secondary, the direction of
  synchronization gets reversed. That's not possible with AVS because...

  well, AVS actually does reverse synchronization and does it very good.

  It's a one-time operation that re-reverses once it completes.

When primary is repaired you want to have it on-line and retain the
changes made on the secondary.

Not necessarily. Even when the primary is ready to go back into 
service, I may not want to revert to it for one reason or another. 
That means I am without a live mirror because AVS' realtime mirroring 
is only one direction, primary to secondary.
-- 

Maurice Volaski, [EMAIL PROTECTED]
Computing Support, Rose F. Kennedy Center
Albert Einstein College of Medicine of Yeshiva University
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SAS or SATA HBA with write cache

2008-09-10 Thread Will Murnane
On Wed, Sep 10, 2008 at 16:56, Matt Beebe [EMAIL PROTECTED] wrote:
 So how 'bout it hardware vendors?  when can we get a PCIe(x8) SAS/SATA 
 controller with an x4 internal port and an x4 external port and 512MB battery 
 backed cache for about $250??  :)  Heck, I'd take SATA only if I could get it 
 at a decent price point...
The Supermicro AOC-USAS-S4iR fits the bill, nearly.  It's got 4
internal and 4 external ports, on PCI express x8, with 256 MB of
cache, for about $320[1].  Adding battery backup is about another
$150[2].

 While we're at it, I'd also be happy with a PCIe(x4) card with 2 or 4 DIMM 
 slots and a battery back-up that exposes itself as a system drive (ala iRAM, 
 but PCIe not SATA 150) for slog and read cache... say $150 price point?  
 heehee...  there is an SSD based option out there, but it has 80GB available, 
 and starts at $2500 (overkill for my requirement)
Not terribly likely to see this soon, I'm afraid.  Memory interface
technology keeps changing once every couple years, and that makes such
a device less attractive to market.  Consider that DDR(-1) RAM is
almost three times as expensive as DDR-2 RAM ($184 versus $67 for a
stick of 2GB ECC... and SDRAM? Survey says $500 easy) and having the
latest generation seems to make sense.  But that means (as a
manufacturer) your device goes obsolete quicker, you sell fewer units,
and make less return on your investment.  So unless a common memory
bus is developed, such a device would be a bad investment.

Actually, what I'd rather have than battery backup is a large enough
flash device to store the contents of RAM, and a battery big enough to
get everything dumped to persistent storage.  That takes out the
question of running out of batteries prematurely, and leaves only the
question of batteries losing capacity over time and needing to replace
them.

Will

[1]: http://www.wiredzone.com/itemdesc.asp?ic=32005545
[2]: http://www.wiredzone.com/itemdesc.asp?ic=10017972
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Any commands to dump all zfs snapshots like NetApp snapmirror

2008-09-10 Thread Richard Elling
 correction below...

Richard Elling wrote:
 Haiou Fu (Kevin) wrote:
   
 The closest thing I can find is:
 http://bugs.opensolaris.org/view_bug.do?bug_id=6421958
   
 

 Look at the man page section on zfs(1m) for -R and -I option explanations.
 http://docs.sun.com/app/docs/doc/819-2240/zfs-1m?a=view

   
 But just like it says:   Incremental +
 recursive will be a bit tricker, because how do you specify the multiple
 source and dest snaps?  

 Let me clarify this more:

 Without send -r I need do something like this;

Given  a zfs file system myzfs in zpool  mypool,  it has N 
 snapshots:
 mypool/myzfs
 mypool/[EMAIL PROTECTED]
 mypool/[EMAIL PROTECTED]
 
 mypool/[EMAIL PROTECTED],

Do following things:

zfs snapshot mypool/[EMAIL PROTECTED] 
zfs send mypool/[EMAIL PROTECTED] | gzip -   
 /somewhere/myzfs-current.gz
zfs send -i mypool/[EMAIL PROTECTED] mypool/[EMAIL PROTECTED] | gzip 
 -  /somewhere/myzfs-1.gz
zfs send -i mypool/[EMAIL PROTECTED] mypool/[EMAIL PROTECTED] | gzip 
 -  /somewhere/myzfs-2.gz 
   ..
zfs send -i mypool/[EMAIL PROTECTED] mypool/[EMAIL PROTECTED] | gzip 
 -  /somewhere/myzfs-N.gz

As you can see, above commands are kind of a stupid solution, and it 
 didn't reach maximum efficiency because those myzfs-1 ~ N.gz files 
 contain a lot of common stuffs in them!
   
 

 No, in this example, each file will contain only the incremental
 changes.

   

I read this wrong, because I was looking at the end snapshot, not
the start. What you wrote won't work.  If you do something like:
zfs send -i [EMAIL PROTECTED] [EMAIL PROTECTED] ...
zfs send -i [EMAIL PROTECTED] [EMAIL PROTECTED] ...

then there would be no overlap.  But I think you will find that
zfs send -I is altogether more convenient.
 -- richard
 I wonder how will send -r do in above situation?  How does it 
 choose multiple source and dest snaps? And can -r efficient enough to just 
 dump the incremental changes?  What is the corresponding receive command for 
 send -r?
 (receive -r ? I guess? )
   
 

 No, receive handles what was sent.
  -- richard

 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
   

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Nexenta/ZFS vs Heartbeat/DRBD

2008-09-10 Thread Erast Benson
On Wed, 2008-09-10 at 18:37 -0400, Maurice Volaski wrote:
 On Wed, 2008-09-10 at 15:00 -0400, Maurice Volaski wrote:
   On Wed, 2008-09-10 at 14:36 -0400, Maurice Volaski wrote:
 A disadvantage, however, is that Sun StorageTek Availability Suite
 (AVS), the DRBD equivalent in OpenSolaris, is much less flexible than
 DRBD. For example, AVS is intended to replicate in one direction,
 from a primary to a secondary, whereas DRBD can switch on the fly.
 See
 http://www.opensolaris.org/jive/thread.jspa?threadID=68881tstart=30
 for details on this.
   
   I would be curious to see production environments switching direction
   on the fly at that low level... Usually some top-level brain does that
   in context of HA fail-over and so on.
 
   By switching on the fly, I mean if the primary services are taken
   down and then brought up on the secondary, the direction of
   synchronization gets reversed. That's not possible with AVS because...
 
   well, AVS actually does reverse synchronization and does it very good.
 
   It's a one-time operation that re-reverses once it completes.
 
 When primary is repaired you want to have it on-line and retain the
 changes made on the secondary.
 
 Not necessarily. Even when the primary is ready to go back into 
 service, I may not want to revert to it for one reason or another. 
 That means I am without a live mirror because AVS' realtime mirroring 
 is only one direction, primary to secondary.

This why I tried to state that this is not realistic environment for
non-shared storage HA deployments. DRBD trying to emulate shared-storage
behavior at a wrong level where in fact usage of FC/iSCSI-connected
storage needs to be considered.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Nexenta/ZFS vs Heartbeat/DRBD

2008-09-10 Thread Maurice Volaski
On Wed, 2008-09-10 at 18:37 -0400, Maurice Volaski wrote:
  On Wed, 2008-09-10 at 15:00 -0400, Maurice Volaski wrote:
On Wed, 2008-09-10 at 14:36 -0400, Maurice Volaski wrote:
  A disadvantage, however, is that Sun StorageTek Availability Suite
  (AVS), the DRBD equivalent in OpenSolaris, is much less 
flexible than
  DRBD. For example, AVS is intended to replicate in one direction,
  from a primary to a secondary, whereas DRBD can switch on the fly.
  See
  http://www.opensolaris.org/jive/thread.jspa?threadID=68881tstart=30
  for details on this.

I would be curious to see production environments switching direction
on the fly at that low level... Usually some top-level brain does that
in context of HA fail-over and so on.
  
By switching on the fly, I mean if the primary services are taken
down and then brought up on the secondary, the direction of
synchronization gets reversed. That's not possible with AVS because...
  
well, AVS actually does reverse synchronization and does it very good.
  
It's a one-time operation that re-reverses once it completes.
  
  When primary is repaired you want to have it on-line and retain the
  changes made on the secondary.

  Not necessarily. Even when the primary is ready to go back into
  service, I may not want to revert to it for one reason or another.
  That means I am without a live mirror because AVS' realtime mirroring
  is only one direction, primary to secondary.

This why I tried to state that this is not realistic environment for
non-shared storage HA deployments.

What's not realistic? DRBD's highly flexible ability to switch roles 
on the fly is a huge advantage over AVS. But this is not to say AVS 
is not realistic. It's just a limitation.

DRBD trying to emulate shared-storage
behavior at a wrong level where in fact usage of FC/iSCSI-connected
storage needs to be considered.

This makes no sense to me. We're talking about mirroring the storage 
of two physical and independent systems. How did the concept of 
shared storage get in here?
-- 

Maurice Volaski, [EMAIL PROTECTED]
Computing Support, Rose F. Kennedy Center
Albert Einstein College of Medicine of Yeshiva University
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Nexenta/ZFS vs Heartbeat/DRBD

2008-09-10 Thread Erast Benson
On Wed, 2008-09-10 at 19:10 -0400, Maurice Volaski wrote:
 On Wed, 2008-09-10 at 18:37 -0400, Maurice Volaski wrote:
   On Wed, 2008-09-10 at 15:00 -0400, Maurice Volaski wrote:
 On Wed, 2008-09-10 at 14:36 -0400, Maurice Volaski wrote:
   A disadvantage, however, is that Sun StorageTek Availability Suite
   (AVS), the DRBD equivalent in OpenSolaris, is much less 
 flexible than
   DRBD. For example, AVS is intended to replicate in one direction,
   from a primary to a secondary, whereas DRBD can switch on the fly.
   See
   
  http://www.opensolaris.org/jive/thread.jspa?threadID=68881tstart=30
   for details on this.
 
 I would be curious to see production environments switching 
  direction
 on the fly at that low level... Usually some top-level brain does 
  that
 in context of HA fail-over and so on.
   
 By switching on the fly, I mean if the primary services are taken
 down and then brought up on the secondary, the direction of
 synchronization gets reversed. That's not possible with AVS because...
   
 well, AVS actually does reverse synchronization and does it very 
  good.
   
 It's a one-time operation that re-reverses once it completes.
   
   When primary is repaired you want to have it on-line and retain the
   changes made on the secondary.
 
   Not necessarily. Even when the primary is ready to go back into
   service, I may not want to revert to it for one reason or another.
   That means I am without a live mirror because AVS' realtime mirroring
   is only one direction, primary to secondary.
 
 This why I tried to state that this is not realistic environment for
 non-shared storage HA deployments.
 
 What's not realistic? DRBD's highly flexible ability to switch roles 
 on the fly is a huge advantage over AVS. But this is not to say AVS 
 is not realistic. It's just a limitation.
 
 DRBD trying to emulate shared-storage
 behavior at a wrong level where in fact usage of FC/iSCSI-connected
 storage needs to be considered.
 
 This makes no sense to me. We're talking about mirroring the storage 
 of two physical and independent systems. How did the concept of 
 shared storage get in here?

This is really outside of ZFS discussion now... But your point taken. If
you want mirror-like behavior of your 2-node cluster, you'll get some
benefits of DRBD but my point is that such solution trying to solve two
problems at the same time: replication and availability, which is in my
opinion plain wrong.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Nexenta/ZFS vs Heartbeat/DRBD

2008-09-10 Thread Maurice Volaski
On Wed, 2008-09-10 at 19:10 -0400, Maurice Volaski wrote:
  On Wed, 2008-09-10 at 18:37 -0400, Maurice Volaski wrote:
On Wed, 2008-09-10 at 15:00 -0400, Maurice Volaski wrote:
  On Wed, 2008-09-10 at 14:36 -0400, Maurice Volaski wrote:
A disadvantage, however, is that Sun StorageTek 
Availability Suite
(AVS), the DRBD equivalent in OpenSolaris, is much less
  flexible than
DRBD. For example, AVS is intended to replicate in one 
direction,
from a primary to a secondary, whereas DRBD can switch 
on the fly.
See
   
http://www.opensolaris.org/jive/thread.jspa?threadID=68881tstart=30
for details on this.
  
  I would be curious to see production environments 
switching direction
  on the fly at that low level... Usually some top-level 
brain does that
  in context of HA fail-over and so on.

  By switching on the fly, I mean if the primary services are taken
  down and then brought up on the secondary, the direction of
  synchronization gets reversed. That's not possible with 
AVS because...

  well, AVS actually does reverse synchronization and does 
it very good.

  It's a one-time operation that re-reverses once it completes.

When primary is repaired you want to have it on-line and retain the
changes made on the secondary.
  
Not necessarily. Even when the primary is ready to go back into
service, I may not want to revert to it for one reason or another.
That means I am without a live mirror because AVS' realtime mirroring
is only one direction, primary to secondary.
  
  This why I tried to state that this is not realistic environment for
  non-shared storage HA deployments.

  What's not realistic? DRBD's highly flexible ability to switch roles
  on the fly is a huge advantage over AVS. But this is not to say AVS
  is not realistic. It's just a limitation.

  DRBD trying to emulate shared-storage
  behavior at a wrong level where in fact usage of FC/iSCSI-connected
  storage needs to be considered.

  This makes no sense to me. We're talking about mirroring the storage
  of two physical and independent systems. How did the concept of
  shared storage get in here?

This is really outside of ZFS discussion now... But your point taken. If
you want mirror-like behavior of your 2-node cluster, you'll get some
benefits of DRBD but my point is that such solution trying to solve two
problems at the same time: replication and availability, which is in my
opinion plain wrong.

Uh, no, DRBD addresses only replication. Linux-HA (aka Heartbeat) 
address availability. They can be an integrated solution and are to 
some degree intended that way, so I have no idea where your opinion 
is coming from.

For replication, OpenSolaris is largely limited to using AVS, whose 
functionality is limited, at least relative to DRBD. But there seems 
to be a few options to implement availability, which should include 
Linux-HA itself as it should run on OpenSolaris!

But relevant to the poster's initial question, ZFS is so far and away 
more advanced than any Linux filesystem can even dream about that it 
handily nullifies any disadvantage in having to run AVS.
-- 

Maurice Volaski, [EMAIL PROTECTED]
Computing Support, Rose F. Kennedy Center
Albert Einstein College of Medicine of Yeshiva University
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZPOOL Import Problem

2008-09-10 Thread Leopold, Corey
I ran into an odd problem importing a zpool while testing avs.  I was
trying to simulate a drive failure, break SNDR replication, and then
import the pool on the secondary.  To simulate the drive failure is just
offlined one of the disks in the RAIDZ set.



--
pr1# zpool status
  pool: rpool
 state: ONLINE
 scrub: none requested
config:

NAMESTATE READ WRITE CKSUM
rpool   ONLINE   0 0 0
  c3t0d0s0  ONLINE   0 0 0

errors: No known data errors

  pool: tank
 state: ONLINE
 scrub: none requested
config:

NAME  STATE READ WRITE CKSUM
tank  ONLINE   0 0 0
  raidz1  ONLINE   0 0 0
c5t0d0s0  ONLINE   0 0 0
c5t1d0s0  ONLINE   0 0 0
c5t2d0s0  ONLINE   0 0 0
c5t3d0s0  ONLINE   0 0 0

errors: No known data errors
pr1# zpool offline
missing pool name
usage:
offline [-t] pool device ...
pr1# zpool offline tank c5t0d0s0
pr1# zpool status
  pool: rpool
 state: ONLINE
 scrub: none requested
config:

NAMESTATE READ WRITE CKSUM
rpool   ONLINE   0 0 0
  c3t0d0s0  ONLINE   0 0 0

errors: No known data errors

  pool: tank
 state: DEGRADED
status: One or more devices has been taken offline by the administrator.
Sufficient replicas exist for the pool to continue functioning
in a
degraded state.
action: Online the device using 'zpool online' or replace the device
with
'zpool replace'.
 scrub: none requested
config:

NAME  STATE READ WRITE CKSUM
tank  DEGRADED 0 0 0
  raidz1  DEGRADED 0 0 0
c5t0d0s0  OFFLINE  0 0 0
c5t1d0s0  ONLINE   0 0 0
c5t2d0s0  ONLINE   0 0 0
c5t3d0s0  ONLINE   0 0 0

errors: No known data errors
pr1# zpool export tank
---

I then disabled SNDR replication.

pr1# sndradm -g zfs-tank -d
Disable Remote Mirror? (Y/N) [N]: Y
-

Then I try to import the ZPOOL on the secondary.

--
pr2# zpool import
  pool: tank
id: 9795707198744908806
 state: DEGRADED
status: One or more devices are offlined.
action: The pool can be imported despite missing or damaged devices.
The
fault tolerance of the pool may be compromised if imported.
config:

tank  DEGRADED
  raidz1  DEGRADED
c5t0d0s0  OFFLINE
c5t1d0s0  ONLINE
c5t2d0s0  ONLINE
c5t3d0s0  ONLINE
pr2# zpool import tank
cannot import 'tank': one or more devices is currently unavailable
pr2# zpool import -f tank
cannot import 'tank': one or more devices is currently unavailable
pr2#
---

Importing on the primary gives the same error.

Anyone have any ideas?

Thanks

Corey
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Nexenta/ZFS vs Heartbeat/DRBD

2008-09-10 Thread Erast Benson
On Wed, 2008-09-10 at 19:42 -0400, Maurice Volaski wrote:
 On Wed, 2008-09-10 at 19:10 -0400, Maurice Volaski wrote:
   On Wed, 2008-09-10 at 18:37 -0400, Maurice Volaski wrote:
 On Wed, 2008-09-10 at 15:00 -0400, Maurice Volaski wrote:
   On Wed, 2008-09-10 at 14:36 -0400, Maurice Volaski wrote:
 A disadvantage, however, is that Sun StorageTek 
 Availability Suite
 (AVS), the DRBD equivalent in OpenSolaris, is much less
   flexible than
 DRBD. For example, AVS is intended to replicate in one 
 direction,
 from a primary to a secondary, whereas DRBD can switch 
 on the fly.
 See

 http://www.opensolaris.org/jive/thread.jspa?threadID=68881tstart=30
 for details on this.
   
   I would be curious to see production environments 
 switching direction
   on the fly at that low level... Usually some top-level 
 brain does that
   in context of HA fail-over and so on.
 
   By switching on the fly, I mean if the primary services are taken
   down and then brought up on the secondary, the direction of
   synchronization gets reversed. That's not possible with 
 AVS because...
 
   well, AVS actually does reverse synchronization and does 
 it very good.
 
   It's a one-time operation that re-reverses once it completes.
 
 When primary is repaired you want to have it on-line and retain the
 changes made on the secondary.
   
 Not necessarily. Even when the primary is ready to go back into
 service, I may not want to revert to it for one reason or another.
 That means I am without a live mirror because AVS' realtime mirroring
 is only one direction, primary to secondary.
   
   This why I tried to state that this is not realistic environment for
   non-shared storage HA deployments.
 
   What's not realistic? DRBD's highly flexible ability to switch roles
   on the fly is a huge advantage over AVS. But this is not to say AVS
   is not realistic. It's just a limitation.
 
   DRBD trying to emulate shared-storage
   behavior at a wrong level where in fact usage of FC/iSCSI-connected
   storage needs to be considered.
 
   This makes no sense to me. We're talking about mirroring the storage
   of two physical and independent systems. How did the concept of
   shared storage get in here?
 
 This is really outside of ZFS discussion now... But your point taken. If
 you want mirror-like behavior of your 2-node cluster, you'll get some
 benefits of DRBD but my point is that such solution trying to solve two
 problems at the same time: replication and availability, which is in my
 opinion plain wrong.
 
 Uh, no, DRBD addresses only replication. Linux-HA (aka Heartbeat) 
 address availability. They can be an integrated solution and are to 
 some degree intended that way, so I have no idea where your opinion 
 is coming from.

Because in my opinion DRBD takes some responsibility of management layer
if you will. Classic, predominant replication in HA clusters schema is
primary-backup (or master-slave) and backup by definition is not
necessary primary-identical system. Having said that, it is noble for
DRBD to implement role switching and not a bad idea for many small
deployments.

 For replication, OpenSolaris is largely limited to using AVS, whose 
 functionality is limited, at least relative to DRBD. But there seems 
 to be a few options to implement availability, which should include 
 Linux-HA itself as it should run on OpenSolaris!

Everything is implementable and I believe AVS designers thought about
dynamic switching of roles, but they end up with what we have today,
they likely discarded this idea.

AVS not switching roles and forces IT admins to use it as primary-backup
data protection service only.

 But relevant to the poster's initial question, ZFS is so far and away 
 more advanced than any Linux filesystem can even dream about that it 
 handily nullifies any disadvantage in having to run AVS.

Right.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Apache module for ZFS ACL based authorization

2008-09-10 Thread Paul B. Henson

We are currently working on a Solaris/ZFS based central file system to
replace the DCE/DFS-based implementation we have had in place for over 10
years. One of the features of our previous implementation was that access
to files regardless of method (CIFS, AFP, HTTP, FTP, etc) was completely
controlled by the DFS ACL. Our ZFS implementation will be available by
NFSv4 and CIFS, both of which respect the ACL. To provide ZFS ACL-based
authorization to files via HTTP, I put together a small Apache module. The
module allows for files to be either delivered without authentication
required (if they are world readable) or requires authentication and
restricts file delivery to users with access based on the ACL.

If anyone is interested in taking a look at it, it is available from:

http://www.csupomona.edu/~henson/www/projects/mod_authz_fsacl/dist/mod_authz_fsacl-0.10.tar.gz


I'd appreciate any feedback, particularly about things that don't work
right :).


-- 
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  [EMAIL PROTECTED]
California State Polytechnic University  |  Pomona CA 91768
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Nexenta/ZFS vs Heartbeat/DRBD

2008-09-10 Thread Richard Elling
Erast Benson wrote:
 Uh, no, DRBD addresses only replication. Linux-HA (aka Heartbeat) 
 address availability. They can be an integrated solution and are to 
 some degree intended that way, so I have no idea where your opinion 
 is coming from.
 

 Because in my opinion DRBD takes some responsibility of management layer
 if you will. Classic, predominant replication in HA clusters schema is
 primary-backup (or master-slave) and backup by definition is not
 necessary primary-identical system. Having said that, it is noble for
 DRBD to implement role switching and not a bad idea for many small
 deployments.
   

The problem with fully automated systems for remote replication is
that they are fully automated.  This opens you up to a set of failure modes
that you may want to avoid, such as replication of data that you don't
want to replicate.  This is why most replication is used to support disaster
recovery cases and the procedures wrapped around disaster recovery
also consider the case where the primary data has been damaged -- and
you really don't want that damage to spread.

It so happens that snapshots are another method which can be used to
limit the spread of damage, so there might be an opportunity here.

By analogy, in the Oracle world, RAC does not replace DataGuard.

 For replication, OpenSolaris is largely limited to using AVS, whose 
 functionality is limited, at least relative to DRBD. But there seems 
 to be a few options to implement availability, which should include 
 Linux-HA itself as it should run on OpenSolaris!
 

I disagree, there are many ways to remotely replicate Solaris systems.
TrueCopy and SRDF are perhaps the most popular, but almost all
storage arrays have some sort of system.  In truth, there is little market
demand for fully automated solutions at the OS level because of the
reasons mentioned above.


 Everything is implementable and I believe AVS designers thought about
 dynamic switching of roles, but they end up with what we have today,
 they likely discarded this idea.
   

It is open source...
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Nexenta/ZFS vs Heartbeat/DRBD

2008-09-10 Thread Carson Gaspar
Let me drag this thread kicking and screaming back to ZFS...

Use case:

- We need an NFS server that can be replicated to another building to 
handle both scheduled powerdowns and unplanned outages. For scheduled 
powerdowns we'd want to fail over a week in advance, and fail back some 
time later.

- We will use a virtual IP for seamless failover, and require that we 
not get stale NFS filehandles during a failover event, as world reboots 
are messy and expensive.

- We do _not_ require synchronous replication. Async is fine.

Today, there is _no_ rasonably priced solution that allows us to use ZFS 
for this. Yes, we have SRDF, and use it where required, but it's 
hideously expensive, and has its own risks (see below).

Today our solution for the above is NetApp filers. We're not thrilled 
with everything about them, but they mostly work.

One of the projects I've done is set up automated zfs replication for 
pure DR. But sadly this is of limited use, as Solaris / ZFS's lack of 
ability to maintain NFS file handles across systems is a deal killer for 
most of our users.

AVS does not appear to support our weeks-long failover model, and 
exposes us to too much risk of simultaneous data loss.

SRDF and Veritas replication have the same data loss risk. SRDF can 
easily support weeks-long a personality swap, I don't know enough about 
Veritas' product.

-- 
Carson
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] x4500 vs AVS ?

2008-09-10 Thread Ralf Ramge
Matt Beebe wrote:

 But what happens to the secondary server?  Specifically to its bit-for-bit 
 copy of Drive #2... presumably it is still good, but ZFS will offline that 
 disk on the primary server, replicate the metadata, and when/if I promote 
 the seconday server, it will also be running in a degraded state (ie: 3 out 
 of 4 drives).  correct?



Correct.

 In this scenario, my replication hasn't really bought me any increased 
 availablity... or am I missing something?  



No. You have an increase of availability when the entire primary node 
goes down, but you're not particularly safer when it comes to decreased 
zpools.


 Also, if I do chose to fail over to the secondary, can I just to a scrub the 
 broken drive (which isn't really broken, but the zpool would be 
 inconsistent at some level with the other online drives) and get back to 
 full speed quickly? or will I always have to wait until one of the servers 
 resilvers itself (from scratch?), and re-replicates itself??


I have not tested this scenario, so I can't say anything about this.

-- 

Ralf Ramge
Senior Solaris Administrator, SCNA, SCSA

Tel. +49-721-91374-3963
[EMAIL PROTECTED] - http://web.de/

11 Internet AG
Brauerstraße 48
76135 Karlsruhe

Amtsgericht Montabaur HRB 6484

Vorstand: Henning Ahlert, Ralph Dommermuth, Matthias Ehrlich, Thomas 
Gottschlich, Matthias Greve, Robert Hoffmann, Markus Huhn, Oliver Mauss, 
Achim Weiss
Aufsichtsratsvorsitzender: Michael Scheeren
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Nexenta/ZFS vs Heartbeat/DRBD

2008-09-10 Thread Maurice Volaski
The problem with fully automated systems for remote replication is
that they are fully automated.  This opens you up to a set of failure modes
that you may want to avoid, such as replication of data that you don't
want to replicate.  This is why most replication is used to support disaster
recovery cases and the procedures wrapped around disaster recovery
also consider the case where the primary data has been damaged -- and
you really don't want that damage to spread.

I seem to be misrepresenting how automatic DRBD is because it offers 
a rich set of policies and strategies to deal with faults. For one 
thing, it (along with heartbeat) will bend over backwards to avoid 
disasters such as split-brain. It's more likely that one would get 
split-brain by manually executing the wrong command than by DRBD's 
(or heartbeat's) doing.

I disagree, there are many ways to remotely replicate Solaris systems.
TrueCopy and SRDF are perhaps the most popular, but almost all

I was referring to cost free options.
-- 

Maurice Volaski, [EMAIL PROTECTED]
Computing Support, Rose F. Kennedy Center
Albert Einstein College of Medicine of Yeshiva University
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss