[zfs-discuss] Re: How to destroy a pool wich you can't import because it is in faulted st

2006-09-07 Thread Lieven De Geyndt
So I can manage the file system mounts/automounts using the legacy option  , 
but I can't manage the auto-import of the pools . Or I should delete the 
zpool.cache file during boot .
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Performance problem of ZFS ( Sol 10U2 )

2006-09-07 Thread Ivan Debnár
Hi,
 
I deployed ZFS on our mailserver recently, hoping for eternal peace after 
running on UFS and moving files witch each TB added.
 
It is mailserver - it's mdirs are on ZFS pool:
  capacity operationsbandwidth
poolused  avail   read  write   read  write
-  -  -  -  -  -  - 
mailstore  3.54T  2.08T280295  7.10M  5.24M
  mirror590G   106G 34 31   676K   786K
c6t3d0 -  - 14 16   960K   773K
c8t22260001552EFE2Cd0  -  - 16 18  1.06M   786K
  mirror613G  82.9G 51 37  1.44M   838K
c6t3d1 -  - 20 19  1.57M   824K
c5t1d1 -  - 20 24  1.40M   838K
c8t227C0001559A761Bd0  -  -  5101   403K  4.63M
  mirror618G  78.3G133 60  6.23M   361K
c6t3d2 -  - 40 27  3.21M   903K
c4t2d0 -  - 23 81  1.91M  2.98M
c8t221200015599F2CFd0  -  -  6108   442K  4.71M
  mirror613G  83.2G110 51  3.66M   337K
c6t3d3 -  - 36 25  2.72M   906K
c5t2d1 -  - 29 65  1.80M  2.92M
  mirror415G  29.0G 30 28   460K   278K
c6t3d4 -  - 11 19   804K   268K
c4t1d2 -  - 15 22   987K   278K
  mirror255G   441G 26 49   536K  1.02M
c8t22110001552F3C46d0  -  - 12 27   835K  1.02M
c8t224B0001559BB471d0  -  - 12 29   835K  1.02M
  mirror257G   439G 32 52   571K  1.04M
c8t22480001552D7AF8d0  -  - 14 28  1003K  1.04M
c4t1d0 -  - 14 32  1002K  1.04M
  mirror251G   445G 28 53   543K  1.02M
c8t227F0001552CB892d0  -  - 13 28   897K  1.02M
c8t22250001559830A5d0  -  - 13 30   897K  1.02M
  mirror   17.4G   427G 22 38   339K   393K
c8t22FA00015529F784d0  -  -  9 19   648K   393K
c5t2d2 -  -  9 23   647K   393K


It is 3x dual-iSCSI + 2x dual SCSI DAS arrays (RAID0, 13x250).

I have problem however:
The 2 SCSI arrays were able to handle the mail-traffic fine with UFS on them.
The new config with 3 additional arrays seem to have problem using ZFS.
The writes are waiting for 10-15 seconds to get to disk - so queue fills ver 
quickly, reads are quite ok.
I assume this is the problem with ZFS prefering reads to writes.

I also see in 'zpool iostat -v 1' that writes are issued to disk only once in 
10 secs, and then its 2000rq one sec.
Reads are sustained at cca 800rq/s.

Is there a way to tune this read/write ratio? Is this know problem?

I tried to change vq_max_pending as suggested by Eric in 
http://blogs.sun.com/erickustarz/entry/vq_max_pending
But no change in this write behaviour.

Iostat shows cca 20-30ms asvc_t, 0%w, and cca 30% busy on all drives so these 
are not saturated it seems. (before with UTF they had 90%busy, 1%wait).

System is Sol 10 U2, sun x4200, 4GB RAM.

Please if you could give me some hint to really make this working as the way 
back to UFS is almost impossible on live system.



-- 
Ivan Debnár

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: How to destroy a pool wich you can't import because it is in faulted st

2006-09-07 Thread James C. McPherson

Lieven De Geyndt wrote:

So I can manage the file system mounts/automounts using the legacy option
, but I can't manage the auto-import of the pools . Or I should delete
the zpool.cache file during boot .



Doesn't this come back to the problem which is self-induced, namely
that they are trying poor man's cluster ??

If you want cluster functionality then pay for a proper solution.
If you can't afford a proper solution then you will *always* get
hurt when you come up against a problem of your own making.

I saw this scenario *many* times while working in Sun's CPRE and
PTS organisations.

Save yourself the hassle and do things right from the start.


James C. McPherson
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: How to destroy a pool wich you can't import because it is in faulted st

2006-09-07 Thread Frank Cusack

On September 7, 2006 6:55:48 PM +1000 James C. McPherson [EMAIL PROTECTED] 
wrote:

Doesn't this come back to the problem which is self-induced, namely
that they are trying poor man's cluster ??

If you want cluster functionality then pay for a proper solution.
If you can't afford a proper solution then you will *always* get
hurt when you come up against a problem of your own making.

I saw this scenario *many* times while working in Sun's CPRE and
PTS organisations.

Save yourself the hassle and do things right from the start.


AIUI, there is no zfs cluster option today.  SC3.2 (with HA-ZFS) is only
in beta.  So, it can't be done right from the start with zfs.

[I'm not disagreeing with you, though.]

-frank
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: Re: Re: ZFS forces system to paging to the point it is

2006-09-07 Thread Jürgen Keil
 We are trying to obtain a mutex that is currently held
 by another thread trying to get memory.

Hmm, reminds me a bit on the zvol swap hang I got
some time ago:

http://www.opensolaris.org/jive/thread.jspa?threadID=11956tstart=150

I guess if the other thead is stuck trying to get memory, then
it is allocating the memory with KM_SLEEP, while holding
a mutex?
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: How to destroy a pool wich you can't import because it is in faulted st

2006-09-07 Thread James C. McPherson

Lieven De Geyndt wrote:

I know this is not supported . But we try to build a safe configuration,
till zfs is supported in Sun cluster. The customer did order SunCluster,
but needs a workarround till the release date . And I think it must be
possible to setup .


So build them a configuration which works and is supported today, and
design it so the migration plan which you also provide them makes it
reasonably pain-free to move to HA-ZFS when sc3.2 is released.

James
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Re: Re: ZFS forces system to paging to the point it is

2006-09-07 Thread Mark Maybee

Jürgen Keil wrote:

We are trying to obtain a mutex that is currently held
by another thread trying to get memory.



Hmm, reminds me a bit on the zvol swap hang I got
some time ago:

http://www.opensolaris.org/jive/thread.jspa?threadID=11956tstart=150

I guess if the other thead is stuck trying to get memory, then
it is allocating the memory with KM_SLEEP, while holding
a mutex?
 

Yup, this is essentially another instance of this problem.

-Mark

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re[2]: [zfs-discuss] Re: Re: ZFS forces system to paging to the point it is

2006-09-07 Thread Robert Milkowski
Hello Mark,

Thursday, September 7, 2006, 12:32:32 AM, you wrote:

MM Robert Milkowski wrote:
 
 
 On Wed, 6 Sep 2006, Mark Maybee wrote:
 
 Robert Milkowski wrote:

 ::dnlc!wc


  1048545 3145811 76522461

 Well, that explains half your problem... and maybe all of it:
 
 
 
 After I reduced vdev prefetch from 64K to 8K for last few hours system 
 is working properly without workaround and free memory stays at about 1GB.
 
 Reducing vdev prefetch to 8K alse reduced read thruoutput 10x.
 
 I belive this is somehow related - maybe vdev cache was so aggressive (I 
 got 40-100MB/s of reads) and consuming memory so fast that thread which 
 is supposed to regain some memory couldn't keep up?

MM I suppose, although the data volume doesn't seem that high... maybe you
MM are just operating at the hairy edge here.  Anyway, I have filed a bug
MM to track this issue:

MM 6467963 do_dnlc_reduce_cache() can be blocked by ZFS_OBJ_HOLD_ENTER()

Well, it was working so far and then in less than 5 minutes free
memory went to 0 and system was unresponsive I couldn't log in.

So I guess exporting/importing pool and in addition lowering vdev
prefetch to 8K is needed here. Hope it will stay longer that way.

:(

-- 
Best regards,
 Robertmailto:[EMAIL PROTECTED]
   http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Performance problem of ZFS ( Sol 10U2 )

2006-09-07 Thread Mark Maybee

Ivan,

What mail clients use your mail server?  You may be seeing the
effects of:

6440499 zil should avoid txg_wait_synced() and use dmu_sync() to issue 
parallel IOs when fsyncing


This bug was fixed in nevada build 43, and I don't think made it into
s10 update 2.  It will, of course, be in update 3 and be available in
a patch at some point.

Ivan Debnár wrote:

Hi,
 
I deployed ZFS on our mailserver recently, hoping for eternal peace after running on UFS and moving files witch each TB added.
 
It is mailserver - it's mdirs are on ZFS pool:

  capacity operationsbandwidth
poolused  avail   read  write   read  write
-  -  -  -  -  -  - 
mailstore  3.54T  2.08T280295  7.10M  5.24M

  mirror590G   106G 34 31   676K   786K
c6t3d0 -  - 14 16   960K   773K
c8t22260001552EFE2Cd0  -  - 16 18  1.06M   786K
  mirror613G  82.9G 51 37  1.44M   838K
c6t3d1 -  - 20 19  1.57M   824K
c5t1d1 -  - 20 24  1.40M   838K
c8t227C0001559A761Bd0  -  -  5101   403K  4.63M
  mirror618G  78.3G133 60  6.23M   361K
c6t3d2 -  - 40 27  3.21M   903K
c4t2d0 -  - 23 81  1.91M  2.98M
c8t221200015599F2CFd0  -  -  6108   442K  4.71M
  mirror613G  83.2G110 51  3.66M   337K
c6t3d3 -  - 36 25  2.72M   906K
c5t2d1 -  - 29 65  1.80M  2.92M
  mirror415G  29.0G 30 28   460K   278K
c6t3d4 -  - 11 19   804K   268K
c4t1d2 -  - 15 22   987K   278K
  mirror255G   441G 26 49   536K  1.02M
c8t22110001552F3C46d0  -  - 12 27   835K  1.02M
c8t224B0001559BB471d0  -  - 12 29   835K  1.02M
  mirror257G   439G 32 52   571K  1.04M
c8t22480001552D7AF8d0  -  - 14 28  1003K  1.04M
c4t1d0 -  - 14 32  1002K  1.04M
  mirror251G   445G 28 53   543K  1.02M
c8t227F0001552CB892d0  -  - 13 28   897K  1.02M
c8t22250001559830A5d0  -  - 13 30   897K  1.02M
  mirror   17.4G   427G 22 38   339K   393K
c8t22FA00015529F784d0  -  -  9 19   648K   393K
c5t2d2 -  -  9 23   647K   393K


It is 3x dual-iSCSI + 2x dual SCSI DAS arrays (RAID0, 13x250).

I have problem however:
The 2 SCSI arrays were able to handle the mail-traffic fine with UFS on them.
The new config with 3 additional arrays seem to have problem using ZFS.
The writes are waiting for 10-15 seconds to get to disk - so queue fills ver 
quickly, reads are quite ok.
I assume this is the problem with ZFS prefering reads to writes.

I also see in 'zpool iostat -v 1' that writes are issued to disk only once in 
10 secs, and then its 2000rq one sec.
Reads are sustained at cca 800rq/s.

Is there a way to tune this read/write ratio? Is this know problem?

I tried to change vq_max_pending as suggested by Eric in 
http://blogs.sun.com/erickustarz/entry/vq_max_pending
But no change in this write behaviour.

Iostat shows cca 20-30ms asvc_t, 0%w, and cca 30% busy on all drives so these 
are not saturated it seems. (before with UTF they had 90%busy, 1%wait).

System is Sol 10 U2, sun x4200, 4GB RAM.

Please if you could give me some hint to really make this working as the way 
back to UFS is almost impossible on live system.





___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: How to destroy a pool wich you can't import

2006-09-07 Thread Eric Schrock
On Thu, Sep 07, 2006 at 11:32:18AM -0700, Darren Dunham wrote:
 
 I know that VxVM stores the autoimport information on the disk
 itself.  It sounds like ZFS doesn't and it's only in the cache (is this
 correct?) 

I'm not sure what 'autoimport' is, but ZFS always stores enough
information on the disks to open the pool, provided all the devices (or
at least one device from each toplevel vdev) can be scanned.  The cache
simply provides a list of known pools and their approximate
configuration, so that we don't have to scan every device (and every
file) on boot to know where pools are located.

It's import to distinguish between 'opening' a pool and 'importing' a
pool.  Opening a pool involves reading the data off disk and
constructing the in-core representation of the pool.  It doesn't matter
if this data comes from the cache, from on-disk, or out of thin air.

Importing a pool is an intentional action which reconstructs the pool
configuration from on-disk data, which it then uses to open the pool.

 Lets imagine that I lose a motherboard on a SAN host and it crashes.  To
 get things going I import the pool on another host and run the apps
 while I repair the first one.  Hardware guy comes in and swaps the
 motherboard, then lets the machine boot.  While it boots, will it try to
 re-import the pool it had before it crashed?  Will it succeed?

Yes, it will open every pool that it has in the cache.  Fundamentally,
this is operator error.  We have talked about storing the hostid of the
last machine to open the pool to detect this case, but then we've also
talked about ways of sharing snapshots from the same pool read-only to
multiple hosts.  So it's not clear that hostid != self is a valid
check when opening a pool, and it would also make failback somewhat more
complicated.

- Eric

--
Eric Schrock, Solaris Kernel Development   http://blogs.sun.com/eschrock
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Recommendation ZFS on StorEdge 3320

2006-09-07 Thread James Dickens

On 9/7/06, Torrey McMahon [EMAIL PROTECTED] wrote:

Nicolas Dorfsman wrote:
 The hard part is getting a set of simple
 requirements. As you go into
 more complex data center environments you get hit
 with older Solaris
 revs, other OSs, SOX compliance issues, etc. etc.
 etc. The world where
 most of us seem to be playing with ZFS is on the
 lower end of the
 complexity scale. Sure, throw your desktop some fast
 SATA drives. No
 problem. Oh wait, you've got ten Oracle DBs on three
 E25Ks that need to
 be backed up every other blue moon ...


   Another fact is CPU use.

   Does anybody really know what will be effects of intensive CPU workload on 
ZFS perfs, and effects of ZFS RAID CPU compute on intensive CPU workload ?


with ZFS I have found that memory is a much greater limitation, even
my dual 300mhz u2 has no problem filling 2x 20MB/s scsi channels, even
with compression enabled,  using raidz and 10k rpm 9GB drives, thanks
to its 2GB of ram it does great at everything I throw at it. On the
other hand my blade 1500 ram  512MB with 3x 18GB 10k rpm drives using
2x 40MB/s scsi channels , os is on a 80GB ide drive, has problems
interactively because as soon as you push zfs hard it hogs all the ram
and may take 5 or 10 seconds to get response on xterms while the
machine clears out ram and loads its applications/data back into ram.

James Dickens
uadmin.blogspot.com



   I heard a story about a customer complaining about his higend server 
performances; when a guy came on site...and discover beautiful SVM RAID-5 volumes, 
the solution was almost found.


Raid calculations take CPU time but I haven't seen numbers on ZFS usage.
SVM is known for using a fair bit of CPU when performing R5 calculations
and I'm sure other OS have the same issue. EMC used to go around saying
that offloading raid calculations to their storage arrays would increase
application performance because you would free up CPU time to do other
stuff. The EMC effect is how they used to market it.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Recommendation ZFS on StorEdge 3320 - offtopic

2006-09-07 Thread Richard Elling - PAE

[EMAIL PROTECTED] wrote:

This is the case where I don't understand Sun's politics at all: Sun
doesn't offer really cheap JBOD which can be bought just for ZFS. And
don't even tell me about 3310/3320 JBODs - they are horrible expansive :-(


Yep, multipacks are EOL for some time now -- killed by big disks.  Back when
disks were small, people would buy multipacks to attach to their workstations.
There was a time when none of the workstations had internal disks, but I'd
be dating myself :-)

For datacenter-class storage, multipacks were not appropriate.  They only
had single-ended SCSI interfaces which have a limited cable budget which
limited their use in racks.  Also, they weren't designed to be used in a
rack environment, so they weren't mechanically appropriate either.  I suppose
you can still find them on eBay.


If Sun wants ZFS to be absorbed quicker it should have such _really_ cheap
JBOD.


I don't quite see this in my crystal ball.  Rather, I see all of the SAS/SATA
chipset vendors putting RAID in the chipset.  Basically, you can't get a
dumb interface anymore, except for fibre channel :-).  In other words, if
we were to design a system in a chassis with perhaps 8 disks, then we would
also use a controller which does RAID.  So, we're right back to square 1.
 -- richard
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


RE: [zfs-discuss] Performance problem of ZFS ( Sol 10U2 )

2006-09-07 Thread Ivan Debnár
Hi, thanks for respose.

As this is close-source mailserver  (CommuniGate pro), I can't say 100% answer, 
but the writes that I see that take too much time (15-30secs) are writes from 
temp queue to final storage, and from my understanding, they are sync so the 
queue manager can guarantee they are on solid storage.

Apart from that however the interactive access to mailstore modifies filenames 
( read/opened/deleted flags are part of filename for each file) a lot content 
of files is not changed any more. Also moves abetween directories and deletions 
are only dir operations ( don't know whether sync or not - you know OS 
internals better).

From description of the error you mentioned, and also
6440499 zil should avoid txg_wait_synced() and use dmu_sync() to issue parallel 
IOs when fsyncing

I think that this may be also my case.

So my question is: I run Sol10U2, is there a way to quicky test new ZFS, 
without reinstalling whole system?
Please say there is

Ivan


-Original Message-
From: eric kustarz [mailto:[EMAIL PROTECTED] 
Sent: Thursday, September 07, 2006 8:39 PM
To: Ivan Debnár
Cc: zfs-discuss@opensolaris.org
Subject: Re: [zfs-discuss] Performance problem of ZFS ( Sol 10U2 )

Ivan Debnár wrote:

Hi,
 
I deployed ZFS on our mailserver recently, hoping for eternal peace after 
running on UFS and moving files witch each TB added.
 
It is mailserver - it's mdirs are on ZFS pool:
  capacity operationsbandwidth
poolused  avail   read  write   read  write
-  -  -  -  -  -  - 
mailstore  3.54T  2.08T280295  7.10M  5.24M
  mirror590G   106G 34 31   676K   786K
c6t3d0 -  - 14 16   960K   773K
c8t22260001552EFE2Cd0  -  - 16 18  1.06M   786K
  mirror613G  82.9G 51 37  1.44M   838K
c6t3d1 -  - 20 19  1.57M   824K
c5t1d1 -  - 20 24  1.40M   838K
c8t227C0001559A761Bd0  -  -  5101   403K  4.63M
  mirror618G  78.3G133 60  6.23M   361K
c6t3d2 -  - 40 27  3.21M   903K
c4t2d0 -  - 23 81  1.91M  2.98M
c8t221200015599F2CFd0  -  -  6108   442K  4.71M
  mirror613G  83.2G110 51  3.66M   337K
c6t3d3 -  - 36 25  2.72M   906K
c5t2d1 -  - 29 65  1.80M  2.92M
  mirror415G  29.0G 30 28   460K   278K
c6t3d4 -  - 11 19   804K   268K
c4t1d2 -  - 15 22   987K   278K
  mirror255G   441G 26 49   536K  1.02M
c8t22110001552F3C46d0  -  - 12 27   835K  1.02M
c8t224B0001559BB471d0  -  - 12 29   835K  1.02M
  mirror257G   439G 32 52   571K  1.04M
c8t22480001552D7AF8d0  -  - 14 28  1003K  1.04M
c4t1d0 -  - 14 32  1002K  1.04M
  mirror251G   445G 28 53   543K  1.02M
c8t227F0001552CB892d0  -  - 13 28   897K  1.02M
c8t22250001559830A5d0  -  - 13 30   897K  1.02M
  mirror   17.4G   427G 22 38   339K   393K
c8t22FA00015529F784d0  -  -  9 19   648K   393K
c5t2d2 -  -  9 23   647K   393K


It is 3x dual-iSCSI + 2x dual SCSI DAS arrays (RAID0, 13x250).

I have problem however:
The 2 SCSI arrays were able to handle the mail-traffic fine with UFS on them.
The new config with 3 additional arrays seem to have problem using ZFS.
The writes are waiting for 10-15 seconds to get to disk - so queue fills ver 
quickly, reads are quite ok.
  


Are those synchronouse writes or asynchronous?  If both, what are the 
percentages of each?

Neil just putback a fix into snv_48 for:
6413510 zfs: writing to ZFS filesystem slows down fsync() on other files in the 
same FS

Basically the fsync/synchronous writes end up doing more work than they should 
- instead of writing the data and meta-data for just the file you're trying to 
fsync, you will write (and wait for) other files' data  meta-data too.

eric

I assume this is the problem with ZFS prefering reads to writes.

I also see in 'zpool iostat -v 1' that writes are issued to disk only once in 
10 secs, and then its 2000rq one sec.
Reads are sustained at cca 800rq/s.

Is there a way to tune this read/write ratio? Is this know problem?

I tried to change vq_max_pending as suggested by Eric in 
http://blogs.sun.com/erickustarz/entry/vq_max_pending
But no change in this write behaviour.

Iostat shows cca 20-30ms asvc_t, 0%w, and cca 30% busy on all drives so these 

[zfs-discuss] Re: Recommendation ZFS on StorEdge 3320

2006-09-07 Thread Anton B. Rang
The bigger problem with system utilization for software RAID is the cache, not 
the CPU cycles proper. Simply preparing to write 1 MB of data will flush half 
of a 2 MB L2 cache. This hurts overall system performance far more than the few 
microseconds that XORing the data takes.

(A similar effect occurs with file system buffering, and this is one reason why 
direct I/O is attractive for databases — there’s no pollution of the system 
cache.)
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: How to destroy a pool wich you can't import

2006-09-07 Thread Eric Schrock
On Thu, Sep 07, 2006 at 01:09:47PM -0700, Frank Cusack wrote:
 
 That zfs needs to address.
 
 What if I simply lose power to one of the hosts, and then power is restored?

Then use a layered clustering product - that's what this is for.  For
example, SunCluster doesn't use the cache file in the traditional way,
and will make sure the host coming back up doesn't access the pool
before it is able to do so.

If you are going to 'roll your own' clustering, then you will need to
come up with some appropriate conversation between the two hosts to know
when it is OK to come completely up.  You can use alternate root pools
(with '/' they become effectively temporary) and allow the host to come
all the way up before having the failback conversation with the other
host before explicitly importing the pool.

- Eric

--
Eric Schrock, Solaris Kernel Development   http://blogs.sun.com/eschrock
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: How to destroy a pool wich you can't import

2006-09-07 Thread Eric Schrock
On Thu, Sep 07, 2006 at 01:52:33PM -0700, Darren Dunham wrote:
 
 What are the problems that you see with that check?  It appears similar
 to what VxVM has been using (although they do not use the `hostid` as
 the field), and that appears to have worked well in most cases.
 
 I don't know what issues appear with multiple hosts.  My worry is that
 an accidental import would allow two machines to update uberblock and
 other metadata to the point that you get corruption.  If the sharing
 hosts get read-only access and never touch the metadata, then (for me)
 the hostname check becomes much less relevant.  If they want to import
 it, fine...but don't corrupt anything.
 

I agree that it's a useful check against accidental mistakes - as long
we're not talking about some built in clustering behavior.  We just
haven't thought through what the experience should be.  In particular,
there are some larger issues in relation to FMA that need to be address.
For example, we would want the pool to show up as faulted, but there
needs to be a consistent way to 'repair' such a pool.  Should it be an
extension of 'zpool clear', or should it be done through 'fmadam
repair'?  Or even 'zpool export' followed by 'zpool import'?  We're
going to have to answer these questions for the next phase of ZFS/FMA
interaction, so maybe it would be a good time to think
about this problem as well.

And of course, you'll always be able to shoot yourself in the foot if
you try, either by 'repairing' a pool that's actively shared, or by
force-importing a pool that's actively in use somewhere else.

- Eric

--
Eric Schrock, Solaris Kernel Development   http://blogs.sun.com/eschrock
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: Re: How to destroy a pool wich you can't import

2006-09-07 Thread Anton B. Rang
A determined administrator can always get around any checks and cause problems. 
We should do our very best to prevent data loss, though! This case is 
particularly bad since simply booting a machine can permanently damage the pool.

And why would we want a pool imported on another host, or not marked as 
belonging to this host, to show up as faulted? That seems an odd use of the 
word.  Unavailable, perhaps, but not faulted.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss