[zfs-discuss] GSoC 09 zfs ideas?

2009-02-25 Thread C. Bergström


Hi everyone.

I've got a couple ideas for good zfs GSoC projects, but wanted to stir 
some interest.  Anyone interested to help mentor?  The deadline is 
around the corner so if planning hasn't happened yet it should start 
soon.  If there is interest who would the org administrator be?


Thanks

./Christopher
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs streams data corruption

2009-02-25 Thread Moore, Joe
Miles Nordin wrote:
   that SQLite2 should be equally as tolerant of snapshot backups as it
   is of cord-yanking.
 
 The special backup features of databases including ``performing a
 checkpoint'' or whatever, are for systems incapable of snapshots,
 which is most of them.  Snapshots are not writeable, so this ``in the
 middle of a write'' stuff just does not happen.

This is correct.  The general term for these sorts of point-in-time backups is 
crash consistant.  If the database can be recovered easily (and/or 
automatically) from pulling the plug (or a kill -9), then a snapshot is an 
instant backup of that database.

In-flight transactions (ones that have not been committed) at the database 
level are rolled back.  Applications using the database will be confused by 
this in a recovery scenario, since the transaction was reported as committed 
are gone when the database comes back.  But that's the case any time a database 
moves backward in time.

 Of course Toby rightly pointed out this claim does not apply if you
 take a host snapshot of a virtual disk, inside which a database is
 running on the VM guest---that implicates several pieces of
 untrustworthy stacked software.  But for snapshotting SQLite2 to clone
 the currently-running machine I think the claim does apply, no?


Snapshots of a virtual disk are also crash-consistant.  If the VM has not 
committed its transactionally-committed data and is still holding it volatile 
memory, that VM is not maintaining its ACID requirements, and that's a bug in 
either the database or in the OS running on the VM.  The snapshot represents 
the disk state as if the VM were instantly gone.  If the VM or the database 
can't recover from pulling the virtual plug, the snapshot can't help that.

That said, it is a good idea to quiesce the software stack as much as possible 
to make the recovery from the crash-consistant image as painless as possible.  
For example, if you take a snapshot of a VM running on an EXT2 filesystem (or 
unlogged UFS for that matter) the recovery will require an fsck of that 
filesystem to ensure that the filesystem structure is consistant.  Perforing a 
lockfs on the filesystem while the snapshot is taken could mitigate that, but 
that's still out of the scope of the ZFS snapshot.

--Joe

--Joe
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs streams data corruption

2009-02-25 Thread Toby Thain


On 25-Feb-09, at 9:53 AM, Moore, Joe wrote:


Miles Nordin wrote:
  that SQLite2 should be equally as tolerant of snapshot backups  
as it

  is of cord-yanking.

The special backup features of databases including ``performing a
checkpoint'' or whatever, are for systems incapable of snapshots,
which is most of them.  Snapshots are not writeable, so this ``in the
middle of a write'' stuff just does not happen.


This is correct.  The general term for these sorts of point-in-time  
backups is crash consistant.  If the database can be recovered  
easily (and/or automatically) from pulling the plug (or a kill -9),  
then a snapshot is an instant backup of that database.


In-flight transactions (ones that have not been committed) at the  
database level are rolled back.  Applications using the database  
will be confused by this in a recovery scenario, since the  
transaction was reported as committed are gone when the database  
comes back.  But that's the case any time a database moves  
backward in time.



Of course Toby rightly pointed out this claim does not apply if you
take a host snapshot of a virtual disk, inside which a database is
running on the VM guest---that implicates several pieces of
untrustworthy stacked software.  But for snapshotting SQLite2 to  
clone

the currently-running machine I think the claim does apply, no?



Snapshots of a virtual disk are also crash-consistant.  If the VM  
has not committed its transactionally-committed data and is still  
holding it volatile memory, that VM is not maintaining its ACID  
requirements, and that's a bug in either the database or in the OS  
running on the VM.


Or the virtual machine! I hate to dredge up the recent thread again -  
but if your virtual machine is not maintaining guest barrier  
semantics (write ordering) on the underlying host, then your snapshot  
may contain inconsistencies entirely unexpected to the virtualised  
transactional/journaled database or filesystem.[1]


I believe this can be reproduced by simply running VirtualBox with  
default settings (ignore flush), though I have been too busy lately  
to run tests which could prove this. (Maybe others would be  
interested in testing as well.) I infer this explanation from  
consistency failures in InnoDB and ext3fs that I have seen[2], which  
would not be expected on bare metal in pull-plug tests. My point is  
not about VB specifically, but just that in general, the consistency  
issue - already complex on bare metal - is tangled further as the  
software stack gets deeper.


--Toby

[1] - The SQLite web site has a good summary of related issues.
http://sqlite.org/atomiccommit.html
[2] http://forums.virtualbox.org/viewtopic.php?t=13661

The snapshot represents the disk state as if the VM were instantly  
gone.  If the VM or the database can't recover from pulling the  
virtual plug, the snapshot can't help that.


That said, it is a good idea to quiesce the software stack as much  
as possible to make the recovery from the crash-consistant image as  
painless as possible.  For example, if you take a snapshot of a VM  
running on an EXT2 filesystem (or unlogged UFS for that matter) the  
recovery will require an fsck of that filesystem to ensure that the  
filesystem structure is consistant.  Perforing a lockfs on the  
filesystem while the snapshot is taken could mitigate that, but  
that's still out of the scope of the ZFS snapshot.


--Joe

--Joe
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs streams data corruption

2009-02-25 Thread Miles Nordin
 jm == Moore, Joe joe.mo...@siemens.com writes:

jm This is correct.  The general term for these sorts of
jm point-in-time backups is crash consistant.

phew, thanks, glad I wasn't talking out my ass again.

jm In-flight transactions (ones that have not been committed) at
jm the database level are rolled back.  Applications using the
jm database will be confused by this in a recovery scenario,
jm since the transaction was reported as committed are gone when
jm the database comes back.  But that's the case any time a
jm database moves backward in time.

hm.  I thought a database would not return success to the app until it
was actually certain the data was on disk with fsync() or whatever,
and this is why databases like NVRAM's and slogs.  Are you saying it's
a common ``optimisation'' for DBMS to worry about write barriers only,
not about flushing?

jm Snapshots of a virtual disk are also crash-consistant.  If the
jm VM has not committed its transactionally-committed data and is
jm still holding it volatile memory, that VM is not maintaining
jm its ACID requirements, and that's a bug in either the database
jm or in the OS running on the VM.

I'm betting mostly ``the OS running inside the VM'' and ``the virtualizer
itself''.  For the latter, from Toby's thread:

-8-
If desired, the virtual disk images (VDI) can be flushed when the
guest issues the IDE FLUSH CACHE command. Normally these requests are
ignored for improved performance.
To enable flushing, issue the following command:
 VBoxManage setextradata VMNAME 
VBoxInternal/Devices/piix3ide/0/LUN#[x]/Config/IgnoreFlush 0
-8-

Virtualizers are able to take snapshots themselves without help from
the host OS, so I would expect at least those to work, and host
snapshots to be fixable.  VirtualBox has a ``pause'' feature---it
could pretend it's received a flush command from the guest, and flush
whatever internal virtualizer buffers it has to the host OS when
paused.

Also a host snapshot is a little more forgiving than a host cord-yank
because the snapshot will capture things applications like VBox have
written to files but not fsync()d yet.  so it's ok for snapshots but
not cord-yanks if VBox never bothers to call fsync().  It's just not
okay that VBox might buffer data internally sometimes.

Even if that's all sorted, though, ``the OS running inside the
VM''---neither UFS nor ext3 sends these cache flush commands to
virtual drives.  At least for ext3, the story is pretty long:

 http://lwn.net/Articles/283161/
  So, for those that wish to enable them, barriers apparently are
  turned on by giving barrier=1 as an option to the mount(8) command,
  either on the command line or in /etc/fstab:
   mount -t ext3 -o barrier=1 device mount point
  (but, does not help at all if using LVM2 because LVM2 drops the barriers)

ext3 get away with it because drive write buffers are small enough
they can mostly get away with only flushing the journal, and the
journal's written in LBA order, so except when it wraps around there's
little incentive for drives to re-order it.  But ext3's supposed
ability to mostly work ok without barriers depends on assumptions
about physical disks---the size of the write cache being 32MB, their
reordering sorting algorithm being elevator-like---that probably don't
apply to a virtual disk so a Linux guest OS very likely is ``broken''
w.r.t. taking these crash-consistent virtual disk snapshots.

And also a Solaris guest: we've been told UFS+logging expects the
write cache to be *off* for correctness.  I don't know if UFS is less
good at evading the problem than ext3, or if Solaris users are just
more conservative.  but, with a virtual disk the write cache will
always be effectively on no matter what simon-sez flags you pass to
that awful 'format' tool.  That was never on the bargaining table
because there's no other way it can have remotely reasonable
performance.

Possibly the ``pause'' command would be a workaround for this becuase
it could let you force a barrier into the write stream yourself (one
the guest OS never sent) and then take a snapshot right after the
barrier with no writes allowed between barrier and snapshot.  If the
fake barrier is inserted into the stack right at the guest/VBox
boundary, then it should make the overall system behave as well as the
guest running on a drive with the write cache disabled.  I'm not sure
such a barrier is actually implied by VBox ``pause'' but if I were
designing the pause feature it would be.


pgpnmTxPa2z8Y.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Backing up ZFS snapshots

2009-02-25 Thread Blake
I'm sure that's true.  My point was that, given the choice between a
zfs send/recv from one set of devices to another, where the target is
another pool, and sending a zfs stream to a tarball, I'd sooner choose
a solution that's all live filesystems.

If backups are *really* important, then it's certainly better to use a
product with commercial support.  I think Amanda is zfs-aware now?


On Mon, Feb 23, 2009 at 12:16 PM, Miles Nordin car...@ivy.net wrote:
 b == Blake  blake.ir...@gmail.com writes:

     c There are other problems besides the versioning.

     b Agreed - I don't think that archiving simply the send stream
     b is a smart idea (yet, until the stream format is stabilized

 *there* *are* *other* *problems* *besides* *the* *versioning*!

 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Motherboard for home zfs/solaris file server

2009-02-25 Thread Brandon High
On Tue, Feb 24, 2009 at 2:29 PM, Tim t...@tcsac.net wrote:
 Given the current state of AMD, I think we all know that's not likely.  Why
 cut into the revenue of your server line chips when you don't have to?
  Right?

AMD has nothing to do with whether ECC exists on the Nehalem.

Most likely ECC is in the memory controller of the Nehalem die, it's
just disabled on the i7. It wouldn't make any sense to tape out a
whole new die for the server version of the chip. The Xeon could use
another stepping, but I'd expect Intel to use the same on both
consumer and server versions of the chip.

-B

-- 
Brandon High : bh...@freaks.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs streams data corruption

2009-02-25 Thread Miles Nordin
 tt == Toby Thain t...@telegraphics.com.au writes:

 c so it's ok for snapshots but not cord-yanks if VBox never
 c bothers to call fsync().

tt Taking good host snapshots may require VB to do that, though.

AIUI the contents of a snapshot on the host will be invariant no
matter where VBox places host fsync() calls along the timeline, or if
it makes them at all.

The host snapshot will not be invariant of when applications running
inside the guest call fsync(), because this inner fsync() implicates
the buffer cache in the guest OS, possibly flush commands at the
guest/VBox driver/virtualdisk boundary, and stdio buffers inside the
VBox app.

so...in the sense that, in a hypothetical nonexistent working overall
system, a guest app calling fsync() eventually propogates out until
finally VBox calls fsync() on the host's kernel, then yeah, observing
a lack of fsync()'s coming out of VBox probably means host snapshots
won't be crash-consistent.  BUT the effect of the fsync() on the host
itself is not what's needed for host snapshots (only needed for host
cord-yanks).  It's all the other stuff that's needed for host
snapshots---flushing the buffer cache inside the guest OS, flushing
VBox's stdio buffers, u.s.w., that makes a bunch of write()'s spew out
just before the fsync() and dams up other write()s inside VBox and the
guest OS until after the fsync() comes out.

 c   But ext3's supposed ability to mostly work ok without
 c barriers

tt Without *working* barriers, you mean? I haven't RTFS but I
tt suspect ext3 needs functioning barriers to maintain crash
tt consistency.

no, the lwn article says that ext3 is just like Solaris UFS and never
issues a cache flush to the drive (except on SLES where Novell made
local patches to their kernel).

ext3 probably does still use an internal Linux barrier API to stop
dangerous kinds of reordering within the Linux buffer cache, but
nothing that makes it down to the drive (nor into VBox).  so I think
even if you turn on the flush-respecting feature in VBox, Linux ext3
and Solaris UFS would both still be necessarily unsafe (according to
our theory so far), at least unsafe from: (1) host cord-yanking, (2)
host snapshots taken without ``pausing'' the VM.

If you're going to turn on the VBox flush option, maybe it would be
worth trying XFS or ext4 or ZFS inside the guest and comparing their
corruptability.

For VBox to simulate a real disk with its write cache turned off, and
thus work better with UFS and ext3, VBox would need to make sure
writes are not re-ordered.  For the unpaused-host-snapshot case this
should be relatively easy---just make VBox stop using stdio, and call
write() exactly once for every disk command the guest issues and call
it in the same order the guest passed it.  It's not necessary to call
fsync() at all, so it should not make things too much slower.

For the host cord-yanking case, I don't think POSIX gives enough to
achieve this and still be fast because you'd be expected to call
fsync() between each write.  What we really want is some flag, ``make
sure my writes appear to have been done in order after a crash.''  I
don't think there's such a thing as a write barrier in POSIX, only the
fsync() flush command?  

Maybe it should be a new rule of zvol's that they always act this
way. It need not slow things down much for the host to arrange that
writes not appear to have been reordered: all you have to do is batch
them into chunks along the timeline, and make sure all the writes in a
chunk commit, or none of them do.  It doesn't matter how big the
chunks are nor where they start and end.  It's sort of a degenerate
form of the snapshot case: with the fwrite()-to-write() change above
we can already take a clean snapshot without fsync(), so just pretend
as thoughyou were taking a snapshot a couple times a minute, and after
losing power roll back to the newest one that survived.  I'm not sure
real snapshots are the right way to implement it, but the idea is with
a COW backingn store it should be well within-reach to provide the
illusion writes are never reordered (and thus that your virtual hard
disk has its write cache turned off) without adding lots of io/s the
way fsync() does.  This still compromises the D in ACID for databases
running inside the guest, in the host cord-yank case, but it should
stop the corruption.


pgpDmKTrtWRL1.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Motherboard for home zfs/solaris file server

2009-02-25 Thread Tim
On Wed, Feb 25, 2009 at 2:39 PM, Brandon High bh...@freaks.com wrote:

 On Tue, Feb 24, 2009 at 2:29 PM, Tim t...@tcsac.net wrote:
  Given the current state of AMD, I think we all know that's not likely.
  Why
  cut into the revenue of your server line chips when you don't have to?
   Right?

 AMD has nothing to do with whether ECC exists on the Nehalem.


Of course it does.  Competition directly affects the features provided on
everyone in a market segment's products.



 Most likely ECC is in the memory controller of the Nehalem die, it's
 just disabled on the i7. It wouldn't make any sense to tape out a
 whole new die for the server version of the chip. The Xeon could use
 another stepping, but I'd expect Intel to use the same on both
 consumer and server versions of the chip.


The fact Intel put a memory controller on die is PROOF that AMD has a direct
effect on their product roadmap.  Do you think Intel would have willingly
killed off their lucrative northbridge chipset business without AMD forcing
their hand?  Please.

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Solaris 8/9 branded zones on ZFS root?

2009-02-25 Thread Rich Teer
Hi all,

I have a situation where I need to consolidate a few servers running
Solaris 9 and 8.  If the application doesn't run natively on Solaris
10 or Nevada, I was thinking of using Solars 9 or 8 branded zones.
My intent would be for the global zone to use ZFS boot/root; would I
be correct in thinking that this will be OK for the branded zones?
That is, they don't care about the underlying file system type?

Or am I stuck with using UFS for the root file systems of Solaris 8
and 9 branded zones?  (I sure hpoe not!)

Many TIA,

-- 
Rich Teer, SCSA, SCNA, SCSECA

URLs: http://www.rite-group.com/rich
  http://www.linkedin.com/in/richteer
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Fwd: Solaris 8/9 branded zones on ZFS root?

2009-02-25 Thread Peter Pickford
Hi Rich,

Solaris 8/9 zones seem to work fine with zfs root for the zone.

Only problem so far is where to put the root file system for the zone
in the zfs file system hierarchy.

branded zones do not seem to be part of the luupdate scheme.

At them moment I have another tree of file systems on rpool.

Problem is that the boot code doesn't mount the root file system for
the zone when you re-boot.


Thanks

Peter


2009/2/25 Rich Teer rich.t...@rite-group.com:
 Hi all,

 I have a situation where I need to consolidate a few servers running
 Solaris 9 and 8.  If the application doesn't run natively on Solaris
 10 or Nevada, I was thinking of using Solars 9 or 8 branded zones.
 My intent would be for the global zone to use ZFS boot/root; would I
 be correct in thinking that this will be OK for the branded zones?
 That is, they don't care about the underlying file system type?

 Or am I stuck with using UFS for the root file systems of Solaris 8
 and 9 branded zones?  (I sure hpoe not!)

 Many TIA,

 --
 Rich Teer, SCSA, SCNA, SCSECA

 URLs: http://www.rite-group.com/rich
      http://www.linkedin.com/in/richteer
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [zones-discuss] Solaris 8/9 branded zones on ZFS root?

2009-02-25 Thread Timothy Kennedy



Rich Teer wrote:


I have a situation where I need to consolidate a few servers running
Solaris 9 and 8.  If the application doesn't run natively on Solaris
10 or Nevada, I was thinking of using Solars 9 or 8 branded zones.
My intent would be for the global zone to use ZFS boot/root; would I
be correct in thinking that this will be OK for the branded zones?


That's correct.  I have some solaris 8 zones running under cluster
control, where zonepath is zfs, and they're doing just fine.
Nothing special had to be done.

-Tim

--
Timothy Kennedy
SDE Infrastructure Operations Manager
Email:  timothy.kenn...@sun.com
Phone:  +1-703-636-0531 / x53151
AIM/Skype: tkSUNW
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Motherboard for home zfs/solaris file server

2009-02-25 Thread Brandon High
On Wed, Feb 25, 2009 at 12:52 PM, Tim t...@tcsac.net wrote:
 Of course it does.  Competition directly affects the features provided on
 everyone in a market segment's products.

The server and workstation market demands ECC. Any die that would be
used in the server or workstation market would need to have ECC.

 The fact Intel put a memory controller on die is PROOF that AMD has a direct
 effect on their product roadmap.  Do you think Intel would have willingly
 killed off their lucrative northbridge chipset business without AMD forcing
 their hand?  Please.

Intel moved to on-die memory controller because the front side bus
architecture was becoming a bottleneck as the number of cores
increased.

The fact that AMD's chips already have an on-die controller certainly
influenced Intel's direction - I'm not disputing that. The fact of the
matter is that an on-die MC is an efficient way to to have high
bandwidth and low latency access to memory. The IBM POWER 6 has on-die
memory controllers as well, which is less likely to be due to any
market pressure caused by AMD since the two firms' products don't
directly compete. It's just a reasonable engineering decision.

-B

-- 
Brandon High : bh...@freaks.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs streams data corruption

2009-02-25 Thread Greg Palmer

Miles Nordin wrote:

gp Performing a checkpoint will perform such tasks as making sure
gp that all transactions recorded in the log but not yet written
gp to the database are written out and that the system is not in
gp the middle of a write when you grab the data.

great copying of buzzwords out of a glossary, 
Wasn't copied from a glossary, I just tried to simplify it enough for 
you to understand. I apologize if I didn't accomplish that goal.



but does it change my
claim or not?  My claim is: 


  that SQLite2 should be equally as tolerant of snapshot backups as it
  is of cord-yanking.
  
You're missing the point here Miles. The folks weren't asking for a 
method to confirm their database was able to perform proper error 
recovery and confirm it would survive having the cord yanked out of the 
wall. They were asking for a reliable way to backup their data. The best 
way to do that is not by snapshotting alone. The process of performing 
database backups is well understood and supported throughout the industry.


Relying on the equivalent of crashing the database to perform backups 
isn't how professionals get the job done. There is a reason that 
database vendor do not suggest you backup their databases by pulling the 
plug out of the wall or killing the running process. The best way to 
backup a database is by using a checkpoint. Your comment about 
checkpoints being for systems where snapshots are not available is not 
accurate. That is the normal method of backing up databases under 
Solaris among others. Checkpoints are useful for all systems since they 
guarantee that the database files are consistent and do not require 
recovery which doesn't always work no matter what the glossy brochures 
tell you. Typically they are used in concert with snapshots. Force the 
checkpoint, trigger the snapshot and you're golden.


Let's take a simple case of a transaction which consists of three 
database updates within a transaction. One of those writes succeeds, you 
take a snapshot and then the two other writes succeed. Everyone 
concerned with the transaction believes it succeeded but your snapshot 
does not show that. When the database starts up again, the data it will 
have in your snapshot indicates the transaction never succeeded and 
therefore it will roll out the database transaction and you will lose 
that transaction. Well, it will assuming that all code involved in that 
recovery works flawlessly. Issuing a checkpoint on the other hand causes 
the database to complete the transaction including ensuring consistency 
of the database files before you take your snapshot. NOTE: If you issue 
a checkpoint and then perform a snapshot you will get consistent data 
which does not require the database perform recovery. Matter of fact, 
that's the best way to do it.


Your dismissal of write activity taking place is inaccurate. Snapshots 
take a picture of the file system at a point in time. They have no 
knowledge of whether or not one of three writes required for the 
database to be consistent have completed. (Refer to above example) Data 
does not hit the disk instantly, it takes some finite amount of time in 
between when the write command is issued for it to arrive at the disk. 
Plainly, terminating the writes between when they are issued and before 
it has completed is possible and a matter of timing. The database on the 
other hand does understand when the transaction has completed and allows 
outside processes to take advantage of this knowledge via checkpointing.


All real database systems have flaws in the recovery process and so far 
every database system I've seen has had issues at one time or another. 
If we were in a perfect world it SHOULD work every time but we aren't in 
a perfect world. ZFS promises on disk consistency but as we saw in the 
recent thread about Unreliable for professional usage it is possible 
to have issues. Likewise with database systems.


Regards,
 Greg
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs streams data corruption

2009-02-25 Thread Miles Nordin
 gp == Greg Palmer gregorylpal...@netscape.net writes:

gp Relying on the equivalent of crashing the database to perform
gp backups isn't how professionals get the job done.

well, nevertheless, it is, and should be, supported by SQLite2.

gp Let's take a simple case of a transaction which consists of
gp three database updates within a transaction. One of those
gp writes succeeds, you take a snapshot and then the two other
gp writes succeed.  Everyone concerned with the transaction
gp believes it succeeded but your snapshot does not show that.

I'm glad you have some rigid procedures that work well for you, but it
sounds like you do not understand how DBMS's actually deal with their
backing store.

You could close the gap by reviewing the glossary entry for ACID.
It's irrelevant whether the transaction spawns one write or
three---the lower parts of the DBMS make updates transactional.  As
long as writes are not re-ordered or silently discarded, it's not a
hand-waving recovery-from-chaos process.  It's certain.  Sometimes
writes ARE lost or misordered, or there are bugs in the DBMS or bad
RAM or who knows what, so I'm not surprised your vendor has given you
hand-waving recovery tools along with a lot of scary disclaimers.  Nor
am I surprised that they ask you to follow procedures that avoid
exposing their bugs.  But it's just plain wrong that the only way to
achieve a correct backup is with the vendor's remedial freezing tools.

I don't understand why you are dwelling on ``everyone concerned
believes it succeeded but it's not in the backup.''  So what?
Obviously the backup has to stop including things at some point.  As
long as the transaction is either in the backup or not in the backup,
the backup is FINE.  It's a BACKUP.  It has to stop somewhere.

You seem to be concerned that a careful forensic scientist could dig
into the depths of the backup and find some lingering evidence that a
transaction might have once been starting to come into existence.  As
far as I'm concerned, that transaction is ``not in the backup'' and
thus fine.

You might also have a look at the, somewhat overcomplicated
w.r.t. database-running-snapshot backups, SQLite2 atomic commit URL
Toby posted:

  http://sqlite.org/atomiccommit.html

Their experience points out, filesystems tend to do certain
somewhat-predictable but surprising things to the data inside files
when the cord is pulled, things which taking a snapshot won't do.  so,
I was a little surprised to read about some of the crash behaviors
SQLite had to deal with, but, with slight reservation, I stand by my
statement that the database should recover swiftly and certainly when
the cord is pulled.  But! it looks like recovering from a
``crash-consistent'' snapshot is actually MUCH easier than a pulled
cord, at least a pulled cord with some of the filesystems SQLite2 aims
to support.

gp [snapshots] have no knowledge of whether or not one of three
gp writes required for the database to be consistent have
gp completed.

it depends on what you mean by consistent.  In my language, the
database is always consistent, after each of those three writes.  The
DBMS orders the writes carefully to ensure this.  Especially in the
case of a lightweight DB like SQLite2 this is the main reason you use
the database in the first place.

gp Data does not hit the disk instantly, it takes some finite
gp amount of time in between when the write command is issued for
gp it to arrive at the disk.

I'm not sure it's critical to my argument, but, snapshots in ZFS have
nothing to do with when data ``hits the disk''.

gp ZFS promises on disk consistency but as we saw in the recent
gp thread about Unreliable for professional usage it is
gp possible to have issues. Likewise with database systems.

yes, finally we are in agreement!  Here is where we disagree: you want
to add a bunch of ponderous cargo-cult procedures and dire warnings,
like some convoluted way to tell SMF to put SQLite2 into
remedial-backup mode before taking a ZFS snapshot to clone a system.
I want to fix the bugs in SQLite2, or in whatever is broken, so that
it does what it says on the tin.

The first step in doing that is to convince people like you that there
is *necessarily* a bug if the snapshot is not a working backup.

Nevermind the fact that your way simply isn't workable with hundreds
of these lightweight SQLite/db4/whatever databases all over the system
in nameservices and Samba and LDAP and Thunderbird and so on.
Workable or not, it's not _necessary_, and installing this confusing
and incorrect expectation that it's necessary blocks bugs from getting
fixed, and is thus harmful for reliability overall (see _unworkable_
one sentence ago).

HTH.


pgp9wrTDqgHf8.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [zones-discuss] Solaris 8/9 branded zones on ZFS root?

2009-02-25 Thread Rich Teer
On Wed, 25 Feb 2009, Timothy Kennedy wrote:

 That's correct.  I have some solaris 8 zones running under cluster
 control, where zonepath is zfs, and they're doing just fine.
 Nothing special had to be done.

Excellent!  Just the news I was hoping for.

Thanks again,

-- 
Rich Teer, SCSA, SCNA, SCSECA

URLs: http://www.rite-group.com/rich
  http://www.linkedin.com/in/richteer
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [zones-discuss] Solaris 8/9 branded zones on ZFS root?

2009-02-25 Thread Nicolas Dorfsman


Le 25 févr. 09 à 23:12, Timothy Kennedy a écrit :


Rich Teer wrote:


I have a situation where I need to consolidate a few servers running
Solaris 9 and 8.  If the application doesn't run natively on Solaris
10 or Nevada, I was thinking of using Solars 9 or 8 branded zones.
My intent would be for the global zone to use ZFS boot/root; would I
be correct in thinking that this will be OK for the branded zones?


That's correct.  I have some solaris 8 zones running under cluster
control, where zonepath is zfs, and they're doing just fine.
Nothing special had to be done.



Which ACL model is then used ?



Nico
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Can VirtualBox run a 64 bit guests on 32 bit host

2009-02-25 Thread Harry Putnam
I've seen some talk on vmware forums indicating it is possible to run
64 bit guests on a 32 bit host as long as something called VT
technology is available.

I have an athlon64 +3200 running 32 bit WinXP pro.  I wondered if I
would be able to run opensol in 64 bit as guest inside VirtualBox?

I suspect the Mobo (A0pen AK86-L) on that Athlon may cause problems if
installing opensol directly onto that hardware.  I don't see any AOpen
motherboards on the HCL.  This one is somewhat dated.. maybe 4 yrs old
or so I doubt it has VT technology.. although I really don't know.

I'm successfully running Opensol-11 on that machine as a `VMware'
guest but didn't know how to try to force a 64 bit install or if it
would be a bad idea anyway.

My whole purpose is to experiment with zfs... would I see much
difference if opensol was installed 64 bit as compared to 32 bit?

I noticed the Simon blogs that describe how to setup a home zfs server
( http://breden.org.uk/2008/03/02/a-home-fileserver-using-zfs/ )
mention it is best setup 64 bit, but no real reason is given.


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Can VirtualBox run a 64 bit guests on 32 bit host

2009-02-25 Thread Tim
Then you would be looking for AMD-V extensions.  VT is only for Intel chips.

I highly doubt a 3200+ has the AMD-V extensions.

--Tim


On Wed, Feb 25, 2009 at 7:14 PM, Harry Putnam rea...@newsguy.com wrote:

 I've seen some talk on vmware forums indicating it is possible to run
 64 bit guests on a 32 bit host as long as something called VT
 technology is available.

 I have an athlon64 +3200 running 32 bit WinXP pro.  I wondered if I
 would be able to run opensol in 64 bit as guest inside VirtualBox?

 I suspect the Mobo (A0pen AK86-L) on that Athlon may cause problems if
 installing opensol directly onto that hardware.  I don't see any AOpen
 motherboards on the HCL.  This one is somewhat dated.. maybe 4 yrs old
 or so I doubt it has VT technology.. although I really don't know.

 I'm successfully running Opensol-11 on that machine as a `VMware'
 guest but didn't know how to try to force a 64 bit install or if it
 would be a bad idea anyway.

 My whole purpose is to experiment with zfs... would I see much
 difference if opensol was installed 64 bit as compared to 32 bit?

 I noticed the Simon blogs that describe how to setup a home zfs server
 ( http://breden.org.uk/2008/03/02/a-home-fileserver-using-zfs/ )
 mention it is best setup 64 bit, but no real reason is given.


 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Can VirtualBox run a 64 bit guests on 32 bit host

2009-02-25 Thread Mark
Hi Harry,

I doubt it too. Try here to be sure (no need to install, unzip in a folder
and just run).

CPUID http://www.cpuid.com/

Check the processor features when you run the app. I hope that helps.


/Mark :-)


2009/2/26 Tim t...@tcsac.net

 Then you would be looking for AMD-V extensions.  VT is only for Intel
 chips.

 I highly doubt a 3200+ has the AMD-V extensions.

 --Tim



 On Wed, Feb 25, 2009 at 7:14 PM, Harry Putnam rea...@newsguy.com wrote:

 I've seen some talk on vmware forums indicating it is possible to run
 64 bit guests on a 32 bit host as long as something called VT
 technology is available.

 I have an athlon64 +3200 running 32 bit WinXP pro.  I wondered if I
 would be able to run opensol in 64 bit as guest inside VirtualBox?

 I suspect the Mobo (A0pen AK86-L) on that Athlon may cause problems if
 installing opensol directly onto that hardware.  I don't see any AOpen
 motherboards on the HCL.  This one is somewhat dated.. maybe 4 yrs old
 or so I doubt it has VT technology.. although I really don't know.

 I'm successfully running Opensol-11 on that machine as a `VMware'
 guest but didn't know how to try to force a 64 bit install or if it
 would be a bad idea anyway.

 My whole purpose is to experiment with zfs... would I see much
 difference if opensol was installed 64 bit as compared to 32 bit?

 I noticed the Simon blogs that describe how to setup a home zfs server
 ( http://breden.org.uk/2008/03/02/a-home-fileserver-using-zfs/ )
 mention it is best setup 64 bit, but no real reason is given.


 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss



 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss




-- 

Laurence J. Peter  - If two wrongs don't make a right, try three.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] GSoC 09 zfs ideas?

2009-02-25 Thread Darren J Moffat

C. Bergström wrote:


Hi everyone.

I've got a couple ideas for good zfs GSoC projects, but wanted to stir 
some interest.  Anyone interested to help mentor?  The deadline is 
around the corner so if planning hasn't happened yet it should start 
soon.  If there is interest who would the org administrator be?


I might be interested in mentoring.  I've done GSoC mentoring in the past.

--
Darren J Moffat
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Can VirtualBox run a 64 bit guests on 32 bit host

2009-02-25 Thread Harry Putnam
Mark mark.hom...@gmail.com writes:

 I doubt it too. Try here to be sure (no need to install, unzip in a folder
 and just run).

 CPUID http://www.cpuid.com/

 Check the processor features when you run the app. I hope that helps.

That is a nice little tool.  Thanks

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Virutal zfs server vs hardware zfs server

2009-02-25 Thread Harry Putnam
I'm experimenting with a zfs home server.  Running Opensol-11 by
way of vmware on WinXP.

It seems one way to avoid all the hardware problems one might run into
trying to install opensol on available or spare hardware.

Are there some bad gotchas about running opensol/zfs through vmware and
never going to real hardware?

One thing comes to mind is the overhead of two OSs on one processor.
An Athlon64 2.2 +3400 running 32bit Windows XP and opensol in VMware.

But if I lay off the windows OS... like not really working it with
transcibing video or compressing masses of data or the like. Is this
likely to be a problem?

Also I'm loosing out on going 64 bit since its not likely this machine
supports the AMD V extensions... and I'm short on SATA connections.  I
only have two onboard, but plan to install a pci style sata controller
to squeeze in some more discs.

Its a  big old ANTEC case so I don't think getting the discs in there
will be much of a problem.  But have wondered if a PCI sata controller
is likely to be a big problem.

So, are there things I need to know about that will make running a zfs
home server from vmware a bad idea?

The server will be serving as backup destination for 5 home machines
and most likely would see service only about 2-3 days a week far as
any kind of heavy usage like ghosted disc images and other large
chunks of data + a regular 3 day a week backup running from windows
using `retrospect' to backup user directories and changed files in
C:\.

A 6th (linux) machine may eventually start using the server but for
now its pretty selfcontained and has lots of disc space. 

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss