Re: [zfs-discuss] APPLE: ZFS need bug corrections instead of new func! Or?

2009-06-19 Thread Dave



Haudy Kazemi wrote:



I think a better question would be: what kind of tests would be most
promising for turning some subclass of these lost pools reported on
the mailing list into an actionable bug?

my first bet would be writing tools that test for ignored sync cache
commands leading to lost writes, and apply them to the case when iSCSI
targets are rebooted but the initiator isn't.

I think in the process of writing the tool you'll immediately bump
into a defect, because you'll realize there is no equivalent of a
'hard' iSCSI mount like there is in NFS.  and there cannot be a strict
equivalent to 'hard' mounts in iSCSI, because we want zpool redundancy
to preserve availability when an iSCSI target goes away.  I think the
whole model is wrong somehow.
  
I'd surely hope that a ZFS pool with redundancy built on iSCSI targets 
could survive the loss of some targets whether due to actual failures or 
necessary upgrades to the iSCSI targets (think OS upgrades + reboots on 
the systems that are offering iSCSI devices to the network.)




I've had a mirrored zpool created from solaris iSCSI target servers in 
production since April 2008. I've had disks die and reboots of the 
target servers - ZFS has handled them very well. My biggest wish is to 
be able to tune the iSCSI timeout value so ZFS can failover reads/writes 
to the other half of the mirror quicker than it does now (about 180 
seconds on my config). A minor gripe considering the features that ZFS 
provides.


I've also had the zfs server (the initiator aggregating the mirrored 
disks) unintentionally power cycled with the iscsi zpool imported. The 
pool re-imported and scrubbed fine.


ZFS is definitely my FS of choice - by far.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] APPLE: ZFS need bug corrections instead of new func! Or?

2009-06-19 Thread Haudy Kazemi



I think a better question would be: what kind of tests would be most
promising for turning some subclass of these lost pools reported on
the mailing list into an actionable bug?

my first bet would be writing tools that test for ignored sync cache
commands leading to lost writes, and apply them to the case when iSCSI
targets are rebooted but the initiator isn't.

I think in the process of writing the tool you'll immediately bump
into a defect, because you'll realize there is no equivalent of a
'hard' iSCSI mount like there is in NFS.  and there cannot be a strict
equivalent to 'hard' mounts in iSCSI, because we want zpool redundancy
to preserve availability when an iSCSI target goes away.  I think the
whole model is wrong somehow.
  
I'd surely hope that a ZFS pool with redundancy built on iSCSI targets 
could survive the loss of some targets whether due to actual failures or 
necessary upgrades to the iSCSI targets (think OS upgrades + reboots on 
the systems that are offering iSCSI devices to the network.)


My suggestion is use multi-way redundancy with iSCSI...e.g. 3 way 
mirrors or RAIDZ2...so that you can safely offline one of the iSCSI 
targets while still leaving the pool with some redundancy.  Sure there 
is an increased risk while that device is offline, but the window of 
opportunity is small for a failure of the 2nd level redundancy; and even 
then nothing is yet lost until a 3rd device has a fault.  Failures 
should also distinguish between complete failure (e.g. device no longer 
responds to commands whatsoever) and intermittent failure (e.g. a 
"sticky" patch of sectors, or the drive stops responding for a minute 
because it has a non-changeable TLER value that otherwise may cause 
trouble in a RAID configuration).  Drives have a gradation from complete 
failure to flaky to flawless...if the software running on them 
recognizes this, better decisions can be made about what to do when an 
error is encountered rather than the simplistic good/failed model that 
has been used in RAIDs for years.


My preference for storage behavior is that it should not cause a system 
panic (ever).  Graceful error recovery techniques are important.  File 
system error messages should be passed up the line when possible so the 
user can figure out something is amiss with some files (even if not all) 
even though the sysadmin is not around or email notification of problems 
is not working.  If it is possible to returning a CRC errors to a 
network share client, that would seem to be a close match to a 
uncorrectable checksum failure.  (Windows throws these errors when it 
cannot read a CD/DVD.)


A good damage mitigation feature is to provide some mechanism to allow a 
user to ignore the checksum failure as in many user data cases partial 
recovery is preferable to no recovery.  To ensure that damaged files are 
not accidentally confused with good files, ignoring the checksum 
failures might only be allowed through a special "recovery filesystem" 
that only lists damaged files the authenticated user has access to.  
From the network client's perspective, this would be another shared 
folder/subfolder that is only present when uncorrectable, damaged files 
have been found.  ZFS would set up the appropriate links to replicate 
the directory structure of the original as needed to include the damaged 
file.


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Server Cloning With ZFS?

2009-06-19 Thread Andre Wenas
The device tree for your 250 might be different, so you may need to  
hack the path_to_inst and /devices and /dev to make it boot sucessfully.




On Jun 20, 2009, at 10:18 AM, Dave Ringkor   
wrote:


Cindy, my question is about what "system specific info" is  
maintained that would need to be changed?  To take my example, my  
E450, "homer", has disks that are failing and it's a big clunky  
server anyway, and management wants to decommission it.  But we have  
an old 220R racked up doing nothing, and it's not scheduled for  
disposal.


What would be wrong with this:
1) Create a recursive snapshot of the root pool on homer.
2) zfs send this snapshot to a file on some NFS server.
3) Boot my 220R (same architecture as the E450) into single user  
mode from a DVD.

4) Create a zpool on the 220R's local disks.
5) zfs receive the snapshot created in step 2 to the new pool.
6) Set the bootfs property.
7) Reboot the 220R.

Now my 220R comes up as "homer", with its IP address, users, root  
pool filesystems, any software that was installed in the old homer's  
root pool, etc.


Since ZFS filesystems don't care about the underlying disk structure  
-- they only care about the pool, and I've already created a pool  
for them on the 220R using the disks it has, there shouldn't be any  
storage-type "system specific into" to change, right?  And sure, the  
220R might have a different number and speed of CPUs, and more or  
less RAM than the E450 had.  But when you upgrade a server in place  
you don't have to manually configure the CPUs or RAM, and how is  
this different?


The only thing I can think of that I might need to change, in order  
to bring up my 220R and have it "be" homer, is the network  
interfaces, from hme to bge or whatever.  And that's a simple config  
setting.


I don't care about Flash.  Actually, if you wanted to provision new  
servers based on a golden image like you can with Flash, couldn't  
you just take a recursive snapshot of a zpool as above, "receive" it  
in an empty zpool on another server, set your bootfs, and do a sys- 
unconfig?


So my big question is, with a server on ZFS root, what "system  
specific info" would still need to be changed?

--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Things I Like About ZFS

2009-06-19 Thread Blake
On Fri, Jun 19, 2009 at 10:30 PM, Dave Ringkor wrote:
> I'll start:
>
> - The commands are easy to remember -- all two of them.  Which is easier, SVM 
> or ZFS, to mirror your disks?  I've been using SVM for years and still have 
> to break out the manual to use metadb, metainit, metastat, metattach, 
> metadetach, etc.  I hardly ever have to break out the ZFS manual.  I can 
> actually remember the commands and options to do things.  Don't even start me 
> on VxVM.
>
> - Boasting to the unconverted.  We still have a lot of VxVM and SVM on 
> Solaris, and LVM on AIX, in the office.  The other admins are always having 
> issues with storage migrations, full filesystems, Live Upgrade, corrupted 
> root filesystems, etc.  I love being able to offer solutions to their 
> immediate problems, and follow it up with, "You know, if your box was on ZFS 
> this wouldn't be an issue."

Interesting.  Usually the problems make their way to this list more
than the successes.  Glad to hear it!

BTW, ZFS just saved my skin tonight after I botched an OpenNMS upgrade
and was able to go back to my auto-snapshots :)

And there was a power failure earlier that took down a bunch of hosts
that rely on our multi-terabyte ZFS filer, as well as the filer itself
- no waiting around for fsck, thanks!

Blake
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Things I Like About ZFS

2009-06-19 Thread Ian Collins

Dave Ringkor wrote:


- Boasting to the unconverted.  We still have a lot of VxVM and SVM on Solaris, and LVM 
on AIX, in the office.  The other admins are always having issues with storage 
migrations, full filesystems, Live Upgrade, corrupted root filesystems, etc.  I love 
being able to offer solutions to their immediate problems, and follow it up with, 
"You know, if your box was on ZFS this wouldn't be an issue."
  
Then you ask them how much the paid for their storage, that really 
annoys windows admins!


--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Things I Like About ZFS

2009-06-19 Thread Dave Ringkor
I'll start:

- The commands are easy to remember -- all two of them.  Which is easier, SVM 
or ZFS, to mirror your disks?  I've been using SVM for years and still have to 
break out the manual to use metadb, metainit, metastat, metattach, metadetach, 
etc.  I hardly ever have to break out the ZFS manual.  I can actually remember 
the commands and options to do things.  Don't even start me on VxVM.

- Boasting to the unconverted.  We still have a lot of VxVM and SVM on Solaris, 
and LVM on AIX, in the office.  The other admins are always having issues with 
storage migrations, full filesystems, Live Upgrade, corrupted root filesystems, 
etc.  I love being able to offer solutions to their immediate problems, and 
follow it up with, "You know, if your box was on ZFS this wouldn't be an issue."
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] recover data after zpool create

2009-06-19 Thread Haudy Kazemi

Kees Nuyt wrote:

On Fri, 19 Jun 2009 11:50:07 PDT, stephen bond
 wrote:

  

Kees,

is it possible to get at least the contents of /export/home ?

that is supposedly a separate file system.



That doesn't mean that data is in one particular spot on the
disk. The blocks of the zfilesystems can be interspersed.
  
You can try a recovery tool that supports file carving.  This technique 
looks for files based on their signatures while ignoring damaged, 
nonexistent, or unsupported partition and/or filesystem info.  Works 
best on small files, but gets worse as file sizes increase (or more 
accurately, gets worse as file fragmentation increases).  Should work 
well for files smaller than the stripe size, but possibly not at all for 
compressed files unless you are using a data recovery app that 
understands ZFS compression formats (I don't know of any myself).  
Disable or otherwise do not run scrub or any other command that may 
write to the array until you have exhausted your recovery options or no 
longer care to keep trying.


EasyRecovery supports file carving as does RecoverMyFiles, and 
TestDisk.  I'm sure there are others too.  Not all programs actually 
call it file carving.  The effectiveness of the programs may vary so it 
is worthwhile to try any demo versions.  The programs will need direct 
block level access to the drive...network shares won't work  You can run 
the recovery software on whatever OS it needs, and based on what you are 
asking for, you don't need to seek recovery software that is explicitly 
Solaris compatible.


is there a 
way to look for files using some low level disk reading
tool. If you are old enough to remember the 80s 
there was stuff like PCTools that could read anywhere
on the disk. 



I am old enough. I was the proud owner of a 20 MByte
harddisk back then (~1983).
Disks were so much smaller, you could practically scroll
most of the contents in a few hours.
The on disk data structures are much more complicated now.
  
I recall using a 12.5 Mhz 286 Amdek (Wyse) PC with a 20 mb 3600 rpm 
Miniscribe MFM drive.  A quick Google search for this item says its 
transfer rate specs were 0.625 MB/sec, which sounds about right IIRC (if 
you chose the optimal interleave when formatting.  If you had the wrong 
interleave performance suffered, however I also recall that the drive 
also made less noise.  I think I even ran that drive at a suboptimal 
interleave for a while simply because it was quieter...you could say it 
was an early indirect form of AAM (acoustic management).


To put that drive capacity and transfer rate into comparison with a 
modern drive, you could theoretically fill the 20 mb drive in 
20/0.625=32 seconds.  A 500 GB (base 10) SATA2 drive (WD5000AAKS) has an 
average write rate of 68 MB/sec.  466*1024/68=7012 seconds to fill.  
Capacity growth is significantly out pacing read/write performance, 
which I've seen summed up as modern drives are becoming like the tapes 
of yesteryear.


Those data recovery tools took advantage of the filesystem's design that 
it only erased the index entry (sometimes only a single character in the 
filename) in the FAT.  When NTFS came out, it took a few years for 
unerase and general purpose NTFS recovery to be possible.  This was 
actually a concern of mine and one reason I delayed using NTFS by 
default on several Windows 2000/XP systems.  I waited until good 
recovery tools were available before I committed to the new filesystem 
(in spite of it being journaled, there initially just weren't any 
recovery tools available in case things went horribly wrong, Live CDs 
were not yet available, and there weren't any read/write NTFS tools 
available for DOS or Linux).  In short, graceful degradation and the 
availability of recovery tools is important in selecting a filesystem, 
particularly when used on a desktop that may not have regular backups.



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Server Cloning With ZFS?

2009-06-19 Thread Dave Ringkor
Cindy, my question is about what "system specific info" is maintained that 
would need to be changed?  To take my example, my E450, "homer", has disks that 
are failing and it's a big clunky server anyway, and management wants to 
decommission it.  But we have an old 220R racked up doing nothing, and it's not 
scheduled for disposal.  

What would be wrong with this:
1) Create a recursive snapshot of the root pool on homer.
2) zfs send this snapshot to a file on some NFS server.
3) Boot my 220R (same architecture as the E450) into single user mode from a 
DVD.
4) Create a zpool on the 220R's local disks.
5) zfs receive the snapshot created in step 2 to the new pool.
6) Set the bootfs property.
7) Reboot the 220R.

Now my 220R comes up as "homer", with its IP address, users, root pool 
filesystems, any software that was installed in the old homer's root pool, etc.

Since ZFS filesystems don't care about the underlying disk structure -- they 
only care about the pool, and I've already created a pool for them on the 220R 
using the disks it has, there shouldn't be any storage-type "system specific 
into" to change, right?  And sure, the 220R might have a different number and 
speed of CPUs, and more or less RAM than the E450 had.  But when you upgrade a 
server in place you don't have to manually configure the CPUs or RAM, and how 
is this different?

The only thing I can think of that I might need to change, in order to bring up 
my 220R and have it "be" homer, is the network interfaces, from hme to bge or 
whatever.  And that's a simple config setting.

I don't care about Flash.  Actually, if you wanted to provision new servers 
based on a golden image like you can with Flash, couldn't you just take a 
recursive snapshot of a zpool as above, "receive" it in an empty zpool on 
another server, set your bootfs, and do a sys-unconfig?

So my big question is, with a server on ZFS root, what "system specific info" 
would still need to be changed?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Mobo SATA migration to AOC-SAT2-MV8 SATA card

2009-06-19 Thread Jeff Bonwick
Yep, right again.

Jeff

On Fri, Jun 19, 2009 at 04:21:42PM -0700, Simon Breden wrote:
> Hi,
> 
> I'm using 6 SATA ports from the motherboard but I've now run out of SATA 
> ports, and so I'm thinking of adding a Supermicro AOC-SAT2-MV8 8-port SATA 
> controller card.
> 
> What is the procedure for migrating the drives to this card?
> Is it a simple case of (1) issuing a 'zpool export pool_name' command, (2) 
> shutdown, (3) insert card and move all SATA cables for drives from mobo to 
> card, (4) boot and issue a 'zpool import pool_name' command ?
> 
> Thanks,
> Simon
> 
> http://breden.org.uk/2008/03/02/a-home-fileserver-using-zfs/
> -- 
> This message posted from opensolaris.org
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Replacing a failed drive

2009-06-19 Thread Jeff Bonwick
Yep, you got it.

Jeff

On Fri, Jun 19, 2009 at 04:15:41PM -0700, Simon Breden wrote:
> Hi,
> 
> I have a ZFS storage pool consisting of a single RAIDZ2 vdev of 6 drives, and 
> I have a question about replacing a failed drive, should it occur in future.
> 
> If a drive fails in this double-parity vdev, then am I correct in saying that 
> I would need to (1) unplug the old drive once I've identified the drive id 
> (c1t0d0 etc), (2) plug in the new drive on the same SATA cable, and (3) issue 
> a 'zpool replace pool_name drive_id' command etc, at which point ZFS will 
> resilver the new drive from the parity data ?
> 
> Thanks,
> Simon
> -- 
> This message posted from opensolaris.org
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Mobo SATA migration to AOC-SAT2-MV8 SATA card

2009-06-19 Thread Simon Breden
Hi,

I'm using 6 SATA ports from the motherboard but I've now run out of SATA ports, 
and so I'm thinking of adding a Supermicro AOC-SAT2-MV8 8-port SATA controller 
card.

What is the procedure for migrating the drives to this card?
Is it a simple case of (1) issuing a 'zpool export pool_name' command, (2) 
shutdown, (3) insert card and move all SATA cables for drives from mobo to 
card, (4) boot and issue a 'zpool import pool_name' command ?

Thanks,
Simon

http://breden.org.uk/2008/03/02/a-home-fileserver-using-zfs/
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Replacing a failed drive

2009-06-19 Thread Simon Breden
Hi,

I have a ZFS storage pool consisting of a single RAIDZ2 vdev of 6 drives, and I 
have a question about replacing a failed drive, should it occur in future.

If a drive fails in this double-parity vdev, then am I correct in saying that I 
would need to (1) unplug the old drive once I've identified the drive id 
(c1t0d0 etc), (2) plug in the new drive on the same SATA cable, and (3) issue a 
'zpool replace pool_name drive_id' command etc, at which point ZFS will 
resilver the new drive from the parity data ?

Thanks,
Simon
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] 7110 questions

2009-06-19 Thread lawrence ho
The Dell SAS controller probably have on-board write cache which helps with 
performance (write commit).

Based on my limited understanding, the 7110 does not have write cache on SAS 
controller.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] x4500 resilvering spare taking forever?

2009-06-19 Thread Tomas Ögren
On 19 June, 2009 - Joe Kearney sent me these 3,8K bytes:

> I've got a Thumper running snv_57 and a large ZFS pool.  I recently
> noticed a drive throwing some read errors, so I did the right thing
> and zfs replaced it with a spare.

Are you taking snapshots periodically? If so, you're using a build old
enough to restart resilver/scrub whenever a snapshot is taken.

There has also been some bug where 'zpool status' as root restarts
resilver/scrub as well. Try as non-root.

/Tomas
-- 
Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] recover data after zpool create

2009-06-19 Thread Kees Nuyt
On Fri, 19 Jun 2009 11:50:07 PDT, stephen bond
 wrote:

>Kees,
>
>is it possible to get at least the contents of /export/home ?
>
>that is supposedly a separate file system.

That doesn't mean that data is in one particular spot on the
disk. The blocks of the zfilesystems can be interspersed.

>is there a 
>way to look for files using some low level disk reading
>tool. If you are old enough to remember the 80s 
>there was stuff like PCTools that could read anywhere
>on the disk. 

I am old enough. I was the proud owner of a 20 MByte
harddisk back then (~1983).
Disks were so much smaller, you could practically scroll
most of the contents in a few hours.
The on disk data structures are much more complicated now.

>I need some text files, which should be 
>easy to recover.

You could read the device using dd and pipe it block by
block into some smart filter that skips blocks with
gibberish and saves anything that looks like text.
You can try to search blocks for typical phrases you know
are in the text and filter blocks on that property.
sed or awk or your friends.

>Are there any rules on how zfs structures itself?
>maybe the old file allocation table still exists
>and just needs to be restored.

You'll have to understand the internals, the on-disk format
is documented, but not easy to grasp.

zdb is the program you'd use to analyse the zpool.

>thank you very much
>Stephen

Good luck.
-- 
  (  Kees Nuyt
  )
c[_]
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] APPLE: ZFS need bug corrections instead of new func! Or?

2009-06-19 Thread Nicolas Williams
On Fri, Jun 19, 2009 at 04:09:29PM -0400, Miles Nordin wrote:
> Also, as I said elsewhere, there's a barrier controlled by Sun to
> getting bugs accepted.  This is a useful barrier: the bug database is
> a more useful drive toward improvement if it's not cluttered.  It also
> means, like I said, sometimes the mailing list is a more useful place
> for information.

There's two bug databases, sadly.  bugs.opensolaris.org is like you
describe, whereas defect.opensolaris.org is not.

Nico
-- 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] APPLE: ZFS need bug corrections instead of new func! Or?

2009-06-19 Thread Miles Nordin
> "th" == Tim Haley  writes:

th> The second is marked as a duplicate of 6784395, fixed in
th> snv_107, 20 weeks ago.

Yeah nice sleuthing. :/

I understood Bogdan's post was a trap: ``provide bug numbers.  Oh,
they're fixed?  nothing to see here then.  no bugs?  nothing to see
here then.''  But think about it.  Does this mean ZFS was not broken
before those bugs were filed?  It does not.  now, extrapolate: imagine
looking back on this day from the future.

In the next line of that post right below where I give the bug
numbers, I provide context explaining why I still think there's a
problem.

Also, as I said elsewhere, there's a barrier controlled by Sun to
getting bugs accepted.  This is a useful barrier: the bug database is
a more useful drive toward improvement if it's not cluttered.  It also
means, like I said, sometimes the mailing list is a more useful place
for information.

HTH.

I think a better question would be: what kind of tests would be most
promising for turning some subclass of these lost pools reported on
the mailing list into an actionable bug?

my first bet would be writing tools that test for ignored sync cache
commands leading to lost writes, and apply them to the case when iSCSI
targets are rebooted but the initiator isn't.

I think in the process of writing the tool you'll immediately bump
into a defect, because you'll realize there is no equivalent of a
'hard' iSCSI mount like there is in NFS.  and there cannot be a strict
equivalent to 'hard' mounts in iSCSI, because we want zpool redundancy
to preserve availability when an iSCSI target goes away.  I think the
whole model is wrong somehow.


pgpV1hu4gJbvT.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs on 32 bit?

2009-06-19 Thread Miles Nordin
> "fan" == Fajar A Nugraha  writes:
> "et" == Erik Trimble  writes:

   fan> The N610N that I have (BCM3302, 300MHz, 64MB) isn't even
   fan> powerful enough to saturate either the gigabit wired 

I can't find that device.  Did you misspell it or something?  BCM
probably means Broadcom, and Broadcom is probably MIPS---it's TI
(omap) and Marvell (orion) that are selling arm.

Anyway I don't think saturating gigabit is the minimum acceptable
performance considering the external storage people actually use right
now.

That said, ARM is interesting because the chips just recently got a
lot faster at the same power/price point, like >1GHz.  There are a
whole batch of new netbooks (i've been calling them HypeBooks because
they will probably fail) based on these new fast omap chips.  Also the
next Orion stepping is supposed to have crypto accel which makes a big
difference in AES per watt.  I will be trying ZFS crypto once it's
released, and my understanding from Linux dmcrypt users is, on
ordinary CPU's it's a serious bottleneck/powerhog.  Right now it makes
more sense to me to do the crypto on Linux iSCSI targets, where I can
do it on hardware-accel Via C7 (also 32-bit), and put several C7 chips
into one zpool since they are device-granularity.

The 64MB may be a show-stopper for ZFS on the whole ARM platform
though.  I brought it up because arm is a 32-bit platform.

et> a Sun 7110-style system shrunk down to a PCI-E controller -
et> you have a simple host-based control program, hook a disk (or
et> storage system) to the ARM HBA, and you could have a nice
et> little embedded ZFS system.

haha yeah!  Oxford 911 firewire-to-?ATA bridges already have an ARM
core inside them.

If such a thing is ever made, I hope it's not sold by Sun so that I
can demand CDDL source.  Otherwise it will probably be treated like
7000---people will be meant to buy the card to get access to a special
closed-source stable branch that has more bugfixes than sol10 but
fewer regressions than SXCE.

et> Either that, or if someone would figure out a way to have
et> multiple-chip ARM implementations (where they could spread out
et> the load efficiently).

yeah seriously though, this is a good chip.  it's interesting in the
same way SPARC is interesting---gate count per throughput, watts per
throughput.  The downside is that it doesn't have the stone-squeezing
high-end proprietary C compiler and fancy Java runtime with mature JIT
that Sun has for SPARC.  The upside is the price point is orders of
magnitude off the T2000 which means it can seep into all kinds of
weird fun markets.


pgpwqijFzafCo.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] APPLE: ZFS need bug corrections instead of new func! Or?

2009-06-19 Thread Tim Haley

Miles Nordin wrote:

"bmm" == Bogdan M Maryniuk  writes:


   bmm> OK, so what is the status of your bugreport about this?

That's a good question if it's meant genuinely, and not to be
obstructionist.  It's hard to report one bug with clear information
because the problem isn't well-isolated yet.

In my notes:  6565042, 6749630


The first of which is marked as fixed in snv_77, 19 months ago.

The second is marked as a duplicate of 6784395, fixed in snv_107, 20 weeks ago.

-tim


but as I said before, I've found the information on the mailing list
more useful w.r.t. this particular problem.

You can see how those bugs are about specific,
methodically-reproduceable problems.  Bugs are not ``I have been
losing more zpools than I lost UFS/vxfs filesystems on top of the same
storage platform.''  


It may take a while.




___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] APPLE: ZFS need bug corrections instead of new func! Or?

2009-06-19 Thread Miles Nordin
> "bmm" == Bogdan M Maryniuk  writes:

   bmm> OK, so what is the status of your bugreport about this?

That's a good question if it's meant genuinely, and not to be
obstructionist.  It's hard to report one bug with clear information
because the problem isn't well-isolated yet.

In my notes:  6565042, 6749630

but as I said before, I've found the information on the mailing list
more useful w.r.t. this particular problem.

You can see how those bugs are about specific,
methodically-reproduceable problems.  Bugs are not ``I have been
losing more zpools than I lost UFS/vxfs filesystems on top of the same
storage platform.''  

It may take a while.


pgpZSaAasbCG9.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] x4500 resilvering spare taking forever?

2009-06-19 Thread Joe Kearney
I've got a Thumper running snv_57 and a large ZFS pool.  I recently noticed a 
drive throwing some read errors, so I did the right thing and zfs replaced it 
with a spare.

Everything went well, but the resilvering process seems to be taking an 
eternity:

# zpool status
  pool: bigpool
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://www.sun.com/msg/ZFS-8000-9P
 scrub: resilver in progress, 4.66% done, 12h16m to go
config:

NAME  STATE READ WRITE CKSUM
bigpool   ONLINE   0 0 0
  raidz2  ONLINE   0 0 0
c4t4d0ONLINE   0 0 0
c7t4d0ONLINE   0 0 0
c6t4d0ONLINE   0 0 0
c1t4d0ONLINE   0 0 0
c0t4d0ONLINE   0 0 0
c4t0d0ONLINE   0 0 0
c7t0d0ONLINE   0 0 0
c6t0d0ONLINE   0 0 0
c1t0d0ONLINE   0 0 0
c0t0d0ONLINE   0 0 0
  raidz2  ONLINE   0 0 0
c5t5d0ONLINE   0 0 0
c4t5d0ONLINE   0 0 0
c7t5d0ONLINE   0 0 0
c6t5d0ONLINE   0 0 0
c1t5d0ONLINE   0 0 0
c0t5d0ONLINE   0 0 0

** Heres the resilver
   spare ONLINE   0 0 0
  c4t1d0  ONLINE  18 0 0
  c5t1d0  ONLINE   0 0 0
***

c7t1d0ONLINE   0 0 0
c6t1d0ONLINE   0 0 0
c1t1d0ONLINE   0 0 0
c0t1d0ONLINE   0 0 0
  raidz2  ONLINE   0 0 0
c5t6d0ONLINE   0 0 0
c4t6d0ONLINE   0 0 0
c7t6d0ONLINE   0 0 0
c6t6d0ONLINE   0 0 0
c1t6d0ONLINE   0 0 0
c0t6d0ONLINE   0 0 0
c4t2d0ONLINE   0 0 0
c7t2d0ONLINE   0 0 0
c6t2d0ONLINE   0 0 0
c1t2d0ONLINE   0 0 0
c0t2d0ONLINE   0 0 0
  raidz2  ONLINE   0 0 0
c5t7d0ONLINE   0 0 0
c4t7d0ONLINE   0 0 0
c7t7d0ONLINE   0 0 0
c6t7d0ONLINE   0 0 0
c1t7d0ONLINE   0 0 0
c0t7d0ONLINE   0 0 0
c5t3d0ONLINE   0 0 0
c4t3d0ONLINE   0 0 0
c7t3d0ONLINE   0 0 0
c6t3d0ONLINE   0 0 0
c1t3d0ONLINE   0 0 0
c0t3d0ONLINE   0 0 0
spares
  c5t1d0  INUSE currently in use
  c5t2d0  AVAIL   

Looks just fine except its been running for 3 days already!  These are 500gb 
drives.

Should I have removed the bad drive and just replaced it vs. trying to swap in 
a spare?  Is there some sort of contention issue because the spare and the 
original drive are still both up?

Not sure what to think here...
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] APPLE: ZFS need bug corrections instead of new func! Or?

2009-06-19 Thread Miles Nordin
> "ic" == Ian Collins  writes:

 >> Access to the bug database is controlled.

ic> No, the bug databse is open.

no, it isn't.  Not all the bugs are visible, and after submitting a
bug it has to be approved.  Neither is true of the mailing list.


pgpZrCTBzKBaa.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] recover data after zpool create

2009-06-19 Thread stephen bond
Kees,

is it possible to get at least the contents of /export/home ?

that is supposedly a separate file system. is there a way to look for files 
using some low level disk reading tool. If you are old enough to remember the 
80s there was stuff like PCTools that could read anywhere on the disk. I need 
some text files, which should be easy to recover. 
Are there any rules on how zfs structures itself? masybe the old file 
allocation table still exists and just needs to be restored.
thank you very much
Stephen
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Is the PROPERTY compression will increase the ZFS I/O th

2009-06-19 Thread Scott Meilicke
Generally, yes. Test it with your workload and see how it works out for you.

-Scott
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Is the PROPERTY compression will increase the ZFS I/O throughput?

2009-06-19 Thread Chookiex
Hi all.


Because the property compression could decrease the file size, and the file IO 
will be decreased also.
So, would it increase the ZFS I/O throughput with compression?

for example:
I turn on gzip-9,on a server with 2*4core Xeon, 8GB RAM.
It could compress my files with compressratio 2.5x+. could it be?
or I turn on lzjb, about 1.5x with the same files.

could it be? Is there anyone have a idea?

thanks 


  ___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS, ESX ,and NFS. oh my!

2009-06-19 Thread HUGE | David Stahl

I would think you would run into the same problem I have. Where you can't
view child zvols from a parent zvol nfs share.


> From: Scott Meilicke 
> Date: Fri, 19 Jun 2009 08:29:29 PDT
> To: 
> Subject: Re: [zfs-discuss] ZFS, ESX ,and NFS. oh my!
> 
> So how are folks getting around the NFS speed hit? Using SSD or battery backed
> RAM ZILs?
> 
> Regarding limited NFS mounts, underneath a single NFS mount, would it work to:
> 
> * Create a new VM
> * Remove the VM from inventory
> * Create a new ZFS file system underneath the original
> * Copy the VM to that file system
> * Add to inventory
> 
> At this point the VM is running underneath it's own file system. I don't know
> if ESX would see this?
> 
> To create another VM:
> 
> * Snap the original VM
> * Create a clone underneath the original NFS FS, along side the original VM
> ZFS.
> 
> Laborious to be sure.
> -- 
> This message posted from opensolaris.org
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS, ESX ,and NFS. oh my!

2009-06-19 Thread Scott Meilicke
So how are folks getting around the NFS speed hit? Using SSD or battery backed 
RAM ZILs?

Regarding limited NFS mounts, underneath a single NFS mount, would it work to:

* Create a new VM
* Remove the VM from inventory
* Create a new ZFS file system underneath the original
* Copy the VM to that file system
* Add to inventory

At this point the VM is running underneath it's own file system. I don't know 
if ESX would see this?

To create another VM:

* Snap the original VM
* Create a clone underneath the original NFS FS, along side the original VM ZFS.

Laborious to be sure.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] compression at zfs filesystem creation

2009-06-19 Thread Darren J Moffat

Bill Sommerfeld wrote:

On Wed, 2009-06-17 at 12:35 +0200, casper@sun.com wrote:
I still use "disk swap" because I have some bad experiences 
with ZFS swap.  (ZFS appears to cache and that is very wrong)


I'm experimenting with running zfs swap with the primarycache attribute
set to "metadata" instead of the default "all".  

aka: 

	zfs set primarycache=metadata rpool/swap 


seems like that would be more likely to behave appropriately.


Agreed, and for the "just incase" scenario secondarycache=none - but 
then again using an SSD as swap could be interesting


--
Darren J Moffat
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] compression at zfs filesystem creation

2009-06-19 Thread Bill Sommerfeld
On Wed, 2009-06-17 at 12:35 +0200, casper@sun.com wrote:
> I still use "disk swap" because I have some bad experiences 
> with ZFS swap.  (ZFS appears to cache and that is very wrong)

I'm experimenting with running zfs swap with the primarycache attribute
set to "metadata" instead of the default "all".  

aka: 

zfs set primarycache=metadata rpool/swap 

seems like that would be more likely to behave appropriately.

- Bill



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS, ESX ,and NFS. oh my!

2009-06-19 Thread Moore, Joe
Scott Meilicke wrote:
> Obviously iSCSI and NFS are quite different at the storage level, and I
> actually like NFS for the flexibility over iSCSI (quotas, reservations,
> etc.)

Another key difference between them is that with iSCSI, the VMFS filesystem 
(built on the zvol presented as a block device) never frees up unused disk 
space.

Once ESX has written to a block on that zvol, it will always be taking up space 
in your zpool, even if you delete the .vmdk file that contains it.  The zvol 
has no idea that the block is not used any more.

With NFS, ZFS is aware that the file is deleted, and can deallocate those 
blocks.

This would be less of an issue if we had deduplication on the zpool (have ESX 
write blocks of all-0 and those would be deduped down to a single block) or if 
there was some way (like the SSD TRIM command) for the VMFS filesystem to tell 
the block device that a block is no longer used.

--Joe
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Lots of metadata overhead on filesystems with 100M files

2009-06-19 Thread Roch Bourbonnais


Le 18 juin 09 à 20:23, Richard Elling a écrit :


Cor Beumer - Storage Solution Architect wrote:

Hi Jose,

Well it depends on the total size of your Zpool and how often these  
files are changed.


...and the average size of the files.  For small files, it is likely  
that the default
recordsize will not be optimal, for several reasons.  Are these  
small files?

-- richard


Hey Richard, I have to correct that. For small files & big files no  
need to tune the recordsize.
(files are stored as single perfectly adjusted records up to the  
dataset recordsize property).


Only for big files access and updated in aligned application record  
(RDBMS) does it help to tune the ZFS recordsize.


-r





I was at a customer an huge internet provider, who had 40x an X4500  
with Standard solaris and using ZFS.
All the machines were equiped with 48x 1TB disks. The machines were  
used to provide the email platform, so all
the user email accounts were on the system. This did mean also  
millions of files in one ZPOOL.


What they noticed on the the X4500 systems, that when the zpool  
became filled up for about 50-60% the performance of the system

did drop enormously.
They do claim this has to do with the fragmentation of the ZFS  
filesystem. So we did try over there putting an S7410 system in  
with about the same config on disks, 44x 1TB SATA BUT 4x 18GB  
WriteZilla (in a stripe) we were able to get much and much more i/ 
o's from the system the the comparable X4500, however they did put  
it in production for a couple of weeks, and as soon as the ZFS  
filesystem did come in the range of about 50-60% filling the did  
see the same problem.
The performance did drop down enormously. Netapps has the same  
problem with there Waffle filesystem, (they also tested this)  
however they do provide an Defragmentation tool for this. This is  
also NOT a nice solution, because you have to run this, manually or  
scheduled and it is taking a lot of system resources but it helps.


I did hear Sun is denying we do have this problem in ZFS, and  
therefore we don't need a kind of defragmentation mechanism,

however our customer experiences are different

May be it is good for the ZFS group to look at this (potential)  
problem.


The customer i am talking about is willing to share there  
experiences with Sun engineering.


greetings,

Cor Beumer


Jose Martins wrote:


Hello experts,

IHAC that wants to put more than 250 Million files on a single
mountpoint (in a directory tree with no more than 100 files on each
directory).

He wants to share such filesystem by NFS and mount it through
many Linux Debian clients

We are proposing a 7410 Openstore appliance...

He is claiming that certain operations like find, even if taken from
the Linux clients on such NFS mountpoint take significant more
time than if such NFS share was provided by other NAS providers
like NetApp...

Can someone confirm if this is really a problem for ZFS  
filesystems?...


Is there any way to tune it?...

We thank any input

Best regards

Jose





--
*Cor Beumer *
 Data Management & Storage

 *Sun Microsystems Nederland BV*
 Saturnus 1
 3824 ME Amersfoort The Netherlands
 Phone +31 33 451 5172
 Mobile +31 6 51 603 142
 Email cor.beu...@sun.com



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss




smime.p7s
Description: S/MIME cryptographic signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] zfs send of a cloned zvol

2009-06-19 Thread Maurilio Longo
Hi,

I'd like to understand a thing or two ... :)

I have a zpool on which I've created a zvol, then I've snapshotted the zvol and 
I've created a clone out of that snapshot.

Now, what happens if I do a 

zfs send mycl...@mysnap > myfile?

I mean, is this stream enough to recover the clone (does it contain a promoted 
zvol?) or do I have to have a stream for the zvol (from which the clone was 
created) as well? And if I do a zfs recv of the clone's stream, does it 'find' 
by itself the zvol from which it stemmed?

In  other words, will I be able to recover the clone, and the zvol it depends 
on, in some way having a zfs stream of them?

Best regards.

Maurilio.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Lots of metadata overhead on filesystems with 100M files

2009-06-19 Thread Rainer Orth
Richard Elling  writes:

> George would probably have the latest info, but there were a number of
> things which circled around the notorious "Stop looking and start ganging"
> bug report,
> http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6596237

Indeed: we were seriously bitten by this one, taking three Solaris 10
fileservers down for about a week until the problem was diagnosed by Sun
Service and an IDR provided.  Unfortunately, this issue (seriously
fragmented pools or pools beyond ca. 90% full cause file servers to grind
to a halt) were only announced/acknowledged publicly after our incident,
although the problem seems to have been reported almost two years ago.
While a fix has been integrated into snv_114, there's still no patch for
S10, only various IDRs.

It's unclear what the state of the related CR 4854312 (need to defragment
storage pool, submitted in 2003!) is.  I suppose this might be dealt with
by the vdev removal code, but overall it's scary that dealing with such
fundamental issues takes so long.

Rainer

-- 
-
Rainer Orth, Faculty of Technology, Bielefeld University
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Linux and OS 2009

2009-06-19 Thread Fajar A. Nugraha
On Thu, Jun 18, 2009 at 8:01 AM, Cesar Augusto Suarez wrote:
> I have Ubuntu jaunty already installed on my pc, on the second HD, i've
> installed OS2009
> Now, i cant share info between this 2 OS.
> I download and install ZFS-FUSE on jaunty, but the version is 6, instead in
> OS209 the ZFS version is 14 or something else.
> off course, thera are different versions.
> How can i share info between this 2 OS?

The best option IMHO is to use a dedicated pool to share data (not
rpool), and create this pool as v14 (zpool create -o version=14).

Another option :
http://groups.google.com/group/zfs-fuse/browse_frm/thread/e1ca406cfee57c03

-- 
Fajar
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss