Re: [zfs-discuss] x4500 vs AVS ?

2008-09-05 Thread Ralf Ramge
[EMAIL PROTECTED] wrote:

   War wounds?  Could you please expand on the why a bit more?



- ZFS is not aware of AVS. On the secondary node, you'll always have to 
force the `zfs import` due to the unnoticed changes of metadata (zpool 
in use). No mechanism to prevent data loss exists, e.g. zpools can be 
imported when the replicator is *not* in logging mode.

- AVS is not ZFS aware. For instance, if ZFS resilves a mirrored disk, 
e.g. after replacing a drive, the complete disk is sent over the network 
to the secondary node, even though the replicated data on the secondary 
is intact.
That's a lot of fun with today's disk sizes of 750 GB and 1 TB drives, 
resulting in usually 10+ hours without real redundancy (customers who 
use Thumpers to store important data usually don't have the budget to
connect their data centers with 10 Gbit/s, so expect 10+ hours *per disk*).

- ZFS  AVS  X4500 leads to a bad error handling. The Zpool may not be 
imported on the secondary node during the replication. The X4500 does 
not have a RAID controller which signals (and handles) drive faults. 
Drive failures on the secondary node may happen unnoticed until the 
primary nodes goes down and you want to import the zpool on the 
secondary node with the broken drive. Since ZFS doesn't offer a recovery 
mechanism like fsck, data loss of up to 20 TB may occur.
If you use AVS with ZFS, make sure that you have a storage which handles 
drive failures without OS interaction.

- 5 hours for scrubbing a 1 TB drive. If you're lucky. Up to 48 drives 
in total.

- An X4500 has no battery buffered write cache. ZFS uses the server's 
RAM as a cache, 15 GB+. I don't want to find out how much time a 
resilver over the network after a power outage may take (a full reverse 
replication would take up to 2 weeks and is no valid option in a serious 
production environment). But the underlying question I asked myself is 
why I should I want to replicate data in such an expensive way, when I 
think the 48 TB data itself are not important enough to be protected by 
a battery?


- I gave AVS a set of 6 drives just for the bitmaps (using SVM soft 
partitions). Weren't enough, the replication was still very slow, 
probably because of an insane amount of head movements, and scales
badly. Putting the bitmap of a drive on the drive itself (if I remember 
correctly, this is recommended in one of the most referenced howto blog 
articles) is a bad idea. Always use ZFS on whole disks, if performance 
and caching matters to you.

- AVS seems to require an additional shared storage when building 
failover clusters with 48 TB of internal storage. That may be hard to 
explain to the customer. But I'm not 100% sure about this, because I 
just didn't find a way, I didn't ask on a mailing list for help.


If you want a fail-over solution for important data, use the external 
JBODs. Use AVS only to mirror complete clusters, don't use it to 
replicate single boxes with local drives. And, in case OpenSolaris is 
not an option for you due to your company policies or support contracts, 
building a real cluster also A LOT cheaper.


-- 

Ralf Ramge
Senior Solaris Administrator, SCNA, SCSA

Tel. +49-721-91374-3963
[EMAIL PROTECTED] - http://web.de/

11 Internet AG
Brauerstraße 48
76135 Karlsruhe

Amtsgericht Montabaur HRB 6484

Vorstand: Henning Ahlert, Ralph Dommermuth, Matthias Ehrlich, Thomas 
Gottschlich, Matthias Greve, Robert Hoffmann, Markus Huhn, Oliver Mauss, 
Achim Weiss
Aufsichtsratsvorsitzender: Michael Scheeren
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] [install-discuss] Will OpenSolaris and Nevada co-exist in peace on the same root zpool

2008-09-05 Thread Johan Hartzenberg
Well, I want to give OpenSolaris a try, but have not yet worked up the
confidence to just try it.  So a few questions:

When I start the OpenSolaris installer, will it install into my existing
root zpool?
Which is called RPOOL. not rpool?
Without destroying my existing Nevada installations?
Or killing my existing Grub menu?
And will it be intelligent about my existing Live Upgrade BEs?
And other existing Shareable ZFS datasets (eg /export and /var/shared)

Related to this:
Can I have the same directory used for my home directory under both Nevada
and OpenSolaris?

I am guessing the answer is YMMV depending on the differences in versions
of, for example Firefox, Gnome, Thunderbird, etc, and based on how well
these cope with settings that was changed by another potentially newer
version of itself.

-- 

ZFS snapshots is your friend.  ZFS = LiveUpgrade: A match made in heaven.

Any sufficiently advanced technology is indistinguishable from magic.
Arthur C. Clarke
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Terabyte scrub

2008-09-05 Thread Marcelo Leal
You are right! Seeing the numbers i could not think very well ;-)
 What matters is the used size, and not the storage capacity! My fault...
 Thanks a lot for the answers.

 Leal.
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] send/receive statistics

2008-09-05 Thread Marcelo Leal
Thanks a lot for the answers!
 Relling did say something about checksum, i did ask to him about a more 
detailed explanation about it. Because i did not understand what checksum 
the receive part has to check, as the send can be redirected to a file on a 
disc or tape... 
 In the end, i think if we can import (receive) the snapshot, and that 
procedure ends fine, we are in good shape.
 Leal.
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] A question about recordsize...

2008-09-05 Thread Marcelo Leal
Hello!
 Assuming the default recordsize (FSB) in zfs is 128k, so:
 1 - If i have a file with 10k, the zfs  will allocate a FSD of 10k. Right? As 
zfs is not static like the other filesystems, i don´t have that old internal 
fragmentation...

 2 - If the above is right, i don´t need to adjust the recordsize (FSB) if i 
will handle a lot of tiny files. Right?

 3 - if the two above are right ones, so the tuning of the recordsize is just 
important for files greater than the FSB. Let´s say, 129k... but so, another 
question: If the file is 129k, the zfs will allocate one filesystem block of 
128k and another of... 1k! Right? Or two of 128k?

 4 - The last one... ;-)
  For the FSB allocation, how the zfs knows the file size, for know if the 
file is smaller than the FSB? Something related to the txg? When the write goes 
to the disk, the zfs knows (some way) if that write is a whole file or a piece 
of it?

 Thanks a lot!

 Leal.
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [install-discuss] Will OpenSolaris and Nevada co-exist in peace on the same root zpool

2008-09-05 Thread James Carlson
Johan Hartzenberg writes:
 I am guessing the answer is YMMV depending on the differences in versions
 of, for example Firefox, Gnome, Thunderbird, etc, and based on how well
 these cope with settings that was changed by another potentially newer
 version of itself.

The answers to your questions are basically all no.  The new
installer wants a primary partition or a whole disk.

However, there are helpful blogs from folks who've made the
transition.  Poor Ed seems to have a broken 'shift' key, but he gives
great details here:

  http://blogs.sun.com/edp/entry/moving_from_nevada_and_live

-- 
James Carlson, Solaris Networking  [EMAIL PROTECTED]
Sun Microsystems / 35 Network Drive71.232W   Vox +1 781 442 2084
MS UBUR02-212 / Burlington MA 01803-2757   42.496N   Fax +1 781 442 1677
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] x4500 vs AVS ?

2008-09-05 Thread Richard Elling
[jumping ahead and quoting myself]
AVS is not a mirroring technology, it is a remote replication technology.
So, yes, I agree 100% that people should not expect AVS to be a mirror.


Ralf Ramge wrote:
 [EMAIL PROTECTED] wrote:

   
   War wounds?  Could you please expand on the why a bit more?
 



 - ZFS is not aware of AVS. On the secondary node, you'll always have to 
 force the `zfs import` due to the unnoticed changes of metadata (zpool 
 in use). No mechanism to prevent data loss exists, e.g. zpools can be 
 imported when the replicator is *not* in logging mode.
   

ZFS isn't special in this regard, AFAIK all file systems, databases and
other data stores suffer from the same issue with remote replication.

 - AVS is not ZFS aware. For instance, if ZFS resilves a mirrored disk, 
 e.g. after replacing a drive, the complete disk is sent over the network 
 to the secondary node, even though the replicated data on the secondary 
 is intact.
 That's a lot of fun with today's disk sizes of 750 GB and 1 TB drives, 
 resulting in usually 10+ hours without real redundancy (customers who 
 use Thumpers to store important data usually don't have the budget to
 connect their data centers with 10 Gbit/s, so expect 10+ hours *per disk*).
   

ZFS only resilvers data.  Other LVMs, like SVM, will resilver the entire 
disk,
though.

 - ZFS  AVS  X4500 leads to a bad error handling. The Zpool may not be 
 imported on the secondary node during the replication. The X4500 does 
 not have a RAID controller which signals (and handles) drive faults. 
 Drive failures on the secondary node may happen unnoticed until the 
 primary nodes goes down and you want to import the zpool on the 
 secondary node with the broken drive. Since ZFS doesn't offer a recovery 
 mechanism like fsck, data loss of up to 20 TB may occur.
 If you use AVS with ZFS, make sure that you have a storage which handles 
 drive failures without OS interaction.
   

If this is the case, then array-based replication would also be similarly
affected by this architectural problem.  In other words, if you say that
a software RAID system cannot be replicated by a software replicator,
then TrueCopy, SRDF, and other RAID array-based (also software)
replicators also do not work.  I think there is enough empirical evidence
that they do work.  I can see where there might be a best practice here,
but I see no fundamental issue.

fsck does not recover data, it only recovers metadata.

 - 5 hours for scrubbing a 1 TB drive. If you're lucky. Up to 48 drives 
 in total.
   

ZFS only scrubs data.  But it is not unusual for a lot of data scrubbing to
take a long time.  ZFS only performs read scrubs, so there is no replication
required during a ZFS scrub, unless data is repaired.

 - An X4500 has no battery buffered write cache. ZFS uses the server's 
 RAM as a cache, 15 GB+. I don't want to find out how much time a 
 resilver over the network after a power outage may take (a full reverse 
 replication would take up to 2 weeks and is no valid option in a serious 
 production environment). But the underlying question I asked myself is 
 why I should I want to replicate data in such an expensive way, when I 
 think the 48 TB data itself are not important enough to be protected by 
 a battery?
   

ZFS will not be storing 15 GBytes of unflushed data on any system I can
imagine today.  While we can all agree that 48 TBytes will be painful to
replicate, that is not caused by ZFS -- though it is enabled by ZFS, because
some other file systems (UFS) cannot be as large as 48 TBytes.

 - I gave AVS a set of 6 drives just for the bitmaps (using SVM soft 
 partitions). Weren't enough, the replication was still very slow, 
 probably because of an insane amount of head movements, and scales
 badly. Putting the bitmap of a drive on the drive itself (if I remember 
 correctly, this is recommended in one of the most referenced howto blog 
 articles) is a bad idea. Always use ZFS on whole disks, if performance 
 and caching matters to you.
   

I think there are opportunities for perormance improvement, but don't
know who is currently actively working on this.

Actually, the cases where ZFS for whole disks is a big win are small.
And, of course, you can enable disk write caches by hand.

 - AVS seems to require an additional shared storage when building 
 failover clusters with 48 TB of internal storage. That may be hard to 
 explain to the customer. But I'm not 100% sure about this, because I 
 just didn't find a way, I didn't ask on a mailing list for help.


 If you want a fail-over solution for important data, use the external 
 JBODs. Use AVS only to mirror complete clusters, don't use it to 
 replicate single boxes with local drives. And, in case OpenSolaris is 
 not an option for you due to your company policies or support contracts, 
 building a real cluster also A LOT cheaper.
   

AVS is not a mirroring technology, it is a remote replication technology.
So, yes, I agree 

Re: [zfs-discuss] zfs metada corrupted

2008-09-05 Thread Richard Elling
LyeBeng Ong wrote:
 I made a bad judgment and  now my raidz pool is corrupted. I have a raidz 
 pool running on Opensolaris b85.  I wanted to try out freenas 0.7 and tried 
 to add my pool to freenas.
  
 After adding the zfs disk, vdev and pool.  I decided to back out and went 
 back to opensolaris. Now my raidz pool will not mount and got the following 
 errors.  Hope someone expert can help me recover from this error.
   

The symptoms are consistent with a repartitioning of the disk
event.  First check that the disks are labeled now as they were
originally.

When you run zdb -l, make sure you use the same devices as
before.  For example,
zdb -l /dev/rdsk/c2d0 
(see below) is not the same as:
zdb -l /dev/dsk/c2d0s0
which is where ZFS thinks the data should be.

Also, /dev/ad10 is something I don't recognize... what is it?
-- richard

 [EMAIL PROTECTED]:/dev/rdsk# zpool status
   pool: syspool
  state: ONLINE
  scrub: none requested
 config:

 NAMESTATE READ WRITE CKSUM
 syspool ONLINE   0 0 0
   c1d0s0ONLINE   0 0 0

 errors: No known data errors

   pool: tank
  state: FAULTED
 status: The pool metadata is corrupted and the pool cannot be opened.
 action: Destroy and re-create the pool from a backup source.
see: http://www.sun.com/msg/ZFS-8000-72
  scrub: none requested
 config:

 NAMESTATE READ WRITE CKSUM
 tankFAULTED  0 0 4  corrupted data
   raidz1ONLINE   0 0 4
 c2d0ONLINE   0 0 0
 c2d1ONLINE   0 0 0
 c3d0ONLINE   0 0 0
 c3d1ONLINE   0 0 0
 [EMAIL PROTECTED]:/dev/rdsk# 

 [EMAIL PROTECTED]:/dev/rdsk# zdb -vvv
 syspool
 version=10
 name='syspool'
 state=0
 txg=13
 pool_guid=7417064082496892875
 hostname='elatte_installcd'
 vdev_tree
 type='root'
 id=0
 guid=7417064082496892875
 children[0]
 type='disk'
 id=0
 guid=16996723219710622372
 path='/dev/dsk/c1d0s0'
 devid='id1,[EMAIL PROTECTED]/a'
 phys_path='/[EMAIL PROTECTED],0/[EMAIL PROTECTED]/[EMAIL 
 PROTECTED]/[EMAIL PROTECTED],0:a'
 whole_disk=0
 metaslab_array=14
 metaslab_shift=30
 ashift=9
 asize=158882856960
 is_log=0
 tank
 version=10
 name='tank'
 state=0
 txg=9305484
 pool_guid=6165551123815947851
 hostname='cempedak'
 vdev_tree
 type='root'
 id=0
 guid=6165551123815947851
 children[0]
 type='raidz'
 id=0
 guid=18029757455913565148
 nparity=1
 metaslab_array=14
 metaslab_shift=33
 ashift=9
 asize=1280228458496
 is_log=0
 children[0]
 type='disk'
 id=0
 guid=14740261559114907785
 path='/dev/dsk/c2d0s0'
 devid='id1,[EMAIL PROTECTED]/a'
 phys_path='/[EMAIL PROTECTED],0/pci10de,[EMAIL 
 PROTECTED]/[EMAIL PROTECTED]/[EMAIL PROTECTED]/[EMAIL PROTECTED],0:a'
 whole_disk=1
 children[1]
 type='disk'
 id=1
 guid=7618479640615121644
 path='/dev/dsk/c2d1s0'
 devid='id1,[EMAIL PROTECTED]/a'
 phys_path='/[EMAIL PROTECTED],0/pci10de,[EMAIL 
 PROTECTED]/[EMAIL PROTECTED]/[EMAIL PROTECTED]/[EMAIL PROTECTED],0:a'
 whole_disk=1
 children[2]
 type='disk'
 id=2
 guid=1801493855297946488
 path='/dev/dsk/c3d0s0'
 devid='id1,[EMAIL PROTECTED]/a'
 phys_path='/[EMAIL PROTECTED],0/pci10de,[EMAIL 
 PROTECTED]/[EMAIL PROTECTED]/[EMAIL PROTECTED]/[EMAIL PROTECTED],0:a'
 whole_disk=1
 children[3]
 type='disk'
 id=3
 guid=15710901655082836445
 path='/dev/dsk/c3d1s0'
 devid='id1,[EMAIL PROTECTED]/a'
 phys_path='/[EMAIL PROTECTED],0/pci10de,[EMAIL 
 PROTECTED]/[EMAIL PROTECTED]/[EMAIL PROTECTED]/[EMAIL PROTECTED],0:a'
 whole_disk=1
 [EMAIL PROTECTED]:/dev/rdsk# 

 [EMAIL PROTECTED]:/dev/rdsk# zdb -l /dev/rdsk/c2d0
 
 LABEL 0
 
 version=6
 name='tank'
 

[zfs-discuss] resilver speed.

2008-09-05 Thread Chris Gerhard
Is there any way to control the resliver speed?  Having attached a third disk 
to a mirror (so I can replace the other disks with larger ones) the resilver 
goes at a fraction of the speed of the same operation using disk suite. However 
it still renders the system pretty much unusable for anything else.

So I would like to control the rate of the resilver.  Either slow it down a lot 
so that the system is still usable or tell it to go as fast as possible to get 
it overwith.

Also does the resilver deliberately pause?  Running iostat I see that it will 
pause for five to ten seconds where no IO is done at all, then it continues on 
at a more reasonable pace.
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs metada corrupted

2008-09-05 Thread Karl Pielorz


--On 05 September 2008 07:37 -0700 Richard Elling [EMAIL PROTECTED] 
wrote:

 Also, /dev/ad10 is something I don't recognize... what is it?
 -- richard

'/dev/ad10' is a FreeBSD disk device, which would kind of be fitting, as:

LyeBeng Ong wrote:
 I made a bad judgment and  now my raidz pool is corrupted. I have a raidz
 pool running on Opensolaris b85.  I wanted to try out freenas 0.7 and
 tried to add my pool to freenas.

FreeNAS is FreeBSD based...

-Kp
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs metada corrupted

2008-09-05 Thread Richard Elling
Karl Pielorz wrote:
 --On 05 September 2008 07:37 -0700 Richard Elling [EMAIL PROTECTED] 
 wrote:

   
 Also, /dev/ad10 is something I don't recognize... what is it?
 -- richard
 

 '/dev/ad10' is a FreeBSD disk device, which would kind of be fitting, as:

 LyeBeng Ong wrote:
   
 I made a bad judgment and  now my raidz pool is corrupted. I have a raidz
 pool running on Opensolaris b85.  I wanted to try out freenas 0.7 and
 tried to add my pool to freenas.
 

 FreeNAS is FreeBSD based...
   

Ah, ok, so perhaps an export and import would clear the cobwebs?
What happens when you try to import?
 -- richard

 -Kp
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
   

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS in Solaris 10 5/08

2008-09-05 Thread Kenny
All,

I realize this is an Open Solaris forum but I need help.  I'm getting 
conflicting information that ZFS in Solaris 5/08 release does support gzip 
compression.  However when I run a zpool upgrade -v command it reports version 
4 and this doesn't gzip... yes??

Is there a way to get gzip compression (gzip-9) enabled in zfs on Solaris 10 
5/08??

Thanks in advance for the discussion.

---Kenny
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] resilver speed.

2008-09-05 Thread Richard Elling
Chris Gerhard wrote:
 Is there any way to control the resliver speed?  Having attached a third disk 
 to a mirror (so I can replace the other disks with larger ones) the resilver 
 goes at a fraction of the speed of the same operation using disk suite. 
 However it still renders the system pretty much unusable for anything else.
   

Resilvers work at low priority in the ZFS scheduler.  In general, they 
work at
the media speed of the disk being resilvered.  However, anecdotal evidence
suggests that this may be impacted by the number and extent of snapshots.
I have a lot of characterization data for resilvers, but without varying the
scope and number of snapshots (which is a hard thing to identify).

ZFS resilvers in time sequence, not by disk block location, so there are
many more variables at play here than might be immediately obvious.

 So I would like to control the rate of the resilver.  Either slow it down a 
 lot so that the system is still usable or tell it to go as fast as possible 
 to get it overwith.
   

There are two competing RFEs for this:
http://bugs.opensolaris.org/view_bug.do?bug_id=6592835
http://bugs.opensolaris.org/view_bug.do?bug_id=6494473

 Also does the resilver deliberately pause?  Running iostat I see that it will 
 pause for five to ten seconds where no IO is done at all, then it continues 
 on at a more reasonable pace.
   

I have not seen such behaviour during resilver characterization.
Which OS release are you using?

Also, are you using IDE disks or disks which do not handle multiple
outstanding operations?

You may also be seeing
http://bugs.opensolaris.org/view_bug.do?bug_id=6729696
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS in Solaris 10 5/08

2008-09-05 Thread Richard Elling
Kenny wrote:
 All,

 I realize this is an Open Solaris forum but I need help.  I'm getting 
 conflicting information that ZFS in Solaris 5/08 release does support gzip 
 compression.  However when I run a zpool upgrade -v command it reports 
 version 4 and this doesn't gzip... yes??
   

correct.  gzip arrives with zpool version 5.

 Is there a way to get gzip compression (gzip-9) enabled in zfs on Solaris 10 
 5/08??
   

Not today, AFAIK.  This will appear as a patch for Solaris 10 5/08
(aka Solaris 10 update 5) and should be in Solaris 10 update 6.  But
I do not know the schedule beyond later this year or real soon now.
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] resilver speed.

2008-09-05 Thread Chris Gerhard

Thanks

Richard Elling wrote:


Also, are you using IDE disks or disks which do not handle multiple
outstanding operations?


SATA with the cmdk driver which is only sending 2 commands at a time.



You may also be seeing
http://bugs.opensolaris.org/view_bug.do?bug_id=6729696


that could well be the case.  Fortunately none of the users would know 
how to run the sync command.




--
Chris Gerhard. __o __o __o
Systems TSC Chief Technologist_`\,`\,`\,_
Sun Microsystems Limited (*)/---/---/ (*)
Phone: +44 (0) 1252 426033 (ext 26033) http://blogs.sun.com/chrisg


smime.p7s
Description: S/MIME Cryptographic Signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ?: any effort for snapshot management

2008-09-05 Thread Steffen Weiberle
I have seen Tim Foster's auto-snapshot and it looks interesting.

Is there a bug id or effort to deliver snapshot policy and space 
management framework? Not looking for a GUI, although a CLI based UI 
might be helpful. Customer needs something that allows the use of 
snapshots on 100s of systems, and minimizes the administration to handle 
disks filling up.

I imagine a component is a time or condition based auto-delete of older 
snopshot(s).

Thanks
Steffen
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Snapshots during a scrub

2008-09-05 Thread mike
I have a weekly scrub setup, and I've seen at least once now where it
says don't snapshot while scrubbing

Is this a data integrity issue, or will it make one or both of the
processes take longer?

Thanks
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Snapshots during a scrub

2008-09-05 Thread Mark Shellenbaum
mike wrote:
 I have a weekly scrub setup, and I've seen at least once now where it
 says don't snapshot while scrubbing
 
 Is this a data integrity issue, or will it make one or both of the
 processes take longer?
 
 Thank



That problem has been fixed in build 94.

Here is the bug that people have been referring to:

6343667 scrub/resilver has to start over when a snapshot is taken

   -Mark

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Snapshots during a scrub

2008-09-05 Thread Richard Elling
mike wrote:
 I have a weekly scrub setup, and I've seen at least once now where it
 says don't snapshot while scrubbing

 Is this a data integrity issue, or will it make one or both of the
 processes take longer?
   

The problem prior to NV b94 is that a snaphot would restart a scrub.
This has been fixed with:
http://bugs.opensolaris.org/view_bug.do?bug_id=6343667
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ?: any effort for snapshot management

2008-09-05 Thread Erast Benson
Steffen,

Most complete and serious ZFS snapshot management, integrated ZFS
send/recv replication over RSYNC with CLI, integrated AVS, GUI and
management server which provides rich API for C/C++/Perl/Python/Ruby
integrators available here:

http://www.nexenta.com/nexentastor-overview

Its ZFS+ with a lot of reliability fixes. Enterprise quality, production
ready solution.

Demo of of advanced CLI usage is here:

http://www.nexenta.com/demos/automated-snapshots.html 
http://www.nexenta.com/demos/auto-tier-basic.html

As a side not, I think that dis-integrated general-purpose scripting
which is available on the Internet simply can not provide production
quality and easy of use.

On Fri, 2008-09-05 at 13:14 -0400, Steffen Weiberle wrote:
 I have seen Tim Foster's auto-snapshot and it looks interesting.
 
 Is there a bug id or effort to deliver snapshot policy and space 
 management framework? Not looking for a GUI, although a CLI based UI 
 might be helpful. Customer needs something that allows the use of 
 snapshots on 100s of systems, and minimizes the administration to handle 
 disks filling up.
 
 I imagine a component is a time or condition based auto-delete of older 
 snopshot(s).
 
 Thanks
 Steffen
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
 

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Snapshots during a scrub

2008-09-05 Thread mike
Okay, well I am running snv_94 already. So I guess I'm good :)

On Fri, Sep 5, 2008 at 10:23 AM, Mark Shellenbaum
[EMAIL PROTECTED] wrote:
 mike wrote:

 I have a weekly scrub setup, and I've seen at least once now where it
 says don't snapshot while scrubbing

 Is this a data integrity issue, or will it make one or both of the
 processes take longer?

 Thank



 That problem has been fixed in build 94.

 Here is the bug that people have been referring to:

 6343667 scrub/resilver has to start over when a snapshot is taken

  -Mark


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Error: value too large for defined data type

2008-09-05 Thread Paul Raines

I am having a very odd problem on one of our ZFS filesystems

On certain files, when accessed on the Solaris server itself locally
where the zfs fs sits, we get an error like the following:

[EMAIL PROTECTED] # ls -l
./README: Value too large for defined data type
total 36
-rw-r-   1 mreuter  mreuter 1019 Sep 25  2006 Makefile
-rw-r-   1 mreuter  mreuter 3185 Feb 22  2000 lcompgre.cc
-rw-r-   1 mreuter  mreuter 3238 Feb 22  2000 lcompgsh.cc
-rw-r-   1 mreuter  mreuter 2485 Feb 22  2000 lcompreg.cc
-rw-r-   1 mreuter  mreuter 2774 Feb 22  2000 lcompshf.cc

The odd thing is that when the filesystem is accessed from our
Linux boxes over NFS, there is no error access the same file


vader:complex[84] ls -l
total 24
drwxr-x---+ 2 mreuter mreuter8 Sep 25  2006 .
drwxr-x---+ 5 mreuter mreuter5 Mar 31  1997 ..
-rw-r-+ 1 mreuter mreuter 3185 Feb 22  2000 lcompgre.cc
-rw-r-+ 1 mreuter mreuter 3238 Feb 22  2000 lcompgsh.cc
-rw-r-+ 1 mreuter mreuter 2485 Feb 22  2000 lcompreg.cc
-rw-r-+ 1 mreuter mreuter 2774 Feb 22  2000 lcompshf.cc
-rw-r-+ 1 mreuter mreuter 1019 Sep 25  2006 Makefile
-rw-r-+ 1 mreuter mreuter 1435 Jan  4  1945 README
vader:mreuter:complex[85] wc README
   40  181 1435 README

The file is obvious small so this is not a large file problem.

Anyone have an idea what gives?


-- 
---
Paul Rainesemail: raines at nmr.mgh.harvard.edu
MGH/MIT/HMS Athinoula A. Martinos Center for Biomedical Imaging
149 (2301) 13th Street Charlestown, MA 02129USA


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ?: any effort for snapshot management

2008-09-05 Thread Ross
I think I can answer those questions.

Firstly, Tim is working with some of Sun's desktop guys to put together a GUI 
for his stuff.  I don't know how long it will be, but I'd guess that you'll see 
that as a full part of OpenSolaris sometime this year.

Regarding snapshot management, I believe there are two types of filesystem 
quota in ZFS now.  The original quotas include snapshot space, and you can also 
create a quota that applies to the main filesystem only.  I don't know the 
details of which settings they are I'm afraid, I just remember reading that the 
two types of quota had been implemented.

I would have thought those quotas, plus the auto-delete ability of Tim's 
snapshot tools should fulfil most needs.  

Ross
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZIL NVRAM partitioning?

2008-09-05 Thread Narayan Venkat
I understand that if you want to use ZIL, then the requirement is one or more 
ZILs per pool. 
With an SSD you can partition the disk to allow usage of a single disk for 
multiple ZILs.  Can we do the same thing with an PCIe-based NVRAM card (like 
http://www.vmetro.com/category4304.html)?

Thanks

Narayan
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] zfs cksum errors

2008-09-05 Thread Seymour Krebs
I am having trouble with zfs, if I scrub the pool, i get cksum errors.

I f I scrub, zpool clear rpool and then re-scrub, the cksum errors remain.

this appears to be a systematic error and not hardware related.

I am also having trouble with beadm create. please see:

http://www.opensolaris.org/jive/thread.jspa?threadID=71960tstart=0 


# zpool status -v
  pool: rpool
 state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
entire pool from backup.
   see: http://www.sun.com/msg/ZFS-8000-8A
 scrub: resilver in progress for 0h13m, 93.11% done, 0h0m to go
config:

NAMESTATE READ WRITE CKSUM
rpool   ONLINE   0 0 2
  mirrorONLINE   0 0 2
c6d0s0  ONLINE   0 0 4
c7d0s2  ONLINE   0 0 4

errors: Permanent errors have been detected in the following files:

  metadata:  0x0 (note had to delete leading carets as this forum stripped 
the whole line otherwise)
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Error: value too large for defined data type

2008-09-05 Thread A Darren Dunham
On Fri, Sep 05, 2008 at 03:17:44PM -0400, Paul Raines wrote:
 [EMAIL PROTECTED] # ls -l
 ./README: Value too large for defined data type
 total 36
 -rw-r-   1 mreuter  mreuter 1019 Sep 25  2006 Makefile
 -rw-r-   1 mreuter  mreuter 3185 Feb 22  2000 lcompgre.cc
 -rw-r-   1 mreuter  mreuter 3238 Feb 22  2000 lcompgsh.cc
 -rw-r-   1 mreuter  mreuter 2485 Feb 22  2000 lcompreg.cc
 -rw-r-   1 mreuter  mreuter 2774 Feb 22  2000 lcompshf.cc

 -rw-r-+ 1 mreuter mreuter 1435 Jan  4  1945 README
 vader:mreuter:complex[85] wc README
40  181 1435 README
 
 The file is obvious small so this is not a large file problem.

Probably the date.

I don't think 'ls' is isaexec-wrapped by default.  You might try running
the 64-bit version of ls.

-- 
Darren
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] resilver speed.

2008-09-05 Thread Bill Sommerfeld
On Fri, 2008-09-05 at 09:41 -0700, Richard Elling wrote:
  Also does the resilver deliberately pause?  Running iostat I see
 that it will pause for five to ten seconds where no IO is done at all,
 then it continues on at a more reasonable pace.

 I have not seen such behaviour during resilver characterization.

I have, post nv_94, and I filed a bug:

6729696 sync causes scrub or resilver to pause for up to 30s


- Bill


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Error: value too large for defined data type

2008-09-05 Thread Kyle McDonald
Paul Raines wrote:
 I am having a very odd problem on one of our ZFS filesystems

 On certain files, when accessed on the Solaris server itself locally
 where the zfs fs sits, we get an error like the following:

 [EMAIL PROTECTED] # ls -l
 ./README: Value too large for defined data type
 total 36
 -rw-r-   1 mreuter  mreuter 1019 Sep 25  2006 Makefile
 -rw-r-   1 mreuter  mreuter 3185 Feb 22  2000 lcompgre.cc
 -rw-r-   1 mreuter  mreuter 3238 Feb 22  2000 lcompgsh.cc
 -rw-r-   1 mreuter  mreuter 2485 Feb 22  2000 lcompreg.cc
 -rw-r-   1 mreuter  mreuter 2774 Feb 22  2000 lcompshf.cc

   
Do you by chance have /usr/gnu/bin, or any directory with a Gnu 'ls' in 
your path before /usr/bin?
(what does 'which ls' show?)

I've seen this with Gnu ls that I have compiled myself as far back as 
Solaris 9 mayber earlier. By default Gnu ls compiled on solaris doesn't 
know how to handle latgr files (and therefore probably 64bit dates either.)

When I've seen this, explicitly running /usr/bin/ls -l worked fine, and 
I suspect it will for you too.

   -Kyle

 The odd thing is that when the filesystem is accessed from our
 Linux boxes over NFS, there is no error access the same file


 vader:complex[84] ls -l
 total 24
 drwxr-x---+ 2 mreuter mreuter8 Sep 25  2006 .
 drwxr-x---+ 5 mreuter mreuter5 Mar 31  1997 ..
 -rw-r-+ 1 mreuter mreuter 3185 Feb 22  2000 lcompgre.cc
 -rw-r-+ 1 mreuter mreuter 3238 Feb 22  2000 lcompgsh.cc
 -rw-r-+ 1 mreuter mreuter 2485 Feb 22  2000 lcompreg.cc
 -rw-r-+ 1 mreuter mreuter 2774 Feb 22  2000 lcompshf.cc
 -rw-r-+ 1 mreuter mreuter 1019 Sep 25  2006 Makefile
 -rw-r-+ 1 mreuter mreuter 1435 Jan  4  1945 README
 vader:mreuter:complex[85] wc README
40  181 1435 README

 The file is obvious small so this is not a large file problem.

 Anyone have an idea what gives?


   

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZIL NVRAM partitioning?

2008-09-05 Thread Neil Perrin
On 09/05/08 14:42, Narayan Venkat wrote:
 I understand that if you want to use ZIL, then the requirement is one or more 
 ZILs per pool.

A little clarification of ZFS terms may help here. The term ZIL is somewhat
overloaded. I think what you mean here is a separate log device (slog), because 
intent
logs are always present in ZFS. Without a slog, the logs are present in the 
main pool.
There is one log per file system and it allocates blocks in the main pool to 
form a chain.
When a slog is defined, then it can be made up of multiple devices (in which 
case the 
writes are striped across the devices) or it can be in the form on a N way
mirror - to provide redundancy.
 
 With an SSD you can partition the disk to allow usage of a single disk for 
 multiple ZILs
 Can we do the same thing with an PCIe-based NVRAM card
 (like http://www.vmetro.com/category4304.html)?

I don't think there's a Solaris supported driver for that device.
However, any Solaris device, whether a partition or not, will work
with ZFS provided it's at least 64MB. It's performance is another matter.

 
 Thanks 
 Narayan
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZIL NVRAM partitioning?

2008-09-05 Thread Narayan Venkat
Thanks Neil for the clarification.  

Regards,

Narayan
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss