Re: [zfs-discuss] ZFS on Dell with FreeBSD

2011-10-19 Thread Fajar A. Nugraha
On Thu, Oct 20, 2011 at 7:56 AM, Dave Pooser  wrote:
> On 10/19/11 9:14 AM, "Albert Shih"  wrote:
>
>>When we buy a MD1200 we need a RAID PERC H800 card on the server
>
> No, you need a card that includes 2 external x4 SFF8088 SAS connectors.
> I'd recommend an LSI SAS 9200-8e HBA flashed with the IT firmware-- then
> it presents the individual disks and ZFS can handle redundancy and
> recovery.

Exactly, thanks for suggesting an exact controller model that can
present disks as JBOD.

With hardware RAID, you'd pretty much rely on the controller to behave
nicely, which is why I suggested to simply create one big volume for
zfs to use (so you pretty much only use features like snapshot,
clones, etc, but don't use zfs self healing feature). Again, others
might (and have) disagree and suggest using volumes for individual
disk (even when you're still relying on hardware RAID controller). But
ultimately there's no question that the best possible setup would be
to present the disks as JBOD and let zfs handle it directly.

-- 
Fajar
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS on Dell with FreeBSD

2011-10-19 Thread Rocky Shek
I also recommend LSI 9200-8E or new 9205-8E with the IT firmware based on
past experience

Also LSI Original HBA normally released FW earlier than OEM.

Plus, most of users in community use LSI HBA. 

Rocky 

-Original Message-
From: zfs-discuss-boun...@opensolaris.org
[mailto:zfs-discuss-boun...@opensolaris.org] On Behalf Of Dave Pooser
Sent: Wednesday, October 19, 2011 5:56 PM
To: freebsd-questi...@freebsd.org; zfs-discuss@opensolaris.org
Subject: Re: [zfs-discuss] ZFS on Dell with FreeBSD

On 10/19/11 9:14 AM, "Albert Shih"  wrote:

>When we buy a MD1200 we need a RAID PERC H800 card on the server

No, you need a card that includes 2 external x4 SFF8088 SAS connectors.
I'd recommend an LSI SAS 9200-8e HBA flashed with the IT firmware-- then
it presents the individual disks and ZFS can handle redundancy and
recovery.
-- 
Dave Pooser
Manager of Information Services
Alford Media  http://www.alfordmedia.com


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS on Dell with FreeBSD

2011-10-19 Thread Dave Pooser
On 10/19/11 9:14 AM, "Albert Shih"  wrote:

>When we buy a MD1200 we need a RAID PERC H800 card on the server

No, you need a card that includes 2 external x4 SFF8088 SAS connectors.
I'd recommend an LSI SAS 9200-8e HBA flashed with the IT firmware-- then
it presents the individual disks and ZFS can handle redundancy and
recovery.
-- 
Dave Pooser
Manager of Information Services
Alford Media  http://www.alfordmedia.com


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] commercial zfs-based storage replication software?

2011-10-19 Thread Jim Klimov

2011-10-19 17:54, Fajar A. Nugraha пишет:

On Wed, Oct 19, 2011 at 7:52 PM, Jim Klimov  wrote:
Well, just for the sake of completeness: most of our systems are 
using zfs-auto-snap service, including Solaris 10 systems datiing 
from Sol10u6. Installation of relevant packages from SXCE (ranging 
snv_117-snv_130) was trivial, but some script-patching was in order. 
I think, replacement of the ksh interpreter to ksh93. 

Yes, I remembered reading about that.



Actually, I revised the systems: the scripts are kept in original form, 
but those sol10 servers where ksh93 was absent, got a symlink:


/usr/bin/ksh93 -> ../dt/bin/dtksh

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] about btrfs and zfs

2011-10-19 Thread Pawel Jakub Dawidek
On Wed, Oct 19, 2011 at 10:13:56AM -0400, David Magda wrote:
> On Wed, October 19, 2011 08:15, Pawel Jakub Dawidek wrote:
> 
> > Fsck can only fix known file system inconsistencies in file system
> > structures. Because there is no atomicity of operations in UFS and other
> > file systems it is possible that when you remove a file, your system can
> > crash between removing directory entry and freeing inode or blocks.
> > This is expected with UFS, that's why there is fsck to verify that no
> > such thing happend.
> 
> Slightly OT, but this non-atomic delay between meta-data updates and
> writes to the disk is exploited by "soft updates" with FreeBSD's UFS:
> 
> http://www.freebsd.org/doc/en/books/handbook/configtuning-disk.html#SOFT-UPDATES
> 
> It may be of some interest to the file system geeks on the list.

Well, soft-updates thanks to careful ordering of operation allow to
mount file system even in inconsistent state and run fsck in background,
as the only inconsistencies are resource leaks - directory entry will
never point at unallocated inode and an inode will never point at
unallocated block, etc. This is still not atomic.

With recent versions of FreeBSD, soft-updates were extended to journal
those resource leaks, so background fsck is not needed anymore.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://yomoli.com


pgp1e542EIuks.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] about btrfs and zfs

2011-10-19 Thread Nico Williams
On Wed, Oct 19, 2011 at 7:24 AM, Garrett D'Amore
 wrote:
> I'd argue that from a *developer* point of view, an fsck tool for ZFS might 
> well be useful.  Isn't that what zdb is for? :-)
>
> But ordinary administrative users should never need something like this, 
> unless they have encountered a bug in ZFS itself.  (And bugs are as likely to 
> exist in the checker tool as in the filesystem. ;-)

zdb can be useful for admins -- say, to gather stats not reported by
the system, to explore the fs/vol layout, for educational purposes,
and so on.

Nico
--
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS on Dell with FreeBSD

2011-10-19 Thread Krunal Desai
On Wed, Oct 19, 2011 at 10:14 AM, Albert Shih  wrote:
> When we buy a MD1200 we need a RAID PERC H800 card on the server so we have
> two options :
>
>        1/ create a LV on the PERC H800 so the server see one volume and put
>        the zpool on this unique volume and let the hardware manage the
>        raid.
>
>        2/ create 12 LV on the perc H800 (so without raid) and let FreeBSD
>        and ZFS manage the raid.
>
> which one is the best solution ?
>
> Any advise about the RAM I need on the server (actually one MD1200 so 12x2To 
> disk)

I know the PERC H200 can be flashed with IT firmware, making it in
effect a "dumb" HBA perfect for ZFS usage. Perhaps the H800 has the
same? (If not, can you get the machine configured with a H200?)

If that's not an option, I think Option 2 will work. My first ZFS
server ran on a PERC 5/i, and I was forced to make 8 single-drive RAID
0s in the PERC Option ROM, but Solaris did not seem to mind that.

--khd
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS on Dell with FreeBSD

2011-10-19 Thread Jorge Medina
On Wed, Oct 19, 2011 at 11:14 AM, Albert Shih  wrote:
> Hi
>
> Sorry to cross-posting. I don't knwon which mailing-list I should post this
> message.
>
> I'll would like to use FreeBSD with ZFS on some Dell server with some
> MD1200 (classique DAS).
>
> When we buy a MD1200 we need a RAID PERC H800 card on the server so we have
> two options :
>
>        1/ create a LV on the PERC H800 so the server see one volume and put
>        the zpool on this unique volume and let the hardware manage the
>        raid.
>
>        2/ create 12 LV on the perc H800 (so without raid) and let FreeBSD
>        and ZFS manage the raid.
>
> which one is the best solution ?
>
> Any advise about the RAM I need on the server (actually one MD1200 so 12x2To 
> disk)
>
> Regards.

for ZFS approach the second option in my opinion is better.

> JAS
> --
> Albert SHIH
> DIO batiment 15
> Observatoire de Paris
> 5 Place Jules Janssen
> 92195 Meudon Cedex
> Téléphone : 01 45 07 76 26/06 86 69 95 71
> Heure local/Local time:
> mer 19 oct 2011 16:11:40 CEST
> ___
> freebsd-questi...@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-questions
> To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"
>



-- 
Jorge Andrés Medina Oliva.
Computer engineer.
IT consultant
http://www.bsdchile.cl
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] repair [was: about btrfs and zfs]

2011-10-19 Thread Garrett D'Amore

On Oct 19, 2011, at 1:52 PM, Richard Elling wrote:

> On Oct 18, 2011, at 5:21 PM, Edward Ned Harvey wrote:
> 
>>> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
>>> boun...@opensolaris.org] On Behalf Of Tim Cook
>>> 
>>> I had and have redundant storage, it has *NEVER* automatically fixed
>>> it.  You're the first person I've heard that has had it automatically fix
>> it.
>> 
>> That's probably just because it's normal and expected behavior to
>> automatically fix it - I always have redundancy, and every cksum error I
>> ever find is always automatically fixed.  I never tell anyone here because
>> it's normal and expected.
> 
> Yes, and in fact the automated tests for ZFS developers intentionally 
> corrupts data
> so that the repair code can be tested. Also, the same checksum code is used 
> to 
> calculate the checksum when writing and reading.
> 
>> If you have redundancy, and cksum errors, and it's not automatically fixed,
>> then you should report the bug.
> 
> For modern Solaris-based implementations, each checksum mismatch that is
> repaired reports the bitmap of the corrupted vs expected data. Obviously, if 
> the
> data cannot be repaired, you cannot know the expected data, so the error is 
> reported without identification of the broken bits.
> 
> In the archives, you can find reports of recoverable and unrecoverable errors 
> attributed to:
>   1. ZFS software (rare, but a bug a few years ago mishandled a raidz 
> case)
>   2. SAN switch firmware
>   3. "Hardware" RAID array firmware
>   4. Power supplies
>   5. RAM
>   6. HBA
>   7. PCI-X bus
>   8. BIOS settings
>   9. CPU and chipset errata
> 
> Personally, I've seen all of the above except #7, because PCI-X hardware is
> hard to find now.

I've seen #7.  I have some PCI-X hardware that is flaky in my home lab. ;-)

There was a case of #1 not very long ago, but it was a difficult to trigger 
race and is fixed in illumos and I presume other derivatives (including 
NexentaStor).

- Garrett
> 
> If consistently see unrecoverable data from a system that has protected data, 
> then
> there may be an issue with a part of the system that is a single point of 
> failure. Very,
> very, very few x86 systems are designed with no SPOF.
> -- richard
> 
> -- 
> 
> ZFS and performance consulting
> http://www.RichardElling.com
> VMworld Copenhagen, October 17-20
> OpenStorage Summit, San Jose, CA, October 24-27
> LISA '11, Boston, MA, December 4-9 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] about btrfs and zfs

2011-10-19 Thread Garrett D'Amore
I'd argue that from a *developer* point of view, an fsck tool for ZFS might 
well be useful.  Isn't that what zdb is for? :-)

But ordinary administrative users should never need something like this, unless 
they have encountered a bug in ZFS itself.  (And bugs are as likely to exist in 
the checker tool as in the filesystem. ;-)

- Garrett


On Oct 19, 2011, at 2:15 PM, Pawel Jakub Dawidek wrote:

> On Wed, Oct 19, 2011 at 08:40:59AM +1100, Peter Jeremy wrote:
>> fsck verifies the logical consistency of a filesystem.  For UFS, this
>> includes: used data blocks are allocated to exactly one file,
>> directory entries point to valid inodes, allocated inodes have at
>> least one link, the number of links in an inode exactly matches the
>> number of directory entries pointing to that inode, directories form a
>> single tree without loops, file sizes are consistent with the number
>> of allocated blocks, unallocated data/inodes blocks are in the
>> relevant free bitmaps, redundant superblock data is consistent.  It
>> can't verify data.
> 
> Well said. I'd add that people who insist on ZFS having a fsck are
> missing the whole point of ZFS transactional model and copy-on-write
> design.
> 
> Fsck can only fix known file system inconsistencies in file system
> structures. Because there is no atomicity of operations in UFS and other
> file systems it is possible that when you remove a file, your system can
> crash between removing directory entry and freeing inode or blocks.
> This is expected with UFS, that's why there is fsck to verify that no
> such thing happend.
> 
> In ZFS on the other hand there are no inconsistencies like that. If all
> blocks match their checksums and you find directory loop or something
> like that, it is a bug in ZFS, not expected inconsistency. It should be
> fixed in ZFS and not work-arounded with some fsck for ZFS.
> 
> -- 
> Pawel Jakub Dawidek   http://www.wheelsystems.com
> FreeBSD committer http://www.FreeBSD.org
> Am I Evil? Yes, I Am! http://yomoli.com
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS on Dell with FreeBSD

2011-10-19 Thread Darren J Moffat

On 10/19/11 15:30, Fajar A. Nugraha wrote:

On Wed, Oct 19, 2011 at 9:14 PM, Albert Shih  wrote:

Hi

Sorry to cross-posting. I don't knwon which mailing-list I should post this
message.

I'll would like to use FreeBSD with ZFS on some Dell server with some
MD1200 (classique DAS).

When we buy a MD1200 we need a RAID PERC H800 card on the server so we have
two options :

1/ create a LV on the PERC H800 so the server see one volume and put
the zpool on this unique volume and let the hardware manage the
raid.

2/ create 12 LV on the perc H800 (so without raid) and let FreeBSD
and ZFS manage the raid.

which one is the best solution ?


Neither.

The best solution is to find a controller which can pass the disk as
JBOD (not encapsulated as virtual disk). Failing that, I'd go with (1)
(though others might disagree).


No go with 2.  ALWAYS let ZFS manage the redundancy otherwise it can't 
self-heal.


--
Darren J Moffat
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] about btrfs and zfs

2011-10-19 Thread Brian Wilson

On 10/18/11 03:31 PM, Tim Cook wrote:



On Tue, Oct 18, 2011 at 3:27 PM, Peter Tribble 
mailto:peter.trib...@gmail.com>> wrote:


On Tue, Oct 18, 2011 at 9:12 PM, Tim Cook mailto:t...@cook.ms>> wrote:
>
>
> On Tue, Oct 18, 2011 at 3:06 PM, Peter Tribble
mailto:peter.trib...@gmail.com>>
> wrote:
>>
>> On Tue, Oct 18, 2011 at 8:52 PM, Tim Cook mailto:t...@cook.ms>> wrote:
>> >
>> > Every scrub I've ever done that has found an error required
manual
>> > fixing.
>> >  Every pool I've ever created has been raid-z or raid-z2, so
the silent
>> > healing, while a great story, has never actually happened in
practice in
>> > any
>> > environment I've used ZFS in.
>>
>> You have, of course, reported each such failure, because if that
>> was indeed the case then it's a clear and obvious bug?
>>
>> For what it's worth, I've had ZFS repair data corruption on
>> several occasions - both during normal operation and as a
>> result of a scrub, and I've never had to intervene manually.
>>
>> --
>> -Peter Tribble
>> http://www.petertribble.co.uk/ - http://ptribble.blogspot.com/
>
>
> Given that there  are guides on how to manually fix
the corruption, I don't
> see any need to report it.  It's considered acceptable and
expected behavior
> from everyone I've talked to at Sun...
> http://dlc.sun.com/osol/docs/content/ZFSADMIN/gbbwl.html

If you have adequate redundancy, ZFS will - and does -
repair errors. The document you quote is for the case
where you don't actually have adequate redundancy: ZFS
will refuse to make up data for you, and report back where
the problem was. Exactly as designed.

(And yes, I've come across systems without redundant
storage, or had multiple simultaneous failures. The original
statement was that if you have redundant copies of the data
or, in the case of raidz, enough information to reconstruct
it, then ZFS will repair it for you. Which has been exactly in
accord with my experience.)

--
-Peter Tribble
http://www.petertribble.co.uk/ - http://ptribble.blogspot.com/




I had and have redundant storage, it has *NEVER* automatically fixed 
it.  You're the first person I've heard that has had it automatically 
fix it.
Per the page "or an unlikely series of events conspired to corrupt 
multiple copies of a piece of data."


Their unlikely series of events, that goes unnamed, is not that 
unlikely in my experience.


--Tim
Just another 2 cents towards a euro/dollar/yen.  I've only had data 
redundancy in ZFS via mirrors (not that it should matter as long as 
there's redundancy), and in every case I've had it repair data 
automatically via a scrub.  The one case where it didn't was when the 
disk controller both drives happened to share (bad design, yes) started 
erroring and corrupting writes to both disks in parallel, so there was 
no good data to fix it with.  I was still happy to be using ZFS, as a 
filesystem without a scrub/scan of some sort wouldn't have even noticed 
in my experience - I suspect btrfs would have if it's scan works similarly.


cheers,
Brian




___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss



--
---
Brian Wilson, Solaris SE, UW-Madison DoIT
Room 3114 CS&S608-263-8047
brian.wilson(a)doit.wisc.edu
'I try to save a life a day. Usually it's my own.' - John Crichton
---

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS on Dell with FreeBSD

2011-10-19 Thread Fajar A. Nugraha
On Wed, Oct 19, 2011 at 9:14 PM, Albert Shih  wrote:
> Hi
>
> Sorry to cross-posting. I don't knwon which mailing-list I should post this
> message.
>
> I'll would like to use FreeBSD with ZFS on some Dell server with some
> MD1200 (classique DAS).
>
> When we buy a MD1200 we need a RAID PERC H800 card on the server so we have
> two options :
>
>        1/ create a LV on the PERC H800 so the server see one volume and put
>        the zpool on this unique volume and let the hardware manage the
>        raid.
>
>        2/ create 12 LV on the perc H800 (so without raid) and let FreeBSD
>        and ZFS manage the raid.
>
> which one is the best solution ?

Neither.

The best solution is to find a controller which can pass the disk as
JBOD (not encapsulated as virtual disk). Failing that, I'd go with (1)
(though others might disagree).

>
> Any advise about the RAM I need on the server (actually one MD1200 so 12x2To 
> disk)

The more the better :)

Just make sure do NOT use dedup untul you REALLY know what you're
doing (which usually means buying lots of RAM and SSD for L2ARC).

-- 
Fajar
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS on Dell with FreeBSD

2011-10-19 Thread Albert Shih
Hi 

Sorry to cross-posting. I don't knwon which mailing-list I should post this
message. 

I'll would like to use FreeBSD with ZFS on some Dell server with some
MD1200 (classique DAS). 

When we buy a MD1200 we need a RAID PERC H800 card on the server so we have
two options : 

1/ create a LV on the PERC H800 so the server see one volume and put
the zpool on this unique volume and let the hardware manage the
raid. 

2/ create 12 LV on the perc H800 (so without raid) and let FreeBSD
and ZFS manage the raid. 

which one is the best solution ? 

Any advise about the RAM I need on the server (actually one MD1200 so 12x2To 
disk)

Regards.

JAS
-- 
Albert SHIH
DIO batiment 15
Observatoire de Paris
5 Place Jules Janssen
92195 Meudon Cedex
Téléphone : 01 45 07 76 26/06 86 69 95 71
Heure local/Local time:
mer 19 oct 2011 16:11:40 CEST
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] about btrfs and zfs

2011-10-19 Thread David Magda
On Wed, October 19, 2011 08:15, Pawel Jakub Dawidek wrote:

> Fsck can only fix known file system inconsistencies in file system
> structures. Because there is no atomicity of operations in UFS and other
> file systems it is possible that when you remove a file, your system can
> crash between removing directory entry and freeing inode or blocks.
> This is expected with UFS, that's why there is fsck to verify that no
> such thing happend.

Slightly OT, but this non-atomic delay between meta-data updates and
writes to the disk is exploited by "soft updates" with FreeBSD's UFS:

http://www.freebsd.org/doc/en/books/handbook/configtuning-disk.html#SOFT-UPDATES

It may be of some interest to the file system geeks on the list.


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] commercial zfs-based storage replication software?

2011-10-19 Thread Fajar A. Nugraha
On Wed, Oct 19, 2011 at 7:52 PM, Jim Klimov  wrote:
> 2011-10-13 13:27, Darren J Moffat пишет:
>>
>> On 10/13/11 09:27, Fajar A. Nugraha wrote:
>>>
>>> On Tue, Oct 11, 2011 at 5:26 PM, Darren J Moffat
>>>  wrote:

 Have you looked at the time-slider functionality that is already in
 Solaris
 ?
>>>
>>> Hi Darren. Is it available for Solaris 10? I just installed Solaris 10
>>> u10 and couldn't find it.
>>
>> No it is not.
>>>
>>> Is there a reference on how to get/install this functionality on Solaris
>>> 10?
>>
>> No because it doesn't exist on Solaris 10.
>>
>
> Well, just for the sake of completeness: most of our systems are using
> zfs-auto-snap service, including Solaris 10 systems datiing from Sol10u6.
> Installation of relevant packages from SXCE (ranging snv_117-snv_130)
> was trivial, but some script-patching was in order. I think, replacement
> of the ksh interpreter to ksh93.

Yes, I remembered reading about that.

>
> I haven't used the GUI part and I guess my experience relates to the
> script-based zfs-auto-snap (before it was remade into current binary
> form, or so I read). We kind of got stuck with SXCE systems which
> still "just work" finely ;)
>
> The point is, even if unsupported (may be a problem in OP's case)
> it is likely that one or another version of zfs-auto-snap or TimeSlider
> can be made to work in Sol10 with little effort.

To be honest, if it's just to get it work, I'd just make my own. Or
running SE with a solaris 10 zone inside it, with SE managing
time-slider/replication and S10 zone running the application.

But for this particular case support is essential. That's why I
mentioned earlier if I can't get a supported solution for this setup
(with a reasonable price), storage-based replication would have to do.

-- 
Fajar
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] bootadm hang WAS tuning zfs_arc_min

2011-10-19 Thread Jim Klimov

2011-10-12 11:56, Frank Van Damme пишет:


The root of the problem seems to be that that process never completes.

9 /lib/svc/bin/svc.startd
332 /sbin/sh /lib/svc/method/boot-archive-update
347 /sbin/bootadm update-archive

Can't kill it and run from the cmdline either, it simply ignores 
SIGKILL. (Which shouldn't even be possible).




I guess it is possible when things lock up in kernel calls, waiting for 
them to complete.
It has happened on me a number of times, usually related to ZFS pool 
being too busy working or repairing to do anything else, and this per se 
often lead to system crashing (see i.e. my adventures this spring 
reported on the forums). I had hit a number of problems generally 
leading to the whole zfs subsystem "running away to a happy place".


As an indication of this you can try running something as simple as 
"zpool list" in the background (otherwise your shell locks up too) and 
see if it ever completes:


# zpool list &

Earlier there were bugs related to inaccessible snapshots (marked for 
deletion, but not actually deletable until you mount and unmount the 
parent dataset) - these mostly fired in zfs-auto-snap auto-deletions, 
but also happened to influence bootadm.


I am not sure in what way bootadm relies on zfs/zpool, but empirically - 
it does.

You might work around the problem by:
* exporting "data" zfs pools before updating the bootarchive (bootadm 
update-archive); if you're rebooting the system anyway - stop the zones 
and services manually, and give this a try.
* booting from another media like a Failsafe Boot (SXCE, Sol10) or 
LiveCD (Indiana) and importing your rootpool to "/a", then run

# bootadm update-archive -R /a
* booting into single-user mode, making the root RW if needed, and 
updating the archive.
** You're likely to go this way anyway if your boot is interrupted due 
to an outdated boot archive (SMF failure - requires a repair shell 
interaction). When the archive is updated, you need to clear the service 
(svcadm clear boot-archive) and exit the repair shell in order to 
continue booting the OS.
* brute force - updating the bootarchive (/platform/i86pc/boot_archive 
and /platform/i86pc/amd64/boot_archive ) manually as an FS image, with 
files listed in /boot/solaris/filelist.ramdisk. Usually failure on boot 
is related to updating of some config files in /etc...


//Jim

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] about btrfs and zfs

2011-10-19 Thread Paul Kraus
Thank you. The following is the best "layman's" explanation as to
_why_ ZFS does not have an fsck equivalent (or even needs one). On the
other hand, there are situations where you really do need to force ZFS
to do something that may not be a"good idea", but is the best of a bad
set of choices. Hence the zpool import -F (and other such tools
available via zdb). While the ZFS data may not be corrupt, it is
possible to corrupt the ZFS metadata, uberblock, and labals in such a
way that force is necessary.

On Wed, Oct 19, 2011 at 8:15 AM, Pawel Jakub Dawidek  wrote:

> Well said. I'd add that people who insist on ZFS having a fsck are
> missing the whole point of ZFS transactional model and copy-on-write
> design.
>
> Fsck can only fix known file system inconsistencies in file system
> structures. Because there is no atomicity of operations in UFS and other
> file systems it is possible that when you remove a file, your system can
> crash between removing directory entry and freeing inode or blocks.
> This is expected with UFS, that's why there is fsck to verify that no
> such thing happend.
>
> In ZFS on the other hand there are no inconsistencies like that. If all
> blocks match their checksums and you find directory loop or something
> like that, it is a bug in ZFS, not expected inconsistency. It should be
> fixed in ZFS and not work-arounded with some fsck for ZFS.
>
> --
> Pawel Jakub Dawidek                       http://www.wheelsystems.com
> FreeBSD committer                         http://www.FreeBSD.org
> Am I Evil? Yes, I Am!                     http://yomoli.com
>
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
>



-- 
{1-2-3-4-5-6-7-}
Paul Kraus
-> Senior Systems Architect, Garnet River ( http://www.garnetriver.com/ )
-> Sound Coordinator, Schenectady Light Opera Company (
http://www.sloctheater.org/ )
-> Technical Advisor, RPI Players
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] commercial zfs-based storage replication software?

2011-10-19 Thread Jim Klimov

2011-10-13 13:27, Darren J Moffat пишет:

On 10/13/11 09:27, Fajar A. Nugraha wrote:

On Tue, Oct 11, 2011 at 5:26 PM, Darren J Moffat
 wrote:
Have you looked at the time-slider functionality that is already in 
Solaris

?


Hi Darren. Is it available for Solaris 10? I just installed Solaris 10
u10 and couldn't find it.


No it is not.
Is there a reference on how to get/install this functionality on 
Solaris 10?


No because it doesn't exist on Solaris 10.



Well, just for the sake of completeness: most of our systems are using
zfs-auto-snap service, including Solaris 10 systems datiing from Sol10u6.
Installation of relevant packages from SXCE (ranging snv_117-snv_130)
was trivial, but some script-patching was in order. I think, replacement
of the ksh interpreter to ksh93.

I haven't used the GUI part and I guess my experience relates to the
script-based zfs-auto-snap (before it was remade into current binary
form, or so I read). We kind of got stuck with SXCE systems which
still "just work" finely ;)

The point is, even if unsupported (may be a problem in OP's case)
it is likely that one or another version of zfs-auto-snap or TimeSlider
can be made to work in Sol10 with little effort.

HTH,
//Jim


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Growing CKSUM errors with no READ/WRITE errors

2011-10-19 Thread Jim Klimov

2011-10-19 16:01, Richard Elling пишет:

On Oct 18, 2011, at 6:35 PM, David Magda wrote:


If we've found one bad disk, what are our options?

Live with it or replace it :-)
  -- richard


Similar question: a HDD went awry last week in an snv_117 box
(the controller no longer sees the drive - so I guess there is either
a dead drive, or dead power/data ports on the backplane), and
a hot-spare replaced it okay. However, there are a number of
CKSUM errors on the replacement disk, growing by about 100
daily (according to "zpool status"). I tried scrubbing the pool and
zeroing the counter with "zpool clear", but new CKSUM errors
are being found. There are zero READ or WRITE error counts,
though.

Should we be worried about replacing the ex-hotspare drive
ASAP as well?

There are no errors in dmesg regarding the ex-hotspare drive,
only those regarding the dead one, occasionally:

=== dmesg:
Oct 19 16:28:23 thumper scsi: [ID 107833 kern.warning] WARNING: 
/pci@1,0/pci1022,7458@4/pci11ab,11ab@1/disk@6,0 (sd40):

Oct 19 16:28:23 thumper Command failed to complete...Device is gone
Oct 19 16:28:23 thumper scsi: [ID 107833 kern.warning] WARNING: 
/pci@1,0/pci1022,7458@4/pci11ab,11ab@1/disk@6,0 (sd40):

Oct 19 16:28:23 thumper SYNCHRONIZE CACHE command failed (5)

=== format:

30. c5t6d0 
/pci@1,0/pci1022,7458@4/pci11ab,11ab@1/disk@6,0

--


++
||
| Климов Евгений, Jim Klimov |
| технический директор   CTO |
| ЗАО "ЦОС и ВТ"  JSC COS&HT |
||
| +7-903-7705859 (cellular)  mailto:jimkli...@cos.ru |
|  CC:ad...@cos.ru,jimkli...@mail.ru |
++
| ()  ascii ribbon campaign - against html mail  |
| /\- against microsoft attachments  |
++



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] repair [was: about btrfs and zfs]

2011-10-19 Thread Jim Klimov

2011-10-19 15:52, Richard Elling пишет:

In the archives, you can find reports of recoverable and unrecoverable errors
attributed to:
...

Ah, yes, and
11. Faulty disk cabling (i.e. plastic connectors that soften with heat 
and fall of) - that has happened to cause strange behavior as well ;)
Even if the connectors don't fall off, unreliable physical connection 
(including oxydization of metal plugs) leads to all sorts of noise on 
the wire which may be misinterpreted as random bits. These can often be 
fixed (and diagnozed) by pulling the connectors and plugging them back 
in - the oxyde film is scratched off, and the cable works again, for a 
few months more...


//Jim

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] repair [was: about btrfs and zfs]

2011-10-19 Thread Jim Klimov

2011-10-19 15:52, Richard Elling wrote:

In the archives, you can find reports of recoverable and unrecoverable errors
attributed to:
1. ZFS software (rare, but a bug a few years ago mishandled a raidz 
case)
2. SAN switch firmware
3. "Hardware" RAID array firmware
4. Power supplies
5. RAM
6. HBA
7. PCI-X bus
8. BIOS settings
9. CPU and chipset errata

10. Broken HDDs ;)

For weird inexplicable bugs, insufficient or faulty power supplies
and cooling are often the core cause, at least in "enthisiast PCs".
Perhaps the PS is okay to run but fails under some peak loads,
and that leads to random bits being generated in RAM or on
connection buses...

Also some interference can be caused by motors, etc. in the HDDs
and cooling fans - with older audio cards you could actually hear
your HDD or CDROM spin up - by a characteristic buzz in the
headphones or on the loudspeakers. Whether other components
would fail or not under such EMI - that depends.

//Jim

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] about btrfs and zfs

2011-10-19 Thread Pawel Jakub Dawidek
On Wed, Oct 19, 2011 at 08:40:59AM +1100, Peter Jeremy wrote:
> fsck verifies the logical consistency of a filesystem.  For UFS, this
> includes: used data blocks are allocated to exactly one file,
> directory entries point to valid inodes, allocated inodes have at
> least one link, the number of links in an inode exactly matches the
> number of directory entries pointing to that inode, directories form a
> single tree without loops, file sizes are consistent with the number
> of allocated blocks, unallocated data/inodes blocks are in the
> relevant free bitmaps, redundant superblock data is consistent.  It
> can't verify data.

Well said. I'd add that people who insist on ZFS having a fsck are
missing the whole point of ZFS transactional model and copy-on-write
design.

Fsck can only fix known file system inconsistencies in file system
structures. Because there is no atomicity of operations in UFS and other
file systems it is possible that when you remove a file, your system can
crash between removing directory entry and freeing inode or blocks.
This is expected with UFS, that's why there is fsck to verify that no
such thing happend.

In ZFS on the other hand there are no inconsistencies like that. If all
blocks match their checksums and you find directory loop or something
like that, it is a bug in ZFS, not expected inconsistency. It should be
fixed in ZFS and not work-arounded with some fsck for ZFS.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://yomoli.com


pgpXffQuNhb6M.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] about btrfs and zfs

2011-10-19 Thread Richard Elling
On Oct 18, 2011, at 6:35 PM, David Magda wrote:

> If we've found one bad disk, what are our options?

Live with it or replace it :-)
 -- richard

-- 

ZFS and performance consulting
http://www.RichardElling.com
VMworld Copenhagen, October 17-20
OpenStorage Summit, San Jose, CA, October 24-27
LISA '11, Boston, MA, December 4-9 













___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] repair [was: about btrfs and zfs]

2011-10-19 Thread Richard Elling
On Oct 18, 2011, at 5:21 PM, Edward Ned Harvey wrote:

>> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
>> boun...@opensolaris.org] On Behalf Of Tim Cook
>> 
>> I had and have redundant storage, it has *NEVER* automatically fixed
>> it.  You're the first person I've heard that has had it automatically fix
> it.
> 
> That's probably just because it's normal and expected behavior to
> automatically fix it - I always have redundancy, and every cksum error I
> ever find is always automatically fixed.  I never tell anyone here because
> it's normal and expected.

Yes, and in fact the automated tests for ZFS developers intentionally corrupts 
data
so that the repair code can be tested. Also, the same checksum code is used to 
calculate the checksum when writing and reading.

> If you have redundancy, and cksum errors, and it's not automatically fixed,
> then you should report the bug.

For modern Solaris-based implementations, each checksum mismatch that is
repaired reports the bitmap of the corrupted vs expected data. Obviously, if the
data cannot be repaired, you cannot know the expected data, so the error is 
reported without identification of the broken bits.

In the archives, you can find reports of recoverable and unrecoverable errors 
attributed to:
1. ZFS software (rare, but a bug a few years ago mishandled a raidz 
case)
2. SAN switch firmware
3. "Hardware" RAID array firmware
4. Power supplies
5. RAM
6. HBA
7. PCI-X bus
8. BIOS settings
9. CPU and chipset errata

Personally, I've seen all of the above except #7, because PCI-X hardware is
hard to find now.

If consistently see unrecoverable data from a system that has protected data, 
then
there may be an issue with a part of the system that is a single point of 
failure. Very,
very, very few x86 systems are designed with no SPOF.
 -- richard

-- 

ZFS and performance consulting
http://www.RichardElling.com
VMworld Copenhagen, October 17-20
OpenStorage Summit, San Jose, CA, October 24-27
LISA '11, Boston, MA, December 4-9 













___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Stream versions in Solaris 10.

2011-10-19 Thread Ian Collins
 I just tried sending from a oi151a system to a Solaris 10 backup 
server and the server barfed with


zfs_receive: stream is unsupported version 17

I can't find any documentation linking stream version to release, so 
does anyone know the Update 10 stream version?


--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss