date:20100923

Re: [zfs-discuss] non-ECC Systems and ZFS for home users (was: Please warn a home user against OpenSolaris under VirtualBox under WinXP ; ))

2010-09-23 Thread R.G. Keen

> On 2010-Sep-24 00:58:47 +0800, "R.G. Keen"
>  wrote:
> > But for me, the likelihood of
> >making a setup or operating mistake in a virtual machine 
> >setup server is far outweighs the hardware cost to put
> >another physical machine on the ground. 
> 
> The downsides are generally that it'll be slower and less power-
> efficient that a current generation server 
My comment was about a physical machine versus virtual machine, 
and my likelihood of futzing up the setup, not new machine versus 
old machine. There are many upsides and downsides on the new 
versus old questions too. 

>and the I/O interfaces will
> be also be last generation (so you are more likely to be stuck with
> parallel SCSI and PCI or PCIx rather than SAS/SATA and PCIe).  And
> when something fails (fan, PSU, ...), it's more likely to be customised
> in some way that makes it more difficult/expensive to repair/replace.
Presuming what you did was buy a last generation server after you 
decided to go for a physical machine. That's not what I did, as I mentioned
later in the posting. Server hardware in general is more expensive than
desktop, and even last generation server hardware will cost more
to repair than desktop. To a hardware manufacturer, "server" is 
synonymous with "these guys can be convinced to pay more if we make
something a little different". And there is a cottage industry of people 
who sell repair parts for older servers at exorbitant prices because there
is some non-techie businessman who will pay.

> Not quite.  When Intel moved the memory controllers from the 
> northbridge into the CPU, they made a conscious  decision to separate
> server and desktop CPUs and chipsets.  The desktop CPUs do not support
> ECC whereas the server ones do 
So, from the lower-cost new hardware view, newer Intel chipsets emphatically
do not support ECC. The (expensive) server-class hardware/chipsets, etc., do. 
A lower-end home class server is unlikely to be built from these much more
expensive - by plan/design - parts. 

> That said, the low-end Xeons aren't outrageously expensive 
They aren't. I considered using  Xeons in my servers. It was about another
$200 in the end. I bought disks with the $200. 

>and you
> generally wind up with support for registered RAM and registered ECC
> RAM is often easier to find than unregistered ECC RAM.
I had no difficulty at all finding unregistered ECC RAM. 
Newegg has steady stock of DDR2 and DDR3 unregistered and registered
ECC. For instance: 
2GB 240-Pin DDR3 ECC Registered KVR1333D3D8R9S/2GHT is $59.99.
2GB 240-Pin DDR3 1333 ECC Unbuffered Server Memory Model KVR1333D3E9S/2G
is $43.99 plus $2.00 shipping. 
2GB 240-pin DDR3 NON-ECC is available for $35 per stick and up. The Kingston
brand I used in the ECC examples is $40.
These are representative, and there are multiple choices, in stock, for all 
three
categories. "Intel certified" costs more if you get registered.

> > AMDs do, in general.
> AMD chose to leave ECC support in almost all their higher-end memory
> controllers, rather than use it as a market differentiator. AFAIK,
> all non-mobile Athlon, Phenom and Opteron CPUs support ECC, whereas
> the lower-end Sempron, Neo, Turion and Geode CPUs don't.  
I guess I should have looked at the lower end CPUs - and chipsets before
I took my "in general" count. I didn't, and every chipset I saw had ECC
support. My lowest end CPU was the Athlon II X2 240e, and every chipset
for that and above that I found supports ECC. 

> In the case of AMD motherboards, it's really just laziness on the
> manufacturer's part to not bother routing the additional tracks.
And doing the support in bios. I did research these issues a fair amount.
For the same chipset, ASUS MBs seem to have bios settings for ECC and
Gigabyte, for instance, do not. I determined this by downloading the 
user manuals for the mobos and reading them. I didn't find a brand other
that ASUS that had a clear support for ECC in the bios. 

But my search was not exhaustive. 

> On older Intel motherboards, it was a chipset issue rather than a
> CPU issue (and even if the chipset supported ECC, the motherboard
> manufacturer might have decided to not bother running
> the ECC tracks).
I think that's generically true.

> Asus appears to have made a conscious decision to support ECC on
> all AMD motherboards whereas other vendors support it sporadically
> and determining whether a particular motherboard supports ECC can
> be quite difficult since it's never one of the options in their
> motherboard selection tools.
Yep. I resorted to selecting mobos that I'd otherwise want, then 
downloaded the user manuals and read the bios configurations 
pages. If it didn't specifically say how to configure ECC, they it 
MIGHT be supported somehow, but also might not. Gigabyte boards
for instance were reputed to support ECC in an undocumented way, 
but I figured if they didn't want to say how to configure it, they sure
wouldn't have worried about testing whether it wor

Re: [zfs-discuss] Kernel panic on ZFS import - how do I recover?

2010-09-23 Thread David Blasingame Oracle

Have you tried setting zfs_recover & aok in /etc/system or setting it 
with the mdb?


Read how to set via /etc/system
http://opensolaris.org/jive/thread.jspa?threadID=114906

mdb debugger
http://www.listware.net/201009/opensolaris-zfs/46706-re-zfs-discuss-how-to-set-zfszfsrecover1-and-aok1-in-grub-at-startup.html

After you get the variables set and system booted, try importing, then 
running a scrub.


Dave

On 09/23/10 19:48, Scott Meilicke wrote:
I posted this on the www.nexentastor.org forums, but no answer so far, so I apologize if you are seeing this twice. I am also engaged with nexenta support, but was hoping to get some additional insights here. 


I am running nexenta 3.0.3 community edition, based on 134. The box crashed 
yesterday, and goes into a reboot loop (kernel panic) when trying to import my 
data pool, screenshot attached. What I have tried thus far:

Boot off of DVD, both 3.0.3 and 3.0.4 beta 8. 'zpool import -f data01' causes 
the panic in both cases.
Boot off of 3.0.4 beta 8, ran zpool import -fF data01
That gives me a message like "Pool data01 returned to its stat as of ...", and 
then panics.

The import -fF does seem to import the pool, but then immediately panic. So after booting off of DVD, I can boot from my hard disks, and the system will not import the pool because it was last imported from another system. 


I have moved /etc/zfs/zfs.cache out of the way, but no luck after a reboot and 
import.

zpool import shows all of my disks are OK, and the pool itself is online.

Is it time to start working with zdb? Any suggestions?

This box is hosting development VMs, so I have some people idling their thumbs 
at the moment.

Thanks everyone,

-Scott
  



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
  



--


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Dedup relationship between pool and filesystem

2010-09-23 Thread Edward Ned Harvey

> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
> boun...@opensolaris.org] On Behalf Of Peter Taps
> 
> The dedup property is set on a filesystem, not on the pool.
> 
> However, the dedup ratio is reported on the pool and not on the
> filesystem.

As with most other ZFS concepts, the core functionality of ZFS is
implemented in zpool.  Hence, zpool is up to what ... version 25 or so now?
Think of ZFS (the posix filesystem) as just an interface which tightly
integrates the zpool features.  ZFS is only up to what, version 4 now?

Perfect example:  

If you create a zvol in linux, without formatting it zfs, and format it
ext3/4, then you can snapshot it, and I believe you can even "zfs send" and
receive.  And so on.  The core functionality is mostly present.  But if you
want to access the snapshot, you have to create some mountpoint, and mount
read-only the snapshot zvol to the mountpoint.  It's not automatic.  It's
barely any better than the crappy "snapshot" concept linux has in LVM.  If
you want good automatic snapshot creation & seamless mounting & automatic
mounting, then you need the ZFS filesystem on top of the zpool.  Cuz the ZFS
filesystem knows about that underlying zpool feature, and makes it
convenient and easy good experience.  ;-)

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] resilver of older root pool disk

2010-09-23 Thread Richard Elling

Timing is everything.  Lori is the authoritative answer and makes sense, due
to the limitations at boot.  Thanks Lori! :-)
 -- richard

-- 
OpenStorage Summit, October 25-27, Palo Alto, CA
http://nexenta-summit2010.eventbrite.com

Richard Elling
rich...@nexenta.com   +1-760-896-4422
Enterprise class storage for everyone
www.nexenta.com





___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] resilver of older root pool disk

2010-09-23 Thread Richard Elling

On Sep 23, 2010, at 3:40 PM, Frank Middleton wrote:

> Bumping this because no one responded. Could this be because
> it's such a stupid question no one wants to stoop to answering it,
> or because no one knows the answer? Trying to picture, say, what
> could happen in /var (say /var/adm/messages), let alone a swap
> zvol, is giving me a headache...

The metadata contains the latest transaction group number, so it is
easy to detect which side of the mirror wins.

That said, I have not tested this for boot, which is a little bit more 
complicated because of the mini-ZFS version in grub -- and grub's
menu.lst is in the root pool.  For non-root pools, ZFS does the right 
thing.
 -- richard

> On 07/09/10 17:00, Frank Middleton wrote:
>> This is a hypothetical question that could actually happen:
>> 
>> Suppose a root pool is a mirror of c0t0d0s0 and c0t1d0s0
>> and for some reason c0t0d0s0 goes off line, but comes back
>> on line after a shutdown. The primary boot disk would then
>> be c0t0d0s0 which would have much older data than c0t1d0s0.
>> 
>> Under normal circumstances ZFS would know that c0t0d0s0
>> needs to be resilvered. But in this case c0t0d0s0 is the boot
>> disk. Would ZFS still be able to correctly resilver the correct
>> disk under these circumstances? I suppose it might depend
>> on which files, if any, had actually changed...
>> 
>> Thanks -- Frank
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

-- 
OpenStorage Summit, October 25-27, Palo Alto, CA
http://nexenta-summit2010.eventbrite.com

Richard Elling
rich...@nexenta.com   +1-760-896-4422
Enterprise class storage for everyone
www.nexenta.com





___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] resilver of older root pool disk

2010-09-23 Thread Lori Alt


 On 09/23/10 04:40 PM, Frank Middleton wrote:

Bumping this because no one responded. Could this be because
it's such a stupid question no one wants to stoop to answering it,
or because no one knows the answer? Trying to picture, say, what
could happen in /var (say /var/adm/messages), let alone a swap
zvol, is giving me a headache...

On 07/09/10 17:00, Frank Middleton wrote:

This is a hypothetical question that could actually happen:

Suppose a root pool is a mirror of c0t0d0s0 and c0t1d0s0
and for some reason c0t0d0s0 goes off line, but comes back
on line after a shutdown. The primary boot disk would then
be c0t0d0s0 which would have much older data than c0t1d0s0.

Under normal circumstances ZFS would know that c0t0d0s0
needs to be resilvered. But in this case c0t0d0s0 is the boot
disk. Would ZFS still be able to correctly resilver the correct
disk under these circumstances? I suppose it might depend
on which files, if any, had actually changed...


Booting from the out-of-date disk will fail with the message:

"The boot device is out of date. Please try booting from '%s'"

where a different device is suggested.

The reason for this is that once the kernel has gotten far enough to 
import the root pool and read all the labels (and thereby determine that 
the device being booted does not have the latest contents), it's too 
late to switch over to a different disk for booting.  So the next best 
action is to fail and let the user explicitly boot from another disk in 
the pool (using whatever means the BIOS provides for this, or by 
selecting a different disk using the OBP interface on sparc platforms).  
Once the system is up, the out-of-date disk will be resilvered.


Ideally, the boot loader should examine all the disks in the root pool 
and boot off one with the most recent contents.  I believe that there is 
a bug filed requesting that this be implemented for sparc platforms.  
It's more difficult with x86 platforms (there are issues with accessing 
more than one disk from within the current boot loader), and I don't 
know what the prospects are for implementing that logic. But in any 
case, the boot logic prevents you from unwittingly booting off an 
out-of-date disk when a more up-to-date disk is online.


Lori





Thanks -- Frank

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Intermittent ZFS hang

2010-09-23 Thread Richard Elling

Hi Charles,
There are quite a few bugs in b134 that can lead to this. Alas, due to the new
regime, there was a period of time where the distributions were not being
delivered. If I were in your shoes, I would upgrade to OpenIndiana b147 which
has 26 weeks of maturity and bug fixes over b134.

http://www.openindiana.org
 -- richard



On Sep 23, 2010, at 2:48 PM, Charles J. Knipe wrote:

> So, I'm still having problems with intermittent hangs on write with my ZFS 
> pool.  Details from my original post are below.  Since posting that, I've 
> gone back and forth with a number of you, and gotten a lot of useful advice, 
> but I'm still trying to get to the root of the problem so I can correct it.  
> Since the original post I have:
> 
> -Gathered a great deal of information in the form of kernel thread dumps, 
> zio_state dumps, and live crash dumps while the problem is happening.
> -Been advised that my ruling out of dedupe was probably premature, as I still 
> likely have a good deal of deduplicated data on-disk.
> -Checked just about every log and counter that might indicate a hardware 
> error, without finding one.
> 
> I was wondering at this point if someone could give me some pointers on the 
> following:
> 1. Given the dumps and diagnostic data I've gathered so far, is there a way I 
> can determine for certain where in the ZFS driver I'm spending so much time 
> hanging?  At the very least I'd like to try to determine whether it is, 
> in-fact a deduplication issue.
> 2. If it is, in fact, a deduplication issue, would my only recourse be a new 
> pool and a send/receive operation?  The data we're storing is VMFS volumes 
> for ESX.  We're tossing around the idea of creating new volumes in the same 
> pool (now that dedupe is off) and migrating VMs over in small batches.  The 
> theory is that we would be writing non-deduped data this way, and when we 
> were done we could remove the deduplicated volumes.  Is this sound?
> 
> Thanks again for all the help!
> 
> -Charles
> 
>> Howdy,
>> 
>> We're having a ZFS performance issue over here that I
>> was hoping you guys could help me troubleshoot.  We
>> have a ZFS pool made up of 24 disks, arranged into 7
>> raid-z devices of 4 disks each.  We're using it as an
>> iSCSI back-end for VMWare and some Oracle RAC
>> clusters.
>> 
>> Under normal circumstances performance is very good
>> both in benchmarks and under real-world use.  Every
>> couple days, however, I/O seems to hang for anywhere
>> between several seconds and several minutes.  The
>> hang seems to be a complete stop of all write I/O.
>> The following zpool iostat illustrates:
>> 
>> pool0   2.47T  5.13T120  0   293K  0
>> pool0   2.47T  5.13T127  0   308K  0
>> pool0   2.47T  5.13T131  0   322K  0
>> pool0   2.47T  5.13T144  0   347K  0
>> pool0   2.47T  5.13T135  0   331K  0
>> pool0   2.47T  5.13T122  0   295K  0
>> pool0   2.47T  5.13T135  0   330K  0
>> 
>> While this is going on our VMs all hang, as do any
>> "zfs create" commands or attempts to touch/create
>> files in the zfs pool from the local system.  After
>> several minutes the system "un-hangs" and we see very
>> high write rates before things return to normal
>> across the board.
>> 
>> Some more information about our configuration:  We're
>> running OpenSolaris svn-134.  ZFS is at version 22.
>> Our disks are 15kRPM 300gb Seagate Cheetahs, mounted
>> in Promise J610S Dual enclosures, hanging off a Dell
>> SAS 5/e controller.  We'd tried out most of this
>> configuration previously on OpenSolaris 2009.06
>> without running into this problem.  The only thing
>> that's new, aside from the newer OpenSolaris/ZFS is
>> a set of four SSDs configured as log disks.
>> 
>> At first we blamed de-dupe, but we've disabled that.
>> Next we suspected the SSD log disks, but we've seen
>> the problem with those removed, as well.
>> 
>> Has anyone seen anything like this before?  Are there
>> any tools we can use to gather information during the
>> hang which might be useful in determining what's
>> going wrong?
>> 
>> Thanks for any insights you may have.
>> 
>> -Charles
> -- 
> This message posted from opensolaris.org
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

-- 
OpenStorage Summit, October 25-27, Palo Alto, CA
http://nexenta-summit2010.eventbrite.com
ZFS and performance consulting
http://www.RichardElling.com












___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Dedup relationship between pool and filesystem

2010-09-23 Thread Scott Meilicke

Hi Peter,

dedupe is pool wide. File systems can opt in or out of dedupe. So if multiple 
file systems are set to dedupe, then they all benefit from using the same pool 
of deduped blocks. In this way, if two files share some of the same blocks, 
even if they are in different file systems, they will dedupe.

I am not sure why reporting is not done at the file system level. It may be an 
accounting issue, i.e. which file system owns the dedupe blocks. But it seems 
some fair estimate could be made. Maybe the overhead to keep a file system 
updated with these stats is too high?

-Scott
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] non-ECC Systems and ZFS for home users (was: Please warn a home user against OpenSolaris under VirtualBox under WinXP ; ))

2010-09-23 Thread Peter Jeremy

On 2010-Sep-24 00:58:47 +0800, "R.G. Keen"  wrote:
>That may not be the best of all possible things to do
>on a number of levels. But for me, the likelihood of 
>making a setup or operating mistake in a virtual machine 
>setup server is far outweighs the hardware cost to put
>another physical machine on the ground. 

The downsides are generally that it'll be slower and less power-
efficient that a current generation server and the I/O interfaces will
be also be last generation (so you are more likely to be stuck with
parallel SCSI and PCI or PCIx rather than SAS/SATA and PCIe).  And
when something fails (fan, PSU, ...), it's more likely to be customised
in some way that makes it more difficult/expensive to repair/replace.

>In fact, the issue goes further. Processor chipsets from both
>Intel and AMD used to support ECC on an ad-hoc basis. It may
>have been there, but may or may not have been supported
>by the motherboard. Intels recent chipsets emphatically do 
>not support ECC.

Not quite.  When Intel moved the memory controllers from the
northbridge into the CPU, they made a conscious decision to separate
server and desktop CPUs and chipsets.  The desktop CPUs do not support
ECC whereas the server ones do - this way they can continue to charge
a premium for "server-grade" parts and prevent the server
manufacturers from using lower-margin desktop parts.  This means that
if you want an Intel-based solution, you need to look at a Xeon CPU.
That said, the low-end Xeons aren't outrageously expensive and you
generally wind up with support for registered RAM and registered ECC
RAM is often easier to find than unregistered ECC RAM.

> AMDs do, in general.

AMD chose to leave ECC support in almost all their higher-end memory
controllers, rather than use it as a market differentiator.  AFAIK,
all non-mobile Athlon, Phenom and Opteron CPUs support ECC, whereas
the lower-end Sempron, Neo, Turion and Geode CPUs don't.  Note that
Athlon and Phenom CPUs normally need unbuffered RAM whereas Opteron
CPUs normally want buffered/registered RAM.

> However, the motherboard
>must still support the ECC reporting in hardware and BIOS for
>ECC to actually work, and you have to buy the ECC memory. 

In the case of AMD motherboards, it's really just laziness on the
manufacturer's part to not bother routing the additional tracks.

>The newer the intel motherboard, the less likely and more
>expensive ECC is. Older intel motherboards sometimes
>did support ECC, as a side note. 

On older Intel motherboards, it was a chipset issue rather than a
CPU issue (and even if the chipset supported ECC, the motherboard
manufacturer might have decided to not bother running the ECC tracks).

>There's about sixteen more pages of typing to cover the issue 
>even modestly correctly. The bottom line is this: for 
>current-generation hardware, buy an AMD AM3 socket CPU,
>ASUS motherboard, and ECC memory. DDR2 and DDR3 ECC
>memory is only moderately more expensive than non-ECC.

Asus appears to have made a conscious decision to support ECC on
all AMD motherboards whereas other vendors support it sporadically
and determining whether a particular motherboard supports ECC can
be quite difficult since it's never one of the options in their
motherboard selection tools.

And when picking the RAM, make sure it's compatible with your
motherboard - motherboards are virtually never compatible with
both unbuffered and buffered RAM.

>hardware going into wearout. I also bought new, high quality
>power supplies for $40-$60 per machine because the power
>supply is a single point of failure, and wears out - that's a 
>fact that many people ignore until the machine doesn't come
>up one day.

"Doesn't come up one day" is at least a clear failure.  With a
cheap (or under-dimensioned) PSU, things are more likely to go
out of tolerance under heavy load so you wind up with unrepeatable
strange glitches.

>Think about what happens if you find a silent bit corruption in 
>a file system that includes encrypted files. 

Or compressed files.

-- 
Peter Jeremy

pgp2gl67ZdR99.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Dedup relationship between pool and filesystem

2010-09-23 Thread zfs user


I believe it goes a something like this -

ZPS filesystems with dedupe turned on can be thought of as hippie/socialist 
filesystems, wanting to "share", etc.  Filesystems with dedupe turned off are 
a grey Randian landscape where sharing blocks between files is seen as a 
weakness/defect. They all live together in a zpool, let's call it "San 
Francisco"...


The hippies store their shared blocks together in a communal store at the pool 
level and everything works pretty well until one of the hippie filesystems 
wants to pull a large number of their blocks out of the communal store; then 
all hell breaks loose and the grey Randians laugh at the hippies and their 
chaos but it is a joyless laughter.


That is the technical explanation, someone else may have a better explanation 
in layman's terms.


On 9/23/10 3:36 PM, Peter Taps wrote:

Folks,

I am a bit confused on the dedup relationship between the filesystem and its 
pool.

The dedup property is set on a filesystem, not on the pool.

However, the dedup ratio is reported on the pool and not on the filesystem.

Why is it this way?

Thank you in advance for your help.

Regards,
Peter

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Dedup relationship between pool and filesystem

2010-09-23 Thread Darren J Moffat


On 09/23/10 15:36, Peter Taps wrote:

I am a bit confused on the dedup relationship between the filesystem and its 
pool.

The dedup property is set on a filesystem, not on the pool.


Dedup is a pool wide concept, blocks from multiple filesystems
maybe deduplicated.


However, the dedup ratio is reported on the pool and not on the filesystem.


The dedup property is on the dataset (filesystem | ZVOL) so that
you can opt in/out on a per dataset basis.  For example if you have
one or two datasets you know will never have duplicate data then don't
enable dedup on those.  For example:

zpool create tank 

zfs set dedup=on tank
zfs create tank/1
zfs create tank/1/1
zfs create tank/2
zfs create -o dedup=off tank/2/2
zfs create tank/2/2/3

In this case all datasets in the pool will participate in deduplication
with the exception of tank/2/2 and its decendents.

--
Darren J Moffat
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] resilver of older root pool disk

2010-09-23 Thread Frank Middleton


Bumping this because no one responded. Could this be because
it's such a stupid question no one wants to stoop to answering it,
or because no one knows the answer? Trying to picture, say, what
could happen in /var (say /var/adm/messages), let alone a swap
zvol, is giving me a headache...

On 07/09/10 17:00, Frank Middleton wrote:

This is a hypothetical question that could actually happen:

Suppose a root pool is a mirror of c0t0d0s0 and c0t1d0s0
and for some reason c0t0d0s0 goes off line, but comes back
on line after a shutdown. The primary boot disk would then
be c0t0d0s0 which would have much older data than c0t1d0s0.

Under normal circumstances ZFS would know that c0t0d0s0
needs to be resilvered. But in this case c0t0d0s0 is the boot
disk. Would ZFS still be able to correctly resilver the correct
disk under these circumstances? I suppose it might depend
on which files, if any, had actually changed...

Thanks -- Frank

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Dedup relationship between pool and filesystem

2010-09-23 Thread Peter Taps

Folks,

I am a bit confused on the dedup relationship between the filesystem and its 
pool.

The dedup property is set on a filesystem, not on the pool.

However, the dedup ratio is reported on the pool and not on the filesystem.

Why is it this way?

Thank you in advance for your help.

Regards,
Peter
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Sliced iSCSI device for doing RAIDZ?

2010-09-23 Thread Alexander Skwar

Hi!

2010/9/23 Gary Mills 
>
> On Tue, Sep 21, 2010 at 05:48:09PM +0200, Alexander Skwar wrote:
> >
> > We're using ZFS via iSCSI on a S10U8 system. As the ZFS Best
> > Practices Guide http://j.mp/zfs-bp states, it's advisable to use
> > redundancy (ie. RAIDZ, mirroring or whatnot), even if the underlying
> > storage does its own RAID thing.
> >
> > Now, our storage does RaID and the storage people say, it is
> > impossible to have it export iSCSI devices which have no redundancy/
> > RAID.
>
> If you have a reliable Iscsi SAN and a reliable storage device, you
> don't need the additional redundancy provided by ZFS.

Okay. This contradicts the ZFS Best Practices Guide, which states:

# For production environments, configure ZFS so that
# it can repair data inconsistencies. Use ZFS redundancy,
# such as RAIDZ, RAIDZ-2, RAIDZ-3, mirror, or copies > 1,
# regardless of the RAID level implemented on the
# underlying storage device. With such redundancy, faults in the
# underlying storage device or its connections to the host can
# be discovered and repaired by ZFS.

>
> > Actually, were would there be a difference? I mean, those iSCSI
> > devices anyway don't represent real disks/spindles, but it's just
> > some sort of abstractation. So, if they'd give me 3x400 GB compared
> > to 1200 GB in one huge lump like they do now, it could be, that
> > those would use the same spots on the real hard drives.
>
> Suppose they gave you two huge lumps of storage from the SAN, and you
> mirrored them with ZFS.  What would you do if ZFS reported that one of
> its two disks had failed and needed to be replaced?  You can't do disk
> management with ZFS in this situation anyway because those aren't real
> disks.  Disk management all has to be done on the SAN storage device.

Yes. I was rather thinking about RAIDZ instead of mirroring.

Anyway. Without redundancy, ZFS cannot do recovery, can
it? As far as I understand, it could detect block level corruption,
even if there's not redundancy. But it could not correct such a
corruption.

Or is that a wrong understanding?

If I got the gist of what you wrote, it boils down to how reliable
the SAN is? But also SANs could have "block level" corruption,
no? I'm a bit confused, because of the (perceived?) contra-
diction to the Best Practices Guide… :)

Best regards,

Alexander
--
↯    Lifestream (Twitter, Blog, …) ↣ http://alexs77.soup.io/     ↯
↯ Chat (Jabber/Google Talk) ↣ a.sk...@gmail.com , AIM: alexws77  ↯
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Intermittent ZFS hang

2010-09-23 Thread Charles J. Knipe

So, I'm still having problems with intermittent hangs on write with my ZFS 
pool.  Details from my original post are below.  Since posting that, I've gone 
back and forth with a number of you, and gotten a lot of useful advice, but I'm 
still trying to get to the root of the problem so I can correct it.  Since the 
original post I have:

-Gathered a great deal of information in the form of kernel thread dumps, 
zio_state dumps, and live crash dumps while the problem is happening.
-Been advised that my ruling out of dedupe was probably premature, as I still 
likely have a good deal of deduplicated data on-disk.
-Checked just about every log and counter that might indicate a hardware error, 
without finding one.

I was wondering at this point if someone could give me some pointers on the 
following:
1. Given the dumps and diagnostic data I've gathered so far, is there a way I 
can determine for certain where in the ZFS driver I'm spending so much time 
hanging?  At the very least I'd like to try to determine whether it is, in-fact 
a deduplication issue.
2. If it is, in fact, a deduplication issue, would my only recourse be a new 
pool and a send/receive operation?  The data we're storing is VMFS volumes for 
ESX.  We're tossing around the idea of creating new volumes in the same pool 
(now that dedupe is off) and migrating VMs over in small batches.  The theory 
is that we would be writing non-deduped data this way, and when we were done we 
could remove the deduplicated volumes.  Is this sound?

Thanks again for all the help!

-Charles

> Howdy,
> 
> We're having a ZFS performance issue over here that I
> was hoping you guys could help me troubleshoot.  We
> have a ZFS pool made up of 24 disks, arranged into 7
> raid-z devices of 4 disks each.  We're using it as an
> iSCSI back-end for VMWare and some Oracle RAC
> clusters.
> 
> Under normal circumstances performance is very good
> both in benchmarks and under real-world use.  Every
> couple days, however, I/O seems to hang for anywhere
> between several seconds and several minutes.  The
> hang seems to be a complete stop of all write I/O.
>  The following zpool iostat illustrates:
> 
> pool0   2.47T  5.13T120  0   293K  0
> pool0   2.47T  5.13T127  0   308K  0
> pool0   2.47T  5.13T131  0   322K  0
> pool0   2.47T  5.13T144  0   347K  0
> pool0   2.47T  5.13T135  0   331K  0
> pool0   2.47T  5.13T122  0   295K  0
> pool0   2.47T  5.13T135  0   330K  0
> 
> While this is going on our VMs all hang, as do any
> "zfs create" commands or attempts to touch/create
> files in the zfs pool from the local system.  After
> several minutes the system "un-hangs" and we see very
> high write rates before things return to normal
> across the board.
> 
> Some more information about our configuration:  We're
> running OpenSolaris svn-134.  ZFS is at version 22.
> Our disks are 15kRPM 300gb Seagate Cheetahs, mounted
> in Promise J610S Dual enclosures, hanging off a Dell
> SAS 5/e controller.  We'd tried out most of this
> configuration previously on OpenSolaris 2009.06
> without running into this problem.  The only thing
> that's new, aside from the newer OpenSolaris/ZFS is
>  a set of four SSDs configured as log disks.
> 
> At first we blamed de-dupe, but we've disabled that.
> Next we suspected the SSD log disks, but we've seen
>  the problem with those removed, as well.
> 
> Has anyone seen anything like this before?  Are there
> any tools we can use to gather information during the
> hang which might be useful in determining what's
> going wrong?
> 
> Thanks for any insights you may have.
> 
> -Charles
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] non-ECC Systems and ZFS for home users (was: Please warn a home user against OpenSolaris under VirtualBox under WinXP ; ))

2010-09-23 Thread Mike.



On 9/23/2010 at 12:38 PM Erik Trimble wrote:

| [snip]
|If you don't really care about ultra-low-power, then there's
absolutely 
|no excuse not to buy a USED server-class machine which is 1- or 2- 
|generations back.  They're dirt cheap, readily available, 
| [snip]
 =



Anyone have a link or two to a place where I can buy some dirt-cheap,
readily available last gen servers?



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Pools inside pools

2010-09-23 Thread Nicolas Williams

On Thu, Sep 23, 2010 at 06:58:29AM +, Markus Kovero wrote:
> > What is an example of where a checksummed outside pool would not be able 
> > to protect a non-checksummed inside pool?  Would an intermittent 
> > RAM/motherboard/CPU failure that only corrupted the inner pool's block 
> > before it was passed to the outer pool (and did not corrupt the outer 
> > pool's block) be a valid example?
> 
> > If checksums are desirable in this scenario, then redundancy would also 
> > be needed to recover from checksum failures.
> 
> That is excellent point also, what is the point for checksumming if
> you cannot recover from it? At this kind of configuration one would
> benefit performance-wise not having to calculate checksums again.

The benefit of checksumming in the "inner tunnel", as it were (the inner
pool), is to provide one more layer of protection relative to iSCSI.
But without redundancy in the inner pool you cannot recover from
failures, as you point out.  And you must have checksumming in the outer
pool, so that it can be scrubbed.

It's tempting to say that the inner pool should not checksum at all, and
that iSCSI and IPsec should be configured correctly to provide
sufficient protection to the inner pool.  Another possibility is to have
a remote ZFS protocol of sorts, but then you begin to wonder if
something like Lustre (married to ZFS) isn't better.

> Checksums in outer pools effectively protect from disk issues, if
> hardware fails so data is corrupted isn't outer pools redundancy going
> to handle it for inner pool also.

Yes.

Nico
-- 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] non-ECC Systems and ZFS for home users (was: Please warn a home user against OpenSolaris under VirtualBox under WinXP ; ))

2010-09-23 Thread Erik Trimble

 [I'm deleting the whole thread, since this is a rehash of several 
discussions on this list previously - check out the archives, and search 
for "ECC RAM"]



These days, for a "home" server, you really have only one choice to make:

"How much power do I care that this thing uses?"



If you are sensitive to the power (and possibly cooling) budget that 
your home server might use, then there are a myriad of compromises 
you're going to have to make - and lack of ECC support is almost 
certainly going to be the first one.  Very, very, very few low-power 
(i.e. under 25W) CPUs support ECC.  A couple of the very-low-voltage EE 
Opterons, and some of the laptop-series Core2 chips are about the best 
hope you have get a CPU which is both low-power and supports ECC.



If you don't really care about ultra-low-power, then there's absolutely 
no excuse not to buy a USED server-class machine which is 1- or 2- 
generations back.  They're dirt cheap, readily available, and support 
all those nice features you'll have problems replicating in trying to do 
a build-it-yourself current-gen box.



For instance, an IBM x3500 tower machine, with dual-core Xeon 
5100-series CPUs, on-board ILOM/BMC, redundant power supply, ECC RAM 
support, and 8 hot-swap SAS/SATA 3.5" bays (and the nice SAS/SATA 
controller supported by Solaris) is about $500, minus the drives.  The 
Sun Ultra 40 is similar.  The ultra-cheapo Dell 400SC works fine, too.



And, frankly, buying a used brand-name server machine will almost 
certainly give you a big advantage over building-it-yourself in one 
crucial (and generally overlooked) area:  the power supply.   These 
machines have significantly better power supplies (and, most have 
redundant ones) than what you can buy for a PC.  Indeed, figure you need 
to spend at least $100 on the PS alone if you build it yourself.



--
Erik Trimble
Java System Support
Mailstop:  usca22-123
Phone:  x17195
Santa Clara, CA

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Growing a root ZFS mirror on b134?

2010-09-23 Thread Cindy Swearingen

> On 23/09/2010 11:06 PM, casper@sun.com wrote:
> >
> >> Ok, that doesn't seem to have worked so well ...
> >>
> >> I took one of the drives offline, rebooted and it
> just hangs at the
> >> splash screen after prompting for which BE to boot
> into.
> >> It gets to
> >> hostname: blah
> >> and just sits there.
> >
> >
> > When you say "offline", did you:
> >
> > - remove the drive physically?
> > - or did you zfs detach it?
> > - or both?
> 
> zpool offline rpool 
> 
> It's plugged back in now (I'm trying all sorts of
> things!)
> 
> 
> > In order to remove half of the mirror I suggest
> that you:
> >
> >
> > split the mirror (if your ZFS is recent enough;
> seems to be
> >  supported since 131)
> > [ make sure you remove /etc/zfs/zpool.cache
> from the
> >split half of the mirror. ]
> > or
> > detach
> >
> >
> > only then remove the disk.
> >
> > Depending on the hardware it may try to find the
> missing disk and this
> > may take some time.
> 
> "some time" being a minute, an hour?  How long should
> I wait before giving in and trying something else?
> 
> >
> > You can boot with the debugger and/or -v to find
> out was is going on.
> 
> How is this done on a PC?  On SPARC I'd just have
> said 'boot -s" or whatever the arguments are for that
> these days, but x86 PC's?
> 
> Thanks Casper (I remember you from the 1990's and
> your early Solaris 2.x FAQ!)

Hi--

I would re-connect the original disk and re-attach it to your rpool. After it 
resilvers,  
re-apply the boot blocks. 

Do you really need such a large root pool (2 TBs)?  You might consider creating 
a
separate data pool with the 2 TB disks and move some of your non-OS root pool
data over to the data pool to reduce your root pool space consumption.

If you still want to increase your root pool by using the 2 TB disks, start 
over with the 
replacement process. In general, you would replace a smaller root pool disk by 
using 
one of the following options:

1. Attach the larger disk, let it resilver, apply the bootblocks, confirm that 
you can 
boot from it. Then, detach the smaller disk.

2. Replace the smaller disk with the larger disk, one disk at a time, by using 
the 
zpool replace command. This process is a bit riskier if anything is wrong  with 
the 
replacement disk. 

Do you have enough slots to connect all 4 disks at once (the two smaller ones 
and the 
two larger ones)? If so, I would recommend a combination of the above options 
because 
your root pool is already mirrored. Something like this:

disk1 = 750 GB
disk2 = 750 GB
disk3 = 2 TB
disk4 = 2 TB

# zpool replace rpool disk2 disk3
/* Use zpool status to check resilvering status
/* Apply bootblocks to disk3
/* Test booting from disk3
/* If disk3 boot succeeds, continue:
# zpool attach rpool disk3 disk4
/* Use zpool status to check resilvering status
/* Apply bootblocks to disk4
/* Test booting from disk4
# zpool detach rpool disk1

You might review some of the steps here:

http://www.solarisinternals.com/wiki/index.php/ZFS_Troubleshooting_Guide

Replacing/Relabeling the Root Pool Disk 

Thanks,

Cindy
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] non-ECC Systems and ZFS for home users (was: Please warn a home user against OpenSolaris under VirtualBox under WinXP ; ))

2010-09-23 Thread R.G. Keen

I should clarify. I was addressing just the issue of 
virtualizing, not what the complete set of things to
do to prevent data loss is. 

> 2010/9/19 R.G. Keen 
> > and last-generation hardware is very, very cheap.
> Yes, of course, it is. But, actually, is that a true
> statement? 
Yes, it is. Last-generation hardware is, in general, 
very cheap. But there is no implication either way 
about ECC in that. And in fact, there is a buyer's
market for last-generation *servers* with ECC that 
is very cheap too. I can get a single-unit rackmount
server setup for under $100 here in Austin that includes
ECC memory. 

That may not be the best of all possible things to do
on a number of levels. But for me, the likelihood of 
making a setup or operating mistake in a virtual machine 
setup server is far outweighs the hardware cost to put
another physical machine on the ground. 

>I've read that it's *NOT* advisable to run ZFS on systems 
>which do NOT have ECC  RAM. And those cheapo last-gen 
>hardware boxes quite often don't have ECC, do they?
Most of them, the ex-desktop boxes, do not. However, 
as I noted above, removed-from-service servers are also
quite cheap. They *do* have ECC. I say this just to 
illustrate the point that a statement about last generation
hardware says nothing about ECC, either positive or negative.

In fact, the issue goes further. Processor chipsets from both
Intel and AMD used to support ECC on an ad-hoc basis. It may
have been there, but may or may not have been supported
by the motherboard. Intels recent chipsets emphatically do 
not support ECC. AMDs do, in general. However, the motherboard
must still support the ECC reporting in hardware and BIOS for
ECC to actually work, and you have to buy the ECC memory. 
The newer the intel motherboard, the less likely and more
expensive ECC is. Older intel motherboards sometimes
did support ECC, as a side note. 

There's about sixteen more pages of typing to cover the issue 
even modestly correctly. The bottom line is this: for 
current-generation hardware, buy an AMD AM3 socket CPU,
ASUS motherboard, and ECC memory. DDR2 and DDR3 ECC
memory is only moderately more expensive than non-ECC.

I have this year built two Opensolaris servers from scratch.
They use the Athlon II processors, 4GB of ECC memory and
ASUS motherboards. This setup runs ECC, and supports ECC 
reporting and scrubbing. The cost of this is about $65 for
the CPU, $110 for memory, and $70-$120 for the motherboard. 
$300 more or less gets you new hardware that runs a 64bit
OS, ECC, and zfs, and does not give you worries about the 
hardware going into wearout. I also bought new, high quality
power supplies for $40-$60 per machine because the power
supply is a single point of failure, and wears out - that's a 
fact that many people ignore until the machine doesn't come
up one day.

> So, I wonder - what's the recommendation, or rather,
> experience as far as home users are concerned? Is it "safe 
>enough" now do use ZFS on non-ECC-RAM systems (if backups 
>are around)?
That's more a question about how much you trust your backups
than a question about ECC. 

ZFS is a layer of checking and recovery on disk writes. If your
memory/CPU tell it to carefully save and recover corrupted
data, it will. Memory corruption is something zfs does not 
address in any way, positive or negative. 

[i][b]The correct question is this: given how much value you put on
not losing your data to hardware or software errors, how much
time and money are you willing to spend to make sure you don't 
lose your data?[/b][/i]
ZFS prevents or mitigates many of the issues involved with disk
errors and bit rot. ECC prevents or mitigates many of the issues
involved with memory corruption. 

My recommendation is this: if you are playing around, fine, use
virtual machines for your data backup. If you want some amount
of real data backup security, address the issues of data corruption
on as many levels as you can. "Safe enough" is something only 
you can answer. My answer, for me and my data, is a separate
machine which does only data backup, which runs both ECC and
zfs, on new (and burnt-in) hardware, which runs only the data
management tasks to simplify the software interactions being run,
and that being two levels deep on different hardware setups, 
finally flushing out to offline DVDs which are themselves protected
by ECC (look up DVDisaster) and periodically scanned for errors
and recopied. 

That probably seems excessive. But I've been burned with subtle
data loss before. It only takes one or two flipped bits in the wrong
places to make a really ugly scenario. Losing an entire file is in 
many ways easier to live with than a quiet error that gets 
propagated silently into your backup stream. When that happens, 
you can't trust **any** file until you have manually checked it, if
that is even possible. Want a really paranoia inducing situation?
Think about what happens if you find a silent bit corruption in 
a file system

Re: [zfs-discuss] non-ECC Systems and ZFS for home users (was: Please warn a home user against OpenSolaris under VirtualBox under WinXP ; ))

2010-09-23 Thread David Dyer-Bennet

On Thu, September 23, 2010 01:33, Alexander Skwar wrote:
> Hi.
>
> 2010/9/19 R.G. Keen 
>
>> and last-generation hardware is very, very cheap.
>
> Yes, of course, it is. But, actually, is that a true statement? I've read
> that it's *NOT* advisable to run ZFS on systems which do NOT have ECC
> RAM. And those cheapo last-gen hardware boxes quite often don't have
> ECC, do they?

Last-generation server hardware supports ECC, and was usually populated
with ECC.  Last-generation desktop hardware rarely supports ECC, and was
even more rarely populated with ECC.

The thing is, last-generation server hardware is, um, marvelously adequate
for most home setups (the problem *I* see with it, for many home setups,
is that it's *noisy*).  So, if you can get it cheap in a sound-level that
fits your needs, that's not at all a bad choice.

I'm running a box I bought new as a home server, but it's NOW at least
last-generation hardware (2006), and it's still running fine; in
particular the CPU load remains trivial compared to what the box supports
(not doing compression or dedup on the main data pool, though I do
compress the backup pools on external USB disks).  (It does have ECC; even
before some of the cases leading to that recommendation were explained on
that list, I just didn't see the percentage in not protecting the memory.)

-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] non-ECC Systems and ZFS for home users

2010-09-23 Thread Richard Elling

On Sep 23, 2010, at 9:08 AM, Dick Hoogendijk wrote:
> On 23-9-2010 16:34, Frank Middleton wrote:
> 
> > For home use, used Suns are
>   available at ridiculously low prices and
> 
>   > they seem to be much better engineered than your typical PC.
>   Memory
> 
>   > failures are much more likely than winning the pick 6
>   lotto...
> 
> And about what SUN systems are you thinking for 'home use' ?

At one time, due to market pricing pressure, Sun actually sold a server without 
ECC.
Bad idea, didn't last long.  Unfortunately, the PeeCee market is just too cheap 
to 
value ECC.  So they take the risk and hope for the best.

> The likeliness of memory failures might be much higher than becoming a 
> millionair, but in the years past I have never had one. And my home sytems 
> are rather cheap. Mind you, not the cheapest, but rather cheap. I do buy good 
> memory though. So, to me, with a good backup I feel rather safe using ZFS. I 
> also had it running for quite some time on a 32bits machine and that also 
> worked out fine.

Part of the difference is the expected use.  For PCs which are only used 8 
hours per
day, 40 hours per week, rebooting regularly, the risk of transient main memory 
errors
is low.  For servers running 24x7, rebooting once a year, the risk is much 
higher.

> The fact that a perfectly good file can not be read because of a bad checksum 
> is a design failure imho. There should be an option to overrule this 
> behaviour of ZFS.

It isn't a perfectly good file once it has been corrupted. But there are some 
ways to get
at the file contents.  Remember, the blocks are checksummed, not the file. So 
if a bad
block is in the file, you can skip over it.
http://blogs.sun.com/relling/entry/holy_smokes_a_holey_file
http://blogs.sun.com/relling/entry/dd_tricks_for_holey_files
http://blogs.sun.com/relling/entry/more_on_holey_files

 -- richard

-- 
OpenStorage Summit, October 25-27, Palo Alto, CA
http://nexenta-summit2010.eventbrite.com
ZFS and performance consulting
http://www.RichardElling.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] non-ECC Systems and ZFS for home users

2010-09-23 Thread Dick Hoogendijk


 On 23-9-2010 16:34, Frank Middleton wrote:


 For home use, used Suns are available at ridiculously low prices and
 they seem to be much better engineered than your typical PC. Memory
 failures are much more likely than winning the pick 6 lotto...


And about what SUN systems are you thinking for 'home use' ?
The likeliness of memory failures might be much higher than becoming a 
millionair, but in the years past I have never had one. And my home 
sytems are rather cheap. Mind you, not the cheapest, but rather cheap. I 
do buy good memory though. So, to me, with a good backup I feel rather 
safe using ZFS. I also had it running for quite some time on a 32bits 
machine and that also worked out fine.


The fact that a perfectly good file can not be read because of a bad 
checksum is a design failure imho. There should be an option to overrule 
this behaviour of ZFS.


My 2çt

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Please warn a home user against OpenSolaris under VirtualBox under WinXP ; )

2010-09-23 Thread Nils

@ kebabber:

> There was a guy doing that: Windows as host and
> OpenSolaris as guest with raw access to his disks. He
> lost his 12 TB data. It turned out that VirtualBox
> dont honor the write flush flag (or something
> similar).

That story is in the link I provided, and as has been pointed out here, it is 
solvable.

I am more worried that the hard drives I wanted to reserve for server use, will 
be taken over by host. "Initialized" in MS'esque, might be dangerous?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Sliced iSCSI device for doing RAIDZ?

2010-09-23 Thread Gary Mills

On Tue, Sep 21, 2010 at 05:48:09PM +0200, Alexander Skwar wrote:
> 
> We're using ZFS via iSCSI on a S10U8 system. As the ZFS Best
> Practices Guide http://j.mp/zfs-bp states, it's advisable to use
> redundancy (ie. RAIDZ, mirroring or whatnot), even if the underlying
> storage does its own RAID thing.
> 
> Now, our storage does RaID and the storage people say, it is
> impossible to have it export iSCSI devices which have no redundancy/
> RAID.

If you have a reliable Iscsi SAN and a reliable storage device, you
don't need the additional redundancy provided by ZFS.

> Actually, were would there be a difference? I mean, those iSCSI
> devices anyway don't represent real disks/spindles, but it's just
> some sort of abstractation. So, if they'd give me 3x400 GB compared
> to 1200 GB in one huge lump like they do now, it could be, that
> those would use the same spots on the real hard drives.

Suppose they gave you two huge lumps of storage from the SAN, and you
mirrored them with ZFS.  What would you do if ZFS reported that one of
its two disks had failed and needed to be replaced?  You can't do disk
management with ZFS in this situation anyway because those aren't real
disks.  Disk management all has to be done on the SAN storage device.

-- 
-Gary Mills--Unix Group--Computer and Network Services-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] scrub: resilver in progress for 0h38m, 0.00% done, 1131207h51m to go

2010-09-23 Thread LIC mesh

On Wed, Sep 22, 2010 at 8:13 PM, Richard Elling  wrote:

> On Sep 22, 2010, at 1:46 PM, LIC mesh wrote:
>
> Something else is probably causing the slow I/O.  What is the output of
> "iostat -en" ?  The best answer is "all balls"  (balls == zeros)
>
>  Found a number of LUNs with errors this way, looks like it has to do with
network problems more so than the hardware, so we're going to try turning of
LACP and using just 1 NIC.

>
>
> For SATA drives, we find that zfs_vdev_max_pending = 2 can be needed in
> certain recovery cases.
>
> We've played around with this on the individual shelves (originally was set
at 1 for quite a great amount of time), but left the head at default for
build 134.

>
>
> Yes.  But some are not inexpensive.
>  -- richard
>
> What price range would we be looking at?

 - Michael
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] non-ECC Systems and ZFS for home users

2010-09-23 Thread Frank Middleton


On 09/23/10 03:01, Ian Collins wrote:


So, I wonder - what's the recommendation, or rather, experience as far
as home users are concerned? Is it "safe enough" now do use ZFS on
non-ECC-RAM systems (if backups are around)?


It's as safe as running any other OS.

The big difference is ZFS will tell you when there's a corruption. Most
users of other systems are blissfully unaware of data corruption!


This runs you into the possibility of perfectly good files becoming inaccessible
due to bad checksums being written to all the mirrors. As Richard Elling
wrote some time ago in "[zfs-discuss] You really do need ECC RAM", see
http://www.cs.toronto.edu/%7Ebianca/papers/sigmetrics09.pdf. There
were a couple of zfs-discuss threads quite recently about memory problems
causing serious issues. Personally, I wouldn't trust any valuable data to any
system without ECC, regardless of OS and file systems. For home use, used
Suns are available at ridiculously low prices and they seem to be much better
engineered than your typical PC. Memory failures are much more likely than
winning the pick 6 lotto...

FWIW Richard helped me diagnose a problem with checksum failures on
mirrored drives a while back and it turned out to be the CPU itself getting
the actual checksum wrong /only on one particular file/, and even then only
when the ambient temperature was high. So ZFS is good at ferreting out
obscure hardware problems :-).

Cheers -- Frank
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Growing a root ZFS mirror on b134?

2010-09-23 Thread Carl Brewer

On 23/09/2010 11:06 PM, casper@sun.com wrote:
>
>> Ok, that doesn't seem to have worked so well ...
>>
>> I took one of the drives offline, rebooted and it just hangs at the
>> splash screen after prompting for which BE to boot into.
>> It gets to
>> hostname: blah
>> and just sits there.
>
>
> When you say "offline", did you:
>
> - remove the drive physically?
> - or did you zfs detach it?
> - or both?

zpool offline rpool 

It's plugged back in now (I'm trying all sorts of things!)


> In order to remove half of the mirror I suggest that you:
>
>
> split the mirror (if your ZFS is recent enough; seems to be
>  supported since 131)
> [ make sure you remove /etc/zfs/zpool.cache from the
>split half of the mirror. ]
> or
> detach
>
>
> only then remove the disk.
>
> Depending on the hardware it may try to find the missing disk and this
> may take some time.

"some time" being a minute, an hour?  How long should I wait before giving in 
and trying something else?

>
> You can boot with the debugger and/or -v to find out was is going on.

How is this done on a PC?  On SPARC I'd just have said 'boot -s" or whatever 
the arguments are for that these days, but x86 PC's?

Thanks Casper (I remember you from the 1990's and your early Solaris 2.x FAQ!)
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Growing a root ZFS mirror on b134?

2010-09-23 Thread Casper . Dik


>Ok, that doesn't seem to have worked so well ...
>
>I took one of the drives offline, rebooted and it just hangs at the
>splash screen after prompting for which BE to boot into.   
>It gets to 
>hostname: blah
>and just sits there.


When you say "offline", did you:

- remove the drive physically?
- or did you zfs detach it?
- or both?


In order to remove half of the mirror I suggest that you:


split the mirror (if your ZFS is recent enough; seems to be 
supported since 131) 
[ make sure you remove /etc/zfs/zpool.cache from the
  split half of the mirror. ]
or
detach 


only then remove the disk.

Depending on the hardware it may try to find the missing disk and this
may take some time.

You can boot with the debugger and/or -v to find out was is going on.

Casper

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Growing a root ZFS mirror on b134?

2010-09-23 Thread Carl Brewer

swapping the boot order in the PC's BIOS doesn't help
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Growing a root ZFS mirror on b134?

2010-09-23 Thread Carl Brewer

it is responding to pings, btw, so *something's* running.  Not ssh though 
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Growing a root ZFS mirror on b134?

2010-09-23 Thread Carl Brewer

Ok, that doesn't seem to have worked so well ...

I took one of the drives offline, rebooted and it just hangs at the splash 
screen after prompting for which BE to boot into.   
It gets to 
hostname: blah
and just sits there.

Um ...

I read some doco that says :

The boot process can be slow if the boot archive is updated or a dump device 
has changed. Be patient.

(from 
http://www.solarisinternals.com/wiki/index.php/ZFS_Troubleshooting_Guide#Disk_Replacement_Example)
DO I need to just wait and it'll come up eventually or do I need to do 
something more constructive?  There's no drive activity after an initial flurry 
for ~1-2 mins or so. It's just hanging.

I kinda need this box up, nothing on it is irreplaceable but it'll be a major 
PITA if I lose it.  Can the root pool be mounted in some form on another box 
and the data from it be retreived?  It's running a more current zpool version 
than 200906 (whatever's current in b134) - maybe if I crank up an openIllumos 
or whatever it's called and see if that can mount the drives somehow?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] non-ECC Systems and ZFS for home users (was: Please warn a home user against OpenSolaris under VirtualBox under WinXP ; ))

2010-09-23 Thread Casper . Dik

>  On 23-9-2010 10:25, casper@sun.com wrote:
>> I'm using ZFS on a system w/o ECC; it works (it's an Atom 230).
>
>I'm using ZFS on a non-ECC machine for years now without any issues. 
>Never had errors. Plus, like others said, other OS'ses have the same 
>problems and also run quite well. If not, you don't know it. With ZFS 
>you will know.
>I would say - just go for it. You will never want to go back.

Indeed.  While I mirror stuff on the same system, I'm now also making
backups using a USB connected disk (eSATA would be better but the box
only has USB).

My backup consists of:

for pool in $pools
do

zfs snapshot -r $p...@$newnapshot
zfs send -R -I $p...@$lastsnapshot $p...@$newsnapshot |
zfs receive -v -u -d -F portable/$pool
done

then I export and store the portable pool somewhere else.

I do run a once per two weeks scrub for all the pools, just in case.

Casper

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] non-ECC Systems and ZFS for home users (was: Please warn a home user against OpenSolaris under VirtualBox under WinXP ; ))

2010-09-23 Thread Dick Hoogendijk


 On 23-9-2010 10:25, casper@sun.com wrote:

I'm using ZFS on a system w/o ECC; it works (it's an Atom 230).


I'm using ZFS on a non-ECC machine for years now without any issues. 
Never had errors. Plus, like others said, other OS'ses have the same 
problems and also run quite well. If not, you don't know it. With ZFS 
you will know.

I would say - just go for it. You will never want to go back.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] non-ECC Systems and ZFS for home users (was: Please warn a home user against OpenSolaris under VirtualBox under WinXP ; ))

2010-09-23 Thread Casper . Dik



I'm using ZFS on a system w/o ECC; it works (it's an Atom 230).

Note that this is not different from using another OS; the difference is 
that ZFS will complain when memory leads to disk corruption; without ZFS 
you will still have memory corruption but you wouldn't know.

Is it helpful not knowing that you have memory corruption?  I don't think 
so.

I've love to have a small (<40W) system with ECC but it is difficult to 
find one.

Casper

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Pools inside pools

2010-09-23 Thread Mattias Pantzare

On Thu, Sep 23, 2010 at 08:48, Haudy Kazemi  wrote:
> Mattias Pantzare wrote:
>>
>> ZFS needs free memory for writes. If you fill your memory with dirty
>> data zfs has to flush that data to disk. If that disk is a virtual
>> disk in zfs on the same computer those writes need more memory from
>> the same memory pool and you have a deadlock.
>> If you write to a zvol on a different host (via iSCSI) those writes
>> use memory in a different memory pool (on the other computer). No
>> deadlock.
>
> Isn't this a matter of not keeping enough free memory as a workspace?  By
> free memory, I am referring to unallocated memory and also recoverable
main
> memory used for shrinkable read caches (shrinkable by discarding cached
> data).  If the system keeps enough free and recoverable memory around for
> workspace, why should the deadlock case ever arise?  Slowness and page
> swapping might be expected to arise (as a result of a shrinking read cache
> and high memory pressure), but deadlocks too?

Yes. But what is enough reserved free memory? If you need 1Mb for a normal
configuration you might need 2Mb when you are doing ZFS on ZFS. (I am just
guessing).

This is the same problem as mounting an NFS server on itself via NFS. Also
not supported.

The system has shrinkable caches and so on, but that space will sometimes
run out. All of it. There is also swap to use, but if that is on ZFS

These things are also very hard to test.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Pools inside pools

2010-09-23 Thread Haudy Kazemi

Markus Kovero wrote:
What is an example of where a checksummed outside pool would not be able
to protect a non-checksummed inside pool? Would an intermittent
RAM/motherboard/CPU failure that only corrupted the inner pool's block
before it was passed to the outer pool (and did not corrupt the outer
pool's block) be a valid example?

If checksums are desirable in this scenario, then redundancy would also
be needed to recover from checksum failures.

That is excellent point also, what is the point for checksumming if you cannot recover from it?
Checksum errors can tell you there is probably a problem worthy of
attention. They can prevent you from making things worse by stopping
you in your tracks until whatever triggered them is resolved, or enough
redundancy is available to overcome the errors. This is why operating
system kernels panic/abend/BSOD when they detect that the system state
has been changed in an unknown way which could have unpredictable (and
likely bad) results on further operations.

Redundancy is useful when you can't recover the data by simply asking
for it to be re-sent or by getting it from another source.
Communications buses and protocols will use checksums to detect
corruption and resends/retries to recover from checksum failures. That
strategy doesn't work when you are talking about your end storage media.

At this kind of configuration one would benefit performance-wise not having to
calculate checksums again.
Checksums in outer pools effectively protect from disk issues, if hardware
fails so data is corrupted isn't outer pools redundancy going to handle it for
inner pool also.
Only thing comes to mind is that IF something happens to outerpool, innerpool
is not aware anymore of possibly broken data which can lead issues.

Yours
Markus Kovero

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Growing a root ZFS mirror on b134?

2010-09-23 Thread Ian Collins


On 09/23/10 05:00 PM, Carl Brewer wrote:

G'day,
My OpenSolaris (b134) box is low on space and has a ZFS mirror for root :

  uname -a
SunOS wattage 5.11 snv_134 i86pc i386 i86pc


rpool 696G   639G  56.7G91%  1.09x  ONLINE  -

It's currently a pair of 750GB drives.  In my bag I have a pair of brand 
spanking new 2TB seagates that I plan to replace the 750's with.

Can anyone here confirm that the following will (should!) work :

take one drive offline
shutdown
replace physical drive&  restart
add it to the mirror, wait for the mirror to resilver
installboot on the new drive
boot
check everything
take other drive offline
shutdown
replace physical drive&  restart
add to the mirror, wait for resilver
installboot on new drive

run zpool list/df etc and see LOTS more disk space in rpool!

Anything I may have missed?
   

Make a cup of tea while the mirrors resilver!

Your steps look to be complete.

--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] non-ECC Systems and ZFS for home users (was: Please warn a home user against OpenSolaris under VirtualBox under WinXP ; ))

2010-09-23 Thread Ian Collins


On 09/23/10 06:33 PM, Alexander Skwar wrote:

Hi.

2010/9/19 R.G. Keen

   

and last-generation hardware is very, very cheap.
 

Yes, of course, it is. But, actually, is that a true statement? I've read
that it's *NOT* advisable to run ZFS on systems which do NOT have ECC
RAM. And those cheapo last-gen hardware boxes quite often don't have
ECC, do they?

So, I wonder - what's the recommendation, or rather, experience as far
as home users are concerned? Is it "safe enough" now do use ZFS on
non-ECC-RAM systems (if backups are around)?

   

It's as safe as running any other OS.

The big difference is ZFS will tell you when there's a corruption.  Most 
users of other systems are blissfully unaware of data corruption!


All my desktops use ZFS, none have ECC.

--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Pools inside pools

2010-09-23 Thread Markus Kovero

> What is an example of where a checksummed outside pool would not be able 
> to protect a non-checksummed inside pool?  Would an intermittent 
> RAM/motherboard/CPU failure that only corrupted the inner pool's block 
> before it was passed to the outer pool (and did not corrupt the outer 
> pool's block) be a valid example?

> If checksums are desirable in this scenario, then redundancy would also 
> be needed to recover from checksum failures.


That is excellent point also, what is the point for checksumming if you cannot 
recover from it? At this kind of configuration one would benefit 
performance-wise not having to calculate checksums again.
Checksums in outer pools effectively protect from disk issues, if hardware 
fails so data is corrupted isn't outer pools redundancy going to handle it for 
inner pool also.
Only thing comes to mind is that IF something happens to outerpool, innerpool 
is not aware anymore of possibly broken data which can lead issues.

Yours
Markus Kovero
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

40 matches

Mail list logo