date:20100204

On Thu, Feb 4, 2010 at 10:35 PM, Bob Friesenhahn <
bfrie...@simple.dallas.tx.us> wrote:

> On Thu, 4 Feb 2010, Marc Nicholas wrote:
>
>>
>> The write IOPS between the X25-M and the X25-E are different since with
>> the X25-M, much
>> more of your data gets completely lost.  Most of us prefer not to lose our
>> data.
>>
>> Would you like to qualify your statement further?
>>
>
> Google is your friend.  And check earlier on this list/forum as well.
>
>  While I understand the difference between MLC and SLC parts, I'm pretty
>> sure Intel didn't
>> design the M version to make "data get completely lost". ;)
>>
>
> It loses the most recently written data, even after a cache sync request.
>  A number of people have verified this for themselves and posted results.
>  Even the X25-E has been shown to lose some transactions.
>
> The devices have some DRAM (16MB) that is used for write amplification
levelling. The sudden loss of power means that this DRAM doesn't get flushed
to Flash. This is the very reason the STEC devices have a supercap.

-marc
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Impact of an enterprise class SSD on ZIL performance


On Thu, 4 Feb 2010, Marc Nicholas wrote:


The write IOPS between the X25-M and the X25-E are different since with the 
X25-M, much
more of your data gets completely lost.  Most of us prefer not to lose our data.

Would you like to qualify your statement further?


Google is your friend.  And check earlier on this list/forum as well.


While I understand the difference between MLC and SLC parts, I'm pretty sure 
Intel didn't
design the M version to make "data get completely lost". ;)


It loses the most recently written data, even after a cache sync 
request.  A number of people have verified this for themselves and 
posted results.  Even the X25-E has been shown to lose some 
transactions.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Impact of an enterprise class SSD on ZIL performance

On Thu, Feb 4, 2010 at 10:18 PM, Bob Friesenhahn <
bfrie...@simple.dallas.tx.us> wrote:

> On Thu, 4 Feb 2010, Marc Nicholas wrote:
>
>  Very interesting stats -- thanks for taking the time and trouble to share
>> them!
>>
>> One thing I found interesting is that the Gen 2 X25-M has higher write
>> IOPS than the
>> X25-E according to Intel's documentation (6,600 IOPS for 4K writes versus
>> 3,300 IOPS for
>> 4K writes on the "E"). I wonder if it'd perform better as a ZIL? (The
>> write latency on
>> both drives is the same).
>>
>
> The write IOPS between the X25-M and the X25-E are different since with the
> X25-M, much more of your data gets completely lost.  Most of us prefer not
> to lose our data.
>
> Would you like to qualify your statement further?

While I understand the difference between MLC and SLC parts, I'm pretty sure
Intel didn't design the M version to make "data get completely lost". ;)

-marc
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Impact of an enterprise class SSD on ZIL performance


On Thu, 4 Feb 2010, Marc Nicholas wrote:


Very interesting stats -- thanks for taking the time and trouble to share them!

One thing I found interesting is that the Gen 2 X25-M has higher write IOPS 
than the
X25-E according to Intel's documentation (6,600 IOPS for 4K writes versus 3,300 
IOPS for
4K writes on the "E"). I wonder if it'd perform better as a ZIL? (The write 
latency on
both drives is the same).


The write IOPS between the X25-M and the X25-E are different since 
with the X25-M, much more of your data gets completely lost.  Most of 
us prefer not to lose our data.


The X25-M is about as valuable as a paper weight for use as a zfs 
slog.  Toilet paper would be a step up.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Cores vs. Speed?


On Thu, 4 Feb 2010, Brian wrote:

Was my raidz2 performance comment above correct?  That the write 
speed is that of the slowest disk?  That is what I believe I have 
read.


Data in raidz2 is striped so that it is split across multiple disks. 
In this (sequential) sense it is faster than a single disk.  For 
random access, the stripe performance can not be faster than the 
slowest disk though.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Cores vs. Speed?

Interesting comments..

But I am confused.

Performance for my backups (compression/deduplication) would most likely not be 
#1 priority.

I want my VMs to run fast - so is it deduplication that really slows things 
down?

Are you saying raidz2 would overwhelm current I/O controllers to where I could 
not saturate 1 GB network link?

Is the CPU I am looking at not capable of doing dedup and compression?  Or are 
no CPUs capable of doing that currently?  If I only enable it for the backup 
filesystem will all my filesystems suffer performance wise?

Where are the bottlenecks in a raidz2 system that I will only access over a 
single gigabit link?  Are the insurmountable?



> > I plan to start with 5 1.5 TB drives in a raidz2
> configuration and 2
> > mirrored boot drives.
> 
> You want to use compression and deduplication and
> raidz2.  I hope you didn't
> want to get any performance out of this system,
> because all of those are
> compute or IO intensive.
> 
> FWIW ... 5 disks in raidz2 will have capacity of 3
> disks.  But if you bought
> 6 disks in mirrored configuration, you have a small
> extra cost, and much
> better performance.
> 
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discu
> ss
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Cores vs. Speed?

2010-02-04 Thread Edward Ned Harvey

> I plan to start with 5 1.5 TB drives in a raidz2 configuration and 2
> mirrored boot drives.

You want to use compression and deduplication and raidz2.  I hope you didn't
want to get any performance out of this system, because all of those are
compute or IO intensive.

FWIW ... 5 disks in raidz2 will have capacity of 3 disks.  But if you bought
6 disks in mirrored configuration, you have a small extra cost, and much
better performance.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Cores vs. Speed?

On Thu, Feb 4, 2010 at 7:54 PM, Brian  wrote:

> It sounds like the consensus is more cores over clock speed.  Surprising to
> me since the difference in clocks speed was over 1Ghz.  So, I will go with a
> quad core.
>

Four cores @ 1.8Ghz = 7.2Ghz of threaded performance ([Open]Solaris is
relatively decent in terms of threading).

Two cores @ 3.1Ghz = 6.2Ghz

:)

Although you may find single threaded operations slower, as someone pointed
out, but even those might wash out as sometimes its I/O that's the problem.

I was leaning towards 4GB of ram - which hopefully should be enough for
> dedup as I am only planning on dedupping my smaller file systems (backups
> and VMs)
>

4GB is a good start.

> Was my raidz2 performance comment above correct?  That the write speed is
> that of the slowest disk?  That is what I believe I have read.
>

You are sort-of-correct that its the write speed of the slowest disk.

Mirrored drives will be faster, especially for random I/O. But you sacrifice
storage for that performance boost. That said, I have a similar setup as far
as number of spindles and can push 200MB/sec+ through it and saturate GigE
for iSCSI so maybe I'm being harsh on raidz2 :)

> Now on to the hard part of picking a motherboard that is supported and has
> enough SATA ports!
>

I used an ASUS board (M4A785-M) which has six (6) SATA2 ports onboard and
pretty decent Hypertransport throughput.

Hope that helps.

-marc
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Cores vs. Speed?

It sounds like the consensus is more cores over clock speed.  Surprising to me 
since the difference in clocks speed was over 1Ghz.  So, I will go with a quad 
core.

I was leaning towards 4GB of ram - which hopefully should be enough for dedup 
as I am only planning on dedupping my smaller file systems (backups and VMs).

Was my raidz2 performance comment above correct?  That the write speed is that 
of the slowest disk?  That is what I believe I have read.

Now on to the hard part of picking a motherboard that is supported and has 
enough SATA ports!
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Cores vs. Speed?

2010-02-04 Thread Cindy Swearingen

Hi Brian,

If you are considering testing dedup, particularly on large datasets,
see the list of known issues, here:

http://hub.opensolaris.org/bin/view/Community+Group+zfs/dedup

Start with build 132.

Thanks,

Cindy

On 02/04/10 16:19, Brian wrote:

I am Starting to put together a home NAS server that will have the following
roles:

(1) Store TV recordings from SageTV over either iSCSI or CIFS. Up to 4 or 5 HD
streams at a time. These will be streamed live to the NAS box during recording.
(2) Playback TV (could be stream being recorded, could be others) to 3 or more
extenders
(3) Hold a music repository
(4) Hold backups from windows machines, mac (time machine), linux.
(5) Be an iSCSI target for several different Virtual Boxes.

Function 4 will use compression and deduplication.
Function 5 will use deduplication.

I plan to start with 5 1.5 TB drives in a raidz2 configuration and 2 mirrored boot drives.

I have been reading these forums off and on for about 6 months trying to figure
out how to best piece together this system.

I am first trying to select the CPU. I am leaning towards AMD because of ECC
support and power consumption.

For items such as de-dupliciation, compression, checksums etc. Is it better to
get a faster clock speed or should I consider more cores? I know certain
functions such as compression may run on multiple cores.

I have so far narrowed it down to:

AMD Phenom II X2 550 Black Edition Callisto 3.1GHz
and
AMD Phenom X4 9150e Agena 1.8GHz Socket AM2+ 65W Quad-Core

As they are roughly the same price.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Cores vs. Speed?

2010-02-04 Thread Arnaud Brand

Le 05/02/10 01:00, Brian a écrit :

Thanks for the reply.

Are cores better because of the compression/deduplication being mult-threaded or because of multiple streams? It is a pretty big difference in clock speed - so curious as to why core would be better. Glad to see your 4 core system is working well for you - so seems like I won't really have a bad choice.

Why avoid large drives? Reliability reasons? My main thought on that is that there is a 3 year warranty and I am building raidz2 because I expect failure. Or are there other reasons to avoid large drives?

I thought I understood the overhead.. The write and read speeds should be roughly that of the slowest disk?

Thanks.

From what I saw, ZFS scales terribly well with
multiple cores.
If you want to send/receive your filesystems through ssh to another
machine, speed matters since ssh only uses one core (but then you can
always use netcat).
On Xeon E5520 running at 2.27 GHz we achieve around 70/80 MB/s ssh
throughput.

For dedup, you want lots of RAM and if possible a large and fast ssd
for L2ARC.
Someone on this list was asking about estimates on ram/cache needs
based on blocksizes / fs size / estimated dedup ratio.
Either I missed the answer or there was no really simple answer (other
than more is better, which always stays true for ram and l2arc).
Anyway, we tested it and were surprised about the quantity of reads
that ensue.

Arnaud

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Mounting a snapshot of an iSCSI volume using Windows

2010-02-04 Thread Scott Meilicke

I have a single zfs volume, shared out using COMSTAR and connected to a Windows 
VM. I am taking snapshots of the volume regularly. I now want to mount a 
previous snapshot, but when I go through the process, Windows sees the new 
volume, but thinks it is blank and wants to initialize it. Any ideas how to get 
Windows to see that it has data on it?

Steps I took after the snap:

zfs clone  data01/san/gallardo/g-recovery
sbdadm create-lu /dev/zvol/rdsk/data01/san/gallardo/g-recovery
stmfadm add-view -h HG-Gallardo -t TG-Gallardo -n 1 
600144F0EAE40A004B6B59090003

At this point, my server Gallardo can see the LUN, but like I said, it looks 
blank to the OS. I suspect the 'sbdadm create-lu' phase.

Any help to get Windows to see it as a LUN with NTFS data would be appreciated.

Thanks,
Scott
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Impact of an enterprise class SSD on ZIL performance

2010-02-04 Thread Andrew Gabriel


Peter Radig wrote:

I was interested in the impact the type of an SSD has on the performance of the 
ZIL. So I did some benchmarking and just want to share the results.

My test case is simply untarring the latest ON source (528 MB, 53k files) on an 
Linux system that has a ZFS file system mounted via NFS over gigabit ethernet.

I got the following results:

- remotely with no dedicated ZIL device: 36 min 37 sec (factor 73 compared to 
local)

- remotely with an Intel X25-E 32 GB as ZIL device: 3 min 11 sec (factor 6.4 
compared to local)
  


That's about the same ratio I get when I demonstrate this on the 
SSD/Flash/Turbocharge Discovery Days I run the UK from time to time (the 
name changes over time;-).


--
Andrew Gabriel
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Cores vs. Speed?

2010-02-04 Thread Richard Elling

Put your money into RAM, especially for dedup.
 -- richard

On Feb 4, 2010, at 3:19 PM, Brian wrote:

> I am Starting to put together a home NAS server that will have the following 
> roles:
> 
> (1) Store TV recordings from SageTV over either iSCSI or CIFS.  Up to 4 or 5 
> HD streams at a time.  These will be streamed live to the NAS box during 
> recording.
> (2) Playback TV (could be stream being recorded, could be others) to 3 or 
> more extenders
> (3) Hold a music repository
> (4) Hold backups from windows machines, mac (time machine), linux.
> (5) Be an iSCSI target for several different Virtual Boxes.
> 
> Function 4 will use compression and deduplication.
> Function 5 will use deduplication.
> 
> I plan to start with 5 1.5 TB drives in a raidz2 configuration and 2 mirrored 
> boot drives.  
> 
> I have been reading these forums off and on for about 6 months trying to 
> figure out how to best piece together this system.
> 
> I am first trying to select the CPU.  I am leaning towards AMD because of ECC 
> support and power consumption.
> 
> For items such as de-dupliciation, compression, checksums etc.  Is it better 
> to get a faster clock speed or should I consider more cores?  I know certain 
> functions such as compression may run on multiple cores.
> 
> I have so far narrowed it down to:
> 
> AMD Phenom II X2 550 Black Edition Callisto 3.1GHz
> and
> AMD Phenom X4 9150e Agena 1.8GHz Socket AM2+ 65W Quad-Core
> 
> As they are roughly the same price.
> -- 
> This message posted from opensolaris.org
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Impact of an enterprise class SSD on ZIL performance

Very interesting stats -- thanks for taking the time and trouble to share
them!

One thing I found interesting is that the Gen 2 X25-M has higher write IOPS
than the X25-E according to Intel's documentation (6,600 IOPS for 4K writes
versus 3,300 IOPS for 4K writes on the "E"). I wonder if it'd perform better
as a ZIL? (The write latency on both drives is the same).

-marc

On Thu, Feb 4, 2010 at 6:43 PM, Peter Radig  wrote:

> I was interested in the impact the type of an SSD has on the performance of
> the ZIL. So I did some benchmarking and just want to share the results.
>
> My test case is simply untarring the latest ON source (528 MB, 53k files)
> on an Linux system that has a ZFS file system mounted via NFS over gigabit
> ethernet.
>
> I got the following results:
> - locally on the Solaris box: 30 sec
> - remotely with no dedicated ZIL device: 36 min 37 sec (factor 73 compared
> to local)
> - remotely with ZIL disabled: 1 min 54 sec (factor 3.8 compared to local)
> - remotely with a OCZ VERTEX SATA II 120 GB as ZIL device: 14 min 40 sec
> (factor 29.3 compared to local)
> - remotely with an Intel X25-E 32 GB as ZIL device: 3 min 11 sec (factor
> 6.4 compared to local)
>
> So it really makes a difference what type of SSD you use for your ZIL
> device. I was expecting a good performance from the X25-E, but was really
> suprised that it is that good (only 1.7 times slower than it takes with ZIL
> completely disabled). So I will use the X25-E as ZIL device on my box and
> will not consider disabling ZIL at all to improve NFS performance.
>
> -- Peter
> --
> This message posted from opensolaris.org
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Cores vs. Speed?

Thanks for the reply.

Are cores better because of the compression/deduplication being mult-threaded 
or because of multiple streams?  It is a pretty big difference in clock speed - 
so curious as to why core would be better.  Glad to see your 4 core system is 
working well for you - so seems like I won't really have a bad choice.

Why avoid large drives?  Reliability reasons?  My main thought on that is that 
there is a 3 year warranty and I am building raidz2 because I expect failure.  
Or are there other reasons to avoid large drives?

I thought I understood the overhead..  The write and read speeds should be 
roughly that of the slowest disk? 

Thanks.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Best 1.5TB drives for consumer RAID?

2010-02-04 Thread Arnaud Brand

Le 04/02/10 20:26, Tonmaus a écrit :

Hi again,

thanks for the answer. Another thing that came to my mind is that you mentioned that you mixed the disks among the controllers. Does that mean you mixed them as well among pools? Unsurprisingly, the WD20EADS is slower than the Hitachi that is a fixed 7200 rpm drive. I wonder what impact that would have if you use them as vdevs of the same pool.

Cheers,

Tonmaus

Yes, we mixed them among controllers and pools.
We've done something that's not recommended : a 15 disk raidz3 pool.

Disks are as follows :
c3 (LSI SAS) has :
- 1x 64 GB Intel X25E
- 3 x 2TB WD20EADS
- 4 x 2TB Hitachi
c2 (LSI SAS) has :
- 4 x 2TB WD20EADS
- 4 x 2TB Hitachi
c5 (motherboard ICH10 if I remember well) has :
- 1x160GB 2,5'' WD
- DVD

All the 2TB drivers are in the raidz3 zpool named tank (we've been very
innovative here ;-).
X25E is sliced in 20GB for the system, 1GB for ZIL for tank, the rest
as cache for tank.

The 2,5'' 160GB WD was not initially part of the setup since we were
planning to slice the 2TB drives in 32GB for the system (mirrored
accross all drives) and the rest for the big zpool, while the X25E was
just there for the ZIL and the cache, but two things we've read on
lists and forums made us change our minds :
- the disk write cache is disabled when you're not using the whole
drive
- some reports on this list about X25E loosing up to 256 cache flushes
in case of power failures.

So we bought this 160GB disk (it was really the last thing that could
fit in the chassis) and sliced it in the same way as the X25E.
The system and the ZIL are mirrored between the X25E and the WD160.
We do not use the WD160 for the cache : we thought it would be better
to save IOPS on this disk for the ZIL mirror.
I don't know wether it's a good idea to mirror the ZIL on such a disk
but we prefer having slower setup and not loose that much cache flushes
on power failure.

Regarding the perfs obtained by using only Hitachi disks, I can't tell,
I haven't tested it, and can't do it right now as the system is in
preproduction testing.

Also, I should have mentionned in my previous post that some WD20EADS
(the 32SB0) have shorter reponse times (as reported by iostat).
They're even "faster" than the Hitachi : I've seen them quite a few
times in the range 0.3 to 1.5 ms, which seems far to short for this
kind of drives.
I suspect they're sort of dropping flush requests. Add to it that 2 out
of 3 failed WD20EADS were 32SB0 and you get the picture...
Note they might also be hybrid drives with some flash memory which
allows quick acknoledgment of writes, but I think we would have heard
of such a feature on this list.

Arnaud

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Cores vs. Speed?

I would go with cores (threads) rather than clock speed here. My home system
is a 4-core AMD @ 1.8Ghz and performs well.

I wouldn't use drives that big and you should be aware of the overheads of
RaidZ[x].

-marc



On Thu, Feb 4, 2010 at 6:19 PM, Brian  wrote:

> I am Starting to put together a home NAS server that will have the
> following roles:
>
> (1) Store TV recordings from SageTV over either iSCSI or CIFS.  Up to 4 or
> 5 HD streams at a time.  These will be streamed live to the NAS box during
> recording.
> (2) Playback TV (could be stream being recorded, could be others) to 3 or
> more extenders
> (3) Hold a music repository
> (4) Hold backups from windows machines, mac (time machine), linux.
> (5) Be an iSCSI target for several different Virtual Boxes.
>
> Function 4 will use compression and deduplication.
> Function 5 will use deduplication.
>
> I plan to start with 5 1.5 TB drives in a raidz2 configuration and 2
> mirrored boot drives.
>
> I have been reading these forums off and on for about 6 months trying to
> figure out how to best piece together this system.
>
> I am first trying to select the CPU.  I am leaning towards AMD because of
> ECC support and power consumption.
>
> For items such as de-dupliciation, compression, checksums etc.  Is it
> better to get a faster clock speed or should I consider more cores?  I know
> certain functions such as compression may run on multiple cores.
>
> I have so far narrowed it down to:
>
> AMD Phenom II X2 550 Black Edition Callisto 3.1GHz
> and
> AMD Phenom X4 9150e Agena 1.8GHz Socket AM2+ 65W Quad-Core
>
> As they are roughly the same price.
> --
> This message posted from opensolaris.org
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Impact of an enterprise class SSD on ZIL performance

2010-02-04 Thread Peter Radig

I was interested in the impact the type of an SSD has on the performance of the 
ZIL. So I did some benchmarking and just want to share the results.

My test case is simply untarring the latest ON source (528 MB, 53k files) on an 
Linux system that has a ZFS file system mounted via NFS over gigabit ethernet.

I got the following results:
- locally on the Solaris box: 30 sec
- remotely with no dedicated ZIL device: 36 min 37 sec (factor 73 compared to 
local)
- remotely with ZIL disabled: 1 min 54 sec (factor 3.8 compared to local)
- remotely with a OCZ VERTEX SATA II 120 GB as ZIL device: 14 min 40 sec 
(factor 29.3 compared to local)
- remotely with an Intel X25-E 32 GB as ZIL device: 3 min 11 sec (factor 6.4 
compared to local)

So it really makes a difference what type of SSD you use for your ZIL device. I 
was expecting a good performance from the X25-E, but was really suprised that 
it is that good (only 1.7 times slower than it takes with ZIL completely 
disabled). So I will use the X25-E as ZIL device on my box and will not 
consider disabling ZIL at all to improve NFS performance.

-- Peter
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Cores vs. Speed?

2010-02-04 Thread Glenn Lagasse

* Brian (broco...@vt.edu) wrote:
> I am Starting to put together a home NAS server that will have the
> following roles:
> 
> (1) Store TV recordings from SageTV over either iSCSI or CIFS.  Up to
> 4 or 5 HD streams at a time.  These will be streamed live to the NAS
> box during recording.  (2) Playback TV (could be stream being
> recorded, could be others) to 3 or more extenders (3) Hold a music
> repository (4) Hold backups from windows machines, mac (time machine),
> linux.  (5) Be an iSCSI target for several different Virtual Boxes.
> 
> Function 4 will use compression and deduplication.  Function 5 will
> use deduplication.
> 
> I plan to start with 5 1.5 TB drives in a raidz2 configuration and 2
> mirrored boot drives.  
> 
> I have been reading these forums off and on for about 6 months trying
> to figure out how to best piece together this system.
> 
> I am first trying to select the CPU.  I am leaning towards AMD because
> of ECC support and power consumption.

I can't comment on most of your question, but I will point you at:

http://blogs.sun.com/mhaywood/entry/powernow_for_solaris

I *think* the cpu's you're looking at won't be an issue but just something
to be aware of when looking at AMD kit (especially if you want to manage
the processor speed).

Cheers,

-- 
Glenn
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Cores vs. Speed?

I am Starting to put together a home NAS server that will have the following 
roles:

(1) Store TV recordings from SageTV over either iSCSI or CIFS.  Up to 4 or 5 HD 
streams at a time.  These will be streamed live to the NAS box during recording.
(2) Playback TV (could be stream being recorded, could be others) to 3 or more 
extenders
(3) Hold a music repository
(4) Hold backups from windows machines, mac (time machine), linux.
(5) Be an iSCSI target for several different Virtual Boxes.

Function 4 will use compression and deduplication.
Function 5 will use deduplication.

I plan to start with 5 1.5 TB drives in a raidz2 configuration and 2 mirrored 
boot drives.  

I have been reading these forums off and on for about 6 months trying to figure 
out how to best piece together this system.

I am first trying to select the CPU.  I am leaning towards AMD because of ECC 
support and power consumption.

For items such as de-dupliciation, compression, checksums etc.  Is it better to 
get a faster clock speed or should I consider more cores?  I know certain 
functions such as compression may run on multiple cores.

I have so far narrowed it down to:

AMD Phenom II X2 550 Black Edition Callisto 3.1GHz
and
AMD Phenom X4 9150e Agena 1.8GHz Socket AM2+ 65W Quad-Core

As they are roughly the same price.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] How to get a list of changed files between two snapshots?

Hi Ross,

Yes - zdb - is dumping out info in the form of:

Object  lvl   iblk   dblk  dsize  lsize   %full  type
19116K512512512  100.00  ZFS plain file
264   bonus  ZFS znode
dnode flags: USED_BYTES USERUSED_ACCOUNTED 
dnode maxblkid: 0
path/snapshot.sh
uid 0
gid 0
atime   Thu Feb  4 23:04:50 2010
mtime   Thu Feb  4 23:04:50 2010
ctime   Thu Feb  4 23:04:50 2010
crtime  Thu Feb  4 23:04:50 2010
gen 529806
mode100755
size174
parent  3
links  
xattr   0
rdev0x


for all objects referenced in the snap.

Perhaps if you wanted to script this, then parsing the above output for time 
stamps that are after the previous snapshot.

Deleted files (and of course new files) can be diffed against the list for the 
snapshot you want to compare with, but I assume you also want files that have 
been modified, hence the requirement to parse the above outputs.

Unfortunately time does not permit me to come up with a working solution until 
(really snowed under until mid next week - did someone say there is meant to be 
a weekend in their too?). But I am sure there is enough info here for someone 
to hack together a script.

Cheers,

Darren Mackay
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Best 1.5TB drives for consumer RAID?

2010-02-04 Thread Travis Tabbal

Supermicro USAS-L8i controllers. 

I agree with you, I'd much rather have the drives respond properly and promptly 
than save a little power if that means I'm going to get strange errors from the 
array. And these are the "green" drives, they just don't seem to cause me any 
problems. The issues people have noted with WD have made me stay away from them 
as just about every drive I own lives in some kind of RAID sometime in its 
life. I have a couple laptop drives that are single, all desktops have at least 
a mirror. I'm a little nuts and would probably install mirrors in the laptops 
if there were somewhere to put them. :)
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Pool disk replacing fails

2010-02-04 Thread Alexander M. Stetsenko


Hi all,
Im trying to replace broken LUN in pool using zpool replace -f , 
but it fails. Physical disk is already replaced, and new lun has the 
same address as broken one.  But zpool detach/attach works.

This is simple configration:

 pool: mypool
state: DEGRADED
status: One or more devices has experienced an unrecoverable error.  An
   attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
   using 'zpool clear' or replace the device with 'zpool replace'.
  see: http://www.sun.com/msg/ZFS-8000-9P
scrub: resilver completed after 0h0m with 0 errors on Thu Feb  4 
23:16:21 2010

config:

   NAMESTATE READ WRITE CKSUM
   mypool  DEGRADED 0 0 0
 mirrorDEGRADED 0 0 0
   c1t4d0  DEGRADED 0 028  too many errors
   c1t5d0  ONLINE   0 0 0



c1t4d0 is physically replaced LUN. then I`m trying to replace it in pool.

r...@myhost:~# zpool replace -f mypool c1t4d0
invalid vdev specification
the following errors must be manually repaired:
/dev/dsk/c1t4d0s0 is part of active ZFS pool mypool. Please see zpool(1M).

zpool manual says: "-fForces use of new_device, even if its appears 
to be in use. Not all devices can be overridden in this manner."



c1t4d0 in use only in mypool.
What is the problem with "zpool replace" in my case? Accordingly to 
zpool manual it should work.


Thanx you


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] unionfs help

2010-02-04 Thread Nicolas Williams

On Thu, Feb 04, 2010 at 04:03:19PM -0500, Frank Cusack wrote:
> On 2/4/10 2:46 PM -0600 Nicolas Williams wrote:
> >In Frank's case, IIUC, the better solution is to avoid the need for
> >unionfs in the first place by not placing pkg content in directories
> >that one might want to be writable from zones.  If there's anything
> >about Perl5 (or anything else) that causes this need to arise, then I
> >suggest filing a bug.
> 
> Right, and thanks for chiming in.  Problem is that perl wants to install
> add-on packages in places that the coincide with the system install.
> Most stuff is limited to the site_perl directory, which is easily
> redirected, but it also has some other locations it likes to meddle with.

Maybe we need a zone_perl location.  Judicious use of the search paths
will get you out of this bind, I think.

Nico
-- 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] unionfs help


On 2/4/10 2:46 PM -0600 Nicolas Williams wrote:

In Frank's case, IIUC, the better solution is to avoid the need for
unionfs in the first place by not placing pkg content in directories
that one might want to be writable from zones.  If there's anything
about Perl5 (or anything else) that causes this need to arise, then I
suggest filing a bug.


Right, and thanks for chiming in.  Problem is that perl wants to install
add-on packages in places that the coincide with the system install.
Most stuff is limited to the site_perl directory, which is easily
redirected, but it also has some other locations it likes to meddle with.

-frank
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] unionfs help

2010-02-04 Thread Nicolas Williams

On Thu, Feb 04, 2010 at 03:19:15PM -0500, Frank Cusack wrote:
> BTW, I could just install everything in the global zone and use the
> default "inheritance" of /usr into each local zone to see the data.
> But then my zones are not independent portable entities; they would
> depend on some non-default software installed in the global zone.
> 
> Just wanted to explain why this is valuable to me and not just some
> crazy way to do something simple.

There's no unionfs for Solaris.

(For those of you who don't know, unionfs is a BSDism and is a
pseudo-filesystem which presents the union of two underlying
filesystems, but with all changes being made only to one of the two
filesystems.  The idea is that one of the underlying filesystems cannot
be modified through the union, with all changes made through the union
being recorded in an overlay fs.  Think, for example, of unionfs-
mounting read-only media containing sources: you could cd to the mount
point and build the sources, with all intermediate files and results
placed in the overlay.)

In Frank's case, IIUC, the better solution is to avoid the need for
unionfs in the first place by not placing pkg content in directories
that one might want to be writable from zones.  If there's anything
about Perl5 (or anything else) that causes this need to arise, then I
suggest filing a bug.

Nico
-- 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] How to get a list of changed files between two snapshots?


On 2/4/10 8:21 AM -0500 Ross Walker wrote:

Find -newer doesn't catch files added or removed it assumes identical
trees.


This may be redundant in light of my earlier post, but yes it does.
Directory mtimes are updated when a file is added or removed, and
find -newer will detect that.

-frank
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] How to get a list of changed files between two snapshots?


On 2/4/10 8:00 AM +0100 Tomas Ögren wrote:

rsync by default compares metadata first, and only checks through every
byte if you add the -c (checksum) flag.

I would say rsync is the best tool here.


ah, i didn't know that was the default.  no wonder recently when i was
incremental-rsyncing a few TB of data between 2 hosts (not using zfs)
i didn't get any speedup from --size-only or whatever the flag is.


The "find -newer blah" suggested in other posts won't catch newer files
with an old timestamp (which could happen for various reasons, like
being copied with kept timestamps from somewhere else).


good point.  that is definitely a restriction with find -newer.  but if
you meet that restriction, and don't need to find added or deleted files,
it will be faster since only 1 directory tree has to be walked.

but in the general case it does sound like rsync is the best.  unless
bart can find added and missing files.  in which case bart is better
because it only has to walk 1 dir tree -- assuming you have a saved
manifest from a previous walk over the original dir tree.

-frank
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] unionfs help


BTW, I could just install everything in the global zone and use the
default "inheritance" of /usr into each local zone to see the data.
But then my zones are not independent portable entities; they would
depend on some non-default software installed in the global zone.

Just wanted to explain why this is valuable to me and not just some
crazy way to do something simple.

-frank
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] How to get a list of changed files between two snapshots?


On 2/4/10 12:39 AM -0500 Ross Walker wrote:

On Feb 3, 2010, at 8:59 PM, Frank Cusack 
wrote:

I think you misread the thread.  Either find or ddiff will do it and
either will be better than rsync.


Find can find files that have been added or removed between two directory
trees?

How?


When a file is added or removed in a directory, the directory's mtime
is updated.  So find -newer will locate those directories.  Then of
course you need to do a little bit more work to locate the files.

-frank
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] unionfs help

On February 4, 2010 12:12:04 PM +0100 dick hoogendijk  
wrote:

Why don't you just export that directory with NFS (rw) to your sparse zone
and mount it on /usr/perl5/mumble ? Or is this too simple a thought?


On February 4, 2010 1:41:20 PM +0100 Thomas Maier-Komor 
 wrote:

What about lofs? I thinks lofs is the equivalent for unionfs on Solaris.


The problem with both of those solutions is a) writes will overwrite the
original filesystem data and b) writes will be visible to everyone else.

Neither suggestion provides unionfs capability.

On February 4, 2010 12:12:18 PM + Peter Tribble 
 wrote:

The way I normally do this is to (in the global zone) symlink
/usr/perl5/mumble to somewhere that would be writable such as /opt, and
then put what you need into that location in the zone. Leaves a dangling
symlink in the global zone and other zones, but that's relatively
harmless.


The problem with that is you don't see the underlying data that exists
in the global zone.  I do use that technique for other data (e.g. the
entire /usr/local hierarchy), but it doesn't meet my desired needs in
this case.

I looked into clones (and at least now I understand them much better
than before) and they *almost* provide the functionality I want.  I
could mount a clone in the zoned version of /foo and it would see the
original /foo, and changes would go to the clone only, just like a real
unionfs.

What it's lacking though is that when the underlying filesystem changes
(in the global zone), those changes don't percolate up to the clone.
The clone's base view of files is from the snapshot it was generated
from, which cannot change.  It would be great if you could re-target
(or re-base?) a clone from a different snapshot than the one it was
originally generated from.  Since I don't need realtime updates, for
my purposes that would be a great equivalent to a true unionfs.

So the thread on zfs diff gave me an idea; I will use clones and will
write a 'zfs diff'-like tool.  When the original /usr/perl5/mumble
changes I will use that to pick out files that are different in the
clone and populate a new clone with them.

-frank
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS compression on Clearcase


On 04/02/2010 12:42, Darren J Moffat wrote:

On 04/02/2010 12:13, Roshan Perera wrote:

Hi Darren,

Thanks - IBM basically haven't test clearcase with ZFS compression
therefore, they don't support currently. Future may change, as such
my customer cannot use compression. I have asked IBM for roadmap info
to find whether/when it will be supported.


That is FUD generation in my opinion and being overly cautious.  The
whole point of the POSIX interfaces to a filesystem is that
applications don't actually care how the filesystem stores their data.



I agree (*). It is very similar to what EMC did some years ago by 
officially stating that while ZFS is supported on their disk arrays ZFS 
snapshots are not. Even more "funny".



(*) - however compression is not entirely transparent in such a sense 
that a reported disk space usage might not be exactly what application 
expects. But I'm not saying it is an issue here - I honestly don't know.



--
Robert Milkowski
http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS compression on Clearcase

2010-02-04 Thread Alex Blewitt

On 4 Feb 2010, at 16:35, Bob Friesenhahn wrote:

> On Thu, 4 Feb 2010, Darren J Moffat wrote:
>>> Thanks - IBM basically haven't test clearcase with ZFS compression 
>>> therefore, they don't support currently. Future may change, as such my 
>>> customer cannot use compression. I have asked IBM for roadmap info to find 
>>> whether/when it will be supported.
>> 
>> That is FUD generation in my opinion and being overly cautious.  The whole 
>> point of the POSIX interfaces to a filesystem is that applications don't 
>> actually care how the filesystem stores their data.
> 
> Clearcase itself implements a versioning filesystem so perhaps it is not 
> being overly cautious.  Compression could change aspects such as how free 
> space is reported.

I'd also like to echo Bob's observations here. Darren's FUDFUD is based on 
limited experience of ClearCase, I expect ...

On the client side, ClearCase actually presnets itself as a mounted filesystem, 
regardless of what the OS has under the covers. In other words, a ClearCase 
directory will never be 'ZFS' because it's not ZFS, it's ClearCaseFS. On the 
server side (which might be the case here) the way ClearCase works is to 
represent the files and contents in a way more akin to a database (e.g. Oracle) 
than traditional file-system approaches to data (e.g. CVS, SVN). In much the 
same way there are app-specific issues with ZFS (e.g. matching block-sizes, 
dealing with ZFS snapshots on a VM image and so forth) there may well be some 
with ClearCase.

At the very least, though, IBM may just be unable/willing to test it at the 
time and put their stamp of approval on it. In many cases for IBM products, 
there are supported platforms (often with specific patch levels), much like 
there are offically supported Solaris platforms and hot-fixes to go for certain 
applications. They may well just being cautious in what there is until they've 
had time to test it out for themselves - or more likely, until the first set of 
paying customers wants to get invoiced for the investigation. But to claim it's 
FUD without any real data to back it up is just FUD^2.

Alex
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Best 1.5TB drives for consumer RAID?

2010-02-04 Thread Tonmaus

Hi again,

thanks for the answer. Another thing that came to my mind is that you mentioned 
that you mixed the disks among the controllers. Does that mean you mixed them 
as well among pools? Unsurprisingly,  the WD20EADS is slower than the Hitachi 
that is a fixed 7200 rpm drive. I wonder what impact that would have if you use 
them as vdevs of the same pool.

Cheers,

Tonmaus
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] How to get a list of changed files between twosnapshots?

2010-02-04 Thread Andrew Daugherity

>>> Richard Elling  2/3/2010 6:06 PM >>> 
On Feb 3, 2010, at 3:46 PM, Ross Walker wrote:

> On Feb 3, 2010, at 12:35 PM, Frank Cusack  
> wrote:
> 
> So was there a final consensus on the best way to find the difference between 
> two snapshots (files/directories added, files/directories deleted and 
> file/directories changed)?
> 
> Find won't do it, ddiff won't do it, I think the only real option is rsync. 
> Of course you can zfs send the snap to another system and do the rsync there 
> against a local previous version.

bart(1m) is designed to do this.
 -- richard

Unless something has changed in the past couple months, bart(1m) does not work 
on large filesystems (2TB limit, I think).
http://opensolaris.org/jive/message.jspa?messageID=433896#433896

My solution to this was rsync in dry-run mode between two snapshot directories, 
which runs in a few seconds and lists both added/changed files and deleted 
files.
http://opensolaris.org/jive/message.jspa?messageID=434176#434176

-Andrew

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] What happens when: file-corrupted and no-redundancy?


On 03/02/2010 21:45, Aleksandr Levchuk wrote:

Hardware RAID6 + hot spare, worked well for us. So, I wanted to stick
our SAN for data protection. I understand that the end-to-end checks
of ZFS make it better at detecting corruptions.

In my case, I can imagine that ZFS would FREEZ the whole volume when a
single block or file is found to be corrupted.

Ideally, I would not like this to happen and instead get a log with
names of corrupted files.

What exactly does happens when zfs detects a corrupted block/file and
does not have redundancy to correct it?

Alex

   

I will repeat myself (as I sent below email just yesterday...)

ZFS won't freeze a pool if a single block is corrupted even if no 
redundancy is configured on zfs level.


zpool status -v should provide you with list of affected files which you 
should be able to delete. In case of corrupted block containg meta-data 
zfs should actually be able to fix it on the fly for you as all 
meta-data related blocks are kept in at least two copies even if no 
redundancy is configured at pool level.


Let's test it:

mi...@r600:~# mkfile 128m file1
mi...@r600:~# zpool create test `pwd`/file1
mi...@r600:~# zpool status test
  pool: test
 state: ONLINE
 scrub: none requested
config:

NAMESTATE READ WRITE CKSUM
testONLINE   0 0 0
  /export/home/milek/file1  ONLINE   0 0 0

errors: No known data errors
mi...@r600:~#
mi...@r600:~# cp /bin/bash /test/file1
mi...@r600:~# cp /bin/bash /test/file2
mi...@r600:~# cp /bin/bash /test/file3
mi...@r600:~# cp /bin/bash /test/file4
mi...@r600:~# cp /bin/bash /test/file5
mi...@r600:~# cp /bin/bash /test/file6
mi...@r600:~# cp /bin/bash /test/file7
mi...@r600:~# cp /bin/bash /test/file8
mi...@r600:~# cp /bin/bash /test/file9
mi...@r600:~# sync
mi...@r600:~# dd if=/dev/zero of=file1 seek=50 count=1 conv=notrunc
1+0 records in
1+0 records out
512 bytes (5.1 MB) copied, 0.179617 s, 28.5 MB/s
mi...@r600:~# sync
mi...@r600:~# zpool scrub test
mi...@r600:~# zpool status -v test
  pool: test
 state: DEGRADED
status: One or more devices has experienced an error resulting in data
corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
entire pool from backup.
   see: http://www.sun.com/msg/ZFS-8000-8A
 scrub: scrub completed after 0h0m with 7 errors on Thu Feb  4 00:18:40 
2010

config:

NAMESTATE READ WRITE CKSUM
testDEGRADED 0 0 7
  /export/home/milek/file1  DEGRADED 0 029  too many 
errors


errors: Permanent errors have been detected in the following files:

/test/file1
mi...@r600:~#
mi...@r600:~# rm /test/file1
mi...@r600:~# sync
mi...@r600:~# zpool scrub test
mi...@r600:~# zpool status -v test
  pool: test
 state: DEGRADED
status: One or more devices has experienced an unrecoverable error.  An
attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://www.sun.com/msg/ZFS-8000-9P
 scrub: scrub completed after 0h0m with 0 errors on Thu Feb  4 00:19:55 
2010

config:

NAMESTATE READ WRITE CKSUM
testDEGRADED 0 0 7
  /export/home/milek/file1  DEGRADED 0 029  too many 
errors


errors: No known data errors
mi...@r600:~# zpool clear test
mi...@r600:~# zpool scrub test
mi...@r600:~# zpool status -v test
  pool: test
 state: ONLINE
 scrub: scrub completed after 0h0m with 0 errors on Thu Feb  4 00:20:12 
2010

config:

NAMESTATE READ WRITE CKSUM
testONLINE   0 0 0
  /export/home/milek/file1  ONLINE   0 0 0

errors: No known data errors
mi...@r600:~#
mi...@r600:~# ls -la /test/
total 7191
drwxr-xr-x  2 root root 10 2010-02-04 00:19 .
drwxr-xr-x 28 root root 30 2010-02-04 00:17 ..
-r-xr-xr-x  1 root root 799040 2010-02-04 00:17 file2
-r-xr-xr-x  1 root root 799040 2010-02-04 00:17 file3
-r-xr-xr-x  1 root root 799040 2010-02-04 00:17 file4
-r-xr-xr-x  1 root root 799040 2010-02-04 00:17 file5
-r-xr-xr-x  1 root root 799040 2010-02-04 00:17 file6
-r-xr-xr-x  1 root root 799040 2010-02-04 00:17 file7
-r-xr-xr-x  1 root root 799040 2010-02-04 00:18 file8
-r-xr-xr-x  1 root root 799040 2010-02-04 00:18 file9
mi...@r600:~#


--
Robert Milkowski
htpp://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] What would happen with a zpool if you 'mirrored' a disk...


On 04/02/2010 13:45, Karl Pielorz wrote:


--On 04 February 2010 11:31 + Karl Pielorz 
 wrote:



What would happen when I tried to 'online' ad2 again?


A reply to my own post... I tried this out, when you make 'ad2' online 
again, ZFS immediately logs a 'vdev corrupt' failure, and marks 'ad2' 
(which at this point is a byte-for-byte copy of 'ad1' as it was being 
written to in background) as 'FAULTED' with 'corrupted data'.


You can't "replace" it with itself at that point, but a detach on ad2, 
and then attaching ad2 back to ad1 results in a resilver, and recovery.


So to answer my own question - from my tests it looks like you can do 
this, and "get away with it". It's probably not ideal, but it does work.


it is actually fine - zfs is designed to detect and fix corruption like 
the one you induced.



A safer bet would be to detach the drive from the pool, and then 
re-attach it (at which point ZFS assumes it's a new drive and probably 
ignores the 'mirror image' data that's on it).




Yes, it should and if you want to force resynchronization that's 
probably the best way to do it.
Other thing is that if you suspect some of your data to be corrupted on 
a half of mirror you might try to run zpool scrub as it will fix only 
those corrupted blocks instead of resynchronizing entire mirror which 
might be faster and safer approach.



--
Robert Milkowski
http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] [ha-clusters-discuss] data corruption



putting storage-discuss@ and zfs-discuss@ as well.


On 04/02/2010 16:33, Robert Milkowski wrote:

Hi,

S10, SC3.2 + patches, Generic_142900-03, 2x T5220 with QLE2462 connected to 
6540s.

We started to observe below messages yesterday at both nodes at the same time 
after several weeks of running:


XXX cl_runtime: [ID 856360 kern.warning] WARNING: QUORUM_GENERIC: 
quorum_read_keys error: Reading the registration keys failed on quorum device 
/dev/did/rdsk/d7s2 with error 22.
XXX cl_runtime: [ID 868277 kern.warning] WARNING: CMM: Erstwhile online quorum 
device /dev/did/rdsk/d7s2 (qid 1) is inaccessible now.

d7 is a quorum device and it was marked by cluster as offline:

# clq status

=== Cluster Quorum ===

--- Quorum Votes Summary from latest node reconfiguration ---

 Needed   Present   Possible
 --   ---   
 23 3


--- Quorum Votes by Node (current status) ---

Node Name Present Possible Status
- ---  --
XXX 1   1Online
YYY 1   1Online


--- Quorum Votes by Device (current status) ---

Device Name   Present  Possible  Status
---   ---    --
d701 Offline



By looking at the source code I found that the above message is printed from 
within quorum_device_generic_impl::quorum_read_keys() and it will only happen 
if quorum_pgre_key_read() returns with return code 22 (actually any other than 
0 or EACCESS but we already know that the rc is 22 from the syslog message).

Now quorum_pgre_key_read() calls quorum_scsi_sector_read() and passes its 
return code as its own.
The quorum_scsi_sector_read() can possibly return with error if 
quorum_ioctl_with_retries() return with error or if there is a checksum 
mismatch.

This is the relevant source code:
 406 int
 407 quorum_scsi_sector_read(
[...]
 449error = quorum_ioctl_with_retries(vnode_ptr, USCSICMD, 
(intptr_t)&ucmd,
 450&retval);
 451if (error != 0) {
 452CMM_TRACE(("quorum_scsi_sector_read: ioctl USCSICMD "
 453"returned error (%d).\n", error));
 454kmem_free(ucmd.uscsi_rqbuf, (size_t)SENSE_LENGTH);
 455return (error);
 456}
 457
 458//
 459// Calculate and compare the checksum if check_data is true.
 460// Also, validate the pgres_id string at the beg of the sector.
 461//
 462if (check_data) {
 463PGRE_CALCCHKSUM(chksum, sector, iptr);
 464
 465// Compare the checksum.
 466if (PGRE_GETCHKSUM(sector) != chksum) {
 467CMM_TRACE(("quorum_scsi_sector_read: "
 468"checksum mismatch.\n"));
 469kmem_free(ucmd.uscsi_rqbuf, 
(size_t)SENSE_LENGTH);
 470return (EINVAL);
 471}
 472
 473//
 474// Validate the PGRE string at the beg of the sector.
 475// It should contain PGRE_ID_LEAD_STRING[1|2].
 476//
 477if ((os::strncmp((char *)sector->pgres_id, 
PGRE_ID_LEAD_STRING1,
 478strlen(PGRE_ID_LEAD_STRING1)) != 0)&&
 479(os::strncmp((char *)sector->pgres_id, 
PGRE_ID_LEAD_STRING2,
 480strlen(PGRE_ID_LEAD_STRING2)) != 0)) {
 481CMM_TRACE(("quorum_scsi_sector_read: pgre id "
 482"mismatch. The sector id is %s.\n",
 483sector->pgres_id));
 484kmem_free(ucmd.uscsi_rqbuf, 
(size_t)SENSE_LENGTH);
 485return (EINVAL);
 486}
 487
 488}
 489kmem_free(ucmd.uscsi_rqbuf, (size_t)SENSE_LENGTH);
 490
 491return (error);
 492 }



  56  ->  __1cXquorum_scsi_sector_read6FpnFvnode_LpnLpgre_sector_b_i_ 
6308555744942019 enter
  56->  __1cZquorum_ioctl_with_retries6FpnFvnode_ilpi_i_ 6308555744957176 
enter
  56<- __1cZquorum_ioctl_with_retries6FpnFvnode_ilpi_i_ 6308555745089857 rc: 0
  56->  __1cNdbg_print_bufIdbprintf6MpcE_v_ 6308555745108310 enter
  56  ->  __1cNdbg_print_bufLdbprintf_va6Mbpcrpv_v_ 6308555745120941 enter
  56->  __1cCosHsprintf6FpcpkcE_v_  6308555745134231 enter
  56<- __1cCosHsprintf6FpcpkcE_v_  6308555745148729 rc: 2890607504684
  56<- __1cNdbg_print_bufLdbprintf_va6Mbpcrpv_v_ 6308555745162898 rc: 1886718112
  56<- __1cNdbg_print_bufIdbprintf6MpcE_v_ 6308555745175529 rc: 1886718112
  56<- __1cXquorum_scsi_sector_read6FpnFvnode_LpnLpgre_sector_b_i_ 
6308555745188599 rc:

Re: [zfs-discuss] Best 1.5TB drives for consumer RAID?

2010-02-04 Thread Arnaud Brand


Le 04/02/10 16:57, Tonmaus a écrit :

Hi Arnaud,

which type of controller is this?

Regards,

Tonmaus
   
I use two LSI SAS3081E-R in each server (16 hard disk trays, passive 
backplane AFAICT, no expander).

Works very well.

Arnaud
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS compression on Clearcase


On Thu, 4 Feb 2010, Darren J Moffat wrote:


Thanks - IBM basically haven't test clearcase with ZFS compression 
therefore, they don't support currently. Future may change, as such my 
customer cannot use compression. I have asked IBM for roadmap info to find 
whether/when it will be supported.


That is FUD generation in my opinion and being overly cautious.  The whole 
point of the POSIX interfaces to a filesystem is that applications don't 
actually care how the filesystem stores their data.


Clearcase itself implements a versioning filesystem so perhaps it is 
not being overly cautious.  Compression could change aspects such as 
how free space is reported.


As I recall, Clearcase maintains a database (on top of a filesystem) 
on a central server to store the actual data.  When a user checks out 
a view of the files, the user views the files via a versioning 
filesystem, which stores a cache of those file on the local system. 
Clearcase intruments access to its versioning filesystem so it knows 
all of the actions which resulted in a built object.  This means that 
there are two places (server and client) where zfs may be involved.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] What would happen with a zpool if you 'mirrored' a disk...

2010-02-04 Thread Karl Pielorz



--On 04 February 2010 08:58 -0500 Jacob Ritorto  
wrote:



Seems your controller is actually doing only harm here, or am I missing
something?


The RAID controller presents the drives as both a mirrored pair, and JBOD - 
*at the same time*...


The machine boots off the partition on the 'mirrored' pair - and ZFS uses 
the JBOD devices (a different area of, of course).


It's a little weird to say the least - and I wouldn't recommend it, but it 
does work 'for me' - and is a way of getting the system to boot off a 
mirror, and still be able to use ZFS with only 2 drives available in the 
chassis.


-Karl
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Best 1.5TB drives for consumer RAID?

2010-02-04 Thread Tonmaus

Hi Arnaud,

which type of controller is this?

Regards,

Tonmaus
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] zfs/sol10u8 less stable than in sol10u5?

2010-02-04 Thread Carsten Aulbert

Hi all,

it might not be a ZFS issue (and thus on the wrong list), but maybe there's 
someone here who might be able to give us a good hint:

We are operating 13 x4500 and started to play with non-Sun blessed SSDs in 
there. As we were running Solaris 10u5 before and wanted to use them as log 
devices we upgraded to the latest and greatest 10u8 and changed the zpool 
layout[1]. However, on the first machine we found many, many problems with 
various disks "failing" in different vdevs (I wrote about this in December on 
this list IIRC).

After going through this with Sun they gave us hints but mostly blamed (maybe 
rightfully the Intel X25e in there), we considered the 2.5" to 2.5" converter 
to be at fault as well. Thus we did the next test by placing the SSD into the 
tray without a conversion unit, but that box (a different one) failed with the 
same problems.

Now, we "learned" from this experience and did the same to another box but 
without the SSD, i.e. jumpstarted the box and installed 10u8, redid the zpool 
and started to fill data in. In today's scrub suddenly this happened:

s09:~# zpool status   
  pool: atlashome 
 state: DEGRADED  
status: One or more devices has experienced an unrecoverable error.  An
attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors  
using 'zpool clear' or replace the device with 'zpool replace'. 
   see: http://www.sun.com/msg/ZFS-8000-9P  
 scrub: resilver in progress for 0h9m, 3.89% done, 4h2m to go   
config: 

NAME  STATE READ WRITE CKSUM
atlashome DEGRADED 0 0 0
  raidz1  ONLINE   0 0 0
c0t0d0ONLINE   0 0 0
c1t0d0ONLINE   0 0 0
c4t0d0ONLINE   0 0 0
c6t0d0ONLINE   0 0 0
c7t0d0ONLINE   0 0 0
  raidz1  ONLINE   0 0 0
c0t1d0ONLINE   0 0 0
c1t1d0ONLINE   0 0 0
c4t1d0ONLINE   0 0 0
c5t1d0ONLINE   0 0 0
c6t1d0ONLINE   0 0 0
  raidz1  ONLINE   0 0 0
c7t1d0ONLINE   0 0 1
c0t2d0ONLINE   0 0 0
c1t2d0ONLINE   0 0 2
c4t2d0ONLINE   0 0 0
c5t2d0ONLINE   0 0 0
  raidz1  ONLINE   0 0 0
c6t2d0ONLINE   0 0 0
c7t2d0ONLINE   0 0 0
c0t3d0ONLINE   0 0 0
c1t3d0ONLINE   0 0 0
c4t3d0ONLINE   0 0 0
  raidz1  DEGRADED 0 0 0
c5t3d0ONLINE   0 0 0
c6t3d0ONLINE   0 0 0
c7t3d0ONLINE   0 0 0
c1t4d0ONLINE   0 0 1
spare DEGRADED 0 0 0
  c4t4d0  DEGRADED 5 011  too many errors
  c0t4d0  ONLINE   0 0 0  5.38G resilvered
  raidz1  ONLINE   0 0 0
c5t4d0ONLINE   0 0 0
c6t4d0ONLINE   0 0 0
c7t4d0ONLINE   0 0 0
c0t5d0ONLINE   0 0 0
c1t5d0ONLINE   0 0 0
  raidz1  ONLINE   0 0 0
c4t5d0ONLINE   0 0 0
c5t5d0ONLINE   0 0 0
c6t5d0ONLINE   0 0 0
c7t5d0ONLINE   0 0 0
c0t6d0ONLINE   0 0 1
  raidz1  ONLINE   0 0 0
c1t6d0ONLINE   0 0 0
c4t6d0ONLINE   0 0 0
c5t6d0ONLINE   0 0 0
c6t6d0ONLINE   0 0 0
c7t6d0ONLINE   0 0 1
  raidz1  ONLINE   0 0 0
c0t7d0ONLINE   0 0 0
c1t7d0ONLINE   0 0 0
c4t7d0ONLINE   0 0 0
c5t7d0ONLINE   0 0 0
c6t7d0ONLINE   0 0 0
spares
  c0t4d0  INUSE currently in use
  c7t7d0  AVAIL


Also similar to the other hosts were the much, much higher Soft/Hard error 
count in iostat:

s09:~# iostat -En|grep Soft
c2t0d0   Soft Errors: 1 Hard Errors: 2 Transport

Re: [zfs-discuss] What would happen with a zpool if you 'mirrored' a disk...

2010-02-04 Thread Mark J Musante


On Thu, 4 Feb 2010, Karl Pielorz wrote:

The reason for testing this is because of a weird RAID setup I have 
where if 'ad2' fails, and gets replaced - the RAID controller is going 
to mirror 'ad1' over to 'ad2' - and cannot be stopped.


Does the raid controller not support a JBOD mode?


Regards,
markm
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Best 1.5TB drives for consumer RAID?

I think you'll do just fine then. And I think the extra platter will
work to your advantage.

-marc

On 2/3/10, Simon Breden  wrote:
> Probably 6 in a RAID-Z2 vdev.
>
> Cheers,
> Simon
> --
> This message posted from opensolaris.org
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>

-- 
Sent from my mobile device
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] What would happen with a zpool if you 'mirrored' a disk...

2010-02-04 Thread Jacob Ritorto

Seems your controller is actually doing only harm here, or am I missing
something?

On Feb 4, 2010 8:46 AM, "Karl Pielorz"  wrote:


--On 04 February 2010 11:31 + Karl Pielorz 
wrote:

> What would happen...
A reply to my own post... I tried this out, when you make 'ad2' online
again, ZFS immediately logs a 'vdev corrupt' failure, and marks 'ad2' (which
at this point is a byte-for-byte copy of 'ad1' as it was being written to in
background) as 'FAULTED' with 'corrupted data'.

You can't "replace" it with itself at that point, but a detach on ad2, and
then attaching ad2 back to ad1 results in a resilver, and recovery.

So to answer my own question - from my tests it looks like you can do this,
and "get away with it". It's probably not ideal, but it does work.

A safer bet would be to detach the drive from the pool, and then re-attach
it (at which point ZFS assumes it's a new drive and probably ignores the
'mirror image' data that's on it).

-Karl

(The reason for testing this is because of a weird RAID setup I have where
if 'ad2' fails, and gets replaced - the RAID controller is going to mirror
'ad1' over to 'ad2' - and cannot be stopped. However, once the re-mirroring
is complete the RAID controller steps out the way, and allows raw access to
each disk in the mirror. Strange, a long story - but true).


___
zfs-discuss mailing list
zfs-disc...@opensolaris.or...
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Booting OpenSolaris on ZFS root on Sun Netra 240

2010-02-04 Thread Saso Kiselkov

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi,

I'm kind stuck at trying to get my aging Netra 240 machine to boot
OpenSolaris. The live CD and installation worked perfectly, but when I
reboot and try to boot from the installed disk, I get:

Rebooting with command: boot disk0
Boot device: /p...@1c,60/s...@2/d...@0,0  File and args:
|
The file just loaded does not appear to be executable.


I suspect it's due to the fact that my OBP can't boot a ZFS root
(OpenBoot 4.22.19). Is there a to work around this?

Regards,
- --
Saso
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAktqz7kACgkQRO8UcfzpOHCqhgCgl8I+5zCTBLb0MUVq9cz5zrqz
9LgAoIurhee3/+nfXtUBwVczkjKxQVaj
=7dXF
-END PGP SIGNATURE-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] What would happen with a zpool if you 'mirrored' a disk...

2010-02-04 Thread Karl Pielorz



--On 04 February 2010 11:31 + Karl Pielorz  
wrote:



What would happen when I tried to 'online' ad2 again?


A reply to my own post... I tried this out, when you make 'ad2' online 
again, ZFS immediately logs a 'vdev corrupt' failure, and marks 'ad2' 
(which at this point is a byte-for-byte copy of 'ad1' as it was being 
written to in background) as 'FAULTED' with 'corrupted data'.


You can't "replace" it with itself at that point, but a detach on ad2, and 
then attaching ad2 back to ad1 results in a resilver, and recovery.


So to answer my own question - from my tests it looks like you can do this, 
and "get away with it". It's probably not ideal, but it does work.


A safer bet would be to detach the drive from the pool, and then re-attach 
it (at which point ZFS assumes it's a new drive and probably ignores the 
'mirror image' data that's on it).


-Karl

(The reason for testing this is because of a weird RAID setup I have where 
if 'ad2' fails, and gets replaced - the RAID controller is going to mirror 
'ad1' over to 'ad2' - and cannot be stopped. However, once the re-mirroring 
is complete the RAID controller steps out the way, and allows raw access to 
each disk in the mirror. Strange, a long story - but true).

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] How to get a list of changed files between two snapshots?

The delete queue and related blocks need further investigation...

r...@osol-dev:/data/zdb-test# zdb -dd data/zdb-test | more
Dataset data/zdb-test [ZPL], ID 641, cr_txg 529804, 24.5K, 6 objects

Object  lvl   iblk   dblk  dsize  lsize   %full  type
 0716K16K  15.0K16K   18.75  DMU dnode
-1116K512 1K512  100.00  ZFS user/group used
-2116K512 1K512  100.00  ZFS user/group used
 1116K512 1K512  100.00  ZFS master node
 2116K512 1K512  100.00  ZFS delete queue
 3116K  1.50K 1K  1.50K  100.00  ZFS directory
 4116K512 1K512  100.00  ZFS directory
19116K512512512  100.00  ZFS plain file
22116K 2K 2K 2K  100.00  ZFS plain file


all the info seems to be there  (otherwise, we would not be able to store files 
at all!!).

and *spare time* project for the coming couple of weeks...

Darren
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] How to get a list of changed files between two snapshots?

2010-02-04 Thread Ross Walker



Interesting, can you explain what zdb is dumping exactly?

I suppose you would be looking for blocks referenced in the snapshot  
that have a single reference and print out the associated file/ 
directory name?


-Ross


On Feb 4, 2010, at 7:29 AM, Darren Mackay  wrote:


Hi Ross,

zdb -  f...@snapshot | grep "path" | nawk '{print $2}'

Enjoy!

Darren Mackay
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] How to get a list of changed files between two snapshots?

2010-02-04 Thread Ross Walker






On Feb 4, 2010, at 2:00 AM, Tomas Ögren  wrote:


On 03 February, 2010 - Frank Cusack sent me these 0,7K bytes:

On February 3, 2010 12:04:07 PM +0200 Henu   
wrote:

Is there a possibility to get a list of changed files between two
snapshots? Currently I do this manually, using basic file system
functions offered by OS. I scan every byte in every file manually  
and it

 ^^^

On February 3, 2010 10:11:01 AM -0500 Ross Walker >

wrote:
Not a ZFS method, but you could use rsync with the dry run option  
to list

all changed files between two file systems.


That's exactly what the OP is already doing ...


rsync by default compares metadata first, and only checks through  
every

byte if you add the -c (checksum) flag.

I would say rsync is the best tool here.

The "find -newer blah" suggested in other posts won't catch newer  
files

with an old timestamp (which could happen for various reasons, like
being copied with kept timestamps from somewhere else).


Find -newer doesn't catch files added or removed it assumes identical  
trees.


I would be interested in comparing ddiff, bart and rsync (local  
comparison only) to see imperically how they match up.


-Ross

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] How to get a list of changed files between two snapshots?

looking through some more code.. i was a bit premature in my last post - been a 
long day.

extracting the guids and query the metadata seems to be logical -> i think 
runnign a zfs send just to parse the data stream is a lot of overhead, when you 
really only need to traverse metadata directly.

zdb sources have most of the bits there - just need to unwind the deadlist 
(this seems to match the numder of blocks that have been deleted since the last 
snap)...

might look into this in the next week or 2 if i have time -> seems like a 
worthwhile project ;-)

Darren Mackay
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] What happens when: file-corrupted and no-redundancy?

2010-02-04 Thread Aleksandr Levchuk

Hardware RAID6 + hot spare, worked well for us. So, I wanted to stick
our SAN for data protection. I understand that the end-to-end checks
of ZFS make it better at detecting corruptions.

In my case, I can imagine that ZFS would FREEZ the whole volume when a
single block or file is found to be corrupted.

Ideally, I would not like this to happen and instead get a log with
names of corrupted files.

What exactly does happens when zfs detects a corrupted block/file and
does not have redundancy to correct it?

Alex

-- 
---
Aleksandr Levchuk
Homepage: http://biocluster.ucr.edu/~alevchuk/
Cell Phone: (951) 368-0004

Bioinformatic Systems and Databases
Lab Phone: (951) 905-5232

Institute for Integrative Genome Biology
University of California, Riverside
---
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS compression on Clearcase

2010-02-04 Thread Roshan Perera

Hi Darren,

I totally agree with you and have raised some of the points mentioned but you 
have given even more items to pass on.
I will update the alias when I hear further.

Many Thanks

Roshan


- Original Message -
From: Darren J Moffat 
Date: Thursday, February 4, 2010 12:42 pm
Subject: Re: [zfs-discuss] ZFS compression on Clearcase
To: Roshan Perera 
Cc: zfs-discuss@opensolaris.org


> On 04/02/2010 12:13, Roshan Perera wrote:
>  >Hi Darren,
>  >
>  >Thanks - IBM basically haven't test clearcase with ZFS compression 
> therefore, they don't support currently. Future may change, as such my 
> customer cannot use compression. I have asked IBM for roadmap info to 
> find whether/when it will be supported.
>  
>  That is FUD generation in my opinion and being overly cautious.  The 
> whole point of the POSIX interfaces to a filesystem is that 
> applications don't actually care how the filesystem stores their data.
>  
>  UFS never had checksums before but ZFS adds those, but that didn't 
> mean that applications had to be checked because checksums were now 
> done on the data.
>  
>  What if it was the disk drive that was doing the compression ?  There 
> would be similarly no way for the application to actually know that it 
> is happening.
>  
>  What about every other feature we add to ZFS ?  Like dedup (which is 
> a type of compression) - again they app can't tell.  Or snapshots - 
> the app can't tell.
>  
>  Thats my opinion though and I know that ISVs can be very cautious 
> about new features sometimes and overly so when it is far below their 
> parts of the stack.
>  
>  Taking another example it would be like an ISV that supports their 
> application running over NFS saying they don't support a certain type 
> of vendors switch in the network because they haven't tested it.
>  
>  -- 
>  Darren J Moffat
>  
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] verging OT: how to buy J4500 w/o overpriced drives

2010-02-04 Thread Eugen Leitl

On Wed, Feb 03, 2010 at 03:02:21PM -0800, Brandon High wrote:

> Another solution, for a true DIY x4500: BackBlaze has schematics for
> the 45 drive chassis that they designed available on their website.
> http://blog.backblaze.com/2009/09/01/petabytes-on-a-budget-how-to-build-cheap-cloud-storage/
> 
> Someone brought it up on the list a few months ago (which is how I
> know about it) and there was some interesting discussion at that time.

IIRC the consensus was that the vibration dampening was inadequate
and the interfaces oversubscribed and the disks being not nearline
too unreliable, but I might be misremembering.

I'm still happy with my 16x WD RE4 drives (linux mdraid RAID 10,
CentOS, Oracle, no zfs). Supermicro does 36x drive chassis now
http://www.supermicro.com/products/chassis/4U/?chs=847 so budget
DIY for zfs is about 72 TByte raw storage with 2 TByte nearline
SATA drives.

I've had trouble finding internal 2x 2.5" in one 3.5" 
SSD mounts from Supermicro for hybrid zfs, but no doubt one 
could improvise something from the usual ricer supplies. 

On smaller scale http://www.supermicro.com/products/chassis/2U/?chs=216
works well with 2.5" Intel SSDs and VelociRaptors. I hope to be able
to use one for a hybrid zfs iSCSI target for VMWare, probably with
10 GBit Ethernet.

> There's no way I would use something like this for most installs, but
> there is definitely some use. Now that opensolaris supports sata pmp,
> you could use a similar chassis for a zfs pool.

-- 
Eugen* Leitl http://leitl.org";>leitl http://leitl.org
__
ICBM: 48.07100, 11.36820 http://www.ativel.com http://postbiota.org
8B29F6BE: 099D 78BA 2FD3 B014 B08A  7779 75B0 2443 8B29 F6BE
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS compression on Clearcase

2010-02-04 Thread Darren J Moffat


On 04/02/2010 12:13, Roshan Perera wrote:

Hi Darren,

Thanks - IBM basically haven't test clearcase with ZFS compression therefore, 
they don't support currently. Future may change, as such my customer cannot use 
compression. I have asked IBM for roadmap info to find whether/when it will be 
supported.


That is FUD generation in my opinion and being overly cautious.  The 
whole point of the POSIX interfaces to a filesystem is that applications 
don't actually care how the filesystem stores their data.


UFS never had checksums before but ZFS adds those, but that didn't mean 
that applications had to be checked because checksums were now done on 
the data.


What if it was the disk drive that was doing the compression ?  There 
would be similarly no way for the application to actually know that it 
is happening.


What about every other feature we add to ZFS ?  Like dedup (which is a 
type of compression) - again they app can't tell.  Or snapshots - the 
app can't tell.


Thats my opinion though and I know that ISVs can be very cautious about 
new features sometimes and overly so when it is far below their parts of 
the stack.


Taking another example it would be like an ISV that supports their 
application running over NFS saying they don't support a certain type of 
vendors switch in the network because they haven't tested it.


--
Darren J Moffat
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] unionfs help

2010-02-04 Thread Thomas Maier-Komor

On 04.02.2010 12:12, dick hoogendijk wrote:
> 
> Frank Cusack wrote:
>> Is it possible to emulate a unionfs with zfs and zones somehow?  My zones
>> are sparse zones and I want to make part of /usr writable within a zone.
>> (/usr/perl5/mumble to be exact)
> 
> Why don't you just export that directory with NFS (rw) to your sparse zone
> and mount it on /usr/perl5/mumble ? Or is this too simple a thought?
> 
What about lofs? I thinks lofs is the equivalent for unionfs on Solaris.

E.g.

mount -F lofs /originial/path /my/alternate/mount/point

- Thomas

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] How to get a list of changed files between two snapshots?