Re: [zfs-discuss] Large scale performance query

2011-08-06 Thread Rob Cohen
I may have RAIDZ reading wrong here.  Perhaps someone could clarify.

For a read-only workload, does each RAIDZ drive act like a stripe, similar to 
RAID5/6?  Do they have independant queues?

It would seem that there is no escaping read/modify/write operations for 
sub-block writes, forcing the RAIDZ group to act like a single stripe.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Large scale performance query

2011-08-06 Thread Rob Cohen
RAIDZ has to rebuild data by reading all drives in the group, and 
reconstructing from parity.  Mirrors simply copy a drive.

Compare 3tb mirros vs. 9x3tb RAIDZ2.

Mirrors:
Read 3tb
Write 3tb

RAIDZ2:
Read 24tb
Reconstruct data on CPU
Write 3tb

In this case, RAIDZ is at least 8x slower to resilver (assuming CPU and writing 
happen in parallel).  In the mean time, performance for the array is severely 
degraded for RAIDZ, but not for mirrors.

Aside from resilvering, for many workloads, I have seen over 10x (!) better 
performance from mirrors.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Large scale performance query

2011-08-06 Thread Rob Cohen
 I may have RAIDZ reading wrong here.  Perhaps someone
 could clarify.
 
 For a read-only workload, does each RAIDZ drive act
 like a stripe, similar to RAID5/6?  Do they have
 independant queues?
 
 It would seem that there is no escaping
 read/modify/write operations for sub-block writes,
 forcing the RAIDZ group to act like a single stripe.

Can RAIDZ even do a partial block read?  Perhaps it needs to read the full 
block (from all drives) in order to verify the checksum.  If so, then RAIDZ 
groups would always act like one stripe, unlike RAID5/6.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Large scale performance query

2011-08-06 Thread Rob Cohen
Thanks for clarifying.

If a block is spread across all drives in a RAIDZ group, and there are no 
partial block reads, how can each drive in the group act like a stripe?  Many 
RAID56 implementations can do partial block reads, allowing for parallel 
random reads across drives (as long as there are no writes in the queue).

Perhaps you are saying that they act like stripes for bandwidth purposes, but 
not for read ops/sec?
-Rob

-Original Message-
From: Bob Friesenhahn [mailto:bfrie...@simple.dallas.tx.us] 
Sent: Saturday, August 06, 2011 11:41 AM
To: Rob Cohen
Cc: zfs-discuss@opensolaris.org
Subject: Re: [zfs-discuss] Large scale performance query

On Sat, 6 Aug 2011, Rob Cohen wrote:

 Can RAIDZ even do a partial block read?  Perhaps it needs to read the 
 full block (from all drives) in order to verify the checksum.
 If so, then RAIDZ groups would always act like one stripe, unlike 
 RAID5/6.

ZFS does not do partial block reads/writes.  It must read the whole block in 
order to validate the checksum.  If there is a checksum failure, then RAID5 
type algorithms are used to produce a corrected block.

For this reason, it is wise to make sure that the zfs filesystem blocksize is 
appropriate for the task, and make sure that the system has sufficient RAM that 
the zfs ARC can cache enough data that it does not need to re-read from disk 
for recently accessed files.

Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Large scale performance query

2011-08-06 Thread Rob Cohen
 If I'm not mistaken, a 3-way mirror is not
 implemented behind the scenes in
 the same way as a 3-disk raidz3.  You should use a
 3-way mirror instead of a
 3-disk raidz3.

RAIDZ2 requires at least 4 drives, and RAIDZ3 requires at least 5 drives.  But, 
yes, a 3-way mirror is implemented totally differently.  Mirrored drives have 
identical copies of the data.  RAIDZ drives store the data once, plus parity 
data.  A 3-way mirror gives imporved redundancy and read performance, but at a 
high capacity cost, and slower writes than a 2-way mirror.

It's more common to do 2-way mirrors + hot spare.  This gives comparable 
protection to RAIDZ2, but with MUCH better performance.

Of course, mirrors cost more capacity, but it helps that ZFS's compression and 
thin provisioning can often offset the loss in capacity, without sacrificing 
performance (especially when used in combination with L2ARC).
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Large scale performance query

2011-08-05 Thread Rob Cohen
Generally, mirrors resilver MUCH faster than RAIDZ, and you only lose 
redundancy on that stripe, so combined, you're much closer to RAIDZ2 odds than 
you might think, especially with hot spare(s), which I'd reccommend.

When you're talking about IOPS, each stripe can support 1 simultanious user.

Writing:
Each RAIDZ group = 1 stripe.
Each mirror group = 1 stripe.
So, 216 drives can be 24 stripes or 108 stripes.

Reading:
Each RAIDZ group = 1 stripe.
Each mirror group = 1 stripe per drive.
So, 216 drives can be 24 stripes or 216 stripes.

Actually, reads from mirrors are even more efficient than reads from stripes, 
because the software can optimally load balance across mirrors.

So, back to the original poster's question, 9 stripes might be enough to 
support 5 clients, but 216 stripes could support many more.

Actually, this is an area where RAID5/6 has an advantage over RAIDZ, if I 
understand correctly, because for RAID5/6 on read-only workloads, each drive 
acts like a stripe.  For workloads with writing, though, RAIDZ is significantly 
faster than RAID5/6, but mirrors/RAID10 give the best performance for all 
workloads.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Large scale performance query

2011-08-04 Thread Rob Cohen
Try mirrors.  You will get much better multi-user performance, and you can 
easily split the mirrors across enclosures.

If your priority is performance over capacity, you could experiment with n-way 
mirros, since more mirrors will load balance reads better than more stripes.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] latest zpool version in solaris 11 express

2011-07-20 Thread Rob Logan


plus virtualbox 4.1 with network in a box would like snv_159

from http://www.virtualbox.org/wiki/Changelog

Solaris hosts: New Crossbow based bridged networking driver for Solaris 11 
build 159 and above

Rob

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


RE: ZFS crypto source

2011-05-11 Thread Rob O'Leary
I guessed you wouldn't be able to say, even if...

The only shortfall in capability that I'm aware of is the secure boot/FDE,
which we discussed previously.

I am mostly interested in the source to see how features have been
implemented and to understand the system structure. I certainly wouldn't
presume to make changes!

On the slightly more general topic of source on opensolaris, are the designs
for subsystems/features available? I've found PSARC cases for some things
but I expect that more detailed design, system interaction and use cases are
documented as part of the development process. Are any of these types of
document made public to assist in understanding at a higher level than the
source code? Again, this is really to help me understand the system, rather
than to attempt any modification.

Regards
Rob

-Original Message-
From: Darren J Moffat [mailto:darr...@opensolaris.org]
Sent: 10 May 2011 11:17
To: Rob O'Leary
Cc: zfs-crypto-discuss@opensolaris.org
Subject: Re: ZFS crypto source


On 07/05/2011 10:57, Rob O'Leary wrote:
 Is the source for ZFS crypto likely to be released on opensolaris.org?

Older versions of the source area available from the zfs-crypto
project gates:  /zfs-crypto/gate/  However in some important areas these
differ quite a bit from what was finally integrated and are not on disk
compatible.

 I searched in /onnv/onnv-gate/usr/src/uts/common/fs/zfs, which may have
been
 the wrong place, for aes and crypt and got no results so I assume that the
 zfs encryption has not been released to date.

Correct the source has not been released.

I do not know anything about future plans nor would I be able to comment
here at this time even if I did.  Please bring this up with your Oracle
account/support team representative if it is important to your business.

Is there something in particular you want to do with the source if you
had it available to you ?  Are there changes you want to make ?

--
Darren J Moffat

___
zfs-crypto-discuss mailing list
zfs-crypto-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-crypto-discuss


RE: Booting from encrypted ZFS WAS: RE: How to mount encryptedfile system at boot? Why no pass phraserequesed

2011-04-28 Thread Rob O'Leary
Hi Dan,

Your first two interpretations are correct.

I like the idea of netbooting but unfortunately, although a good idea, it
doesn't fit with the details of our use case - we temporarily take our
system to a trusted location, use it and then remove it, so we do not have a
permanent presence at the trusted locations (other than our base location).
This means that providing the netboot environment is effectively the same
problem, as anything on the same network as the data becomes subject to the
same rules regarding protection.

Putting the boot system on the key media isn't quite the same as
transporting the key on media alone - the key media can be read-only/only
used at boot to authenticate, whereas the boot system is on writable media.
(I have already considered read-only boot images on DVD but due to the low
numbers of systems and the need to make permanent changes to the system, I
do not consider this approach operable.)

Regarding tampering and tamper detection, when the disks are transported, we
do not rely on an IT approach to these issues.

Regards,
Rob

-Original Message-
From: Daniel Carosone [mailto:d...@geek.com.au]
Sent: 28 April 2011 03:21
To: Rob O'Leary
Cc: Troels N?rgaard Nielsen; zfs-crypto-discuss@opensolaris.org
Subject: Re: Booting from encrypted ZFS WAS: RE: How to mount
encryptedfile system at boot? Why no pass phraserequesed


If I understood correctly:

 - there is no requirement for the system to boot (or be bootable)
   outside of your secure locations.

 - you are willing to accept separate tracking and tagging of removable
   media, e.g. for key distribution.

Consider, at least for purposes of learning from the comparison:

 - having the machines netboot only, and provide the netboot
   environment only within the secure locations.

 - having the system disks on the removable media that is handled
   separately, not just the keys.

Both of these share the property that the physical chassis being
transported contains only encrypted disks, leaving you to make other
tradeoffs with respect to risks and handling of the bootstrapping data
(including keys).

My primary interest in encrypted zfs boot for the OS is more around
the integrity of the boot media, for devices that may be exposed to
tampering of various kinds.  This is a complex issue that can only be
partly addressed by ZFS, even with such additional features.

Do these sorts of concerns apply to your environment?  If someone was
to intercept one of these machines in transit, and tamper with OS and
system executables in such a way as to disclose information/keys or
otherwise alter their operation when next booted in the secure
environment, would that be a concern?

--
Dan.

___
zfs-crypto-discuss mailing list
zfs-crypto-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-crypto-discuss


RE: Booting from encrypted ZFS WAS: RE: How to mount encrypted filesystem at boot? Why no pass phraserequesed

2011-04-27 Thread Rob O'Leary

Hi Michel,

I had noticed these drives in the past, but your email reminded me and I
followed your link, thanks.

A bit of googling showed that not everyone is having a great experience and
I couldn't find the barracuda fde promised in the press release. I also need
SAS because of read while writing issues and these are momentus sata disks
(despite link names below). Can I mix sas and sata in the same controller?

Reliable, SAS, FDE does not seem to be available...

Regards,
Rob

http://forums.seagate.com/t5/Barracuda-XT-Barracuda-and/Issues-with-ST932032
2AS-FDE-3-drives/m-p/29247#M12876

http://forums.seagate.com/t5/Barracuda-XT-Barracuda-and/Recovering-Formattin
g-FDE-Drives/td-p/7412

-Original Message-
From: michel.bell...@malaiwah.com [mailto:michel.bell...@malaiwah.com]
Sent: 27 April 2011 12:08
To: Rob O'Leary
Cc: zfs-crypto-discuss@opensolaris.org
Subject: Re: Booting from encrypted ZFS WAS: RE: How to mount encrypted
filesystem at boot? Why no pass phraserequesed


Hi,

I think the best solution for your OS drives is to have a look at disks that
offer built-in full disk encryption (FDE) just like the ones offered by
Seagate (example:
http://www.google.com/url?q=http://www.seagate.com/ww/v/index.jsp%3Flocale%3
Den-US%26name%3Ddn_sec_intro_fde%26vgnextoid%3D1831bb5f5ed93110VgnVCM10f
5ee0a0aRCRDsa=Uei=8_e3TfvVIInBtgemsdjeBAved=0CAgQFjAAusg=AFQjCNGt_c3Vokq
4D6hL8k25rfUcIrB2Bw). While it does not offer the flexibility of ZFS
encrypted datasets, I think it would be appropriate in your situation.

I would rely on that encryption for the OS with a static passphrase asked at
boot-time, but still point sensitive informations to the ZFS pool for better
management of the keys, if your auditor asks them to be rolled once in a
while (for data, at least).

My 2 cents,

Michel
Envoyé de mon terminal mobile BlackBerry par le biais du réseau de Rogers
Sans-fil

-Original Message-
From: Rob O'Leary raole...@btinternet.com
Sender: zfs-crypto-discuss-boun...@opensolaris.org
Date: Wed, 27 Apr 2011 11:46:02
To: Troels Nørgaard Nielsentro...@norgaard.co
Cc: zfs-crypto-discuss@opensolaris.org
Subject: RE: Booting from encrypted ZFS WAS: RE: How to mount encrypted file
system at boot? Why no pass phraserequesed

Hi Troels,

There are two things here. First, I don't want to learn another set of
administration tasks (I've just had a quick look at Trusted Extensions and
am shuddering at the thought) and second, the problem isn't when the system
is running but when it is stopped. I believe the problem is called data at
rest. Also, notice the line where I said the auditors like a simple story.
They really do.

I still want to be able to print and use the network without incurring lots
of admin, re-programming or performance overhead. (Our applications are very
network heavy.) But, when I shutdown I want the data on the disks to be
un-intelligible.

In terms of management/learning overhead, we are very familiar with tracking
and accounting for documents and keys, so having a few extra keys and usb
sticks to look after is no problem.

Unfortunately, I don't know enough about grub and zfs booting. So, I shall
resist the temptation of can't it just Almost. I'm sure there's a way.
Chain from authentication phase and getting key to main boot...? (Sorry, I
had to.)

Best regards,
Rob

-Original Message-
From: Troels Nørgaard Nielsen [mailto:tro...@norgaard.co]
Sent: 27 April 2011 11:13
To: Rob O'Leary
Cc: zfs-crypto-discuss@opensolaris.org
Subject: Re: Booting from encrypted ZFS WAS: RE: How to mount encrypted
file system at boot? Why no pass phraserequesed


Hi Rob,

Wouldn't the use of Solaris Trusted Extensions by placing all 'secure'
operations inside a label that can only write to the filesystem (that is
encrypted) with the same label, do for you what the auditors are seeking?
The base idea of Trusted Extensions is that no data can escape it's label
(guarded by syscall checks), to ensure traffic to the label, one can use
IPSec with labeling, etc.

I think Darren is dragging along here, because implementing zfs-crypto on
rpool requires grub to be aware of zfs-crypto, which is kinda hard (e.g.
grub doesn't support multiple vdev or raidz-n yet).

Best regards
Troels Nørgaard
Nørgaard Consultancy

Den 27/04/2011 kl. 09.54 skrev Rob O'Leary:


 Requirements
 The main requirement is to convince our security auditors that all the
data
 on our systems is encrypted. The systems are moved between multiple
trusted
 locations and the principle need is to ensure that, if lost or stolen
while
 on the move, no data can be accessed. The systems are not required to
 operate except in a trusted location.

 Storing the data on encrypted zfs filesystems seems like it should be
 sufficient for this. But the counter argument is that you cannot
_guarentee_
 that no data will be accidentally copied onto un-encrypted parts of the
 system, say as part of the print spooling of a data report (by the system

Re: [zfs-discuss] [?] - What is the recommended number of disks for a consumer PC with ZFS

2011-02-07 Thread Rob Clark
References:

Thread: ZFS effective short-stroking and connection to thin provisioning? 
http://opensolaris.org/jive/thread.jspa?threadID=127608

Confused about consumer drives and zfs can someone help?
http://opensolaris.org/jive/thread.jspa?threadID=132253

Recommended RAM for ZFS on various platforms
http://opensolaris.org/jive/thread.jspa?threadID=132072

Performance advantages of spool with 2x raidz2 vdevs vs. Single vdev - Spindles
http://opensolaris.org/jive/thread.jspa?threadID=132127
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] problem adding second MD1000 enclosure to LSI 9200-16e

2011-01-10 Thread Rob Cohen
As a follow-up, I tried a SuperMicro enclosure (SC847E26-RJBOD1).  I have 3 
sets of 15 drives.  I got the same results when I loaded the second set of 
drives (15 to 30).

Then, I tried changing the LSI 9200's BIOS setting for max INT 13 drives from 
24 (the default) to 15.  From then on, the SuperMicro enclosure worked fine, 
even with all 45 drives, and no kernel hangs.

I suspect that the BIOS setting would have worked with 1 MD1000 enclosure, but 
I never tested the MD1000s, after I had the SuperMicro enclosure running.

I'm not sure if the kernal hang with max int13=24 was a hardware problem, or a 
Solaris bug.
  - Rob

 I have 15x SAS drives in a Dell MD1000 enclosure,
 attached to an LSI 9200-16e.  This has been working
 well.  The system is boothing off of internal drives,
 on a Dell SAS 6ir.
 
 I just tried to add a second storage enclosure, with
 15 more SAS drives, and I got a lockup during Loading
 Kernel.  I got the same results, whether I daisy
 chained the enclosures, or plugged them both directly
 into the LSI 9200.  When I removed the second
 enclosure, it booted up fine.
 
 I also have an LSI MegaRAID 9280-8e I could use, but
 I don't know if there is a way to pass the drives
 through, without creating RAID0 virtual drives for
 each drive, which would complicate replacing disks.
 The 9280 boots up fine, and the systems can see new
  virtual drives.
 
 Any suggestions?  Is there some sort of boot
 procedure, in order to get the system to recognize
 the second enclosure without locking up?  Is there a
 special way to configure one of these LSI boards?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] l2arc_noprefetch

2010-11-21 Thread Rob Cohen
When running real data, as opposed to benchmarks, I notice that my l2arc stops 
filling, even though the majority of my reads are still going to primary 
storage.  I'm using 5 SSDs for L2ARC, so I'd expect to get good throughput, 
even with sequential reads.

I'd like to experiment with disabling the l2arc_noprefetch feature, to see how 
the performance compares by caching more data.  How exactly do I do that?

Right now, I added the following line to /etc/system, but it doesn't seem to 
have made a difference.  I'm still seeing most of my reads go to primary 
storage, even though my cache should be warm by now, and my SSDs are far from 
full.

set zfs:l2arc_noprefetch = 0

Am I setting this wrong?  Am misunderstanding this option?

Thanks,
  Rob
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] problem adding second MD1000 enclosure to LSI 9200-16e

2010-11-21 Thread Rob Cohen
I have 15x SAS drives in a Dell MD1000 enclosure, attached to an LSI 9200-16e.  
This has been working well.  The system is boothing off of internal drives, on 
a Dell SAS 6ir.

I just tried to add a second storage enclosure, with 15 more SAS drives, and I 
got a lockup during Loading Kernel.  I got the same results, whether I daisy 
chained the enclosures, or plugged them both directly into the LSI 9200.  When 
I removed the second enclosure, it booted up fine.

I also have an LSI MegaRAID 9280-8e I could use, but I don't know if there is a 
way to pass the drives through, without creating RAID0 virtual drives for each 
drive, which would complicate replacing disks.  The 9280 boots up fine, and the 
systems can see new virtual drives.

Any suggestions?  Is there some sort of boot procedure, in order to get the 
system to recognize the second enclosure without locking up?  Is there a 
special way to configure one of these LSI boards?

Thanks,
   Rob
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] problem adding second MD1000 enclosure to LSI 9200-16e

2010-11-21 Thread Rob Cohen
Markus,
I'm pretty sure that I have the MD1000 plugged in properly, especially since 
the same connection works on the 9280 and Perc 6/e.  It's not in split mode.

Thanks for the suggestion, though.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] WarpDrive SLP-300

2010-11-17 Thread Rob Logan

 BTW, any new storage-controller-related drivers introduced in snv151a?

the 64bit driver in 147
-rwxr-xr-x   1 root sys   401200 Sep 14 08:44 mpt
-rwxr-xr-x   1 root sys   398144 Sep 14 09:23 mpt_sas
is a different size than 151a
-rwxr-xr-x   1 root sys   400936 Nov 15 23:05 /kernel/drv/amd64/mpt
-rwxr-xr-x   1 root sys   399952 Nov 15 23:06 /kernel/drv/amd64/mpt_sas

and mpt_sas has a new printf:
reset was running, this event can not be handled this time

Rob

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs record size implications

2010-11-10 Thread Rob Cohen
Thanks, Richard.  Your answers were very helpful.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] zfs record size implications

2010-11-04 Thread Rob Cohen
I have read some conflicting things regarding the ZFs record size setting.  
Could you guys verify/correct my these statements:

(These reflect my understanding, not necessarily the facts!)

1) The ZFS record size in a zvol is the unit that dedup happens at.  So, for a 
volume that is shared to an NTFS machine, if the NTFS cluster size is smaller 
than the zvol record size, dedup will get dramatically worse, since it won't 
dedup clusters that are positioned differently in zvol records.

2) For shared folders, the record size is the allocation unit size, so large 
records can waste a substantial amount of space, in cases with lots of very 
small files.  This is different than a HW raid stripe size, which only affects 
performance, not space usage.

3) Although small record sizes have a large RAM overhead for dedup tables, as 
long as the dedup table working set fits in RAM, and the rest fits in L2ARC, 
performance will be good.

Thanks,
   Rob
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] stripes of different size mirror groups

2010-10-28 Thread Rob Cohen
I have a couple drive enclosures:
15x 450gb 15krpm SAS
15x 600gb 15krpm SAS

I'd like to set them up like RAID10.  Previously, I was using two hardware 
RAID10 volumes, with the 15th drive as a hot spare, in each enclosure.

Using ZFS, it could be nice to make them a single volume, so that I could share 
L2ARC and ZIL devices, rather than buy two sets.

It appears possible to set up 7x450gb mirrored sets and 7x600gb mirrored sets 
in the same volume, without losing capacity.  Is that a bad idea?  Is there a 
problem with having different stripe sizes, like this?

Thanks,
Rob
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] stripes of different size mirror groups

2010-10-28 Thread Rob Cohen
Thanks, Ian.

If I understand correctly, the performance would then drop to the same level as 
if I set them up as separate volumes in the first place.

So, I get double the performance for 75% of my data, and equal performance for 
25% of my data, and my L2ARC will adapt to my working set across both 
enclosures.

That sounds like all upside, and no downside, unless I'm missing something.

Are there any other problems?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Performance advantages of spool with 2x raidz2 vdevs vs. Single vdev

2010-07-22 Thread Rob Clark
 Hi guys, I am about to reshape my data spool and am wondering what 
 performance diff. I can expect from the new config. Vs. The old.
 
 The old config. Is a pool of a single vdev of 8 disks raidz2.
 The new pool config is 2vdev's of 7 disk raidz2 in a single pool.
 
 I understand it should be better with higher io throughputand 
 better read/write rates...but interested to hear the science behind it.
 
 ...
 
 FYI, it's just a home serverbut I like it.

Some answers (and questions) are here: 
http://www.opensolaris.org/jive/thread.jspa?threadID=102368tstart=0


*** We need this explained in the ZFS FAQ by a Panel of Experts ***

Q: I (we) have a Home Computer and desire to use ZFS with a few large, cheap, 
(consumer-grade) Drives. What can I expect 
from 3 Drives, would I be better off with 4 or 5. Please note: I doubt I can 
afford as many as 10 Drives nor could I stuff them 
into my Box so please suggest options that use less than that many (most 
prefefably less than 7).

A: ?


Thanks,
Rob
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [?] - What is the recommended number of disks for a consumer PC with ZFS

2010-07-22 Thread Rob Clark
 I'm building my new storage server, all the parts should come in this week.
 ...
Another answer is here: 
http://eonstorage.blogspot.com/2010/03/whats-best-pool-to-build-with-3-or-4.html

Rob
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Confused about consumer drives and zfs can someone help?

2010-07-22 Thread Rob Clark
 I wanted to build a small back up (maybe also NAS) server using 
A common question that I am trying to get answered (and have a few) here: 
http://www.opensolaris.org/jive/thread.jspa?threadID=102368tstart=0

Rob
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Recommended RAM for ZFS on various platforms

2010-07-22 Thread Rob Clark
 I'm currently planning on running FreeBSD with ZFS, but I wanted to 
 double-check how much memory I'd need for it to be stable. The ZFS 
 wiki currently says you can go as low as 1 GB, but recommends 2 GB; 
 however, elsewhere I've seen someone claim that you need at least 4 GB.
 ...
 How about other OpenSolaris-based OSs, like NexentaStor?  
 ...
 If it matters, I'm currently planning on RAID-Z2 with 4x500GB 
 consumer-grade SATA drives.  ...  This is on an AMD64 system, 
 and the OS in question will be running inside of VirtualBox ...
 Thanks,
 Michael
 

Buy the biggest Chips you can afford and if you need to pair them (for 
performance) 
do so. You want to keep as many Memory Slots open as you can so you can add 
more 
memory later. I think you (or I) would be unhappy with a measly 4GB in a new 
System 
but in reality it would be OK.

If it is not OK (for you) then you have open Memory Slots in which to add more
Chips (which you are certain to want to do in the future).

Rob
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [?] - What is the recommended number of disks for a consumer PC with ZFS

2010-07-18 Thread Rob Clark
 I'm building my new storage server, all the parts should come in this week...

How did it turn out ? Did 8x1TB Drives seem to be the correct number or a 
couple too many (based on 
the assumption that you did not run out of space; I mean solely from a 
performance / 'ZFS usability' 
standpoint - as opposed to over three dozen tiny Drives).

Thanks for your reply,
Rob
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] reconstruct recovery of rpool zpool and zfs file system with bad sectors

2010-05-25 Thread Rob Levy
Roy,

Thanks for your reply. 

I did get a new drive and attempted the approach (as you have suggested pre 
your reply) however once booted off the OpenSolaris Live CD (or the rebuilt new 
drive), I was not able to import the rpool (which I had established had sector 
errors). I expect I should have had some success if the vdev labels were intact 
(I currently suspect some critical boot files are impacted by bad sectors 
resulting in failed boot attempts from that partition slice). Unfortunately, I 
didn't keep a copy of the messages (if any - I have tried many permutations 
since).

At my last attempt ... I installed knoppix (debian) on one of the partitions  
(also allowed access to smartctl and hdparm too - I was hoping to reduce the 
read timeout to speed up the exercise), then added zfs-fuse (to access the 
space I will use to stage the recovery file) and added dd_rescue and gnu 
ddrescue packages. smartctl appears not to be able to manage the disk while 
attached to usb (but I am guessing because don't have much experience with it).

At this point I attempted dd_rescue to create an image of the partition with 
bad sectors (hoping there were efficiencies beyong normal dd) but it was at 
5.6GB in 36 hours, so again I needed to abort however it does log the blocks 
attempted so far so hopefully I can skip past them when I next get an 
opportunity. Although it does now appear that gnu ddrescue is the preferred of 
the two utilities which I may opt to use to look at creating an image of the 
partition before attempting recovery of the slice (rpool).

As an aside, I noticed that the knoppix 'dmesg | grep sd' command which 
reflects the primary partition devices, no longer appears to reflect the 
solaris partition (p2) slice devices (as it would the extended p4 partitions 
logical partition devices configured). I suspect due to this, the rpool (one of 
the solaris partition slices) appears not to be detected by the knoppix 
zfs-fuse 'zpool import' (although I can access the zpool which exists on 
partition p3). I wonder if this is related to the transition from ufs to zfs?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] reconstruct recovery of rpool zpool and zfs file system with bad sectors

2010-05-20 Thread Rob Levy
Folks I posted this question on (OpenSolaris - Help) without any replies 
http://opensolaris.org/jive/thread.jspa?threadID=129436tstart=0 and am 
re-posting here in the hope someone can help ... I have updated the wording a 
little too (in an attempt to clarify)

I currently use OpenSolaris on a Toshiba M10 laptop.

One morning the system wouldn't boot OpenSolaris 2009.06 (it was simply unable 
progress to the second stage grub). On further investigation I discovered the 
hdd partition slice with rpool appeared to have bad sectors.

Faced with either a rebuild or an attempt at recovery, I first made an attempt 
to recover the slice before rebuilding.

The c7t0d0 HDD (p0) was divided into p1 (NTFS 24GB), p2 (OpenSolaris 24GB), p3 
(OpenSolaris zfs pool for data 160GB) and p4 (50GB extended with 32GB pcfs, 
12GB linux and linux swap) partitions (or something close to that). On the 
first Solaris partition (p2), slice 0 was the OpenSolaris rpool zpool.

To attempt recovery I booted the OpenSolaris 2009.06 live CD and was able to 
import the ZFS pool which was configured on p3. On the p2 device (Solaris boot 
partition which wouldn't boot) I then ran dd if=/dev/rdsk/c7t0d0s2 bs=512 
conv=sync, noerror of=/p0/s2image.dd.

Due to sector read error timeouts, this took longer than my maintenance window 
allowed and I ended up aborting the attempt with a significant amount of 
sectors already captured. 

On block examination of this (so far) captured image.dd, I noticed the first 
two s0 vdev labels appeared to be intact. I then skipped the expected number of 
s2 sectors to get to the s0 start and copied blocks to attempt to reconstruct 
the s0 rpool (against this I ran zdb -l which reported the first two labels) 
and gave me the encouragement necessary to continue the exercise.

At the next opportunity I ran the command again using the skip directive to 
capture the balance of slice. The result was that I had two files (images) 
comprising the good c7t0d0s0 sectors (with I expect the bad padded) Ie. an 
s0image_start.dd and s0image_end.dd

As mentioned at this stage I was able to run 'zfs -l s0image_start.dd' and see 
the first two vdev labels and 'zfs -l s0image_end.dd' and see the last two vdev 
labels.

I then combined the two files (I tried various approaches eg. cat and dd with 
the append directive) however only the first two vdev labels appear to be 
readable in the resulting s0image_s0.dd? The resulting file size, which I 
expect is largely good sectors with padding for bad sectors, matches that of 
the prtvtoc s0 sectors count multiplied by 512.

Can anyone advise .. why I am unable to read the third and forth vdev labels 
once the start and end files are combined?

Is there another approach that may prove more fruitful?

Once I have the file (with labels being in the correct places) I was intending 
to attempt to import the vdev zpool as rpool2 or attempt any repair procedures 
I could locate (as far as was possible anyway) to see what data could be 
recovered (besides it was an opportunity to get another close look at ZFS).

Incidentally *only* the c7t0d0s0 slice appeared to have bad sectors (I do 
wonder what the significance is of this?).
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Does ZFS use large memory pages?

2010-05-06 Thread Rob
Hi Gary,
I would not remove this line in /etc/system.
We have been combatting this bug for a while now on our ZFS file system running 
JES Commsuite 7. 

I would be interested in finding out how you were able to pin point the 
problem. 

We seem to have no worries with the system currently, but when the file system 
gets above 80% we seems to have quite a number of issues, much the same as what 
you've had in the past, ps and prstats hanging.

are you able to tell me the IDR number that you applied?

Thanks,
Rob
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Snv_126 Kernel PF Panic

2010-04-09 Thread Rob Cherveny
Hey All,

I'm having some issues with a snv_126 file server running on an HP ML370 G6
server with an Adaptec Raid Card (31605). The server has the rpool, plus two
raidz2 data pools (one is 1.5TB and 1.0TB respectively). I have been using
e-sata to backup the pools to a pool that contains 3x 1.5 Tb drives every
week.  This has all worked great for the last 4 or so months.

Starting last week, the machine would panic and reboot when attempting to
perform a backup. This week, the machine has been randomly rebooting every
3-15 hours (with or without backup pool attached), complaining of:

(#pf Page fault) rp=ff0010568eb0 addr=30 occurred in module zfs due to
a NULL pointer dereference

I use cron to perform a scrub of all pools every night, and there have been
no errors what so ever.

Below is the output from mdb $C on the core dump:

rcher...@stubborn2:/var/crash/Stubborn2$ mdb 0
Loading modules: [ unix genunix specfs dtrace mac cpu.generic uppc pcplusmp
rootnex scsi_vhci zfs sd sockfs ip hook neti sctp arp usba uhci fctl md lofs
fcip fcp cpc random crypto smbsrv nfs logindmux ptm ufs nsmb sppp ipc ]
 $C
ff000f4ef3b0 vdev_is_dead+0xc(0)
ff000f4ef3d0 vdev_readable+0x16(0)
ff000f4ef410 vdev_mirror_child_select+0x61(ff02fa41da10)
ff000f4ef450 vdev_mirror_io_start+0xda(ff02fa41da10)
ff000f4ef490 zio_vdev_io_start+0x1ba(ff02fa41da10)
ff000f4ef4c0 zio_execute+0xa0(ff02fa41da10)
ff000f4ef4e0 zio_nowait+0x42(ff02fa41da10)
ff000f4ef580 arc_read_nolock+0x82d(0, ff02d716b000,
ff02e3fdc000, 0, 0, 6, 3, ff000f4ef65c, ff000f4ef670)
ff000f4ef620 arc_read+0x75(0, ff02d716b000, ff02e3fdc000,
ff02e3a7f928, 0, 0, 6, 3, ff000f4ef65c, ff000f4ef670)
ff000f4ef6c0 dbuf_prefetch+0x131(ff02e3a80018, 20)
ff000f4ef710 dmu_zfetch_fetch+0xa8(ff02e3a80018, 20, 1)
ff000f4ef750 dmu_zfetch_dofetch+0xb8(ff02e3a80278, ff02f4c52868)
ff000f4ef7b0 dmu_zfetch_find+0x436(ff02e3a80278, ff000f4ef7c0,
1)
ff000f4ef870 dmu_zfetch+0xac(ff02e3a80278, 2b, 4000, 1)
ff000f4ef8d0 dbuf_read+0x170(ff02f3d8ea00, 0, 2)
ff000f4ef950 dnode_hold_impl+0xed(ff02e2a2f040, 1591, 1,
ff02e4e71478, ff000f4ef998)
ff000f4ef980 dnode_hold+0x2b(ff02e2a2f040, 1591, ff02e4e71478,
ff000f4ef998)
ff000f4ef9e0 dmu_tx_hold_object_impl+0x4a(ff02e4e71478,
ff02e2a2f040, 1591, 2, 0, 0)
ff000f4efa00 dmu_tx_hold_bonus+0x2a(ff02e4e71478, 1591)
ff000f4efa50 zfs_inactive+0x99(ff030213ae80, ff02d4ed6d88, 0)
ff000f4efaa0 fop_inactive+0xaf(ff030213ae80, ff02d4ed6d88, 0)
ff000f4efac0 vn_rele+0x5f(ff030213ae80)
ff000f4efae0 smb_node_free+0x7d(ff02f098b2a0)
ff000f4efb10 smb_node_release+0x9a(ff02f098b2a0)
ff000f4efb30 smb_ofile_delete+0x76(ff03026d5d18)
ff000f4efb60 smb_ofile_release+0x84(ff03026d5d18)
ff000f4efb80 smb_request_free+0x23(ff02fa4b0058)
ff000f4efbb0 smb_session_worker+0x6e(ff02fa4b0058)
ff000f4efc40 taskq_d_thread+0xb1(ff02e51b9e90)
ff000f4efc50 thread_start+8()
 


I can provide any other info that may be need. Thank you in advance for your
help!

Rob
-- 
Rob Cherveny 

Manager of Information Technology
American Junior Golf Association
770.868.4200 


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] sharing a ssd between rpool and l2arc

2010-03-30 Thread Rob Logan


 you can't use anything but a block device for the L2ARC device.

sure you can... 
http://mail.opensolaris.org/pipermail/zfs-discuss/2010-March/039228.html
it even lives through a reboot (rpool is mounted before other pools)

zpool create -f test c9t3d0s0 c9t4d0s0
zfs create -V 3G rpool/cache
zpool add test cache /dev/zvol/dsk/rpool/cache
reboot

if your asking for a L2ARC on rpool, well, yea, its not mounted soon enough, 
but the
point is to put rpool, swap, and L2ARC for your storage pool all on a single
SSD..

Rob

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-03-30 Thread Rob Logan

 if you disable the ZIL altogether, and you have a power interruption, failed 
 cpu, 
 or kernel halt, then you're likely to have a corrupt unusable zpool

the pool will always be fine, no matter what.

 or at least data corruption. 

yea, its a good bet that data sent to your file or zvol will not be there
when the box comes back, even though your program had finished seconds 
before the crash.

Rob

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SSD As ARC

2010-03-28 Thread Rob Logan
 Can't you slice the SSD in two, and then give each slice to the two zpools?
 This is exactly what I do ... use 15-20 GB for root and the rest for an L2ARC.

I like the idea of swapping on SSD too, but why not make a zvol for the L2ARC
so your not limited by the hard partitioning?

Rob

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SSD As ARC

2010-03-28 Thread Rob Logan

 I like the idea of swapping on SSD too, but why not make a zvol for the L2ARC
 so your not limited by the hard partitioning?

it lives through a reboot.. 

zpool create -f test c9t3d0s0 c9t4d0s0
zfs create -V 3G rpool/cache
zpool add test cache /dev/zvol/dsk/rpool/cache
reboot 
zpool status
  pool: rpool
 state: ONLINE
 scrub: none requested
config:

NAME  STATE READ WRITE CKSUM
rpool ONLINE   0 0 0
  mirror  ONLINE   0 0 0
c9t1d0s0  ONLINE   0 0 0
c9t2d0s0  ONLINE   0 0 0

errors: No known data errors

  pool: test
 state: ONLINE
 scrub: none requested
config:

NAME STATE READ WRITE CKSUM
test ONLINE   0 0 0
  c9t3d0s0   ONLINE   0 0 0
  c9t4d0s0   ONLINE   0 0 0
cache
  /dev/zvol/dsk/rpool/cache  ONLINE   0 0 0

errors: No known data errors

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS send and receive corruption across a WAN link?

2010-03-18 Thread Rob
Can a ZFS send stream become corrupt when piped between two hosts across a WAN 
link using 'ssh'?

For example a host in Australia sends a stream to a host in the UK as follows:

# zfs send tank/f...@now | ssh host.uk receive tank/bar
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Poor ZIL SLC SSD performance

2010-02-19 Thread Rob Logan

 An UPS plus disabling zil, or disabling synchronization, could possibly
 achieve the same result (or maybe better) iops wise.
Even with the fastest slog, disabling zil will always be faster... 
(less bytes to move)

 This would probably work given that your computer never crashes
 in an uncontrolled manner. If it does, some data may be lost
 (and possibly the entire pool lost, if you are unlucky).
the pool would never be at risk, but when your server
reboots, its clients will be confused that things
it sent, and the server promised it had saved, are gone.
For some clients, this small loss might be the loss of their 
entire dataset.

Rob

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Reading ZFS config for an extended period

2010-02-15 Thread Rob Logan

  RFE open to allow you to store [DDT] on a separate top level VDEV

hmm, add to this spare, log and cache vdevs, its to the point of making
another pool and thinly provisioning volumes to maintain partitioning  
flexibility.

taemun: hay, thanks for closing the loop!

Rob
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Cores vs. Speed?

2010-02-06 Thread Rob Logan
 I like the original Phenom X3 or X4 

we all agree ram is the key to happiness. The debate is what offers the most ECC
ram for the least $. I failed to realize the AM3 cpus accepted UnBuffered ECC 
DDR3-1333
like Lynnfield. To use Intel's 6 slots vs AMD 4 slots, one must use Registered 
ECC.
So the low cost mission is something like

AMD Phenom II X4 955 Black Edition Deneb 3.2GHz Socket AM3 125W 
$150 http://www.newegg.com/Product/Product.aspx?Item=N82E16819103808  
$ 85 http://www.newegg.com/Product/Product.aspx?Item=N82E16813131609  
$ 60 http://www.newegg.com/Product/Product.aspx?Item=N82E16820139050

But we are still stuck at 8G without going to expensive ram or
a more expensive CPU.

Rob
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Cores vs. Speed?

2010-02-05 Thread Rob Logan


 if zfs overlaps mirror reads across devices.

it does... I have one very old disk in this mirror and
when I attach another element one can see more reads going
to the faster disks... this past isn't right after the attach
but since the reboot, but one can still see the reads are
load balanced depending on the response of elements
in the vdev.

13 % zpool iostat -v
   capacity operationsbandwidth
pool used  avail   read  write   read  write
--  -  -  -  -  -  -
rpool   7.01G   142G  0  0  1.60K  1.44K
  mirror7.01G   142G  0  0  1.60K  1.44K
c9t1d0s0  -  -  0  0674  1.46K
c9t2d0s0  -  -  0  0687  1.46K
c9t3d0s0  -  -  0  0720  1.46K
c9t4d0s0  -  -  0  0750  1.46K


but I also support your conclusions.

Rob

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Cores vs. Speed?

2010-02-05 Thread Rob Logan

 Intel's RAM is faster because it needs to be.
I'm confused how AMD's dual channel, two way interleaved 
128-bit DDR2-667 into an on-cpu controller is faster than
Intel's Lynnfield dual channel, Rank and Channel interleaved 
DDR3-1333 into an on-cpu controller. 
http://www.anandtech.com/printarticle.aspx?i=3634

 With the AMD CPU, the memory will run cooler and be cheaper. 
cooler yes, but only $2 more per gig for 2x bandwidth?

http://www.newegg.com/Product/Product.aspx?Item=N82E16820139050
http://www.newegg.com/Product/Product.aspx?Item=N82E16820134652

and if one uses all 16 slots, that 667Mhz simm runs at 533Mhz
with AMD. The same is true for Lynnfield if one uses Registered
DDR3, one only gets 800Mhz with all 6 slots. (single or dual rank)

 Regardless, for zfs, memory is more important than raw CPU 
agreed! but everything must be balanced.

Rob
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Cores vs. Speed?

2010-02-04 Thread Rob Logan

  I am leaning towards AMD because of ECC support 

well, lets look at Intel's offerings... Ram is faster than AMD's
at 1333Mhz DDR3 and one gets ECC and thermal sensor for $10 over non-ECC 
http://www.newegg.com/Product/Product.aspx?Item=N82E16820139040

This MB has two Intel ethernets and for an extra $30 an ether KVM (LOM)
http://www.newegg.com/Product/Product.aspx?Item=N82E16813182212

One needs a Xeon 34xx for ECC, the 45W versions isn't on newegg, and ignoring
the one without Hyper-Threading leaves us 
http://www.newegg.com/Product/Product.aspx?Item=N82E16819117225

Yea @ 95W it isn't exactly low power, but 4 cores @ 2533MHz and another
4 Hyper-Thread cores is nice.. If you only need one core, the marketing
paperwork claims it will push to 2.93GHz too. But the ram bandwidth is the 
big win for Intel. 

Avoid the temptation, but @ 2.8Ghz without ECC, this close $$
http://www.newegg.com/Product/Product.aspx?Item=N82E16819115214

Now, this gets one to 8G ECC easily...AMD's unfair advantage is all those
ram slots on their multi-die MBs... A slow AMD cpu with 64G ram
might be better depending on your working set / dedup requirements.

Rob



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] verging OT: how to buy J4500 w/o overpriced drives

2010-02-02 Thread Rob Logan

 true. but I buy a Ferrari for the engine and bodywork and chassis
 engineering. It is totally criminal what Sun/EMC/Dell/Netapp do charging

its interesting to read this with another thread containing:

 timeout issue is definitely the WD10EARS disks.
 replaced 24 of them with ST32000542AS (f/w CC34), and the problem departed 
with the WD disks.

everyone needs to eat, if Ferrari spreads their NRE over
the wheels, it might be because they are light and have
been tested to not melt from the heat. Sun/EMC/Dell/Netapp
tests each of their components and sells the total car.

I'm thankful Sun shares their research and we can build on it.
(btw, netapp ontap 8 is freebsd, and runs on std hardware
after alittle bios work :-)

Rob
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Best 1.5TB drives for consumer RAID?

2010-01-24 Thread Rob Logan

 a 1U or 2U JBOD chassis for 2.5 drives,
from http://supermicro.com/products/nfo/chassis_storage.cfm 
the E1 (single) or E2 (dual) options have a SAS expander so
http://supermicro.com/products/chassis/2U/?chs=216
fits your build or build it your self with
http://supermicro.com/products/accessories/mobilerack/CSE-M28E2.cfm


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] 4 Internal Disk Configuration

2010-01-14 Thread Rob Logan


 By partitioning the first two drives, you can arrange to have a small
 zfs-boot mirrored pool on the first two drives, and then create a second
 pool as two mirror pairs, or four drives in a raidz to support your data.

agreed..

2 % zpool iostat -v
 capacity operationsbandwidth
pool   used  avail   read  write   read  write
  -  -  -  -  -  -
r 8.34G  21.9G  0  5  1.62K  17.0K
  mirror  8.34G  21.9G  0  5  1.62K  17.0K
c5t0d0s0  -  -  0  2  3.30K  17.2K
c5t1d0s0  -  -  0  2  3.66K  17.2K
  -  -  -  -  -  -
z  375G   355G  6 32  67.2K   202K
  mirror   133G   133G  2 14  24.7K  84.2K
c5t0d0s7  -  -  0  3  53.3K  84.3K
c5t1d0s7  -  -  0  3  53.2K  84.3K
  mirror   120G   112G  1  9  21.3K  59.6K
c5t2d0-  -  0  2  38.4K  59.7K
c5t3d0-  -  0  2  38.2K  59.7K
  mirror   123G   109G  1  8  21.3K  58.6K
c5t4d0-  -  0  2  36.4K  58.7K
c5t5d0-  -  0  2  37.2K  58.7K
  -  -  -  -  -  -

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs, raidz, spare and jbod

2010-01-10 Thread Rob
Hello Arnaud,

Thanks for your reply.

We have a system ( 2 x  Xeon 5410, Intel S5000PSL  mobo and 8 GB memory) with 
12 x 500 GB SATA disks on a Areca 1130 controller. rpool is a mirror over 2 
disks. 8 disks in raidz2, 1 spare. We have 2 aggr links.

Our goal is a ESX storage system, I am using ISCSI and NFS to serve space to 
our ESX 4.0 servers.

We can remove a disk, with no problem. I can do a replace and the disk is being 
resilverd. That works fine here.

Our problem comes when we make it the server a little bit harder! When we give 
the server a hard time, copy 60G+ of data or do some other stuff to give the 
system some load it hangs. This happens after 5 minutes or after 30 minutes or 
later but it hangs. Then we get the problems of the attached pictures.

I have also emaild Areca. I'll hope the can fix it..

Regards,

Rob
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] unable to zfs destroy

2010-01-08 Thread Rob Logan

this one has me alittle confused. ideas?

j...@opensolaris:~# zpool import z
cannot mount 'z/nukeme': mountpoint or dataset is busy
cannot share 'z/cle2003-1': smb add share failed
j...@opensolaris:~# zfs destroy z/nukeme
internal error: Bad exchange descriptor
Abort (core dumped)
j...@opensolaris:~# adb core
core file = core -- program ``/sbin/zfs'' on platform i86pc
SIGABRT: Abort
$c
libc_hwcap1.so.1`_lwp_kill+0x15(1, 6, 80462a8, fee9bb5e)
libc_hwcap1.so.1`raise+0x22(6, 0, 80462f8, fee7255a)
libc_hwcap1.so.1`abort+0xf2(8046328, fedd, 8046328, 8086570, 8086970, 400)
libzfs.so.1`zfs_verror+0xd5(8086548, 813, fedc5178, 804635c)
libzfs.so.1`zfs_standard_error_fmt+0x225(8086548, 32, fedc5178, 808acd0)
libzfs.so.1`zfs_destroy+0x10e(808acc8, 0, 0, 80479c8)
destroy_callback+0x69(808acc8, 8047910, 80555ec, 8047910)
zfs_do_destroy+0x31f(2, 80479c8, 80479c4, 80718dc)
main+0x26a(3, 80479c4, 80479d4, 8053fdf)
_start+0x7d(3, 8047ae4, 8047ae8, 8047af0, 0, 8047af9)
^d
j...@opensolaris:~# uname -a
SunOS opensolaris 5.11 snv_130 i86pc i386 i86pc
j...@opensolaris:~# zpool status -v z
  pool: z
 state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
entire pool from backup.
   see: http://www.sun.com/msg/ZFS-8000-8A
 scrub: scrub in progress for 0h39m, 19.15% done, 2h46m to go
config:

NAMESTATE READ WRITE CKSUM
z   ONLINE   0 0 2
  c3t0d0s7  ONLINE   0 0 4
  c3t1d0s7  ONLINE   0 0 0
  c2d0  ONLINE   0 0 4

errors: Permanent errors have been detected in the following files:

z/nukeme:0x0

j...@opensolaris:~# zfs list z/nukeme
NAME   USED  AVAIL  REFER  MOUNTPOINT
z/nukeme  49.0G   496G  49.0G  /z/nukeme
j...@opensolaris:~# zdb -d z/nukeme 0x0
zdb: can't open 'z/nukeme': Device busy

there is also no mount point /z/nukeme

any ideas how to nuke /z/nukeme?


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Update - mpt errors on snv 101b

2009-12-08 Thread Rob Nelson
I can report io errors with Chenbro based LSI SASx36 IC based  
expanders tested with 111b/121/128a/129.  The HBA was LSI 1068 based.   
If I bypass expander by adding more HBA controllers, mpt does not have  
io errors.


-nola


On Dec 8, 2009, at 6:48 AM, Bruno Sousa wrote:


Hi James,

Thank you for your feedback, and i will send the prtconf -v output for
your email.
I also have another system where i can test something if that's the
case, and if you need extra information or even access to the system,
please let me know it.

Thank you,
Bruno

James C. McPherson wrote:

Bruno Sousa wrote:

Hi all,

During this problem i did a power-off/power-on in the server and the
bus reset/scsi timeout issue persisted. After that i decided to
poweroff/power on the jbod array, and after that everything became
normal.
No scsi timeouts, normal performance, everything is okay now.
With this is it safe to assume that the problem may becaused by the
SAS expander (one single LSI SASX36 Expander Chip) used by the
supermicro jbod chassis, and not by the hba/mpt driver?


Hi Bruno,
that is indeed what I, personally, suspect is the case. Tracking
that down and conclusively proving so is, however, another thing
entirely.

Could you send the output from prtconf -v for your host please,
so that we can have a look at the vital information for the
enclosure services and SMP nodes that the SAS Expander presents/



thankyou,
James C. McPherson
--
Senior Kernel Software Engineer, Solaris
Sun Microsystems
http://blogs.sun.com/jmcphttp://www.jmcp.homeunix.com/blog



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] How can we help fix MPT driver post build 129

2009-12-05 Thread Rob Nelson
How can we help with what is outlined below.  I can reproduce these at will, so 
if anyone at Sun would like an environment to test this situation let me know.

What is the best info to grab for you folks to help here?

Thanks - nola



This is in regard to these threads:

http://www.opensolaris.org/jive/thread.jspa?messageID=421400#421400
http://www.opensolaris.org/jive/thread.jspa?threadID=118947tstart=0
http://www.opensolaris.org/jive/thread.jspa?threadID=117702tstart=1
http://www.opensolaris.org/jive/thread.jspa?messageID=437031tstart=0

And bug IDs: 

6894775 mpt driver timeouts and bus resets under load
6900767 Server hang with LSI 1068E based SAS controller under load

Exec Summary:  Those using the LSI 1068 chipset with the LSI SAS2x IC expander 
have IO errors under load from about build 118 to 129 (last build I tested).

At build 111b, it worked.  If you take the same hardware and load test scripts, 
run under 111b your OK, run under @118 and on you suffer from for example:

Dec  5 08:17:04 gb2000-007 scsi: [ID 365881 kern.info] 
/p...@7a,0/pci8086,3...@7/pci1000,3...@0 (mpt1):
Dec  5 08:17:04 gb2000-007  Log info 0x3000 received for target 79.
Dec  5 08:17:04 gb2000-007  scsi_status=0x0, ioc_status=0x804b, 
scsi_state=0xc
Dec  5 08:17:07 gb2000-007 scsi: [ID 243001 kern.warning] WARNING: 
/p...@7a,0/pci8086,3...@7/pci1000,3...@0 (mpt1):
Dec  5 08:17:07 gb2000-007  SAS Discovery Error on port 4. DiscoveryStatus 
is DiscoveryStatus is |Unaddressable device found|
Dec  5 08:18:09 gb2000-007 scsi: [ID 107833 kern.warning] WARNING: 
/p...@7a,0/pci8086,3...@7/pci1000,3...@0 (mpt1):
Dec  5 08:18:09 gb2000-007  Disconnected command timeout for Target 79
Dec  5 08:18:14 gb2000-007 scsi: [ID 365881 kern.info] 
/p...@7a,0/pci8086,3...@7/pci1000,3...@0 (mpt1):
Dec  5 08:18:14 gb2000-007  Log info 0x3113 received for target 79.
Dec  5 08:18:14 gb2000-007  scsi_status=0x0, ioc_status=0x8048, 
scsi_state=0xc
Dec  5 08:18:17 gb2000-007 scsi: [ID 243001 kern.warning] WARNING: 
/p...@7a,0/pci8086,3...@7/pci1000,3...@0 (mpt1):
Dec  5 08:18:17 gb2000-007  mpt_handle_event_sync: IOCStatus=0x8000, 
IOCLogInfo=0x3000
Dec  5 08:18:17 gb2000-007 scsi: [ID 243001 kern.warning] WARNING: 
/p...@7a,0/pci8086,3...@7/pci1000,3...@0 (mpt1):
Dec  5 08:18:17 gb2000-007  mpt_handle_event: IOCStatus=0x8000, 
IOCLogInfo=0x3000
Dec  5 08:18:19 gb2000-007 scsi: [ID 365881 kern.info] 
/p...@7a,0/pci8086,3...@7/pci1000,3...@0 (mpt1):
Dec  5 08:18:19 gb2000-007  Log info 0x3000 received for target 79.
Dec  5 08:18:19 gb2000-007  scsi_status=0x0, ioc_status=0x804b, 
scsi_state=0xc
Dec  5 08:18:22 gb2000-007 scsi: [ID 243001 kern.warning] WARNING: 
/p...@7a,0/pci8086,3...@7/pci1000,3...@0 (mpt1):
Dec  5 08:18:22 gb2000-007  SAS Discovery Error on port 4. DiscoveryStatus 
is DiscoveryStatus is |Unaddressable device found|
Dec  5 08:19:24 gb2000-007 scsi: [ID 107833 kern.warning] WARNING: 
/p...@7a,0/pci8086,3...@7/pci1000,3...@0 (mpt1):
Dec  5 08:19:24 gb2000-007  Disconnected command timeout for Target 79
Dec  5 08:19:29 gb2000-007 scsi: [ID 365881 kern.info] 
/p...@7a,0/pci8086,3...@7/pci1000,3...@0 (mpt1):
Dec  5 08:19:29 gb2000-007  Log info 0x3113 received for target 79.
Dec  5 08:19:29 gb2000-007  scsi_status=0x0, ioc_status=0x8048, 
scsi_state=0xc
Dec  5 08:19:32 gb2000-007 scsi: [ID 243001 kern.warning] WARNING: 
/p...@7a,0/pci8086,3...@7/pci1000,3...@0 (mpt1):
Dec  5 08:19:32 gb2000-007  mpt_handle_event_sync: IOCStatus=0x8000, 
IOCLogInfo=0x3000
Dec  5 08:19:32 gb2000-007 scsi: [ID 243001 kern.warning] WARNING: 
/p...@7a,0/pci8086,3...@7/pci1000,3...@0 (mpt1):
Dec  5 08:19:32 gb2000-007  mpt_handle_event: IOCStatus=0x8000, 
IOCLogInfo=0x3000
Dec  5 08:19:34 gb2000-007 scsi: [ID 365881 kern.info] 
/p...@7a,0/pci8086,3...@7/pci1000,3...@0 (mpt1):
Dec  5 08:19:34 gb2000-007  Log info 0x3000 received for target 79.
Dec  5 08:19:34 gb2000-007  scsi_status=0x0, ioc_status=0x804b, 
scsi_state=0xc
Dec  5 08:19:37 gb2000-007 scsi: [ID 243001 kern.warning] WARNING: 
/p...@7a,0/pci8086,3...@7/pci1000,3...@0 (mpt1):
Dec  5 08:19:37 gb2000-007  SAS Discovery Error on port 4. DiscoveryStatus 
is DiscoveryStatus is |Unaddressable device found|
Dec  5 08:20:39 gb2000-007 scsi: [ID 107833 kern.warning] WARNING: 
/p...@7a,0/pci8086,3...@7/pci1000,3...@0 (mpt1):
Dec  5 08:20:39 gb2000-007  Disconnected command timeout for Target 79
Dec  5 08:20:39 gb2000-007 scsi: [ID 243001 kern.warning] WARNING: 
/p...@7a,0/pci8086,3...@7/pci1000,3...@0 (mpt1):
Dec  5 08:20:39 gb2000-007  mpt_handle_event_sync: IOCStatus=0x8000, 
IOCLogInfo=0x31112000
Dec  5 08:20:44 gb2000-007 scsi: [ID 365881 kern.info] 
/p...@7a,0/pci8086,3...@7/pci1000,3...@0 (mpt1):
Dec  5 08:20:44 gb2000-007  Log info 0x3113 received for target 79.
Dec  5 08:20:44 gb2000-007  scsi_status=0x0, ioc_status=0x8048, 
scsi_state=0xc
Dec  5 08:20:44 gb2000-007 scsi: 

Re: [zfs-discuss] Separate Zil on HDD ?

2009-12-02 Thread Rob Logan


 2 x 500GB mirrored root pool
 6 x 1TB raidz2 data pool
 I happen to have 2 x 250GB Western Digital RE3 7200rpm
 be better than having the ZIL 'inside' the zpool.

listing two log devices (stripe) would have more spindles
than your single raidz2 vdev..  but for low cost fun one
might make a tinny slice on all the disks of the raidz2
and list six log devices (6 way stripe) and not bother
adding the other two disks.

Rob


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Workaround for mpt timeouts in snv_127

2009-11-30 Thread Rob Logan

 Chenbro 16 hotswap bay case.  It has 4 mini backplanes that each connect via 
 an SFF-8087 cable
 StarTech HSB430SATBK 

hmm, both are passive backplanes with one SATA tunnel per link... 
no SAS Expanders (LSISASx36) like those found in SuperMicro or J4x00 with 4 
links per connection. 
wonder if there is a LSI issue with too many links in HBA mode?

Rob

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] scrub differs in execute time?

2009-11-14 Thread Rob Logan

 P45 Gigabyte EP45-DS3P. I put the AOC card into a PCI slot

I'm not sure how many half your disks are or how your vdevs 
are configured, but the ICH10 has 6 sata ports at 300MB and 
one PCI port at 266MB (that's also shared with the IT8213 IDE chip) 

so in an ideal world your scrub bandwidth would be 

300*6 MB with 6 disks on ICH10, in a strip
300*1 MB with 6 disks on ICH10, in a raidz
300*3+(266/3) MB with 3 disks on ICH10, and 3 on shared PCI in a strip
266/3 MB with 3 disks on ICH10, and 3 on shared PCI in a raidz
266/6 MB with 6 disks on shared PCI in a stripe
266/6 MB with 6 disks on shared PCI in a raidz

we know disk don't go that fast anyway, but going from a 8h to 15h 
scrub is very reasonable depending on vdev config.

Rob

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] scrub differs in execute time?

2009-11-14 Thread Rob Logan

 The ICH10 has a 32-bit/33MHz PCI bus which provides 133MB/s at half duplex.

you are correct, I thought ICH10 used a 66Mhz bus, when infact its 33Mhz. The
AOC card works fine in a PCI-X 64Bit/133Mhz slot good for 1,067 MB/s 
even if the motherboard uses a PXH chip via 8 lane PCIE.

Rob

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] raidz-1 vs mirror

2009-11-11 Thread Rob Logan

 from a two disk (10krpm) mirror layout to a three disk raidz-1. 

wrights will be unnoticeably slower for raidz1 because of parity calculation
and latency of a third spindle. but reads will be 1/2 the speed
of the mirror because it can split the reads between two disks.

another way to say the same thing:

a raidz will be the speed of the slowest disk in the array, while a
mirror will be x(Number of mirrors)  time faster for reads or
the the speed of the slowest disk for wrights.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] PSARC recover files?

2009-11-09 Thread Rob Logan


frequent snapshots offer outstanding oops protection.

Rob
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] PSARC recover files?

2009-11-09 Thread Rob Logan


 Maybe to create snapshots after the fact

how does one quiesce a drive after the fact?
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS + fsck

2009-11-04 Thread Rob Warner
ZFS scrub will detect many types of error in your data or the filesystem 
metadata.

If you have sufficient redundancy in your pool and the errors were not due to 
dropped or misordered writes, then they can often be automatically corrected 
during the scrub.

If ZFS detects an error from which it cannot automatically recover, it will 
often instantly lock your entire pool to prevent any read or write access, 
informing you only that you must destroy it and restore from backups to get 
your data back.

Your only recourse in such situations is to do exactly that, or enlist the help 
of Victor Latushkin to attempt to recover your pool using painstaking manual 
manipulation.

Recent putbacks seem to indicate that future releases will provide a mechanism 
to allow mere mortals to recover from some of the errors caused by dropped 
writes.

cheers,

Rob
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] sub-optimal ZFS performance

2009-10-29 Thread Rob Logan


 So the solution is to never get more than 90% full disk space

while that's true, its not Henrik's main discovery. Henrik points
out that 1/4 of the arc is used for metadata, and sometime
that's not enough..

if
echo ::arc | mdb -k | egrep ^size
isn't reaching
echo ::arc | mdb -k | egrep ^c 
and you are maxing out your metadata space, check:
echo ::arc | mdb -k | grep meta_

one can set the metadata space (1G in this case) with:
echo arc_meta_limit/Z 0x400 | mdb -kw

So while Henrik's FS had some fragmentation, 1/4 of c_max wasn't
enough metadata arc space for number of files in /var/pkg/download

good find Henrik!

Rob
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs code and fishworks fork

2009-10-27 Thread Rob Logan


 are you going to ask NetApp to support ONTAP on Dell systems,

well, ONTAP 5.0 is built on freebsd, so it wouldn't be too
hard to boot on dell hardware. Hay, at least it can do
aggregates larger than 16T now...
http://www.netapp.com/us/library/technical-reports/tr-3786.html

Rob
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZPOOL Metadata / Data Error - Help

2009-10-04 Thread Rob Logan


Action: Restore the file in question if possible. Otherwise restore  
the

 entire pool from backup.
 metadata:0x0
 metadata:0x15


bet its in a snapshot that looks to have been destroyed already. try

zpool clear POOL01
zpool scrub POOL01


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] bigger zfs arc

2009-10-02 Thread Rob Logan

 zfs will use as much memory as is necessary but how is necessary 
calculated?

using arc_summary.pl from http://www.cuddletech.com/blog/pivot/entry.php?id=979
my tiny system shows:
 Current Size: 4206 MB (arcsize)
 Target Size (Adaptive):   4207 MB (c)
 Min Size (Hard Limit):894 MB (zfs_arc_min)
 Max Size (Hard Limit):7158 MB (zfs_arc_max)

so arcsize is close to the desired c, no pressure here but it would be nice to 
know
how c is calculated as its much smaller than zfs_arc_max on a system
like yours with nothing else on it.

 When an L2ARC is attached does it get used if there is no memory pressure?

My guess is no. for the same reason an L2ARC takes so long to fill.
arc_summary.pl from the same system is

  Most Recently Used Ghost:0%  9367837 (mru_ghost)  [ Return Customer 
Evicted, Now Back ]
  Most Frequently Used Ghost:  0% 11138758 (mfu_ghost)  [ Frequent Customer 
Evicted, Now Back ]

so with no ghosts, this system wouldn't benefit from an L2ARC even if added

In review:  (audit welcome)

if arcsize = c and is much less than zfs_arc_max,
  there is no point in adding system ram in hopes of increase arc.

if m?u_ghost is a small %, there is no point in adding an L2ARC.

if you do add a L2ARC, one must have ram between c and zfs_arc_max for its 
pointers.

Rob
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] zpool create over old pool recovery

2009-08-24 Thread Rob Levy
Folks,

Need help with ZFS recovery following zfs create ... 

We recently received new laptops (hardware refresh) and I simply transfered the 
multiboot hdd (using OpenSolaris 2008.11 as the primary production OS) from the 
old laptop to the new one (used the live DVD to do the zpool import, updated 
the boot archive and did a devfsadm) and worked away as usual.

I then wanted to use the WinXP distribution which was shipped with the new 
laptop and discovered the existing partition was too small. I bought a new hdd 
and proceeded to partition it as required (1.WinXP 24GB, 2.Solaris 24GB, 
3.Solaris 130GB, 4.extended with logical 5.FAT32 30GB, 6.Linux 20GB, 7.Linux 
swap 6GB).

I would consider myself inexperienced with ZFS (one of the reasons I opted for 
OpenSolaris was to get more familiar with it and the other features before they 
were adopted by customers).

So although I bet there would be more elgant ways to do this I stuck with what 
I know.

I connected the hdd (with functional but inappropriate partition sizes) to the 
usb port.

I 'dd' the first Solaris partition across (OpenSolaris rpool dataset) to the 
new drive (used the live DVD to do the zpool import, update_grub and bootadm) 
then all was as it should be.

It appears I may have been able to do this from the hdd after booting 
OpenSolaris but I wasn't aware of how to deal with two pools of the same name - 
Ie. rpool.

I subsequently copied the linux OS across (booted linux, created ext3fs and 
copied OS files across using tar). Also from Linux, I created the FAT32 
filesystem and copied the data across with tar).  - all okay and functional.

At this point all I need to do was copy the second 130GB Solaris partition (zfs 
filesystem) across and proceeded to create the new pool and zfs file system. My 
intention was to mount the two and simply copy the data across.

What I did do was zpool create to the device where the existing pool (of valid 
data) was and created a zfs filesystem. [b]When I did the zpool status I 
realised what I had done and promptly disconnected the hdd from the usb 
port[/b].

I have googled to see if anyone has successfully recovered data following an 
inadvertent 'zpool create' without success. The Sun url also says the data 
cannot be recovered and should be sourced from a backup.

I dont have a recent backup (and I guess worse yet - don't know what I will 
have lost by going back to a 8 monthish old backup).

So I guess what I'm hoping is that like other filesystems if the zfs 
'superstructures' are removed the data would still be in place and using some 
of the cached detail perhaps it can be pieced back?

I know it's a long shot but as I don't know ZFS well enough, I must ask the 
question.

Some documentation seemed to suggest ZFS would advise if a healthy pool existed 
before blowing it away (perhaps only if it is mounted? this wasn't imported?). 
As there is very obviously a risk here it would be a good time to add any 
possible checks to zpool create.

Does anyone have any recovery advise?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] The importance of ECC RAM for ZFS

2009-07-24 Thread Rob Logan

 The post I read said OpenSolaris guest crashed, and the guy clicked
 the ``power off guest'' button on the virtual machine.

I seem to recall guest hung. 99% of solaris hangs (without
a crash dump) are hardware in nature. (my experience backed by
an uptime of 1116days) so the finger is still
pointed at VirtualBox's hardware implementation.

as for ZFS requiring better hardware, you could turn
off checksums and other protections so one isn't notified
of issues making it act like the others.

Rob
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Another user looses his pool (10TB) in this case and 40 days work

2009-07-20 Thread Rob Logan

  the machine hung and I had to power it off.

kinda getting off the zpool import --tgx -3 request, but
hangs are exceptionally rare and usually ram or other
hardware issue, solairs usually abends on software faults.

r...@pdm #  uptime
  9:33am  up 1116 day(s), 21:12,  1 user,  load average: 0.07, 0.05, 0.05
r...@pdm #  date
Mon Jul 20 09:33:07 EDT 2009
r...@pdm #  uname -a
SunOS pdm 5.9 Generic_112233-12 sun4u sparc SUNW,Ultra-250

Rob

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Understanding SAS/SATA Backplanes and Connectivity

2009-07-16 Thread Rob Logan

 c4 scsi-bus connectedconfigured   unknown
 c4::dsk/c4t15d0disk connectedconfigured   unknown
 :
 c4::dsk/c4t33d0disk connectedconfigured   unknown
 c4::es/ses0ESI  connectedconfigured   unknown

thanks! so SATA disks show up JBOD in IT mode.. Is there some magic that
load balances the 4 SAS ports as this shows up as one scsi-bus?

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS write I/O stalls

2009-06-30 Thread Rob Logan

 CPU is smoothed out quite a lot
yes, but the area under the CPU graph is less, so the
rate of real work performed is less, so the entire
job took longer. (allbeit smoother)

Rob
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS and Dinamic Stripe

2009-06-29 Thread Rob Logan

 try to be spread across different vdevs.

% zpool iostat -v
   capacity operationsbandwidth
pool used  avail   read  write   read  write
--  -  -  -  -  -  -
z686G   434G 40  5  2.46M   271K
  c1t0d0s7   250G   194G 14  1   877K  94.2K
  c1t1d0s7   244G   200G 15  2   948K  96.5K
  c0d0   193G  39.1G 10  1   689K  80.2K


note that c0d0 is basically full, but still serving 10
of every 15 reads, and 82% of the writes.

Rob
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] BugID formally known as 6746456

2009-06-26 Thread Rob Healey
This appears to be the fix related to the ACL's which they seem to throw all of 
the ASSERT panics in zfs_fuid.c under even if they have nothing to do with 
ACL's; my case being one of those.

Thanks for the pointer though!

-Rob
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] BugID formally known as 6746456

2009-06-24 Thread Rob Healey
Does anyone know if related problems to the panic's dismissed as duplicate of 
6746456 ever resulted in Solaris 10 patches? It sounds like they were actually 
solved in OpenSolaris but S10 is still panicing predictably when Linux NFS 
clients try to change a nobody UID/GID on a ZFS exported filesystem.

Specifically the NFS induced panics related to the nobody id not mapping 
correctly, or, more precisely, attempts to change user/group ID nobody causing 
S10u7 to blow chunks in zfs_fuid.c zfs_fuid_table_load's ASSERT?

While the workaround to change the id's on the server is possible, it pretty 
much torpedo's management's view of Solaris' stability and sending fileserver 
duty back to Linux... :( Anybody could create a nobody file and put the system 
into endless boot-loops without this being patched.

I'm hoping further work on this issue was done on the S10 side of the house and 
there is a stealthy patch ID that can fix the issue.

Thanks,

-Rob
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] problems with l2arc in 2009.06

2009-06-18 Thread Rob Logan

 correct ratio of arc to l2arc?

from http://blogs.sun.com/brendan/entry/l2arc_screenshots

It costs some DRAM to reference the L2ARC, at a rate proportional to record 
size.
For example, it currently takes about 15 Gbytes of DRAM to reference 600 Gbytes 
of
L2ARC - at an 8 Kbyte ZFS record size. If you use a 16 Kbyte record size, that 
cost
would be halve - 7.5 Gbytes. This means you shouldn't, for example, configure a
system with only 8 Gbytes of DRAM, 600 Gbytes of L2ARC, and an 8 Kbyte record 
size -
if you did, the L2ARC would never fully populate.


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Replacing HDD with larger HDD..

2009-05-22 Thread Rob Logan

 zpool offline grow /var/tmp/disk01
 zpool replace grow /var/tmp/disk01 /var/tmp/bigger_disk01

one doesn't need to offline before the replace, so as long as you
have one free disk interface one can cfgadm -c configure sata0/6
each disk as you go... or you can offline and cfgadm each
disk in the same port too as you go.

 It is still the same size. I would expect it to go to 9G.

a reboot or export/import would have fixed this.

 cannot import 'grow': no such pool available

you meant to type
zpool import -d /var/tmp grow

Rob
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] RAIDZ2: only half the read speed?

2009-05-22 Thread Rob Logan

 How does one look at the disk traffic?

iostat -xce 1

 OpenSolaris, raidz2 across 8 7200 RPM SATA disks:
 17179869184 bytes (17 GB) copied, 127.308 s, 135 MB/s

 OpenSolaris, flat pool across the same 8 disks:
 17179869184 bytes (17 GB) copied, 61.328 s, 280 MB/s

one raidz2 set of 8 disks can't be faster than the slowest
disk in the set as its one vdev... I would have expected
the 8 vdev set to be 8x faster than the single raidz[12]
set, but like Richard said, there is another bottle
neck in there that iostat will show.

Rob
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SAS 15K drives as L2ARC

2009-05-05 Thread Rob Logan


  use a bunch of 15K SAS drives as L2ARC cache for several TBs of SATA disks?

perhaps... depends on the workload, and if the working set
can live on the L2ARC

 used mainly as astronomical images repository

hmm, perhaps two trays of 1T SATA drives all
mirrors rather than raidz sets of one tray.

ie: pls don't discount how one arranges the vdev
in a given configuration.

Rob

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] zpool import crash, import degraded mirror?

2009-04-29 Thread Rob Logan

When I type `zpool import` to see what pools are out there, it gets to

/1: open(/dev/dsk/c5t2d0s0, O_RDONLY) = 6
/1: stat64(/usr/local/apache2/lib/libdevid.so.1, 0x08042758) Err#2 ENOENT
/1: stat64(/usr/lib/libdevid.so.1, 0x08042758)= 0
/1: d=0x02D90002 i=241208 m=0100755 l=1  u=0 g=2 sz=61756
/1: at = Apr 29 23:41:17 EDT 2009  [ 1241062877 ]
/1: mt = Apr 27 01:45:19 EDT 2009  [ 124089 ]
/1: ct = Apr 27 01:45:19 EDT 2009  [ 124089 ]
/1: bsz=61952 blks=122   fs=zfs
/1: resolvepath(/usr/lib/libdevid.so.1, /lib/libdevid.so.1, 1023) = 18
/1: open(/usr/lib/libdevid.so.1, O_RDONLY)= 7
/1: mmapobj(7, 0x0002, 0xFEC70640, 0x080427C4, 0x) = 0
/1: close(7)= 0
/1: memcntl(0xFEC5, 4048, MC_ADVISE, MADV_WILLNEED, 0, 0) = 0
/1: fxstat(2, 6, 0x080430C0)= 0
/1: d=0x04A0 i=5015 m=0060400 l=1  u=0 g=0 
rdev=0x01800340
/1: at = Nov 19 21:19:26 EST 2008  [ 1227147566 ]
/1: mt = Nov 19 21:19:26 EST 2008  [ 1227147566 ]
/1: ct = Apr 29 23:23:11 EDT 2009  [ 1241061791 ]
/1: bsz=8192  blks=1 fs=devfs
/1: modctl(MODSIZEOF_DEVID, 0x01800340, 0x080430BC, 0xFEC51239, 0xFE8E92C0) 
= 0
/1: modctl(MODGETDEVID, 0x01800340, 0x0038, 0x080D5A48, 0xFE8E92C0) = 0
/1: fxstat(2, 6, 0x080430C0)= 0
/1: d=0x04A0 i=5015 m=0060400 l=1  u=0 g=0 
rdev=0x01800340
/1: at = Nov 19 21:19:26 EST 2008  [ 1227147566 ]
/1: mt = Nov 19 21:19:26 EST 2008  [ 1227147566 ]
/1: ct = Apr 29 23:23:11 EDT 2009  [ 1241061791 ]
/1: bsz=8192  blks=1 fs=devfs
/1: modctl(MODSIZEOF_MINORNAME, 0x01800340, 0x6000, 0x080430BC, 
0xFE8E92C0) = 0
/1: modctl(MODGETMINORNAME, 0x01800340, 0x6000, 0x0002, 0x0808FFC8) 
= 0
/1: close(6)= 0
/1: ioctl(3, ZFS_IOC_POOL_STATS, 0x08042220)= 0

and then the machine dies consistently with:

panic[cpu1]/thread=ff01d045a3a0:
BAD TRAP: type=e (#pf Page fault) rp=ff000857f4f0 addr=260 occurred in module 
unix due to a NULL pointer dereference

zpool:
#pf Page fault
Bad kernel fault at addr=0x260
pid=576, pc=0xfb854e8b, sp=0xff000857f5e8, eflags=0x10246
cr0: 8005003bpg,wp,ne,et,ts,mp,pe cr4: 6f8xmme,fxsr,pge,mce,pae,pse,de
cr2: 260
cr3: 12b69
cr8: c

rdi:  260 rsi:4 rdx: ff01d045a3a0
rcx:0  r8:   40  r9:21ead
rax:0 rbx:0 rbp: ff000857f640
r10:  bf88840 r11: ff01d041e000 r12:0
r13:  260 r14:4 r15: ff01ce12ca28
fsb:0 gsb: ff01ce985ac0  ds:   4b
 es:   4b  fs:0  gs:  1c3
trp:e err:2 rip: fb854e8b
 cs:   30 rfl:10246 rsp: ff000857f5e8
 ss:   38

ff000857f3d0 unix:die+dd ()
ff000857f4e0 unix:trap+1752 ()
ff000857f4f0 unix:cmntrap+e9 ()
ff000857f640 unix:mutex_enter+b ()
ff000857f660 zfs:zio_buf_alloc+2c ()
ff000857f6a0 zfs:arc_get_data_buf+173 ()
ff000857f6f0 zfs:arc_buf_alloc+a2 ()
ff000857f770 zfs:dbuf_read_impl+1b0 ()
ff000857f7d0 zfs:dbuf_read+fe ()
ff000857f850 zfs:dnode_hold_impl+d9 ()
ff000857f880 zfs:dnode_hold+2b ()
ff000857f8f0 zfs:dmu_buf_hold+43 ()
ff000857f990 zfs:zap_lockdir+67 ()
ff000857fa20 zfs:zap_lookup_norm+55 ()
ff000857fa80 zfs:zap_lookup+2d ()
ff000857faf0 zfs:dsl_pool_open+91 ()
ff000857fbb0 zfs:spa_load+696 ()
ff000857fc00 zfs:spa_tryimport+95 ()
ff000857fc40 zfs:zfs_ioc_pool_tryimport+3e ()
ff000857fcc0 zfs:zfsdev_ioctl+10b ()
ff000857fd00 genunix:cdev_ioctl+45 ()
ff000857fd40 specfs:spec_ioctl+83 ()
ff000857fdc0 genunix:fop_ioctl+7b ()
ff000857fec0 genunix:ioctl+18e ()
ff000857ff10 unix:brand_sys_sysenter+1e6 ()

the offending disk, c5t2d0s0, is part of a mirror that if removed I can
see the results (from the other mirror half) and the machine does not crash.
all 8 labels look diff perfect

version=13
name='r'
state=0
txg=2110897
pool_guid=10861732602511278403
hostid=13384243
hostname='nas'
top_guid=6092190056527819247
guid=16682108003687674581
vdev_tree
type='mirror'
id=0
guid=6092190056527819247
whole_disk=0
metaslab_array=23
metaslab_shift=31
ashift=9
asize=320032473088
is_log=0
children[0]
type='disk'
id=0
guid=16682108003687674581
path='/dev/dsk/c5t2d0s0'

Re: [zfs-discuss] Motherboard for home zfs/solaris file server

2009-02-24 Thread Rob Logan


Not. Intel decided we don't need ECC memory on the Core i7 


I thought that was a Core i7 vs Xeon E55xx for socket
LGA-1366 so that's why this X58 MB claims ECC support:
http://supermicro.com/products/motherboard/Xeon3000/X58/X8SAX.cfm


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Is Disabling ARC on SolarisU4 possible?

2009-02-12 Thread Rob Brown
Thanks Nathan,

I want to test the underlying performance, of course the problem is I want
to test the 16 or so disks in the stripe, rather than individual devices.

Thanks

Rob



On 28/01/2009 22:23, Nathan Kroenert nathan.kroen...@sun.com wrote:

 Also - My experience with a very small ARC is that your performance will
 stink. ZFS is an advanced filesystem that IMO makes some assumptions
 about capability and capacity of current hardware. If you don't give
 what it's expecting, your results may be equally unexpected.
 
 If you are keen to test the *actual* disk performance, you should just
 use the underlying disk device like /dev/rdsk/c0t0d0s0
 
 Beware, however, that any writes to these devices will indeed result in
 the loss of the data on those devices, zpools or other.
 
 Cheers.
 
 Nathan.
 
 Richard Elling wrote:
  Rob Brown wrote:
  Afternoon,
 
  In order to test my storage I want to stop the cacheing effect of the
  ARC on a ZFS filesystem. I can do similar on UFS by mounting it with
  the directio flag.
 
  No, not really the same concept, which is why Roch wrote
  http://blogs.sun.com/roch/entry/zfs_and_directio
 
  I saw the following two options on a nevada box which presumably
  control it:
 
  primarycache
  secondarycache
 
  Yes, to some degree this offers some capability. But I don't believe
  they are in any release of Solaris 10.
  -- richard
 
  But I¹m running Solaris 10U4 which doesn¹t have them -can I disable it?
 
  Many thanks
 
  Rob
 
 
 
 
  *|* *Robert Brown - **ioko *Professional Services *|
  | **Mobile:* +44 (0)7769 711 885 *|
  *
  
 
  ___
  zfs-discuss mailing list
  zfs-discuss@opensolaris.org
  http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
   
 
  ___
  zfs-discuss mailing list
  zfs-discuss@opensolaris.org
  http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
 
 --
 //
 // Nathan Kroenert  nathan.kroen...@sun.com //
 // Systems Engineer Phone:  +61 3 9869-6255 //
 // Sun Microsystems Fax:+61 3 9869-6288 //
 // Level 7, 476 St. Kilda Road  Mobile: 0419 305 456//
 // Melbourne 3004   VictoriaAustralia   //
 //
 




| Robert Brown - ioko Professional Services |
| Mobile:  +44 (0)7769 711 885 |


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Is Disabling ARC on SolarisU4 possible?

2009-01-28 Thread Rob Brown
Afternoon,

In order to test my storage I want to stop the cacheing effect of the ARC on
a ZFS filesystem.  I can do similar on UFS by mounting it with the directio
flag.  I saw the following two options on a nevada box which presumably
control it:

primarycache
secondarycache

But I¹m running Solaris 10U4 which doesn¹t have them -can I disable it?

Many thanks

Rob




| Robert Brown - ioko Professional Services |
| Mobile:  +44 (0)7769 711 885 |


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Practical Application of ZFS

2009-01-06 Thread Rob
ZFS is the bomb. It's a great file system. What are it's real world 
applications besides solaris userspace? What I'd really like is to utilize the 
benefits of ZFS across all the platforms we use. For instance, we use Microsoft 
Windows Servers as our primary platform here. How might I utilize ZFS to 
protect that data? 

The only way I can visualize doing so would be to virtualize the windows server 
and store it's image in a ZFS pool. That would add additional overhead but 
protect the data at the disk level. It would also allow snapshots of the 
Windows Machine's virtual file. However none of these benefits would protect 
Windows from hurting it's own data, if you catch my meaning.

Obviously ZFS is ideal for large databases served out via application level or 
web servers. But what other practical ways are there to integrate the use of 
ZFS into existing setups to experience it's benefits.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Practical Application of ZFS

2009-01-06 Thread Rob
I am not experienced with iSCSI. I understand it's block level disk access via 
TCP/IP. However I don't see how using it eliminates the need for virtualization.

Are you saying that a Windows Server can access a ZFS drive via iSCSI and store 
NTFS files?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Practical Application of ZFS

2009-01-06 Thread Rob
Wow. I will read further into this. That seems like it could have great 
applications. I assume the same is true of FCoE?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SMART data

2008-12-08 Thread Rob Logan

the sata framework uses the sd driver so its:

4 % smartctl -d scsi -a /dev/rdsk/c4t2d0s0
smartctl version 5.36 [i386-pc-solaris2.8] Copyright (C) 2002-6 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

Device: ATA  WDC WD1001FALS-0 Version: 0K05
Serial number:
Device type: disk
Local Time is: Mon Dec  8 15:14:22 2008 EST
Device supports SMART and is Enabled
Temperature Warning Disabled or Not Supported
SMART Health Status: OK

Current Drive Temperature: 45 C

Error Counter logging not supported
No self-tests have been logged

5 % /opt/SUNWhd/hd/bin/hd -e c4t2
Revision: 16
Offline status 132
Selftest status 0
Seconds to collect 19200
Time in minutes to run short selftest 2
Time in minutes to run extended selftest 221
Offline capability 123
SMART capability 3
Error logging capability 1
Checksum 0x86
Identification Status Current Worst Raw data
   1 Raw read error rate0x2f   200   2000
   3 Spin up time   0x27   253   253 6216
   4 Start/Stop count   0x32   100   100   11
   5 Reallocated sector count   0x33   200   2000
   7 Seek error rate0x2e   100   2530
   9 Power on hours count   0x32   100   100  446
  10 Spin retry count   0x32   100   2530
  11 Recalibration Retries count0x32   100   2530
  12 Device power cycle count   0x32   100   100   11
192 Power off retract count0x32   200   200   10
193 Load cycle count   0x32   200   200   11
194 Temperature0x22   105   103  45/  0/  0 (degrees C 
cur/min/max)
196 Reallocation event count   0x32   200   2000
197 Current pending sector count   0x32   200   2000
198 Scan uncorrected sector count  0x30   200   2000
199 Ultra DMA CRC error count  0x32   200   2000
200 Write/Multi-Zone Error Rate0x8200   2000


http://www.opensolaris.org/jive/thread.jspa?threadID=84296
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs iscsi sustained write performance

2008-12-08 Thread Rob

  (with iostat -xtc 1)

it sure would be nice to know if actv  0 so
we would know if the lun was busy because
its queue is full or just slow (svc_t  200)

for tracking errors try `iostat -xcen 1`
and `iostat -E`


Rob
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Is SUNWhd for Thumper only?

2008-12-01 Thread Rob
  (http://cuddletech.com/blog/pivot/entry.php?id=993). Will the SUNWhd

can't dump all SMART data, but get some temps on a generic box..

4 % hd -a
 fdisk
DeviceSerialVendor   Model Rev  Temperature Type
------   -  --- 
c3t0d0p0ATA  ST3750640AS   K255 C (491 F) EFI
c3t1d0p0ATA  ST3750640AS   K255 C (491 F) EFI
c3t2d0p0ATA  ST3750640AS   K255 C (491 F) EFI
c3t4d0p0ATA  ST3750640AS   K255 C (491 F) EFI
c3t5d0p0ATA  ST3750640AS   K255 C (491 F) EFI
c4t0d0p0ATA  WDC WD1001FALS-0  0K05 43 C (109 F) EFI
c4t1d0p0ATA  WDC WD1001FALS-0  0K05 43 C (109 F) EFI
c4t2d0p0ATA  WDC WD1001FALS-0  0K05 43 C (109 F) EFI
c4t4d0p0ATA  WDC WD1001FALS-0  0K05 42 C (107 F) EFI
c4t5d0p0ATA  WDC WD1001FALS-0  0K05 43 C (109 F) EFI
c5t0d0p0    TSSTcorp CD/DVDW SH-S162A  TS02 None  None
c5t1d0p0ATA  WDC WD3200JD-00K  5J08 0 C (32 F) Solaris2
c5t2d0p0ATA  WDC WD3200JD-00K  5J08 0 C (32 F) Solaris2
c5t3d0p0ATA  WDC WD3200JD-00K  5J08 0 C (32 F) Solaris2
c5t4d0p0ATA  WDC WD3200JD-00K  5J08 0 C (32 F) Solaris2
c5t5d0p0ATA  WDC WD3200JD-00K  5J08 0 C (32 F) Solaris2

Do you know of a solaris tool to get SMART data?

Rob

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ? SX:CE snv_91 - ZFS - raid and mirror - drive sizes don't add correctl

2008-11-29 Thread Rob Clark
Bump.

Some of the threads on this were last posted to over a year ago. I checked
6485689 and it is not fixed yet, is there any work being done in this area?

Thanks,
Rob

 There may be some work being done to fix this:
 
 zpool should support raidz of mirrors
 http://bugs.opensolaris.org/bugdatabase/view_bug.do?bu
 g_id=6485689
 
 Discussed in this thread:
 Mirrored Raidz ( Posted: Oct 19, 2006 9:02 PM )
 http://opensolaris.org/jive/thread.jspa?threadID=15854
 tstart=0
 
 
 The suggested solution (by jone
 http://opensolaris.org/jive/thread.jspa?messageID=6627
 9 ) is:
 
 # zpool create a1pool raidz c0t0d0 c0t1d0 c0t2d0 ..
 # zpool create a2pool raidz c1t0d0 c1t1d0 c1t2d0 ..
 # zfs create -V a1pool/vol
 # zfs create -V a2pool/vol
 # zpool create mzdata mirror /dev/zvol/dsk/a1pool/vol
 /dev/zvol/dsk/a2pool/vol
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Still more questions WRT selecting a mobo for small ZFS RAID

2008-11-14 Thread Rob
  WD Caviar Black drive [...] Intel E7200 2.53GHz 3MB L2
  The P45 based boards are a no-brainer

16G of DDR2-1066 with P45 or
  8G of ECC DDR2-800 with 3210 based boards

That is the question.

Rob
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Inexpensive ZFS home server

2008-11-12 Thread Rob Logan

  I don't think the Pentium E2180 has the lanes to use ECC RAM.

look at the north bridge, not the cpu.. the PowerEdge SC440
uses intel 3000 MCH which supports up to 8GB unbuffered ECC
or non-ECC DDR2 667/533 SDRAM. its been replaced with
the intel 32x0 that uses DDR2 800/667MHz unbuffered ECC /
non-ECC SDRAM.

http://www.intel.com/products/server/chipsets/3200-3210/3200-3210-overview.htm

Rob
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS + OpenSolaris for home NAS?

2008-10-29 Thread Rob Logan

  ECC?

$60 unbuffered 4GB 800MHz DDR2 ECC CL5 DIMM (Kit Of 2)
http://www.provantage.com/kingston-technology-kvr800d2e5k2-4g~7KIN90H4.htm

for Intel 32x0 north bridge like
http://www.provantage.com/supermicro-x7sbe~7SUPM11K.htm
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs-auto-snapshot 0.11 work (was Re: zfs-auto-snapshot with at scheduling )

2008-08-06 Thread Rob
 The other changes that will appear in 0.11 (which is
 nearly done) are:

Still looking forward to seeing .11 :)
Think we can expect a release soon? (or at least svn access so that others can 
check out the trunk?)
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] force a reset/reinheit zfs acls?

2008-08-05 Thread Rob
Hello All!

Is there a command to force a re-inheritance/reset of ACLs? e.g., if i have a 
directory full of folders that have been created with inherited ACLs, and i 
want to change the ACLs on the parent folder, how can i force a reapply of all 
ACLs?
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] force a reset/reinheit zfs acls?

2008-08-05 Thread Rob
 Rob wrote:
  Hello All!
  
  Is there a command to force a re-inheritance/reset
 of ACLs? e.g., if i have a directory full of folders
 that have been created with inherited ACLs, and i
 want to change the ACLs on the parent folder, how can
 i force a reapply of all ACLs?
   
   
 
 
 There isn't an easy way to do exactly what you want.

That's unfortunate :(
How do I go about requesting a feature like this?
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ? SX:CE snv_91 - ZFS - raid and mirror - drive sizes don't add correctl

2008-07-29 Thread Rob Clark
There may be some work being done to fix this:

zpool should support raidz of mirrors
http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6485689

Discussed in this thread:
Mirrored Raidz ( Posted: Oct 19, 2006 9:02 PM )
http://opensolaris.org/jive/thread.jspa?threadID=15854tstart=0
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS deduplication

2008-07-22 Thread Rob Clark
 Hi All 
Is there any hope for deduplication on ZFS ? 
Mertol Ozyoney
Storage Practice - Sales Manager
Sun Microsystems
 Email [EMAIL PROTECTED]

There is always hope.

Seriously thought, looking at 
http://en.wikipedia.org/wiki/Comparison_of_revision_control_software there are 
a lot of choices of how we could implement this.

SVN/K , Mercurial and Sun Teamware all come to mind. Simply ;) merge one of 
those with ZFS. 

It _could_ be as simple (with SVN as an example) of using directory listings to 
produce files which were then 'diffed'. You could then view the diffs as though 
they were changes made to lines of source code. 

Just add a tree subroutine to allow you to grab all the diffs that referenced 
changes to file 'xyz' and you would have easy access to all the changes of a 
particular file (or directory).

With the speed optimized ability added to use ZFS snapshots with the tree 
subroutine to rollback a single file (or directory) you could undo / redo your 
way through the filesystem.

Using a LKCD (http://www.faqs.org/docs/Linux-HOWTO/Linux-Crash-HOWTO.html) you 
could sit out on the play and watch from the sidelines -- returning to the OS 
when you thought you were 'safe' (and if not, jumping backout).

Thus, Mertol, it is possible (and could work very well).

Rob
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ? SX:CE snv_91 - ZFS - raid and mirror - drive

2008-07-22 Thread Rob Clark
 Though possible, I don't think we would classify it as a best practice.
  -- richard

Looking at http://opensolaris.org/os/community/volume_manager/ I see:
Supports RAID-0, RAID-1, RAID-5, Root mirroring and Seamless upgrades and 
live upgrades (that would go nicely with my ZFS root mirror - right).

I also don't see that there is a nice GUI for those that desire one ...

Looking at http://evms.sourceforge.net/gui_screen/ I see some great screenshots 
and page http://evms.sourceforge.net/ says it supports: Ext2/3, JFS, ReiserFS, 
XFS, Swap, OCFS2, NTFS, FAT -- so it might be better to suggest adding ZFS 
there instead of focusing on non-ZFS solutions in this ZFS discussion group.

Rob
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS deduplication

2008-07-22 Thread Rob Clark
 On Tue, 22 Jul 2008, Miles Nordin wrote:
  scrubs making pools uselessly slow?  Or should it be scrub-like so
  that already-written filesystems can be thrown into the dedup bag and
  slowly squeezed, or so that dedup can run slowly during the business
  day over data written quickly at night (fast outside-business-hours
  backup)?
 
 I think that the scrub-like model makes the most sense since ZFS write 
 performance should not be penalized.  It is useful to implement 
 score-boarding so that a block is not considered for de-duplication 
 until it has been duplicated a certain number of times.  In order to 
 decrease resource consumption, it is useful to perform de-duplication 
 over a span of multiple days or multiple weeks doing just part of the 
 job each time around. Deduping a petabyte of data seems quite 
 challenging yet ZFS needs to be scalable to these levels.
 Bob Friesenhahn

In case anyone (other than Bob) missed it, this is why I suggested File-Level 
Dedup:

... using directory listings to produce files which were then 'diffed'. You 
could then view the diffs as though they were changes made ...


We could have:
Block-Level (if we wanted to restore an exact copy of the drive - duplicate  
the 'dd' command) or 
Byte-Level (if we wanted to use compression - duplicate the 'zfs set 
compression=on rpool' _or_ 'bzip' commands) ...
etc... 
assuming we wanted to duplicate commands which already implement those 
features, and provide more than we (the filesystem) needs at a very high cost 
(performance).

So I agree with your comment about the need to be mindful of resource 
consumption, the ability to do this over a period of days is also useful.

Indeed the Plan9 filesystem simply snapshots to WORM and has no delete - nor 
are they able to fill their drives faster than they can afford to buy new ones:

Venti Filesystem
http://www.cs.bell-labs.com/who/seanq/p9trace.html

Rob
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ? SX:CE snv_91 - ZFS - raid and mirror - drive

2008-07-21 Thread Rob Clark
 Solaris will allow you to do this, but you'll need to use SVM instead of ZFS. 
  
 Or, I suppose, you could use SVM for RAID-5 and ZFS to mirror those.
  -- richard
Or run Linux ...


Richard, The ZFS Best Practices Guide says not.

Do not use the same disk or slice in both an SVM and ZFS configuration.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Adding my own compression to zfs

2008-07-20 Thread Rob Clark
 Robert Milkowski wrote:
 During christmass I managed to add my own compression to zfs - it as quite 
 easy. 

Great to see innovation but unless your personal compression method is somehow 
better (very fast with excellent 
compression) then would it not be a better idea to use an existing (leading 
edge) compression method ?

7-Zip's (http://www.7-zip.org/) 'newest' methods are LZMA and PPMD 
(http://www.7-zip.org/7z.html). 

There is a proprietary license for LZMA that _might_ interest Sun but PPMD is 
no explicit license see this link:

Using PPMD for compression
http://www.codeproject.com/KB/recipes/ppmd.aspx

Rob
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How to delete hundreds of emtpy snapshots

2008-07-20 Thread Rob Clark
 I got overzealous with snapshot creation. Every 5 mins is a bad idea. Way too 
 many.
 What's the easiest way to delete the empty ones?
 zfs list takes FOREVER

You might enjoy reading:

ZFS snapshot massacre
http://blogs.sun.com/chrisg/entry/zfs_snapshot_massacre.

(Yes, the . is part of the URL (NMF) - so add it or you'll 404).

Rob
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ? SX:CE snv_91 - ZFS - raid and mirror - drive

2008-07-20 Thread Rob Clark
 -Peter Tribble wrote:

 On Sun, Jul 6, 2008 at 8:48 AM, Rob Clark wrote:
 I have eight 10GB drives.
 ...
 I have 6 remaining 10 GB drives and I desire to
 raid 3 of them and mirror them to the other 3 to
 give me raid security and integrity with mirrored
 drive performance. I then want to move my /export
 directory to the new drive.
 ...

 You can't do that. You can't layer raidz and mirroring.
 You'll either have to use raidz for the lot, or just use mirroring:
 zpool create temparray mirror c1t2d0 c1t4d0 mirror c1t5d0 c1t3d0 mirror 
 c1t6d0 c1t8d0
 -Peter Tribble


Solaris may not allow me to do that but the concept is not unheard of:


Quoting: 
Proceedings of the Third USENIX Conference on File and Storage Technologies
http://www.usenix.org/publications/library/proceedings/fast04/tech/corbett/corbett.pdf

Mirrored RAID-4 and RAID-5 protect against higher order failures [4]. However, 
the efficiency of the array as measured by its data capacity divided by its 
total disk space is reduced.

[4] Qin Xin, E. Miller, T. Schwarz, D. Long, S. Brandt, W. Litwin, ”Reliability 
mechanisms for very large storage systems”, 20th IEEE/11th NASA Boddard 
Conference on Mass Storage Systems and Technologies, San Diego, CA, pgs. 
146-156, Apr. 2003.

Rob
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Raid-Z with N^2+1 disks

2008-07-19 Thread Rob Clark
 On July 14, 2008 7:49:58 PM -0500 Bob Friesenhahn 
 [EMAIL PROTECTED] wrote:
  With ZFS and modern CPUs, the parity calculation is
 surely in the noise to the point of being unmeasurable.
 
 I would agree with that.  The parity calculation has *never* been a 
 factor in and of itself.  The problem is having to read the rest of
 the stripe and then having to wait for a disk revolution before writing.
 -frank

And this is where a HW RAID controller comes in. We hope it has a uP for
the calculations, full knowledge of the head positions, and a list of free 
blocks -- then it simply chooses one of the drives that suit the criteria 
for the RAID level used and writes immediately to the free block under 
one of the heads. If only ...

Maybe in a few years Sun will make a HW RAID controller using ZFS once 
we all get the bugs out. With Flash updates this should work wonderfully.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ? SX:CE snv_91 - ZFS - raid and mirror - drive sizes don't add correc

2008-07-06 Thread Rob Clark
 Peter Tribble wrote:
 Because what you've created is a pool containing two
 components:
 - a 3-drive raidz
 - a 3-drive mirror
 concatenated together.
 

OK. Seems odd that ZFS would allow that (would people want that configuration
instead of what I am  attempting to do).


 I think that what you're trying to do based on your description is to create
 one raidz and mirror that to another raidz. (Or create a raidz out of mirrored
 drives.) You can't do that. You can't layer raidz and mirroring.
 You'll either have to use raidz for the lot, or just use mirroring:
 zpool create temparray mirror c1t2d0 c1t4d0 mirror c1t5d0 c1t3d0 mirror 
 c1t6d0 c1t8d0

Bummer.


Curiously I can get that same odd size with either of these two commands (the 
second attempt sort of looks like it is raid + mirroring):


# zpool create temparray1 mirror c1t2d0 c1t4d0 mirror c1t3d0 c1t5d0 mirror 
c1t6d0 c1t8d0

# zpool status
  pool: rpool
 state: ONLINE
 scrub: none requested
config:

NAME  STATE READ WRITE CKSUM
rpool ONLINE   0 0 0
  mirror  ONLINE   0 0 0
c1t0d0s0  ONLINE   0 0 0
c1t1d0s0  ONLINE   0 0 0

errors: No known data errors

  pool: temparray1
 state: ONLINE
 scrub: none requested
config:

NAMESTATE READ WRITE CKSUM
temparray1  ONLINE   0 0 0
  mirrorONLINE   0 0 0
c1t2d0  ONLINE   0 0 0
c1t4d0  ONLINE   0 0 0
  mirrorONLINE   0 0 0
c1t3d0  ONLINE   0 0 0
c1t5d0  ONLINE   0 0 0
  mirrorONLINE   0 0 0
c1t6d0  ONLINE   0 0 0
c1t8d0  ONLINE   0 0 0

errors: No known data errors

# zfs list
NAMEUSED  AVAIL  REFER  MOUNTPOINT
rpool  4.36G  5.42G35K  /rpool
rpool/ROOT 3.09G  5.42G18K  legacy
rpool/ROOT/snv_91  3.09G  5.42G  3.01G  /
rpool/ROOT/snv_91/var  84.5M  5.42G  84.5M  /var
rpool/dump  640M  5.42G   640M  -
rpool/export   14.0M  5.42G19K  /export
rpool/export/home  14.0M  5.42G  14.0M  /export/home
rpool/swap  640M  6.05G16K  -
temparray1 92.5K  29.3G 1K  /temparray1
# zpool destroy temparray1


And the pretty one:


# zpool create temparray raidz c1t2d0 c1t4d0 raidz c1t3d0 c1t5d0 raidz c1t6d0 
c1t8d0

# zpool status
  pool: rpool
 state: ONLINE
 scrub: none requested
config:

NAME  STATE READ WRITE CKSUM
rpool ONLINE   0 0 0
  mirror  ONLINE   0 0 0
c1t0d0s0  ONLINE   0 0 0
c1t1d0s0  ONLINE   0 0 0

errors: No known data errors

  pool: temparray
 state: ONLINE
 scrub: none requested
config:

NAMESTATE READ WRITE CKSUM
temparray   ONLINE   0 0 0
  raidz1ONLINE   0 0 0
c1t2d0  ONLINE   0 0 0
c1t4d0  ONLINE   0 0 0
  raidz1ONLINE   0 0 0
c1t3d0  ONLINE   0 0 0
c1t5d0  ONLINE   0 0 0
  raidz1ONLINE   0 0 0
c1t6d0  ONLINE   0 0 0
c1t8d0  ONLINE   0 0 0

errors: No known data errors

# zfs list
NAMEUSED  AVAIL  REFER  MOUNTPOINT
rpool  4.36G  5.42G35K  /rpool
rpool/ROOT 3.09G  5.42G18K  legacy
rpool/ROOT/snv_91  3.09G  5.42G  3.01G  /
rpool/ROOT/snv_91/var  84.6M  5.42G  84.6M  /var
rpool/dump  640M  5.42G   640M  -
rpool/export   14.0M  5.42G19K  /export
rpool/export/home  14.0M  5.42G  14.0M  /export/home
rpool/swap  640M  6.05G16K  -
temparray94K  29.3G 1K  /temparray
# zpool destroy temparray


That second attempt leads this newcommer to imagine that they have 3 raid 
drives mirrored to 3 raid drives.


Is there a way to get mirror performance (double speed) with raid integrity 
(one drive can fail and you are OK)? I can't imagine that there exists no one 
who would want that configuration.


Thanks for your comment Peter.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs equivalent of ufsdump and ufsrestore

2008-05-30 Thread Rob Logan
  I'd like to take a backup of a live filesystem without modifying
  the last  accessed time.

why not take a snapshot?

Rob
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


  1   2   >