Re: [zfs-discuss] Can the ZFS "copies" attribute substitute HW disk redundancy?

2012-07-30 Thread John Martin

On 07/29/12 14:52, Bob Friesenhahn wrote:


My opinion is that complete hard drive failure and block-level media
failure are two totally different things.


That would depend on the recovery behavior of the drive for
block-level media failure.  A drive whose firmware does excessive
(reports of up to 2 minutes) retries of a bad sector may be
indistinguishable from a failed drive.  See previous discussions
of the firmware differences between desktop and enterprise drives.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Very poor small-block random write performance

2012-07-19 Thread John Martin

On 07/19/12 19:27, Jim Klimov wrote:


However, if the test file was written in 128K blocks and then
is rewritten with 64K blocks, then Bob's answer is probably
valid - the block would have to be re-read once for the first
rewrite of its half; it might be taken from cache for the
second half's rewrite (if that comes soon enough), and may be
spooled to disk as a couple of 64K blocks or one 128K block
(if both changes come soon after each other - within one TXG).


What are the values for zfs_txg_synctime_ms and zfs_txg_timeout
on this system (FreeBSD, IIRC)?

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] New fast hash algorithm - is it needed?

2012-07-10 Thread John Martin

On 07/10/12 19:56, Sašo Kiselkov wrote:

Hi guys,

I'm contemplating implementing a new fast hash algorithm in Illumos' ZFS
implementation to supplant the currently utilized sha256. On modern
64-bit CPUs SHA-256 is actually much slower than SHA-512 and indeed much
slower than many of the SHA-3 candidates, so I went out and did some
testing (details attached) on a possible new hash algorithm that might
improve on this situation.


Is the intent to store the 512 bit hash or truncate to 256 bit?

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Interaction between ZFS intent log and mmap'd files

2012-07-04 Thread John Martin

On 07/04/12 16:47, Nico Williams wrote:


I don't see that the munmap definition assures that anything is written to
"disk".  The system is free to buffer the data in RAM as long as it likes
without writing anything at all.


Oddly enough the manpages at the Open Group don't make this clear.  So
I think it may well be advisable to use msync(3C) before munmap() on
MAP_SHARED mappings.  However, I think all implementors should, and
probably all do (Linux even documents that it does) have an implied
msync(2) when doing a munmap(2).  I really makes no sense at all to
have munmap(2) not imply msync(3C).


This assumes msync() has the behavior you expect.  See:

  http://pubs.opengroup.org/onlinepubs/009695399/functions/msync.html

In particular, the paragraph starting with "For mappings to files, ...".
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Migrating 512 byte block zfs root pool to 4k disks

2012-06-16 Thread John Martin

On 06/16/12 12:23, Richard Elling wrote:

On Jun 15, 2012, at 7:37 AM, Hung-Sheng Tsao Ph.D. wrote:


by the way
when you format start with cylinder 1 donot use 0


There is no requirement for skipping cylinder 0 for root on Solaris, and there
never has been.


Maybe not for core Solaris, but it is still wise advice
if you plan to use Oracle ASM.  See section 3.3.1.4, 2c:


http://docs.oracle.com/cd/E11882_01/install.112/e24616/storage.htm#CACHGBAH

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Migrating 512 byte block zfs root pool to 4k disks

2012-06-15 Thread John Martin

On 06/15/12 15:52, Cindy Swearingen wrote:


Its important to identify your OS release to determine if
booting from a 4k disk is supported.


In addition, whether the drive is really 4096p or 512e/4096p.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Advanced Format HDD's - are we there yet? (or - how to buy a drive that won't be teh sux0rs on zfs)

2012-05-29 Thread John Martin

On 05/29/12 07:26, bofh wrote:


ashift:9  is that standard?


Depends on what the drive reports as physical sector size.


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Advanced Format HDD's - are we there yet? (or - how to buy a drive that won't be teh sux0rs on zfs)

2012-05-29 Thread John Martin

On 05/29/12 08:35, Nathan Kroenert wrote:

Hi John,

Actually, last time I tried the whole AF (4k) thing, it's performance
was worse than woeful.

But admittedly, that was a little while ago.

The drives were the seagate green barracuda IIRC, and performance for
just about everything was 20MB/s per spindle or worse, when it should
have been closer to 100MB/s when streaming. Things were worse still when
doing random...

I'm actually looking to put in something larger than the 3*2TB drives
(triple mirror for read perf) this pool has in it - preferably 3 * 4TB
drives. (I don't want to put in more spindles - just replace the current
ones...)

I might just have to bite the bullet and try something with current SW. :).



Raw read from one of the mirrors:

#  timex dd if=/dev/rdsk/c0t2d0s2 of=/dev/null bs=1024000 count=1
1+0 records in
1+0 records out

real  49.26
user   0.01
sys0.27


filebench filemicro_seqread reports an impossibly high number (4GB/s)
so the ARC is likely handling all reads.

The label on the boxes I bought say:

  1TB 32MB INTERNAL KIT 7200
  ST310005N1A1AS-RK
  S/N: ...
  PN:9BX1A8-573

The drives in the box were really
ST1000DM003-9YN162 with 64MB of cache.
I have multiple pools on each disk so the
cache should be disabled.  The drive reports
512 byte logical sectors and 4096 physical sectors.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Advanced Format HDD's - are we there yet? (or - how to buy a drive that won't be teh sux0rs on zfs)

2012-05-29 Thread John Martin

On 05/28/12 08:48, Nathan Kroenert wrote:


Looking to get some larger drives for one of my boxes. It runs
exclusively ZFS and has been using Seagate 2TB units up until now (which
are 512 byte sector).

Anyone offer up suggestions of either 3 or preferably 4TB drives that
actually work well with ZFS out of the box? (And not perform like
rubbish)...

I'm using Oracle Solaris 11 , and would prefer not to have to use a
hacked up zpool to create something with ashift=12.


Are you replacing a failed drive or creating a new pool?

I had a drive in a mirrored pool recently fail.  Both
drives were 1TB Seagate ST310005N1A1AS-RK with 512 byte sectors.
All the 1TB Seagate boxed drives I could find with the same
part number on the box (with factory seals in place)
were really ST1000DM003-9YN1 with 512e/4196p.  Just being
cautious, I ended up migrating the pools over to a pair
of the new drives.  The pools were created with ashift=12
automatically:

  $ zdb -C | grep ashift
  ashift: 12
  ashift: 12
  ashift: 12

Resilvering the three pools concurrently went fairly quickly:

  $ zpool status
scan: resilvered 223G in 2h14m with 0 errors on Tue May 22 21:02:32 
2012
scan: resilvered 145G in 4h13m with 0 errors on Tue May 22 23:02:38 
2012
scan: resilvered 153G in 3h44m with 0 errors on Tue May 22 22:30:51 
2012


What performance problem do you expect?
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] What is your data error rate?

2012-01-25 Thread John Martin

On 01/25/12 09:08, Edward Ned Harvey wrote:


Assuming the failure rate of drives is not linear, but skewed toward higher 
failure rate after some period of time (say, 3 yrs)  ...


See section 3.1 of the Google study:

  http://research.google.com/archive/disk_failures.pdf

although section 4.2 of the Carnegie Mellon study
is much more supportive of the assumption.

  http://www.usenix.org/events/fast07/tech/schroeder/schroeder.pdf
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] What is your data error rate?

2012-01-24 Thread John Martin

On 01/24/12 17:06, Gregg Wonderly wrote:

What I've noticed, is that when I have my drives in a situation of small
airflow, and hence hotter operating temperatures, my disks will drop
quite quickly.


While I *believe* the same thing and thus have over provisioned
airflow in my cases (for both drives and memory), there
are studies which failed to find a strong correlation between
drive temperature and failure rates:

  http://research.google.com/archive/disk_failures.pdf

  http://www.usenix.org/events/fast07/tech/schroeder.html

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Data loss by memory corruption?

2012-01-16 Thread John Martin

On 01/16/12 11:08, David Magda wrote:


The conclusions are hardly unreasonable:


While the reliability mechanisms in ZFS are able to provide reasonable
robustness against disk corruptions, memory corruptions still remain a
serious problem to data integrity.


I've heard the same thing said ("use ECC!") on this list many times over
the years.


I believe the whole paragraph quoted from the USENIX paper above is
important:

  While the reliability mechanisms in ZFS are able to
  provide reasonable robustness against disk corruptions,
  memory corruptions still remain a serious problem to
  data integrity. Our results for memory corruptions in-
  dicate cases where bad data is returned to the user, oper-
  ations silently fail, and the whole system crashes. Our
  probability analysis shows that one single bit flip has
  small but non-negligible chances to cause failures such
  as reading/writing corrupt data and system crashing.

The authors provide probability calculations in section 6.3
for single bit flips.  ECC provides detection and correction
of single bit flips.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs read-ahead and L2ARC

2012-01-09 Thread John Martin

On 01/08/12 10:15, John Martin wrote:


I believe Joerg Moellenkamp published a discussion
several years ago on how L1ARC attempt to deal with the pollution
of the cache by large streaming reads, but I don't have
a bookmark handy (nor the knowledge of whether the
behavior is still accurate).


http://www.c0t0d0s0.org/archives/5329-Some-insight-into-the-read-cache-of-ZFS-or-The-ARC.html
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs read-ahead and L2ARC

2012-01-09 Thread John Martin

On 01/08/12 20:10, Jim Klimov wrote:


Is it true or false that: ZFS might skip the cache and
go to disks for "streaming" reads?


I don't believe this was ever suggested.  Instead, if
data is not already in the file system cache and a
large read is made from disk should the file system
put this data into the cache?

BTW, I chose the term streaming to be a subset
of sequential where the access pattern is sequential but
at what appears to be artificial time intervals.
The suggested pre-read of the entire file would
be a simple sequential read done as quickly
as the hardware allows.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs read-ahead and L2ARC

2012-01-08 Thread John Martin

On 01/08/12 11:30, Jim Klimov wrote:


However for smaller servers, such as home NASes which have
about one user overall, pre-reading and caching files even
for a single use might be an objective per se - just to let
the hard-disks spin down. Say, if I sit down to watch a
movie from my NAS, it is likely that for 90 or 120 minutes
there will be no other IO initiated by me. The movie file
can be pre-read in a few seconds, and then most of the
storage system can go to sleep.


Isn't this just a more extreme case of prediction?
In addition to the file system knowing there will only
be one client reading 90-120 minutes of (HD?) video
that will fit in the memory of a small(er) server,
now the hard drive power management code also knows there
won't be another access for 90-120 minutes so it is OK
to spin down the hard drive(s).
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs read-ahead and L2ARC

2012-01-08 Thread John Martin

On 01/08/12 09:30, Edward Ned Harvey wrote:


In the case of your MP3 collection...  Probably the only thing you can do is
to write a script which will simply go read all the files you predict will
be read soon.  The key here is the prediction - There's no way ZFS or
solaris, or any other OS in the present day is going to intelligently
predict which files you'll be requesting soon.



The other prediction is whether the blocks will be reused.
If the blocks of a streaming read are only used once, then
it may be wasteful for a file system to allow these blocks
to placed in the cache.  If a file system purposely
chooses to not cache streaming reads, manually scheduling a
"pre-read" of particular files may simply cause the file to be read
from disk twice: on the manual pre-read and when it is read again
by the actual application.

I believe Joerg Moellenkamp published a discussion
several years ago on how L1ARC attempt to deal with the pollution
of the cache by large streaming reads, but I don't have
a bookmark handy (nor the knowledge of whether the
behavior is still accurate).
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] bad seagate drive?

2011-09-12 Thread John Martin

On 09/12/11 10:33, Jens Elkner wrote:


Hmmm, at least if S11x, ZFS mirror, ICH10 and cmdk (IDE) driver is involved,
I'm 99.9% confident, that "a while" turns out to be some days or weeks, only
- no matter what Platinium-Enterprise-HDDs you use ;-)


On Solaris 11 Express with a dual drive mirror, ICH10 and the AHCI
driver (not sure why you would purposely choose to run in IDE mode)
resilvering a 1TB drive (Seagate ST310005N1A1AS-RK) went at a rate of
3.2GB/min.  Deduplication was not enabled.  Only hours for a 55%
full mirror, not days or weeks.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] BAD WD drives - defective by design?

2011-09-06 Thread John Martin

http://wdc.custhelp.com/app/answers/detail/a_id/1397/~/difference-between-desktop-edition-and-raid-%28enterprise%29-edition-drives
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] matching zpool versions to development builds

2011-08-08 Thread John Martin

Is there a list of zpool versions for development builds?

I found:

  http://blogs.oracle.com/stw/entry/zfs_zpool_and_file_system

where it says Solaris 11 Express is zpool version 31, but my
system has BEs back to build 139 and I have not done a zpool upgrade
since installing this system but it reports on the current
development build:

  # zpool upgrade -v
  This system is currently running ZFS pool version 33.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS ... open source moving forward?

2010-12-24 Thread Martin Matuska
Tim Cook  cook.ms> writes:

> You are not a court of law, and that statement has not been tested.  It is
your opinion and nothing more.  I'd appreciate if every time you repeated that
statement, you'd preface it with "in my opinion" so you don't have people
running around believing what they're doing is safe.  I'd hope they'd be smart
enough to consult with a lawyer, but it's probably better to just not spread
unsubstantiated rumor in the first place.  
> 
> --Tim

Hi guys, I am one of the ZFS porting folks at FreeBSD.

You might want to look at this site: http://www.sun.com/lawsuit/zfs/

There are three main threatening Netapp patents mentioned:
5,819,292 - "copy on write"
7,174,352 - "filesystem snapshot"
6,857,001 - "writable snapshots"

You can examine the documents at: http://www.sun.com/lawsuit/zfs/documents.jsp

5,819,292:
This one as a final action by the U.S. Patent Office from 16.06.2009. In this
action almost all claims subject to reexamination were rejected by the Office
(due to anticipation), only claims 1, 21 and 22 were confirmed as patentable.
These claims are not significant for copy-on-write. So you can consider the
copy-on-write patent by Netapp rejected. With this document in your hands they
cannot expect winning a lawsuit against you on copy-on-write anymore as there
is not much from the patent left over.

7,174,352:
This patent has a non-final action rejecting all the claims due to
anticipation. There may exist a final action that confirms this, but its not
among the documents. If there is a final action, you can use any filesystem
that does snapshots without risking a lawsuit from Netapp. The non-final
action document a very strong asset in your hands, anyway :-)

6,857,001:
No documents for this patent at the site.

So you can use copy-on-write - as to the documents all relevant parts of the
patent are rejected.
Snapshots - the non-final action document is a good asset, but I don't know if
there is a final action document. This patent can be considered as "almost"
rejected.
Clones - no idea

But remember, this goes for ANY filesystems, this isn't only about ZFS.
So every filesystem doing snapshots or clones (btrfs?) should actually 
have a permission from Netapp as they involve their patents ;-)


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Guide to COMSTAR iSCSI?

2010-12-13 Thread Martin Mundschenk
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi!

I have configured two LUs following this guide:

http://thegreyblog.blogspot.com/2010/02/setting-up-solaris-comstar-and.html

Now I want each LU to be available to only one distinct client in the network. 
I found no easy guide how to accomplish the anywhere in the internet. Any hint?

Martin


-BEGIN PGP SIGNATURE-
Version: GnuPG/MacGPG2 v2.0.16 (Darwin)

iQEcBAEBAgAGBQJNBIw2AAoJEA6eiwqkMgR8vAcH/0jeBh0PvZdnjLK4FOY6/Xw1
JwAqdNbS5jvUn8pvYRxdA379gqyZNoFXMRTpPl5Xefw88rpXS+vqvDHoaM1A5Wov
tTERXrh9DMACAswm4KYnA7lcWxEUJWBJ8LA870Sd6GVqPHbBnE+R+o2Op69XUy/g
+sAa0f7MDHPJP46xad5/qweUVRNZ0C+Ka2YYqhWKvYTN2DEYmFfnem+c6Vna2TXv
uOLoEeV+CHOI/BdrpcDaU8XQzAS5f1x/oTPhk56j0Uzm4q8+aKqc2YTccvGnRJCm
8F+/ZyZ40fy2TRLfhmZIGoL+y9nrJqUDm+K2jXkdH/55vzsk+EdhfZUlDYXsalo=
=NdL6
-END PGP SIGNATURE-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Guide to COMSTAR iSCSI?

2010-12-12 Thread Martin Mundschenk
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi!

I have configured two LUs following this guide:

http://thegreyblog.blogspot.com/2010/02/setting-up-solaris-comstar-and.html

Now I want each LU to be available to only one distinct client in the network. 
I found no easy guide how to accomplish the anywhere in the internet. Any hint?

Martin

-BEGIN PGP SIGNATURE-
Version: GnuPG/MacGPG2 v2.0.16 (Darwin)

iQEcBAEBAgAGBQJNBIzZAAoJEA6eiwqkMgR8NhYIALeIA7VTTSP3PkpN+GaIwQ/e
Y5lVRTJCCY5jcj++g7WLniF9NmbrYrm/dGObXGL8WbkdsJSW1G0vUwVoW+lEYU9G
wFbXRtny5uklb7N7coy25aPioSGdJGaIBFk+I7Taus1plc1hs0B0sJffBxNzF4lQ
YfsyQxwd6kY9y4dc8+E41YPgeRojle96UDuJIEnjG4X4nii6VhlfCUOU7vlxvJli
64wB8cE6+4AS582M7/a7q+7+zU/uokTzeS3JAPY+uQEmSMp3COz9YsJSNiqvIiIm
Op7XWeBzr7eDuK+0hrHRaXj/uxhIUfEY9Xci6hdYv2kldM0fD7Ds6fe84wAsHns=
=EB37
-END PGP SIGNATURE-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs-discuss Digest, Vol 59, Issue 13

2010-09-08 Thread Dr. Martin Mundschenk

Am 09.09.2010 um 07:00 schrieb zfs-discuss-requ...@opensolaris.org:

> What's the write workload like?  You could try disabling the ZIL to see
> if that makes a difference.  If it does, the addition of an SSD-based
> ZIL / slog device would most certainly help.
> 
> Maybe you could describe the makeup of your zpool as well?
> 
> Ray


The zpool is a mirrored root-pool (2 SATA 250GB devices). The box is a Dell PE 
T710. When I copy via NFS, zpool iostat reports 4MB/sec along the copy process. 
When I copy via scp I get a network performance of about 50 MB/sec and zpool 
iostat reports 105 MB/sec for a short interval about 5 seconds after scp 
completed. 

As far as I figured out, the problem is the nfs commit, that forces the 
filesystem to write data directly on disk instead of caching the data stream, 
like it is done in the scp example. 

NFS was there long before SSD-based drives where. I can not imagine, that NFS 
performance used to be not more than 1/3 of the speed of a 10BaseT connection 
ever before...

Martin
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] NFS performance issue

2010-09-08 Thread Dr. Martin Mundschenk
Hi!

I searched the web for hours, trying to solve the NFS/ZFS low performance issue 
on my just setup OSOL box (snv134). The problem is discussed in many threads 
but I've found no solution. 

On a nfs shared volume, I get write performance of 3,5M/sec (!!) read 
performance is about 50M/sec which is ok but on a GBit network, more should be 
possible, since the servers disk performance reaches up to 120 M/sec.

Does anyone have a solution how I can at least speed up the writes?

Martin
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Storage server hardwae

2010-08-25 Thread Dr. Martin Mundschenk

Am 26.08.2010 um 04:38 schrieb Edward Ned Harvey:

> There is no such thing as reliable external disks.  Not unless you want to
> pay $1000 each, which is dumb.  You have to scrap your mini, and use
> internal (or hotswappable) disks.
> 
> Never expect a mini to be reliable.  They're designed to be small and cute.
> Not reliable.


The MacMini and the disks themselves are just fine. The problem seems to be the 
SATA-bridges to USB/FW. They just stall, when the load gets heavy.

Martin
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS Storage server hardwae

2010-08-25 Thread Dr. Martin Mundschenk
Hi!

I'm running a OSOL box for quite a while and I think ZFS is an amazing 
filesystem. As a computer I use a Apple MacMini with USB and FireWire devices 
attached. Unfortunately the USB and sometimes the FW devices just die, causing 
the whole system to stall, forcing me to do a hard reboot.

I had the worst experience with an USB-SATA bridge running an Oxford chipset, 
in a way that the four external devices stalled randomly within a day or so. I 
switched to a four slot raid box, also with USB bridge, but with better 
reliability.

Well, I wonder what are the components to build a stable system without having 
an enterprise solution: eSATA, USB, FireWire, FibreChannel?

Martin
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Cant't detach spare device from pool

2010-08-21 Thread Martin Mundschenk
After about 62 hours and 90%, the resilvering process got stuck. Since 12 hours 
nothing happens anymore. Thus, I can not detach the spare device. Is there a 
way to get the resilvering process back running?

Martin



Am 18.08.2010 um 20:11 schrieb Mark Musante:

> You need to let the resilver complete before you can detach the spare.  This 
> is a known problem, CR 6909724.
> 
> http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6909724

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Cant't detach spare device from pool

2010-08-18 Thread Dr. Martin Mundschenk
Hi!

I had trouble with my raidz in the way, that some of the blockdevices where not 
found by the OSOL Box the other day, so the spare device was hooked on 
automatically.

After fixing the problem, the missing device came back online, but I am unable 
to detach the spare device, even though all devices are online and functional.

m...@iunis:~# zpool status tank
  pool: tank
 state: ONLINE
status: One or more devices is currently being resilvered.  The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
 scrub: resilver in progress for 1h5m, 1,76% done, 61h12m to go
config:

NAME   STATE READ WRITE CKSUM
tank   ONLINE   0 0 0
  raidz1-0 ONLINE   0 0 0
c9t0d1 ONLINE   0 0 0
c9t0d3 ONLINE   0 0 0  15K resilvered
c9t0d0 ONLINE   0 0 0
spare-3ONLINE   0 0 0
  c9t0d2   ONLINE   0 0 0  37,5K resilvered
  c16t0d0  ONLINE   0 0 0  14,1G resilvered
cache
  c18t0d0  ONLINE   0 0 0
spares
  c16t0d0  INUSE currently in use

errors: No known data errors

m...@iunis:~# zpool detach tank c16t0d0
cannot detach c16t0d0: no valid replicas

How can I solve the Problem?

Martin


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Drive failure causes system to be unusable

2010-02-09 Thread Dr. Martin Mundschenk
Am 08.02.2010 um 20:03 schrieb Richard Elling:

> Are you sure there is not another fault here?  What does "svcs -xv" show?

Well, I don't have the result of svcs -xv, since the fault is recovered by now, 
but it turned out not to be a hardware failure but an unstable USB-conectivity. 
But sill: Why does the system get stuck? Even when a USB-Plug is unhooked, why 
does the spare does not go online?

Martin
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Drive failure causes system to be unusable

2010-02-08 Thread Martin Mundschenk
Hi!

I have a OSOL box as a home file server. It has 4 1TB USB Drives and 1 TB 
FW-Drive attached. The USB devices are combined to a RaidZ-Pool and the FW 
Drive acts as a hot spare.

This night, one USB drive faulted and the following happened:

1. The zpool was not accessible anymore
2. changing to a directory on the pool causes the tty to get stuck
3. no reboot was possible
4. the system had to be rebooted ungracefully by pushing the power button

After reboot:

1. The zpool ran in a degraded state
2. the spare device did NOT automatically go online
3. the system did not boot to the usual run level, and no auto-boot zones where 
started, GDM did not start either


NAME STATE READ WRITE CKSUM
tank DEGRADED 0 0 0
  raidz1-0   DEGRADED 0 0 0
c21t0d0  ONLINE   0 0 0
c22t0d0  ONLINE   0 0 0
c20t0d0  FAULTED  0 0 0  corrupted data
c23t0d0  ONLINE   0 0 0
cache
  c18t0d0ONLINE   0 0 0
spares
  c16t0d0AVAIL  



My questions:

1. Why does the system get stuck, when a device faults?
2. Why does the hot spare not go online? (The manual says, that going online 
automatically is the default behavior)
3. Why does the system not boot to the usual run level, when a zpool is in a 
degraded state at boot time?


Regards,
Martin


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Verify NCQ status

2010-02-01 Thread Martin Faltesek
The 4 disks attached to the ahci driver should be using NCQ.  The two
cmdk disks will not have NCQ capability as they are under control  of
the legacy ata driver.  What does your pool topology look like?  Can you
try removing the cmdk disks from your pool.

You can also verify if your disks are NCQ capable:

grep sata /var/adm/messages*


On Fri, 2010-01-29 at 16:10 -0500, Christo Kutrovsky wrote:
> If I am reading this right, I have both IDE (root) and AHCI (data)
> pools. So they are using AHCI.
> 
> pci-ide, instance #0 (driver name: pci-ide)
> ide, instance #0 (driver name: ata)
> cmdk, instance #0 (driver name: cmdk)
> cmdk, instance #2 (driver name: cmdk)
> ide (driver name: ata)
> pci15d9,7980, instance #0 (driver name: ahci)
> disk, instance #5 (driver name: sd)
> disk, instance #6 (driver name: sd)
> disk, instance #7 (driver name: sd)
> disk, instance #8 (driver name: sd)
> 
> 
> On Fri, Jan 29, 2010 at 4:04 PM, Richard Elling
>  wrote:
> On Jan 29, 2010, at 12:01 PM, Christo Kutrovsky wrote:
> > Hello,
> >
> > I have PDSMi board
> (http://www.supermicro.com/products/motherboard/PD/E7230/PDSMi.cfm) 
> with Intel® ICH7R SATA2 (3 Gbps) controller built-in.
> >
> > I suspect NCQ is not working as I never see "actv" bigger
> than 1.0 i in iostat, even though I have requests in "wait".
> 
> This can happen if the BIOS represents the disk as being in
> IDE mode
> instead of AHCI mode. "prtconf -D" will show the drivers
> loaded and you
> can see if the disk is using an ATA or IDE driver.
>  -- richard
> 
> >
> > How can I verify the status of NCQ, and if not enabled to
> enable it. There are reports that ICH7R supports NCQ
> 
> (http://www.overclock.net/intel-motherboards/269993-how-enable-ahci-ncq-windows-2k.html).
> > --
> > This message posted from opensolaris.org
> > ___
> > zfs-discuss mailing list
> > zfs-discuss@opensolaris.org
> > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
> 
> 
> --
> Client of Pythian? On twitter? Let @paulvallee know and we'll add you
> to @pythian/clients!
> 
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Boot from external degraded zpool

2009-12-30 Thread Dr. Martin Mundschenk
Hi!

I wonder if the following scenario works:

I have a mac mini running as an OSOL box. The OS is installed on the internal 
hard drive on the vdrive rpool. On rpool there is no redundancy. 

If I add an external block device (USB / Firewire) to rpool to mirror the 
internal hard drive and if the internal hard drive fails, can I reboot the 
system with the detached internal drive but with the degraded mirror half on 
the external drive?

The mac is definitely capable of booting from all kinds of devices. But does 
OSOL support it in such a way, described above?

Regards,
Martin
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Something wrong with zfs mount

2009-12-15 Thread Martin Uhl
> We import the pool with the -R parameter, might that contribute to the 
> problem? Perhaps a zfs mount -a bug in correspondence with the -R parameter?

This Bugreport seems to confirm this:
http://bugs.opensolaris.org/view_bug.do?bug_id=6612218

Note that the /zz directory mentioned in the bugreport does not exist before 
the zfs set mountpoint command.

Greetings, Martin
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Something wrong with zfs mount

2009-12-15 Thread Martin Uhl
>>The dirs blocking the mount are created at import/mount time.
>how you know that??

In the previous example I could reconstruct that using zfs mount.  Just look at 
the last post.
I doubt ZFS removes mount directories.

>If you're correct you should been able to reproduce 
>the problem by doing a "clean" shutdown (or an export/import), can you 
>reproduce it this way??

The server is in a production environment and we cannot afford the necessary 
downtime for that.
Unfortunately the server has lots of datasets which cause import/export times 
of 45 mins.

We import the pool with the -R parameter, might that contribute to the problem? 
Perhaps a zfs mount -a bug in correspondence with the -R parameter?

Greetings, Martin
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Something wrong with zfs mount

2009-12-14 Thread Martin Uhl
> If you umount a ZFS FS that has some other FS's underneath it, then the 
> mount points for the "child" FS needs to be created to have those 
> mounted; that way if you don't export the pool the dirs won't be deleted 
> and next time you import the pool the FS will fail to mount because your 
> mount point is not empty. IMHO this is not a bug.

As far as I see it, the dirs will be created when the filesystem is mounted in 
the wrong order. That has nothing to do with removing dirs on export.

e.g. 

sunsystem9:[private] > zfs list -r -o name,mountpoint,mounted,canmount 
private/vmware
NAMEMOUNTPOINT
MOUNTED  CANMOUNT
private/vmware  //private/vmware   
noon
private/vmware/datastores   //private/vmware/datastores
noon
private/vmware/datastores/cdimages  //private/vmware/datastores/cdimages   
yes   on
sunsystem9:[private] > sudo zfs umount private/vmware/datastores/cdimages
sunsystem9:[private] > cd vmware
sunsystem9:[vmware] > find .
.
./datastores
./datastores/cdimages
sunsystem9:[vmware] > cd ..
sunsystem9:[private] > sudo zfs mount private/vmware
cannot mount '//private/vmware': directory is not empty

---> /private/vmware contains subdirs which prevent mounting of private/vmware

sunsystem9:[vmware] > sudo rm -rf datastores
sunsystem9:[private] > sudo zfs mount private/vmware

> now I removed the dirs, therefore it is working

sunsystem9:[private] > sudo zfs umount private/vmware
sunsystem9:[private] > sudo zfs mount private/vmware/datastores/cdimages

> that recreates the dir hierarchy /private/vmware/datastores/cdimages

sunsystem9:[private] > sudo zfs umount private/vmware/datastores/cdimages

> that does not remove these aforementioned dirs

sunsystem9:[private] > sudo zfs mount private/vmware
cannot mount '//private/vmware': directory is not empty

> obviously that will fail.

So AFAIK those directories will be created on mount but not removed on unmount

The problem is not that exporting will not remove dirs (which I doubt it 
should) but mounting datasets in the wrong order does create to many dirs. 
Which then inhibits mounting of some datasets. 

The dirs blocking the mount are created at import/mount time.

> That's what should be investigated in your case; i know there are some 
> fixes in progress specially for the sharing part but more data is needed 
> to see what's going on here.

what data do you need?

Greetings, Martin
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Something wrong with zfs mount

2009-12-14 Thread Martin Uhl
We are also running into this bug.

Our system is a Solaris 10u4
SunOS sunsystem9 5.10 Generic_127112-10 i86pc i386 i86pc
ZFS version 4

We opened a Support Case (Case ID 71912304) which after some discussion came to 
the "conclusion" that we should not use /etc/reboot for rebooting.

This leads me to the conclusion that  /etc/reboot is not supported on ZFS? I 
cannot believe that
Is there better solution to this problem, what if the machine crashes?

Unfortunately, since the pool contains so many datasets (around 5000) that 
correcting the mount order by hand would incur serious downtime. (importing and 
therefore also sharing this pool takes around 45min. - which is also why we are 
using /etc/reboot to avoid additional downtime for unsharing, umounting, 
exporting)

Is there a backport of this fix for S10 in progress. 

Why does ZFS confuse the mount order? After all, the datasets are ordered 
hierarchically.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Messed up zpool (double device label)

2009-12-12 Thread Dr. Martin Mundschenk
Hi!

I tried to add an other FiweFire Drive to my existing four devices but it 
turned out, that the OpenSolaris IEEE1394 support doen't seem to be 
well-engineered.

After not recognizing the new device and exporting and importing the existing 
zpool, I get this zpool status:

  pool: tank
 state: DEGRADED
status: One or more devices could not be used because the label is missing or
invalid.  Sufficient replicas exist for the pool to continue
functioning in a degraded state.
action: Replace the device using 'zpool replace'.
   see: http://www.sun.com/msg/ZFS-8000-4J
 scrub: none requested
config:

NAME STATE READ WRITE CKSUM
tank DEGRADED 0 0 0
  raidz1 DEGRADED 0 0 0
c12t0d0  ONLINE   0 0 0
c12t0d0  FAULTED  0 0 0  corrupted data
c14t0d0  ONLINE   0 0 0
c15t0d0  ONLINE   0 0 0

The device c12t0d0 appears two times!?

'format' returns these devices:

AVAILABLE DISK SELECTIONS:
   0. c7d0 
  /p...@0,0/pci-...@b/i...@0/c...@0,0
   1. c12t0d0 
  
/p...@0,0/pci10de,a...@16/pci11c1,5...@0/u...@00303c02e014fc66/d...@0,0
   2. c13t0d0 
  
/p...@0,0/pci10de,a...@16/pci11c1,5...@0/u...@00303c02e014fc32/d...@0,0
   3. c14t0d0 
  
/p...@0,0/pci10de,a...@16/pci11c1,5...@0/u...@00303c02e014fc61/d...@0,0
   4. c15t0d0 
  
/p...@0,0/pci10de,a...@16/pci11c1,5...@0/u...@00303c02e014fc9d/d...@0,0


When I scrub data, the devices c12t0d0, c13t0d0 and c14t0d0 re accessed and 
c15t0d0 sleeps. I don't get it! How can such a mess happen and how do I get it 
back straight?

Regards,
Martin

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS Kernel Panic

2009-12-12 Thread Dr. Martin Mundschenk
Hi!

My OpenSolaris 2009.06 box runs into kernel panics almost every day. There are 
4 FireWire drives, as a RaidZ pool attached to a MacMini. The panic seems to be 
related to this known bug:

http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6835533

Since there are no known workarounds, is my hardware configuration worthless? 

Regards,
Martin
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] hung pool on iscsi

2009-11-16 Thread Martin Vool
I already got my files back acctuay and the disc contains already new pools, so 
i have no idea how it was set.

I have to make a virtualbox installation and test it. 
Can you please tell me how-to set the failmode?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS forensics/revert/restore shellscript and how-to.

2009-11-16 Thread Martin Vool
I have no idea why this forum just makes files dissapear??? I will put a link 
tomorrow...a file was attached before...
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] hung pool on iscsi

2009-11-16 Thread Martin Vool
I encountered the same problem...like i sed in the first post...zpool command 
freezes. Anyone knows how to make it respond again?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Need Help Invalidating Uberblock

2009-11-16 Thread Martin Vool
You might want to check out this thread:

http://opensolaris.org/jive/thread.jspa?messageID=435420
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS forensics/revert/restore shellscript and how-to.

2009-11-16 Thread Martin Vool
The links work fine if you take the * off from the end...sorry bout that
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS forensics/revert/restore shellscript and how-to.

2009-11-16 Thread Martin Vool
I forgot to add the script
-- 
This message posted from opensolaris.org

zfs_revert.py
Description: Binary data
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS forensics/revert/restore shellscript and how-to.

2009-11-16 Thread Martin Vool
I have written an python script that enables to get back already deleted files 
and pools/partitions. This is highly experimental, but I managed to get back a 
moths work when all the partitions were deleted by accident(and of course 
backups are for the weak ;-) 

I hope someone can pass this information to the ZFS forensics project or where 
this should be..

First the basics and the HOW-TO is after that.

And i am not an solaris or ZFS expert, i am sure here are many things to 
improve, i hope you can help me out with some problems this still has.

[b]Basics:[/b]
Basically this script finds all the uberblocks, reads their metadata and orders 
them by time, then enables you to destroy all the uberblocks that were created 
after the event that you want to scroll back. Then destroy the cache and make 
the machine boot up again.
This will only work if the discs are not very full and there was not very much 
activity after the bad event. I managed to get back files from an ZFS partition 
after it was deleted(several) and created new ones.


I got so far by the help of these materials, the ones with * are the key parts:
*http://mbruning.blogspot.com/2008/08/recovering-removed-file-on-zfs-disk.html*
http://blogs.sun.com/blogfinger/entry/zfs_and_the_uberblock
*http://www.opensolaris.org/jive/thread.jspa?threadID=85794&%u205Etstart=0*
http://opensolaris.org/os/project/forensics/ZFS-Forensics/
http://docs.huihoo.com/opensolaris/solaris-zfs-administration-guide/html/ch04s06.html
http://www.lildude.co.uk/zfs-cheatsheet/

[b]How-to[/b]
This is the scenario i had...

First check the pool status:
$zpool status zones 

>From there you will get the disc name e.g:c2t60060E800457AB0057AB0146d0

Now we look up the history of the pool so we can find the timeline and some 
uberblocks(their TXG-s) where to scroll back:
zpool history -il zones
Save this output for later use.

You will defently want to backup the disk before you continue from this point:
e.g. ssh r...@host "dd if=/dev/dsk/c..." | dd of=Desktop/zones.dd

Now take the script that i have attached zfs_revert.py
It has two options:
-bs  is block size, by default 512 (never tested)
-tb is number of blocks:[this is mandatory, maybe someone could automate this]

To find the block size in solaris you can use
prtvtoc /dev/dsk/c2t60060E800457AB0057AB0146d0 | grep sectors
>From there look at the "sectors" row.
If you have a file/loop device just sizeinbytes/blocksize=total blocks

Now run the script for example:
./zfs_revert.py -bs=512 -tb=41944319 
/dev/dsk/c2t60060E800457AB0057AB0146d0

This will use dd, od and grep(GNU) to find the required information. This 
script should work on linux and on solaris.

It should give you a representation of the found uberblocks(i tested it with a 
20GB pool, did not take very long since the uberblocks are only at the 
beginning  and ending of the disk)

Something like this, but probably much more:
TXG, time-stamp, unixtime, addresses(there are 4 copy's of uberblocks)
411579  05 Oct 2009 14:39:511254742791  [630, 1142, 41926774, 41927286]
411580  05 Oct 2009 14:40:211254742821  [632, 1144, 41926776, 41927288]
411586  05 Oct 2009 14:43:211254743001  [644, 1156, 41926788, 41927300]
411590  05 Oct 2009 14:45:211254743121  [652, 1164, 41926796, 41927308]

Now comes the FUN part, take a wild guess witch block might be the one, it took 
me about 10 tryes to get it right, and i have no idea what are the "good" 
blocks or how to check this up. You will see later what i mean by that.

Enter the last TXG you want to KEEP.

Now the script writes zeroes to all of the uberblocs after the TXG you inserted.

Now clear the ZFS cache and reboot(better solution someone???)
rm -rf /etc/zfs/zpool.cache && reboot

After the box comes up you have to hurry, you don't have much time, if any at 
all since ZFS will realize in about a minute or two that something is fishy.

First try to import the pool if it is not imported yet.
zpool import -f zones

Now see if it can import it or fail miserably. There is a good chance that you 
will hit Corrupt data and unable to import, but as i said earlier it took me 
about 10 tries to get it right. I did not have to restore the whole thing every 
time, i just took baby steps and every time deleted some more blocks until i 
found something stable(not quite, it will still crash after few minutes, but 
this is enough time to get back conf files or some code)


Problems and unknown factors:
1) After the machine boots up you have limited time before ZFS realizes that it 
has been corrupted(checksums? I tried to turn them off but as soon as I turn 
checksumming off it crashes and when i could turn it of then the data might be 
corrupted)
2) If you copy files and one of them is corrupted the whole thing halts/crashes 
and you have to start with the zfs_revert.py script and reboot again.
3) It might be that reverting to a TXG where the pool was exported then there 
is a better

Re: [zfs-discuss] Live resize/grow of iscsi shared ZVOL

2009-08-11 Thread Martin Wheatley
Did anyone reply to this question?

We have the same issue and our Windows admins do see why the iSCSI target 
should be disconnected when the underlying storage is extended
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Shrinking a zpool?

2009-08-05 Thread Martin
Bob wrote:

> Perhaps the problem is one of educating the customer
> so that they can 
> ammend their accounting practices.  Different
> business groups can 
> share the same pool if necessary.

Bob, while I don't mean to pick on you, that statement captures a major 
thinking flaw in IT when it comes to sales.

Yes, Brian should do everything possible to shape the customer's expectations; 
that's his job.

At the same time, let's face it.  If the customer thinks he needs X (whether or 
not he really does) and Brian can't get him to move away from it, Brian is 
sunk.  Here Brian sits with a potential multi-million dollar sale which is 
stuck on a missing feature, and probably other obstacles.  The truth is that 
the other obstacles are irrelevant as long as the customer can't get past 
feature X, valid or not.

So millions of dollars to Sun hang in the balance and these discussions revolve 
around whether or not the customer is planning optimally.  Imagine how much 
rapport Brian will gain when he tells this guy, "You know, if you guys just 
planned better, you wouldn't need feature X."  Brian would probably not get his 
phone calls returned after that.

You can rest assured that when the customer meets with IBM the next day, the 
IBM rep won't let the customer get away from feature X that JFS has.  The 
conversation might go like this.

Customer: You know, we are really looking at Sun and ZFS.

IBM: Of course you are, because that's a wise thing to do.  ZFS has a lot of 
exciting potential.

Customer: Huh?

IBM: ZFS has a solid base and Sun is adding features which will make it quite 
effective for your applications.

Customer: So you like ZFS?

IBM: Absolutely.  At some point it will have the features you need.  You 
mentioned you use feature X to provide the flexibility you have to continue to 
outperform your competition during this recession.  I understand Sun is working 
hard to integrate that feature, even as we speak.

Customer: Maybe we don't need feature X.

IBM: You would know more than I.  When did you last use feature X?

Customer: We used X last quarter when we scrambled to add FOO to our product 
mix so that we could beat our competition to market.

IBM: How would it have been different if feature X was unavailable?

Customer (mind racing): We would have found a way.

IBM: Of course, as innovative as your company is, you would have found a way.  
How much of a delay?

Customer (thinking through the scenarios): I don't know.

IBM: It wouldn't have impacted the rollout, would it?

Customer: I don't know.

IBM: Even if it did delay things, the delay wouldn't blow back on you, right?

Customer (sweating): I don't think so.

Imagine the land mine Brian now has to overcome when he tries to convince the 
customer that they don't need feature X, and even if they do, Sun will have it 
"real soon now."

Does anyone really think that Oracle made their money lecturing customers on 
how Table Partitions are stupid and if the customer would have planned their 
schema better, they wouldn't need them anyway?  Of course not.  People wanted 
partitions (valid or not) and Oracle delivered.

Marty
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Shrinking a zpool?

2009-08-05 Thread Martin
richard wrote:
> Preface: yes, shrink will be cool.  But we've been
> running highly  
> available,
> mission critical datacenters for more than 50 years
> without shrink being
> widely available.

I would debate that.  I remember batch windows and downtime delaying one's 
career movement.  Today we are 24x7 where an outage can kill an entire business.

> Do it exactly the same way you do it for UFS.  You've
> been using UFS
> for years without shrink, right?  Surely you have
> procedures in  
> place :-)

While I haven't taken a formal survey, everywhere I look I see JFS on AIX and 
VxFS on Solaris.  I haven't been in a production UFS shop this decade.

> Backout plans are not always simple reversals.  A
> well managed site will
> have procedures for rolling upgrades.

I agree with everything you wrote.  Today other technologies allow live changes 
to the pool, so companies use those technologies instead of ZFS.

> There is more than one way to skin a cat.

Which entirely misses the point.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Shrinking a zpool?

2009-08-05 Thread Martin
C,

I appreciate the feedback and like you, do not wish to start a side rant, but 
rather understand this, because it is completely counter to my experience.

Allow me to respond based on my anecdotal experience.

> What's wrong with make a new pool.. safely copy the data. verify data
> and then delete the old pool..

You missed a few steps.  The actual process would be more like the following.
1. Write up the steps and get approval from all affected parties
-- In truth, the change would not make it past step 1.
2. Make a new pool
3. Quiesce the pool and cause a TOTAL outage during steps 4 through 9
4. Safely make a copy of the data
5. Verify the data
6. Export old pool
7. Import new pool
8. Restart server
9. Confirm all services are functioning correctly
10. Announce the outage has finished
11. Delete the old pool

Note step 3 and let me know which 24x7 operation would tolerate an extended 
outage (because it would last for hours or days) on a critical production 
server.

One solution is not to do this on a critical enterprise storage, and that's the 
point I am trying to make.

> Who in the enterprise just allocates a
> massive pool

Everyone.

> and then one day [months or years later] wants to shrink it...

Business needs change.  Technology changes.  The project was a pilot and 
canceled.  The extended pool didn't meet verification requirements, e,g, 
performance and the change must be backed out.  Business growth estimates are 
grossly too high and the pool needs migration to a cheaper frame in order to 
keep costs in line with revenue.  The pool was made of 40 of the largest disks 
at the time and now, 4 years later, only 10 disks are needed to accomplish the 
same thing while the 40 original disks are at EOL and no longer supported.

The list goes on and on.

> I'd have to concur there's more useful things out there. OTOH... 

That's probably true and I have not seen the priority list.  I was merely 
amazed at the number of "Enterprises don't need this functionality" posts.

Thanks again,
Marty
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Shrinking a zpool?

2009-08-05 Thread Martin
> You are the 2nd customer I've ever heard of to use shrink.

This attitude seems to be a common theme in ZFS discussions: "No enterprise 
uses shrink, only grow."

Maybe.  The enterprise I work for requires that every change be reversible and 
repeatable.  Every change requires a backout plan and that plan better be fast 
and nondisruptive.

Who are these enterprise admins who can honestly state that they have no 
requirement to reverse operations?  Who runs a 24x7 storage system and will 
look you in the eye and state, "The storage decisions (parity count, number of 
devices in a stripe, etc.) that I make today will be valid until the end of 
time and will NEVER need nondisruptive adjustment.  Every storage decision I 
made in 1993 when we first installed RAID is still correct and has needed no 
changes despite changes in our business models."

My experience is that this attitude about enterprise storage borders on insane.

Something does not compute.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] triple-parity: RAID-Z3

2009-08-04 Thread Martin
> With RAID-Z stripes can be of variable width meaning that, say, a
> single row
> in a 4+2 configuration might have two stripes of 1+2. In other words,
> there
> might not be enough space in the new parity device.

Wow -- I totally missed that scenario.  Excellent point.

>  I did write up the
> steps
> that would be needed to support RAID-Z expansion

Good write up.  If I understand it, the basic approach is to add the device to 
each row and leave the unusable fragments there.  New stripes will take 
advantage of the wider row but old stripes will not.

It would seem that the mythical bp_rewrite() that I see mentioned here and 
there could relocate a stripe to another set of rows without altering the 
transaction_id (or whatever it's called), critical for tracking snapshots.  I 
suspect this function would allow background defrag/coalesce (a needed feature 
IMHO) and deduplication.  With background defrag, the extra space on existing 
stripes would not immediately be usable, but would appear over time.

Many thanks for the insight and thoughts.

Bluntly, how can I help?  I have cut a lifetime of C code in a past life.

Cheers,
Marty
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] triple-parity: RAID-Z3

2009-07-20 Thread Martin
> Enterprises will not care about ease so much as they
> have dedicated professionals to pamper their arrays.

Enterprises can afford the professionals.  I work for a fairly large bank which 
can, and does, afford a dedicated storage team.

On the other hand, no enterprise can afford downtime.  Where I work, a planned 
outage is a major event and any solution which allows flexibility without an 
outage is most welcome.

While I am unfamiliar withe innards of VxFS, I have seen several critical 
production VxFS mount points expanded with little or no interruption.

ZFS is so close on so many levels.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] triple-parity: RAID-Z3

2009-07-19 Thread Martin
> I don't see much similarity between mirroring and raidz other than
> that they both support redundancy.

A single parity device against a single data device is, in essence, mirroring.  
For all intents and purposes, raid and mirroring with this configuration are 
one and the same.

> A RAID system with distributed parity (like raidz) does not have a
> "parity device". Instead, all disks are treated as equal. Without
> distributed parity you have a bottleneck and it becomes difficult to
> scale the array to different stripe sizes.

Agreed.  Distributed parity is the way to go.  Nonetheless, if I have an array 
with a single parity, then I still have one device dedicated to parity, even if 
the actual device which holds the parity information will vary from stripe to 
stripe.

The point simply was that it might be straightforward to add a device and 
convert a raidz array into a raidz2 array, which effectively would be adding a 
parity device.  An extension of that is to convert a raidz2 array back into a 
raidz array and increase its size without adding a device.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] triple-parity: RAID-Z3

2009-07-18 Thread Martin
> Don't hear about triple-parity RAID that often:

I agree completely.  In fact, I have wondered (probably in these forums), why 
we don't bite the bullet and make a generic raidzN, where N is any number >=0.

In fact, get rid of mirroring, because it clearly is a variant of raidz with 
two devices.  Want three way mirroring?  Call that raidz2 with three devices.  
The truth is that a generic raidzN would roll up everything: striping, 
mirroring, parity raid, double parity, etc. into a single format with one 
parameter.

If memory serves, the second parity is calculated using Reed-Solomon which 
implies that any number of parity devices is possible.

Let's not stop there, though.  Once we have any number of parity devices, why 
can't I add a parity device to an array?  That should be simple enough with a 
scrub to set the parity.  In fact, what is to stop me from removing a parity 
device?  Once again, I think the code would make this rather easy.

Once we can add and remove parity devices at will, it might not be a stretch to 
convert a parity device to data and vice versa.  If you have four data drives 
and two parity drives but need more space, in a pinch just convert one parity 
drive to data and get more storage.

The flip side would work as well.  If I have six data drives and a single 
parity drive but have, over the years, replaced them all with vastly larger 
drives and have space to burn, I might want to covert a data drive to parity.  
I may sleep better at night.

If we had a generic raidzN, the ability to add/remove parity devices and the 
ability to convert a data device from/to a parity device, then what happens?  
Total freedom.  Add devices to the array, or take them away.  Choose the blend 
of performance and redundancy that meets YOUR needs, then change it later when 
the technology and business needs change, all without interruption.

Ok, back to the real world.  The one downside to triple parity is that I recall 
the code discovered the corrupt block by excluding it from the stripe, 
reconstructing the stripe and comparing that with the checksum.  In other 
words, for a given cost of X to compute a stripe and a number P of corrupt 
blocks, the cost of reading a stripe is approximately X^P.  More corrupt blocks 
would radically slow down the system.  With raidz2, the maximum number of 
corrupt blocks would be two, putting a cap on how costly the read can be.

Standard disclaimers apply: I could be wrong, I am often wrong, etc.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Sans Digital Tower Raid TR8M

2009-07-03 Thread Martin Englund
I'm wondering if someone has tried using Sans Digital's Tower Raid TR8M[1] with 
ZFS (I'm especially curious about the bundled 2-port eSATA PCIe Host Bus 
Adapter)

It seems like an very good expansion tower as it holds up to 8 SATA disks, but 
before I dish out $395 I'd like to know that it works ok :)

[1] http://www.sansdigital.com/towerraid/tr8mb.html

cheers,
/Martin
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Adding a SDcard as zfs cache (L2ARC?) on a laptop?

2009-06-13 Thread Martin
> Did anyone ever have success with this?

> I'm trying to add a usb flash device as rpool cache, and am hitting the same 
> problem,
> even after working through the SMI/EFI label and other issues above.

I played with adding a USB stick as L2ARC a few versions ago of SXCE, pre 104.

At the time, I got the same error message, but if I booted to "failsafe" it 
would allow me to add the USB device as rpool cache.  Unfortunately, on normal 
boot up, the device name had changed on the USB device so ZFS complained.  I 
don't recall how I got around that problem.

I found that the USB implementation on Solaris was weak enough that the USB 
drive, which consistently read 20 MB/s or so under Linux was hovering around 5 
MB/s in Solaris.  Only under extremely random reads did the flash drive help.  
In all other cases, it actually slowed down the system.  I assume that a better 
USB implementation or a different storage interface would have made it a much 
better experience.

Good luck,
Marty
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Mirror and RaidZ on only 3 disks

2008-11-18 Thread Martin Blom
Miles Nordin wrote:
> mb> if I'm risking it more than usual when the procedure is done?
>
> yeah, that is my opinion: when the procedure is done, using ZFS
> without a backup is risking the data more than using UFS or ext3
> without a backup.  Is that a clear statement?
>
>
> I can ramble on, but maybe that's all you care to hear.
>   

Thanks all for your input. I guess basically the idea is sound for my
needs; however, Miles' words made me do my homework and read the mail
archives. So I don't know ... The idea of loosing a complete zpool just
because the power goes out (which  does happen once of twice a year
here), or simply because of the fact that I will be running on toy
hardware, is really not comfortable. I'm quite confident ext3 will never
do that to me.

In a mail from last month Jeff Bonwick wrote on this list that he's
working on better recovery from inconsistent filesystems. I guess that's
something I should wait for.

-- 
 Martin Blom --- [EMAIL PROTECTED] 
Eccl 1:18 http://martin.blom.org/



smime.p7s
Description: S/MIME Cryptographic Signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Mirror and RaidZ on only 3 disks

2008-11-16 Thread Martin Blom
Miles Nordin wrote:
>
> mb> 5) Given that this is all cheap PC hardware ... can I move a
> mb> disk from a broken controller to another
>
> zpool export, zpool import.
>   
I was testing with the rpool, but "zpool import -f" when booting for the
CD did the trick. Thanks for the hint.
> If the pool is only DEGRADED it would be nice to do it online, but I
> don't know a way to do that.
>
> mb> How does this idea sound to you?
>
> I think you need a backup in a separate pool or a non-ZFS filesystem.
> The backup could be .tar files or an extracted rsync copy, but somehow
> I think you need protection against losing the whole pool to software
> bugs or operator error.  There are other cases where you might want to
> destroy and recreate the pool, like wanting to remove a slog or change
> the raidz/raidz2/mirror level, but I think that's not why you need it.
> You need it for protection.  losing the pool is really possible.
>   
I do intend to keep backups both before and after, but are you referring
to the actual migration or when everything is completed? I know the data
is at risk while transferring the old content and when attaching the
third drive; what I'm worried about is if I'm risking it more than usual
when the procedure is done?

-- 
 Martin Blom --- [EMAIL PROTECTED] 
Eccl 1:18 http://martin.blom.org/



smime.p7s
Description: S/MIME Cryptographic Signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Mirror and RaidZ on only 3 disks

2008-11-15 Thread Martin Blom

Hi,

I have a small Linux server PC at home (Intel Core2 Q9300, 4 GB RAM),
and I'm seriously considering switching to OpenSolaris (Indiana,
2008.11) in the near future, mainly because of ZFS. The idea is to run
the existing CentOS 4.7 system inside a VM and let it NFS mount home
directories and other filesystems from OpenSolaris. I might migrate more
services from Linux over time, but for now, the filesystems are priority
one.

Since most of my questions are actually about ZFS, I thought I'd ask
here directly.

First of all, I'm on a budget and while "cheap" is important, "value for
the money" is critical. So the less number of disk, the better (not only
because of price but also because of power consumption and the lack of
cooling). This also means that I prefer RAID-Z to mirrors for less
critical data.

1) My plan is to install 2008.11 on three 1 TB disks (Samsung SpinPoint
F1, two new and one currently containing Linux data). I will start with
a single, new disk and install the OS on a zpool inside an fdisk
partition, say 100 GB large. This is where the OS and /home will live.
The rest of the disk will later be used for less critical data (music,
video, VMs, backups, etc), inside an fdisk partition of type "Unix".

Once installed, I'll attach the second new disk, using identical
partition layout, and attach the 100 GB partition as mirror to the root
pool. I'll then create a sparse ZFS volume and use this together with
the unused fdisk partition on each disk to create a 3-way RAID-Z pool,
and finally degrade it by taking the sparse file offline.

I'll then start migrating the Linux files, probably using a VM directly
mounting the ext3 filesystems from the old 1 TB disk and copying home
directories to the mirror pool and media files to the raid-z pool.
Finally, I'll reformat and attach the third disk to the pools.

I thus hope to end up with three disk and two pools: one small 3-way
mirror for critical data and one large 3-way raid-z pool for the rest.
How does this idea sound to you? Will I be able to enable the write
cache in this setup (not that write speed matters much to me, but still)?

2) Given the perhaps unusual disk layout, do you think I'll run into
trouble concerning OS upgrades or if/when one of the disks fails?

I've tried the procedure in VMware and found a few gotchas, but nothing
too serious (like "zpool import -f rpool" to make grub work after
installation, and trying to create the second zpool on c4t0d0p1 -- which
happened to be the same as c4t0d0s0, where rpool lives -- instead of
c4t0d0p2; funny "zpool create" didn't complain?)

3) One problem I don't understand why I got is this: When I attach a new
virgin disk to the system, I first run format->fdisk to make two fdisk
partitions and then use prtvtoc/fmthard to create the splices on the
solaris partition. When I then try to attach the new c4t1d0s0 to the
existing c4t1d0s0, zpool complains that only complete disks can be
attached. However, after a reboot, the slice attaches without problems
(except that I have to use -f since c4t1d0s0 overlaps with c4t1d0s2,
which is also something I don't understand -- of course it does?). How come?

4) I plan to use a sparse ZFS volume for the Linux VM root disk,
probably from the mirror pool. Objections?

5) Given that this is all cheap PC hardware ... can I move a disk from a
broken controller to another, and if so, how? I tried this in VMware,
but could not figure out how to re-attach the moved disk. zpool
complains that the moved disk is part of an active zpool and -f didn't
help at all.

Any input would be greatly appreciated!

-- 
 Martin Blom --- [EMAIL PROTECTED] 
Eccl 1:18 http://martin.blom.org/



smime.p7s
Description: S/MIME Cryptographic Signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] sata on sparc

2008-10-24 Thread Martin Winkelman
On Fri, 24 Oct 2008, Francois Dion wrote:

> On this page:
>
> http://www.sun.com/io_technologies/sata/SATA0.html
>
> I see:
> -USB,SATA,IEEE1394 SIIG, Inc. USB 2.0 + FireWire + SATA Combo (SC-UNS012) 
> Verified (Solaris 10)
>
> Indicating that if I install an SC-UNS012 in a Solaris 10 Sparc server I 
> would get a few USB 2 ports, a few IEEE1394 ports but more interestingly, a 
> few SATA ports working under sparc.
>
> But that same page states:
> "Native SATA is not yet fully supported. There is no current support on 
> SPARC-based systems. "
>
> Which is it? It works or it doesn't?


Solaris 10 for Sparc doesn't have a driver for the SATA chipset on this card. 
It is listed as verified for Sparc Solaris because the USB and FireWire ports 
will work on Sparc systems.


--
Martin Winkelman  -  [EMAIL PROTECTED]  -  303-272-3122
http://www.sun.com/solarisready/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] one step forward - pinging Lukas pool: ztankKarwacki (kangurek)

2008-10-02 Thread Martin Uhl
> When I attempt again to import using zdb -e ztank
> I still get zdb: can't open ztank: I/O error
> and zpool import -f, whilst it starts and seems to
> access the disks sequentially, it stops al the 3rd
> one (no sure which precisely - it spins it up and the
> process stops right there, and the system will not
> reboot when asked to (shutdown -g0 -y -i5)
> so there's some slight progress here.

How about just removing that disk and try importing?
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Joshua P Martin is out of the office

2008-09-06 Thread Joshua . P . Martin

I will be out of the office starting  09/05/2008 and will not return until
09/08/2008.

I will respond to your message when I return.

CONFIDENTIALITY NOTICE:  This electronic message contains
information which may be legally confidential and/or privileged and
does not in any case represent a firm ENERGY COMMODITY bid or offer
relating thereto which binds the sender without an additional
express written confirmation to that effect.  The information is
intended solely for the individual or entity named above and access
by anyone else is unauthorized.  If you are not the intended
recipient, any disclosure, copying, distribution, or use of the
contents of this information is prohibited and may be unlawful.  If
you have received this electronic transmission in error, please
reply immediately to the sender that you have received the message
in error, and delete it.  Thank you.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Live resize/grow of iscsi shared ZVOL

2008-08-14 Thread Martin Svensson
I have created a zvol. My client computer (windows) has the volume connected 
fine.
But when I resize the zvol using:
zfs set volsize=20G pool/volumes/v1
.. it disconnects the client. Is this by design?
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] marvell88sx patch

2008-08-14 Thread Martin Gasthuber

Hi,

  in which opensolaris (nevada) version this fix is included 

thanks,
   Martin

On 13 Aug, 2008, at 18:52, Bob Friesenhahn wrote:


I see that a driver patch has now been released for marvell88sx
hardware.  I expect that this is the patch that Thumper owners have
been anxiously waiting for.  The patch ID is 138053-02.

Bob
==
Bob Friesenhahn
[EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss




smime.p7s
Description: S/MIME cryptographic signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS configuration and performance questions.

2008-08-11 Thread Martin Svensson
I read this (http://blogs.sun.com/roch/entry/when_to_and_not_to) blog regarding 
when and when not to use raidz. There is an example of a plain striped 
configuration and a mirror configuration. (See below)

M refers to a 2-way mirror and S to a simple dynamic stripe.

Config  Blocks AvailableRandom FS Blocks /sec
    -
M  2 x (50)5000 GB  2 
S  1 x (100)   1 GB 2

Granted, the simple striped configuration is fast, and of course with no 
redundancy. But I don't understand how a mirrored configuration can perform as 
good when you sacrifice half of your disks for redundancy. Doesn't a mirror 
perform as one device? Can someone please clarify the example from the above, I 
think I am missing something?
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS configuration and performance questions.

2008-08-10 Thread Martin Svensson
Hello! I'm new to ZFS and have some configuration questions.

What's the difference, performance wise, in below configurations?
* In the first configuration, can I loose 1 disk? And, are the disks striped to 
gain performance, as they act as one vdev?

* In the second configuration, can I loose 2 disks (one per raidz group)? Since 
dynamic striping is done across vdevs, will this configuration have better 
performance then configuration 1?

[b]Configuration 1[/b]
 raidz
  - dev1
  - dev2
  - dev3
  - dev4
  - dev5
  - dev6
  - dev7
  - dev8
  - dev9
  - dev10

[b]Configuration 2[/b]
 raidz
  - dev1
  - dev2
  - dev3
  - dev4
  - dev5
 raidz
  - dev6
  - dev7
  - dev8
  - dev9
  - dev10
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS fails to mount all datasets on boot

2008-07-21 Thread Martin Uhl
I have a server with a huge number of datasets (around 9000)

When the pool containing the datasets is imported on boot up, a few (<10) 
datasets are not mounted and thus not exported via nfs. Which dataset is not 
mounted is random.

All datasets are exported via nfs. A zfs import takes around 30 mins. 

The system is running s10u4 

sunsystem9:[~] > uname -a
SunOS sunsystem9 5.10 Generic_127112-10 i86pc i386 i86pc

The zpool configuration is as follows:

sunsystem9:[~] > zpool status
  pool: private
 state: ONLINE
 scrub: none requested
config:

NAME   STATE READ WRITE CKSUM
privateONLINE   0 0 0
  raidz1   ONLINE   0 0 0
c6t600A0B800029A24007204753ACF1d0  ONLINE   0 0 0
c6t600A0B800029A24007224753AE8Fd0  ONLINE   0 0 0
c6t600A0B800029F7C407304753B2A2d0  ONLINE   0 0 0
c6t600A0B800029A24007284753B6D3d0  ONLINE   0 0 0
  raidz1   ONLINE   0 0 0
c6t600A0B800029F7C4072A4753ACB4d0  ONLINE   0 0 0
c6t600A0B800029F7C4072E4753B126d0  ONLINE   0 0 0
c6t600A0B800029A24007264753B341d0  ONLINE   0 0 0
c6t600A0B800029F7C407344753BF84d0  ONLINE   0 0 0
  raidz1   ONLINE   0 0 0
c6t600A0B800029F7C4072C4753AE42d0  ONLINE   0 0 0
c6t600A0B800029A24007244753B20Dd0  ONLINE   0 0 0
c6t600A0B800029F7C407324753B6BAd0  ONLINE   0 0 0
c6t600A0B800029A240072A4753C06Dd0  ONLINE   0 0 0

errors: No known data errors

The disks are on a ST6140 attached to the system via redundant fc 
(multipathing) (Each of the disk devices are volumes with 5 500gb disks in a 
raid5)

Any ideas?
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] NFS shared subtree not visible

2008-07-13 Thread Martin Schuster
According to PerterB in #opensolaris, I'd need NFS4 mirror-mounts for that.

I decided to instead just setup the automounter on the clients and put the 
directories in the automount-map :)
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] NFS shared subtree not visible

2008-07-13 Thread Martin Schuster
Hi everyone,
after using Linux for 12 years, I now decided to give OpenSolaris a try, using 
it as OS for my new home-filer.

I've created a zpool, and multiple zfs on there, two of those are
NAMEUSED  AVAIL  REFER  
MOUNTPOINT
tank/data   838G   993G  42.6K  
/data
tank/data/video 389G   993G   389G  
/data/video

and I've shared tank/data:
[EMAIL PROTECTED]:~# zfs get sharenfs tank/data
NAME   PROPERTY  VALUE  SOURCE
tank/data  sharenfs  rw,nosuid,[EMAIL PROTECTED]  local
[EMAIL PROTECTED]:~# zfs get sharenfs tank/data/video
NAME PROPERTY  VALUE  SOURCE
tank/data/video  sharenfs  rw,nosuid,[EMAIL PROTECTED]  inherited from tank/data

Now I'd like to mount the whole /data tree with all its sub-zfs via NFS, but if 
I do a
mount -t nfs sheeana:/data /mnt
I can only see an empty /mnt/video; I have to additionally do a
mount -t nfs sheeana:/data/video /mnt/video

Is there some way to mount everything using just the first mount-command?

tia
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] sharenfs=off, but still being shared?

2008-07-12 Thread Martin Gisch
Hi Mark,

Sharemgr output:

-bash-3.2# sharemgr show -vp
default nfs=()
smb smb=()
zfs
zfs/rpool/export smb=()
  export=/export
zfs/store/movies smb=()
  Movies=/store/movies
zfs/store/overlord2 nfs=() smb=()
  overlord2=/store/overlord2
zfs/store/tv smb=()
  TV=/store/tv
-bash-3.2#

Full zfs list:
-bash-3.2# zfs list -o name,sharenfs,sharesmb
NAMESHARENFS  SHARESMB
rpool   off   off
[EMAIL PROTECTED]   - -
rpool/ROOT  off   off
rpool/[EMAIL PROTECTED]  - -
rpool/ROOT/opensolaris  off   off
rpool/ROOT/[EMAIL PROTECTED]  - -
rpool/ROOT/opensolaris/opt  off   off
rpool/ROOT/opensolaris/[EMAIL PROTECTED]  - -
rpool/exportoff   name=export
rpool/[EMAIL PROTECTED]- -
rpool/export/home   off   off
rpool/export/[EMAIL PROTECTED]   - -
store   off   off
store/backupoff   off
store/moviesoff   name=Movies
store/overlord2 rwname=overlord2
store/tvoff   name=TV
-bash-3.2#

Thanks,
-M.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] sharenfs=off, but still being shared?

2008-07-12 Thread Martin Gisch
I noticed an oddity on my 2008.05 box today.
Created a new zfs file system that I was planning to nfs share out to an old 
FreeBSD box, after I put sharenfs=on for it, I noticed there was a bunch of 
others shared too:

-bash-3.2# dfshares -F nfs
RESOURCE  SERVER ACCESSTRANSPORT
reaver:/store/movies  reaver  - -
reaver:/exportreaver  - -
reaver:/store/tv  reaver  - -

Which is strange because I never turned it on for them...I don't know if they 
were shared before that point already...hadn't checked before. They are all smb 
shared though.

-bash-3.2# zfs list -o name,sharenfs,sharesmb
NAMESHARENFS  SHARESMB
...
rpool/exportoff   name=export
store/moviesoff   name=Movies
store/tvoff   name=TV

I tested from a different server on my lan, and I can definitely mount and read 
from those via nfs too.
/etc/dfs/dfstab has no entries.

If I tell them to turn it off again, they disappear from the share list:
-bash-3.2# zfs set sharenfs=off store/tv
-bash-3.2# zfs set sharenfs=off store/movies
-bash-3.2# zfs set sharenfs=off rpool/export
-bash-3.2# dfshares
-bash-3.2#

Then re-share the one I actually want to nfs export:
-bash-3.2# zfs set sharenfs=on store/overlord2

For a moment it's correct:
-bash-3.2# dfshares
RESOURCE  SERVER ACCESSTRANSPORT
reaver:/store/overlord2   reaver  - -

But if I kill mountd (letting smf restart it)
-bash-3.2# ps -ef | grep mount
root  1343 1   0 19:54:38 ?   0:00 /usr/lib/autofs/automountd
root  1345  1343   0 19:54:38 ?   0:00 /usr/lib/autofs/automountd
root  1402  1081   0 20:12:56 pts/3   0:00 grep mount
root  1335 1   0 19:54:33 ?   0:00 /usr/lib/nfs/mountd
-bash-3.2# kill 1335
-bash-3.2# dfshares
RESOURCE  SERVER ACCESSTRANSPORT
reaver:/store/overlord2   reaver  - -
reaver:/store/movies  reaver  - -
reaver:/exportreaver  - -
reaver:/store/tv  reaver  - -

Any ideas whats happening here? Am I doing something stupid somewhere?
Are sharesmb & sharenfs tied together somehow or can they be separated?

Cheers,
-Martin.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] USB hard to ZFS

2008-06-16 Thread Martin Winkelman
On Mon, 16 Jun 2008, Andrius wrote:

> # eject /rmdisk/unnamed_rmdisk
> No such file or directory
> # eject /dev/rdsk/c5t0d0s0
> /dev/rdsk/c5t0d0s0 is busy (try 'eject floppy' or 'eject cdrom'?)
> # eject rmdisk
> /vol/dev/rdsk/c5t0d0/unnamed_rmdisk: Inappropriate ioctl for device
> # eject /vol/dev/rdsk/c5t0d0/unnamed_rmdisk
> /vol/dev/rdsk/c5t0d0/unnamed_rmdisk: No such file or directory

# mount |grep rmdisk
/rmdisk/unnamed_rmdisk on /vol/dev/dsk/c2t0d0/unnamed_rmdisk:c 
read/write/setuid/devices/nohidden/nofoldcase/dev=16c1003 on Mon Jun 16 
12:51:07 2008
# eject unnamed_rmdisk
# mount |grep rmdisk
#


--
Martin Winkelman  -  [EMAIL PROTECTED]  -  303-272-3122
http://www.sun.com/solarisready/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] USB hard to ZFS

2008-06-16 Thread Martin Winkelman
On Mon, 16 Jun 2008, Andrius wrote:

> dick hoogendijk wrote:
>> On Mon, 16 Jun 2008 19:10:18 +0100
>> Andrius <[EMAIL PROTECTED]> wrote:
>> 
>>> /rmdisk/unnamed_rmdisk
>> umount /rmdisk/unnamed_rmdisk should do the trick
>> 
>> It's probably also mounted on /media depending on your solaris version.
>> If so, umount /media/unnamed_rmdisk unmounts the disk too.
>
> It is mounted on /rmdisk/unnamed_rmdisk. It is Solaris 10.
>
> #umount /rmdisk/unnamed_rmdisk
> umount: warning: /rmdisk/unnamed_rmdisk not in mnttab
> umount: /rmdisk/unnamed_rmdisk not mounted

This disk is probably under volume manager control. Try running "eject 
unnamed_rmdisk".

--
Martin Winkelman  -  [EMAIL PROTECTED]  -  303-272-3122
http://www.sun.com/solarisready/


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] What to do about retryable write errors?

2008-04-02 Thread Martin Englund
Oh, it should say retryable and normal write errors - I have permanent  
errors too

/Martin

On 2 apr 2008, at 00:55, Richard Elling wrote:
> Martin Englund wrote:
>> I've got a newly created zpool where I know (from the previous UFS)  
>> that one of the disks has retryable write errors.
>>
>> What should I do about it now? Just leave zfs to deal with it?  
>> Repair it?
>>
>
> Retryable write errors are not fatal, they are retried.
> What do you think you can do to "repair" them?
> I'd raise an eyebrow, but otherwise not worry unless
> there are fatal errors.
> -- richard
>
>> If I should repair, if this procedure ok?
>>
>> zpool offline z2 c5t4d0
>> format -d c5t4d0
>> repair ...
>> zpool online z2 c5t4d0
>>
>> cheers,
>> /Martin
>>  This message posted from opensolaris.org
>> ___
>> zfs-discuss mailing list
>> zfs-discuss@opensolaris.org
>> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>>
>

cheers,
/Martin
-- 
Martin Englund, Security Engineer, .Sun Engineering, Sun Microsystems  
Inc.
Email: [EMAIL PROTECTED] Time Zone: GMT-3 PGP: 1024D/AA514677
"The question is not if you are paranoid, it is if you are paranoid  
enough."


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] What to do about retryable write errors?

2008-04-01 Thread Martin Englund
I've got a newly created zpool where I know (from the previous UFS) that one of 
the disks has retryable write errors.

What should I do about it now? Just leave zfs to deal with it? Repair it?

If I should repair, if this procedure ok?

zpool offline z2 c5t4d0
format -d c5t4d0
repair ...
zpool online z2 c5t4d0

cheers,
/Martin
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool DEGRADED after resilver

2008-03-07 Thread Martin Englund
Answering my own post :)

I ran zpool scrub which solved it:
weblogs # zpool status
  pool: storage
 state: ONLINE
 scrub: resilver completed with 0 errors on Fri Mar  7 16:01:08 2008
config:

NAMESTATE READ WRITE CKSUM
storage ONLINE   0 0 0
  raidz1ONLINE   0 0 0
c0t6d0  ONLINE   0 0 0
c1t6d0  ONLINE   0 0 0
c4t6d0  ONLINE   0 0 0
c5t6d0  ONLINE   0 0 0
c6t6d0  ONLINE   0 0 0
c7t6d0  ONLINE   0 0 0

cheers,
/Martin
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] zpool DEGRADED after resilver

2008-03-07 Thread Martin Englund
I replaced a failed disk today, and while the resilvering was running the 
system crashed. Once the server was back up the resilvering continued, but 
after it completed it is still in degraded mode:

weblogs # zpool status
  pool: storage
 state: DEGRADED
 scrub: resilver completed with 0 errors on Fri Mar  7 15:15:39 2008
config:

NAME  STATE READ WRITE CKSUM
storage   DEGRADED 0 0 0
  raidz1  DEGRADED 0 0 0
c0t6d0ONLINE   0 0 0
replacing DEGRADED 0 0 0
  c1t6d0s0/o  UNAVAIL  0 0 0  cannot open
  c1t6d0  ONLINE   0 0 0
c4t6d0ONLINE   0 0 0
c5t6d0ONLINE   0 0 0
c6t6d0ONLINE   0 0 0
c7t6d0ONLINE   0 0 0

How do I get this back to normal?

cheers,
/Martin
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [dtrace-discuss] periodic ZFS disk accesses: try er_kernel

2008-03-01 Thread Martin . Itzkowitz
Bill Shannon wrote:
> Marty Itzkowitz wrote:
>> Interesting problem.  I've used disk rattle as a measurement of io 
>> activity before
>> there were such tools for measurement.  It's crude, but effective.
>>
>> To answer your question: you could try er_kernel.  It uses DTrace to
>> do statistical callstack sampling, and is described on our  kernel 
>> profiling page .
>> That page is woefully out of date, but is basically correct w.r.t usage.
>> We test it on Nevada and S10, and S9.
>>
>> Roch Bourbonnais (PAE), who did most of the development, used it to 
>> track
>> down the root cause of a similar-sounding problem in VxFS.  See the 
>> slides
>> towards the end of the presentation on Kernel Profiling 
>>  dated June, 2002.
>>
>> The best version to use on a late snv is the one from our 
>> nightly-build 
>>
>> If you have any problems, please contact Roch and/or me;
>> if you give it a try, please report your experience back to our interest
>> alias,  [EMAIL PROTECTED]  .
>
> I looked through the page and the presentation and it wasn't clear to me
> that this was going to give me information on which file was being
> accessed or which process was doing the accessing.  I really want 
> "tracing",
> not "profiling".  Statistically, the disk isn't being accessed at all.

It should give you information on what callstacks in the kernel are 
triggering
the disk usage.  You should see something with the same periodicity as the
rattle you hear.

Marty
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Multiple ZFS partitions on overlapping region of USB stick

2007-11-29 Thread Martin
and when I re-created it, the duplicate disappeared...

# zpool destroy black
# zpool create -f newblack c5t0d0
# zpool export newblack
# zpool import
  pool: newblack
id: 5325813934475784040
 state: ONLINE
action: The pool can be imported using its name or numeric identifier.
config:

newblackONLINE
  c5t0d0ONLINE
#
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Multiple ZFS partitions on overlapping region of USB stick

2007-11-29 Thread Martin
I used a usb stick, and the first time I used it, I used something similar to

zpool create black c5t0d0p0 # ie with the "p0" pseudo partition
and used it happily for some while.

Some weeks later, I wanted to use the stick again, starting afresh, but this 
time used
zpool create black c5t0d0 # ie *without* the "p0" pseudo partition

and later, when attempting to import it, I got offered *both* the black pools, 
the original (ie old and overwritten) one from c5t0d0p0, and the newer good one 
from c5t0d0

Should zfs protect against this "user error"?  (I'm not even sure why it 
occurred, since I had assumed that both pseudo devices would map to a similar 
region)



# uname -a
SunOS mouse 5.11 snv_77 i86pc i386 i86pc
# 
# rmformat
Looking for devices...
 1. Logical Node: /dev/rdsk/c5t0d0p0
Physical Node: /[EMAIL PROTECTED],0/pci1043,[EMAIL PROTECTED],1/[EMAIL 
PROTECTED]/[EMAIL PROTECTED],0
Connected Device: SanDisk  U3 Cruzer Micro  3.27
Device Type: Removable
Bus: USB
Size: 3.9 GB
Label: 
Access permissions: Medium is not write protected.
# zpool import
  pool: black
id: 13810954658225353291
 state: ONLINE
status: The pool is formatted using an older on-disk version.
action: The pool can be imported using its name or numeric identifier, though
some features will not be available without an explicit 'zpool upgrade'.
config:

black   ONLINE
  c5t0d0ONLINE

  pool: black
id: 4667414672969078773
 state: ONLINE
status: The pool was last accessed by another system.
action: The pool can be imported using its name or numeric identifier and
the '-f' flag.
   see: http://www.sun.com/msg/ZFS-8000-EY
config:

black   ONLINE
  c5t0d0p0  ONLINE

### first the newer (ie) good one is imported and used ok
# zpool import 13810954658225353291
# ls /black
November
# zpool status black
  pool: black
 state: ONLINE
status: The pool is formatted using an older on-disk format.  The pool can
still be used, but some features are unavailable.
action: Upgrade the pool using 'zpool upgrade'.  Once this is done, the
pool will no longer be accessible on older software versions.
 scrub: none requested
config:

NAMESTATE READ WRITE CKSUM
black   ONLINE   0 0 0
  c5t0d0ONLINE   0 0 0

errors: No known data errors
# 
# find black -depth -print | cpio -pmd /var/tmp
788016 blocks

# zpool scrub black
# zpool status black
  pool: black
 state: ONLINE
status: The pool is formatted using an older on-disk format.  The pool can
still be used, but some features are unavailable.
action: Upgrade the pool using 'zpool upgrade'.  Once this is done, the
pool will no longer be accessible on older software versions.
 scrub: scrub completed with 0 errors on Thu Nov 29 12:51:08 2007
config:

NAMESTATE READ WRITE CKSUM
black   ONLINE   0 0 0
  c5t0d0ONLINE   0 0 0

errors: No known data errors
# 

# 
## ...and now the older one that's most likely corrupt is used...
# zpool export black 

# zpool import -f 4667414672969078773
# ls /black
October
# 
# zpool scrub black

...some time passes...

# zpool status black
  pool: black
 state: DEGRADED
status: One or more devices has experienced an error resulting in data
corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
entire pool from backup.
   see: http://www.sun.com/msg/ZFS-8000-8A
 scrub: scrub completed with 7224 errors on Thu Nov 29 12:56:47 2007
config:

NAMESTATE READ WRITE CKSUM
black   DEGRADED 0 0 26.6K
  c5t0d0p0  DEGRADED 0 0 26.6K  too many errors

errors: 7073 data errors, use '-v' for a list
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [xen-discuss] xVm blockers!

2007-11-29 Thread Martin
Regarding the following that I also hit, see 
http://www.opensolaris.org/jive/thread.jspa?messageID=180995
and if any further details or tests are required, I would be happy to assist.

> 3/ Problem with DMA under Xen ... e.g. my areca raid cards works
> perfect on a 8GB box without xen but because of the way xen allocates
> memory... I am forced to allocate only 1 or 2 gig for the dom0 or the
> areca drivers will fail miserably trying to do DMA above the first 4G
> address space. This very same problem affected xen under linux over a
> year ago and seems to have been addressed. Several persons on the ZFS
> discuss list who complain about poor ZFS IO performance are affected
> by this issue.

This should be relatively easy to fix assuming I can get
access to similar H/W.

Do you get any error messages? We do have a bug in contig alloc
(allocs too much memory) which was recently found which is
affecting nv_sata based systems. It may be related to that
or something that the driver could be doing better.

Can you send me more details around your setup (card your
using, what's connected to it, where you got the driver
and what version you have), behavior and perf on metal,
behavior and perf on xVM.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS very slow under xVM

2007-11-29 Thread Martin
I set /etc/system's zfs:zfs_arc_max = 0x1000 and it seems better now.

I had previously tried setting it to 2Gb rather than 256Mb as above without 
success... I should have tried much lower!

It "seems" that when I perform I/O though a WindowsXP hvm, I get a "reasonable" 
I/O rate, but I'm not sure at this point in time.  When a write is made from 
within the hvm VM, would I expect for the same DMA issue to arise? (I can't 
really tell either way aty the moment because it's not super fast anyway)
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS very slow under xVM

2007-11-12 Thread Martin
IIn this PC, I'm using the PCI card 
http://www.intel.com/network/connectivity/products/pro1000gt_desktop_adapter.htm
 , but, more recentlyI'm using the PCI Express card 
http://www.intel.com/network/connectivity/products/pro1000pt_desktop_adapter.htm

Note that the latter didn't have PXE and the boot ROM enabled (for JumpStart), 
contrary the the documentation, and I had to download the DOS program from the 
Intel site to enable it.  (please ask if anyone needs the URL) 

...so, for an easy life, I recommend the Intel PRO/ 1000 GT Desktop
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS very slow under xVM

2007-11-08 Thread Martin
Well, I've tried the latest OpenSolaris snv_76 release, and it displays the 
same symptoms.
(so b66-0624-xen, 75a and 76 all have the same problem)

But, the good news is that is behaves well if there is only 2Gb of memory in 
the system.

So, in summary

The command time dd if=/dev/zero of=myfile.dat bs=16k count=15

...takes around 30 seconds if running on "bare metal" (ie when the Grub menu 
does *not* select xVM/Xen ... ie when not running under Dom0)

...takes around 30 seconds in Dom0 when the Grub boot selected Xen (but only if 
2Gb memory)

...takes "forever", with the IO rate dropping from an initial 70Mb/s to around 
1M/s, if booted under Xen, and executed within Dom0, and there is either 4Gb (2 
DIMMs, single channel), 4Gb (2 Dimmon, dual channel), or 8Gb (dual channel).

Anyone else using an IP35 based board and >2Gb memory?

Anyone using 8Gb memory with Xen on a "retail" based motherboard?
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS very slow under xVM

2007-11-06 Thread Martin
kugutsum

  I tried with just 4Gb in the system, and the same issue.  I'll try 2Gb 
tomorrow and see if any better.(ps, how did you determine that was the 
problem in your case)

cheers

Martin
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS very slow under xVM

2007-11-04 Thread Martin
Mitchell

 The problem seems to occur with various IO patterns.  I first noticed it after 
using ZFS based storage for a disk image for a xVM/Xen virtual domain, and 
then, while tracking ti down, observed that either "cp" of a large .iso disk 
image would reproduce the problem, and more later, a single "dd if=/dev/zero 
of=myfile bs=16k count=15" would.  So I guess this latter case is a mostly 
write pattern to the disk, especially after it is noted that the command 
returns after around 5 seconds, leaving the rest buffered in memory.

best regards

Martin
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS very slow under xVM

2007-11-03 Thread Martin
> The behaviour of ZFS might vary between invocations, but I don't think that
> is related to xVM. Can you get the results to vary when just booting under
> "bare metal"?

It's pretty consistently displays the behaviors of good IO (approx 60Mb/s - 
80Mb/s) for about 10-20 seconds, then always drops to approx 2.5 Mb/s for 
virtually all of the rest of the output. It always displays this when running 
under xVM/Xen with Dom0, and never on bare metal when xVM/Xen isn't booted.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS very slow under xVM

2007-11-02 Thread Martin
I've removed half the memory, leaving 4Gb, and rebooted into "Solaris xVM", and 
re-tried under Dom0.  Sadly, I still get a similar problem.  With "dd 
if=/dev/zero of=myfile bs=16k count=15" I get command returning in 15 
seconds, and "zpool iostat 1 1000" shows 22 records with an IO rate of around 
80M, then 209 records of 2.5M (pretty consistent), then the final 11 records 
climbing to 2.82, 3.29, 3.05, 3.32, 3.17, 3.20, 3.33, 4.41, 5.44, 8.11

regards

Martin
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS very slow under xVM

2007-11-01 Thread Martin
Hello

I've got Solaris Express Community Edition build 75 (75a) installed on an Asus 
P5K-E/WiFI-AP (ip35/ICH9R based) board.  CPU=Q6700, RAM=8Gb, disk=Samsung 
HD501LJ and (older) Maxtor 6H500F0.

When the O/S is running on bare metal, ie no xVM/Xen hypervisor, then 
everything is fine.

When it's booted up running xVM and the hypervisor, then unlike plain disk I/O, 
and unlike svm volumes, zfs is around 20 time slower.

Specifically, with either a plain ufs on a raw/block disk device, or ufs on a 
svn meta device, a command such as dd if=/dev/zero of=2g.5ish.dat bs=16k 
count=15 takes less than a minute, with an I/O rate of around 30-50Mb/s.

Similary, when running on bare metal, output to a zfs volume, as reported by 
zpool iostat, shows a similar high output rate. (also takes less than a minute 
to complete).

But, when running under xVM and a hypervisor, although the ufs rates are still 
good, the zfs rate drops after around 500Mb.

For instance, if a window is left running zpool iostat 1 1000, then after the 
"dd" command above has been run, there are about 7 lines showing a rate of 
70Mbs, then the rate drops to around 2.5Mb/s until the entire file is written.  
Since the dd command initially completes and returns control back to the shell 
in around 5 seconds, the 2 gig of data is cached and is being written out.  
It's similar with either the Samsung or Maxtor disks (though the Samsung are 
slightly faster).

Although previous releases running on bare metal (with xVM/Xen) have been fine, 
the same problem exists with the earlier b66-0624-xen drop of Open Solaris
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS pool on USB flash disk

2007-07-12 Thread Martin Man
[EMAIL PROTECTED] wrote:
> [EMAIL PROTECTED] wrote:
>> it might be a faq or known problem, but it's rather dangerous, is this
>> being worked ON? usb stick removal should not panic the kernel, should
> it?
> 
> I think the default behavior is that if the pool is unprotected (or at an
> unprotected state via redundancy failure on mirror or raidz(2)) and you
> lose a device the system panics. This is a known issue/bug/feature (pick
> one depending on your view) that has been discussed multiple times on the
> list.

discussed yes, I think I remember that, reported? being worked on?

> -Wade

thanx,
Martin
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How to list pools that are not imported

2007-07-12 Thread Martin Man
Menno Lageman wrote:
> Martin Man wrote:
>>
>> I insert the stick, and how can I figure out what poools are available 
>> for 'zpool import' without knowing their name?
>>
>> zpool list does not seem to be listing those,
>>
> 
> A plain 'zpool import' should do the trick.

yep, works like a charm, that one was easy, thanx... :-)

> 
> Menno

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] How to list pools that are not imported

2007-07-12 Thread Martin Man
Hi all,

again might be a FAQ, but imagine that I have a pool on USB stick,

I insert the stick, and how can I figure out what poools are available 
for 'zpool import' without knowing their name?

zpool list does not seem to be listing those,

thanx,
Martin
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS pool on USB flash disk

2007-07-12 Thread Martin Man
Hi all,

Nevada build 67, USB flash voayager, ...

created zpool on one of the FDISK partitions on the flash drive, zpool 
import export works fine,

tried to take the USB stick out of the system while the pool is mounted, 
..., 3 seconds, bang, kernel down, core dumped, friendly reboot on the 
way...

it might be a faq or known problem, but it's rather dangerous, is this 
being worked ON? usb stick removal should not panic the kernel, should it?

thanx,
Martin
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: Trying to understand zfs RAID-Z

2007-05-19 Thread Martin
> Quoth Steven Sim on Thu, May 17, 2007 at 09:55:37AM
> +0800:
> >Gurus;
> >I am exceedingly impressed by the ZFS although
> it is my humble opinion
> >that Sun is not doing enough evangelizing for
> it.
> 
> What else do you think we should be doing?
> 
> 
> David

I'll jump in here.  I am a huge fan of ZFS.  At the same time, I know about 
some of its warts.

ZFS hints at adding agility to data management and is a wonderful system.  At 
the same time, it operates on some assumptions which are antithetical to data 
agility, including:
* inability to online restripe: add/remove data/parity disks
* inability to make effective use of varying sized disks

In one breath ZFS says, "Look how well you can dynamically alter filesystem 
storage."

In another breath ZFS says, "Make sure that your pools have identical spindles 
and you have accurately predicted future bandwidth, access time, vdev size, and 
parity disks.  Because you can't change any of that later."

I know, down the road you can tack new vdevs onto the pool, but that really 
misses the point.  Even so, if I accidentally add a vdev to a pool and then 
realize my mistake, I am sunk.  Once a vdev is added to a pool, it is attached 
to the pool forever.

Ideally I could provision a vdev, later decide that I need a disk/LUN from that 
vdev and simply remove the disk/LUN, decreasing the vdev capacity.  I should 
have the ability to decide that current redundancy needs are insufficient and 
allocate [b]any[/b] number of new parity disks.  I should be able to have a 
pool from a rack of 15x250GB disks and then later add a rack of 11x750GB disks 
[b]to the vdev[/b], not by making another vdev.

I should have the luxury of deciding to put hot Oracle indexes on their own 
vdev, deallocate spindles form an existing vdev and put those indexes on the 
new vdev.  I should be able to change my mind later and put it all back.

Most importantly is the access time issue.  Since there are no partial-stripe 
reads in ZFS, then access time for a RAIDZ vdev is the same as single-disk 
access time, no matter how wide the stripe is.

How to evangelize better?

Get rid of the glaring "you can't change it later" problems.

Another thought is that flash storage has all of the indicators of being a 
disruptive technology described in [i]The Innovator's Dilemma[/i].  What this 
means is that flash storage [b]will[/b] take over hard disks.  It is 
inevitable.  ZFS has a weakness with access times but handles single-block 
corruption very nicely.  ZFS also has the ability to do very wide RAIDZ 
stripes, up to 256(?) devices, providing mind-numbing throughput.

Flash has near-zero access times and relatively low throughput.  Flash is also 
prone to single-block failures once the erase-limit has been reached for a 
block.

ZFS + Flash = near-zero access time, very high throughput and high data 
integrity.

To answer the question: get rid of the limitations and build a Thumper-like 
device using flash.  Market it for Oracle redo logs, temp space, swap space 
(flash is now cheaper than RAM), anything that needs massive throughput and 
ridiculous iops numbers, but not necessarily huge storage.

Each month, the cost of flash will fall 4% anyway, so get ahead of the curve 
now.

My 2 cents, at least.

Marty
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Drobo

2007-04-17 Thread Martin Englund
Here's another product which has removed the hassle out of disk  
management:

<http://www.drobo.com/products_demo.aspx>

I wonder if they (Data Robotics) will make the Drobo work with ZFS  
once Leopard is out (since it supports HFS+)?


---8<---
Data Robotics has just introduced Drobo, the world’s first storage  
robot.   Drobo is a direct attached storage array that provides fully  
automated storage that is very easy to use.  Drobo attaches via USB  
2.0, with no host software required.


Drobo combines up to 4 drives (SATA I or III) into a pool of  
protected storage (i.e. with the protection levels of RAID 5 but with  
none of the hassles of RAID).   Drobo’s capacity can be upgraded on  
the fly (hot-swappable) with drives of different capacities and  
speeds, and from different manufacturers.


There is a video demonstration of Drobo on www.drobo.com and there  
are multiple postings about Drobo on the web, including:

http://www.engadget.com/2007/04/09/drobo-the-worlds-first-storage-robot/
---8<---

cheers,
/Martin
--
Martin Englund, Java Security Engineer, Java SE, Sun Microsystems Inc.
Email: [EMAIL PROTECTED] Time Zone: GMT+2 PGP: 1024D/AA514677
"The question is not if you are paranoid, it is if you are paranoid  
enough."



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Add mirror to an existing Zpool

2007-04-10 Thread Martin Girard
Hi,

I have a zpool with only one disk. No mirror.
I have some data in the file system.

Is it possible to make my zpool redundant by adding a new disk in the pool
and making it a mirror with the initial disk?
If yes, how?

Thanks

Martin
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: How much do we really want zpool remove?

2007-01-18 Thread Martin
> Jeremy Teo wrote:
> > On the issue of the ability to remove a device from
> a zpool, how
> > useful/pressing is this feature? Or is this more
> along the line of
> > "nice to have"?
> 
> This is a pretty high priority.  We are working on
> it.

Good news!  Where is the discussion on the best approach to take?

> On 18/01/2007, at 9:55 PM, Jeremy Teo wrote:
> The most common reason is migration of data to new
> storage  
> infrastructure. The experience is often that the
> growth in disk size  
> allows the new storage to consist of fewer disks/LUNs
> than the old.

I agree completely.  No matter how wonderful your current FC/SAS/whatever 
cabinet is, at some point in the future you will want to migrate to another 
newer/faster array with a better/faster interface, probably on fewer disks.  
The "just add another top level vdev" approach to growing a RAIDZ pool seems a 
bit myopic.

> On Thu, 2007-01-18 at 10:51 -0800, Matthew Ahrens
> wrote:
> I'd consider it a lower priority than say, adding a
> drive to a RAIDZ
> vdev, but yes, being able to reduce a zpool's size by
> removing devices
> is quite useful, as it adds a considerable degree of
> flexibility that
> (we) admins crave.

These two items (removing a vdev and restriping an array) are probably closely 
related.  At the core of either operation likely will center around some 
metaslab_evacuate() routine which empties a metaslab and puts the data onto 
another metaslab.

Evacuating a vdev could be no more than evacuating all of the metaslabs in the 
vdev.

Restriping (adding/removing a data/parity disk) could be no more than 
progressively evacuating metaslabs with the old stripe geometry and writing the 
data to metaslabs with the new stripe geometry.  The biggest challenge while 
restriping might be getting the read routine to figure out on-the-fly which 
geometry is in use for any particular stripe.  Even so, this shouldn't be too 
big of a challenge: one geometry will checksum correctly and the other will not.

Marty
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: Re[2]: Re: Adding disk to a RAID-Z?

2007-01-10 Thread Martin
> Hello Kyle,
> 
> Wednesday, January 10, 2007, 5:33:12 PM, you wrote:
> 
> KM> Remember though that it's been mathematically
> figured that the 
> KM> disadvantages to RaidZ start to show up after 9
> or 10 drives. (That's 
> 
> Well, nothing like this was proved and definitely not
> mathematically.
> 
> It's just a common sense advise - for many users
> keeping raidz groups
> below 9 disks should give good enough performance.
> However if someone
> creates raidz group of 48 disks he/she probable
> expects also
> performance and in general raid-z wouldn't offer one.

Wow, lots of good discussion here.  I started the idea of allowing a RAIDZ 
group to grow to arbitrary drives because I was unaware of the downsides to 
massive pools.  From my RAID5 experience, a perfect world would be large 
numbers of data spindles and a sufficient number of parity spindles, e.g. 99+17 
(99 data drives and 17 parity drives).  In RAID5 this would give massive iops 
and redundancy.

After studying the code and reading the blogs, a few things have jumped out, 
with some interesting (and sometimes goofy) implications.  Since I am still 
learning, I could be wrong on any of the following.

RAIDZ pools operate with a storage granularity of one stripe.  If you request a 
read of a block within the stripe, you get the whole stripe.  If you modify a 
block within the stripe, the whole stripe is written to a different location 
(ala COW).

This implies that ANY read will require the whole stripe, therefore all 
spindles to seek and read a sector.  All drives will return the sectors 
(mostly) simultaneously.  For performance purposes, a RAIDZ pool seeks like a 
single drive would and has the throughput of multiple drives.  Unlike 
traditional RAID5, adding more spindles does NOT increase read IOPS.

Another implication is ZFS checksums the stripe, not the component sectors.  If 
a drive silently returns a bad sector, ZFS only knows is that the whole stripe 
is bad (which could probably also be inferred from a bogus parity sector).  ZFS 
has no clue which drive produced bad data, only that the whole stripe failed 
the checksum.  ZFS finds the offending sector by process of elimination: going 
through the sectors one at a time, throwing away the data actually read, 
reconstructing the data from the parity then determining if the stripe passes 
the checksum.

Two parity drives make this a bigger problem still, almost squaring the number 
of computations needed.  If a stripe has enough parity drives, then the cost of 
determining N bad data sectors in a stripe is roughly O(k^N), where k is some 
constant.

Another implication is that there is no RAID5 "write penalty."  More 
accurately, the write penalty is incurred during the read operation where an 
entire stripe is read.

Finally, there is no need to rotate parity.  Rotating parity was introduced in 
RAID5 because every write of a single sector in a stripe also necessitated the 
read and subsequent write of the parity sector.  Since there are no partial 
stripe writes in ZFS, there is no need to read then write the parity sector.

For those in the know, where I am off base here?

Thanks!
Marty
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: Adding disk to a RAID-Z?

2007-01-09 Thread Martin
> I agree for non enterprise users the expansion of
> raidz vdevs is a critical missing feature.

Now you've got me curious.  I'm not trying to be inflammatory here, but how is 
online expansion a non-enterprise feature?  From my perspective, enterprise 
users are the ones most likely to keep legacy filesystems for extended lengths 
of time, well past any rational usage plan.  Enterprise users are also the ones 
most likely to need 24/7 availability.  Any hacker-in-a-basement can take a 
storage pool offline to expand or contract it, while enterprise users lack this 
luxury.

Experience taught me that enterprise users most need future flexibility and 
zero downtime.

Again, I'm not arguing here, only interested in your contrasting viewpoint.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Re: Adding disk to a RAID-Z?

2007-01-08 Thread Martin
> I want to setup a ZFS server with RAID-Z.  Right now
> I have 3 disks.  In 6 months, I want to add a 4th
> drive and still have everything under RAID-Z without
> a backup/wipe/restore scenario.  Is this possible?

I am trying to figure out how to code this right now, as I see it being one of 
most needed and ignored features of ZFS.  Unfortunately, there exists precious 
little documentation of how the stripes are laid out, so I find myself studying 
the code.

In addition to having the ability to add/remove a data drive, I can see use 
cases for:
* Adding/removing arbitrary numbers of parity drives.
Raidz2 uses Reed-Solomon codes for the 2nd parity, which implies that there is 
no practical limit on the number of parity drives.
* Maximizing the use of different disk sizes
Allowing the stripe geometry to vary throughout the vdev would allow maximal 
use of space for different size devices, while preserving the desired fault 
tolerance.

If such capabilities exist, you could start with a single disk vdev and grow it 
to consume a large disk farm with any number of parity drives, all while the 
system is fully available.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


  1   2   >