Re: [zfs-discuss] Fwd: ZFS for consumers WAS:Yager on ZFS

2007-11-19 Thread Ian Collins
Paul Kraus wrote:

 I also like being able to see how much space I am using for
 each with a simple df rather than a du (that takes a while to run). I
 can also tune compression on a data type basis (no real point in
 trying to compress media files that are already compressed MPEG and
 JPEGs).

   
That's a very good point.  I do the same and as a side effect, my data
has never been better organised.

For a home user, data integrity is probably as, if not more, important
than for a corporate user.  How many home users do regular backups?

Ian.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] pls discontinue troll bait was: Yager on ZFS and

2007-11-19 Thread can you guess?
 OTOH, when someone whom I don't know comes across as
 a pushover, he loses 
 credibility.

It may come as a shock to you, but some people couldn't care less about those 
who assess 'credibility' on the basis of form rather than on the basis of 
content - which means that you can either lose out on potentially useful 
knowledge by ignoring them due to their form, or change your own attitude.

 I'd expect a senior engineer to show not only
 technical expertise but also 
 the ability to handle difficult situations, *not*
 adding to the 
 difficulties by his comments.

Another surprise for you, I'm afraid:  some people just don't meet your 
expectations in this area.  In particular, I only 'show my ability to handle 
difficult situations' in the manner that you suggest when I have some actual 
interest in the outcome - otherwise, I simply do what I consider appropriate 
and let the chips fall where they may.

Deal with that in whatever manner you see fit (just as I do).

- bill
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Fwd: ZFS for consumers WAS:Yager on ZFS

2007-11-19 Thread Mario Goebbels
 For a home user, data integrity is probably as, if not more, important
 than for a corporate user.  How many home users do regular backups?

I'm a heavy computer user and probably passed the 500GB mark way before
most other home users, did various stunts like running a RAID0 on IBM
Deathstars, and I never back up.

And I'm only running a ZFS mirror since a month or two, as insurance
against disk failure (suddenly felt I needed to do this).

What ZFS can give home users is safety for certain parts of their data,
via checksums and ditto blocks. Doesn't prevent disk failure, but sure
helps keeping important personal documents uncorrupted.

-mg
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Recommended settings for dom0_mem when using zfs

2007-11-19 Thread K

I have a xVm b75 server and use zfs for storage (zfs root mirror and a  
raid-z2 datapool.)

I see everywhere that it is recommended to have a lot of memory on a  
zfs file server... but I also need to relinquish a lot of my memory to  
be used by the domUs.

What would a good value for dom0_mem on a box with 4 gig of ram?

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [xen-discuss] Recommended settings for dom0_mem when using zfs

2007-11-19 Thread Mark Johnson


K wrote:
 I have a xVm b75 server and use zfs for storage (zfs root mirror and a  
 raid-z2 datapool.)
 
 I see everywhere that it is recommended to have a lot of memory on a  
 zfs file server... but I also need to relinquish a lot of my memory to  
 be used by the domUs.
 
 What would a good value for dom0_mem on a box with 4 gig of ram?

I have found setting dom0_mem to 2G (with zfs in dom0)
works pretty well as long as you don't balloon down.

If you want to set dom0_mem to 1G, you should limit
the amount of memory zfs uses. e.g. set the following
in /etc/system.

  set zfs:zfs_arc_max = 0x1000




MRJ

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Fwd: ZFS for consumers WAS:Yager on ZFS

2007-11-19 Thread Paul Kraus
On 11/19/07, Ian Collins [EMAIL PROTECTED] wrote:

 For a home user, data integrity is probably as, if not more, important
 than for a corporate user.  How many home users do regular backups?

Let me correct a point I made badly the first time around, I
value the data integrity provided by mirroring (I have always used
mirrored drives for data and OS on my home servers), I don't know how
much the end-to-end checksumming buys me, but it is not a compelling
feature. In other words, I didn't choose ZFS because of the end-to-end
checksumming, I chose it for the ease of management and flexibility in
configuration. The checksummed data is just a bonus that came along
for the ride :-)

Remember, this thread was essentially Why would a home user
choose ZFS over other options... I tried using software mirrors under
Linux ... maybe I was spoiled by Disk Suite / Solaris Volume Manager,
but I found the Linux software mirrors clunky and unreliable (when
installing the OS, the metadevices came up in one order, after booting
off of the hard disk they came up in another order, leaving my
mirrored root unmountable). I'm not a big fan of hardware RAID as I
have seen terrible performance out of HW RAID cards and from the OS
layer you need additional hardware vendor drivers to really manage and
monitor the drives (if you even can from the OS layer, I hate
rebooting, even home servers).

Just one geeks opinion.

-- 
Paul Kraus
Albacon 2008 Facilities
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs on a raid box

2007-11-19 Thread Dan Pritts
On Mon, Nov 19, 2007 at 11:10:32AM +0100, Paul Boven wrote:
 Any suggestions on how to further investigate / fix this would be very
 much welcomed. I'm trying to determine whether this is a zfs bug or one
 with the Transtec raidbox, and whether to file a bug with either
 Transtec (Promise) or zfs.

the way i'd try to do this would be to use the same box under solaris
software RAID, or better yet linux or windows software RAID (to make
sure it's not a solaris device driver problem).

does pulling the disk then get noticed?  If so, it's a zfs bug.  

danno
--
Dan Pritts, System Administrator
Internet2
office: +1-734-352-4953 | mobile: +1-734-834-7224
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] slog tests on read throughput exhaustion (NFS)

2007-11-19 Thread Roch - PAE
Neil Perrin writes:
  
  
  Joe Little wrote:
   On Nov 16, 2007 9:13 PM, Neil Perrin [EMAIL PROTECTED] wrote:
   Joe,
  
   I don't think adding a slog helped in this case. In fact I
   believe it made performance worse. Previously the ZIL would be
   spread out over all devices but now all synchronous traffic
   is directed at one device (and everything is synchronous in NFS).
   Mind you 15MB/s seems a bit on the slow side - especially is
   cache flushing is disabled.
  
   It would be interesting to see what all the threads are waiting
   on. I think the problem maybe that everything is backed
   up waiting to start a transaction because the txg train is
   slow due to NFS requiring the ZIL to push everything synchronously.
  
   
   I agree completely. The log (even though slow) was an attempt to
   isolate writes away from the pool. I guess the question is how to
   provide for async access for NFS. We may have 16, 32 or whatever
   threads, but if a single writer keeps the ZIL pegged and prohibiting
   reads, its all for nought. Is there anyway to tune/configure the
   ZFS/NFS combination to balance reads/writes to not starve one for the
   other. Its either feast or famine or so tests have shown.
  
  No there's no way currently to give reads preference over writes.
  All transactions get equal priority to enter a transaction group.
  Three txgs can be outstanding as we use a 3 phase commit model:
  open; quiescing; and syncing.

That makes me wonder if this is not just the lack of write
throttling issue. If one txg is syncing and the other is
quiesced out, I think it means we have let in too many
writes. We do need a better balance.

Neil is  it correct that  reads never hit txg_wait_open(), but
they just need an I/O scheduler slot ?

If so seems to me just a matter of 

6429205 each zpool needs to monitor it's throughput and throttle heavy 
writers

However, if this is it, disabling the zil would not solve the
issue (it might even make it worse). So I am lost as to
what could be blocking the reads other than lack of I/O
slots. As another way to improve I/O scheduler we have :


6471212 need reserved I/O scheduler slots to improve I/O latency of 
critical ops



-r

  
  Neil.
  
  ___
  zfs-discuss mailing list
  zfs-discuss@opensolaris.org
  http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] slog tests on read throughput exhaustion (NFS)

2007-11-19 Thread Neil Perrin


Roch - PAE wrote:
 Neil Perrin writes:
   
   
   Joe Little wrote:
On Nov 16, 2007 9:13 PM, Neil Perrin [EMAIL PROTECTED] wrote:
Joe,
   
I don't think adding a slog helped in this case. In fact I
believe it made performance worse. Previously the ZIL would be
spread out over all devices but now all synchronous traffic
is directed at one device (and everything is synchronous in NFS).
Mind you 15MB/s seems a bit on the slow side - especially is
cache flushing is disabled.
   
It would be interesting to see what all the threads are waiting
on. I think the problem maybe that everything is backed
up waiting to start a transaction because the txg train is
slow due to NFS requiring the ZIL to push everything synchronously.
   

I agree completely. The log (even though slow) was an attempt to
isolate writes away from the pool. I guess the question is how to
provide for async access for NFS. We may have 16, 32 or whatever
threads, but if a single writer keeps the ZIL pegged and prohibiting
reads, its all for nought. Is there anyway to tune/configure the
ZFS/NFS combination to balance reads/writes to not starve one for the
other. Its either feast or famine or so tests have shown.
   
   No there's no way currently to give reads preference over writes.
   All transactions get equal priority to enter a transaction group.
   Three txgs can be outstanding as we use a 3 phase commit model:
   open; quiescing; and syncing.
 
 That makes me wonder if this is not just the lack of write
 throttling issue. If one txg is syncing and the other is
 quiesced out, I think it means we have let in too many
 writes. We do need a better balance.
 
 Neil is  it correct that  reads never hit txg_wait_open(), but
 they just need an I/O scheduler slot ?

Yes, they don't modify any meta data (except access time which is
handled separately). I'm less clear about what happens further
down in the DMU and SPA.


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] raidz2

2007-11-19 Thread Brian Lionberger
If I yank out a disk in a raidz2 4 disk array, shouldn't the other disks 
pick up without any errrors?
I have a 3120 JBOD and I went and yanked out a disk and the everything 
got hosed. It's okay, because I'm just testing stuff and wanted to see
raidz2 in action when a disk goes down.

Am I missing a step?

I set up the disks with this command:

zpool create apool raidz2 c#t#d0 c#t#d0 c#t#d0 c#t#d0.
zfs create apool/export_home
zfs create apool/export_backup
zfs set mountpoint=/export/home apool/export_home
zfs set mountpoint=/export/backup apool/export_backup

Thanks for your help/advice.
Brian.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] raidz2

2007-11-19 Thread Eric Schrock
On Mon, Nov 19, 2007 at 04:33:26PM -0700, Brian Lionberger wrote:
 If I yank out a disk in a raidz2 4 disk array, shouldn't the other disks 
 pick up without any errrors?
 I have a 3120 JBOD and I went and yanked out a disk and the everything 
 got hosed. It's okay, because I'm just testing stuff and wanted to see
 raidz2 in action when a disk goes down.
 
 Am I missing a step?

What version of Solaris are you running?  What does got hosed mean?

There have been many improvements in proactively detecting failure,
culminating in build 77 of Nevada.  Earlier builds:

- Were unable to distinguish device removal from devices misbehaving,
  depending on the driver and hardware.

- Did not diagnose a series of I/O failures as disk failure.

- Allowed several (painful) SCSI retries and continued to queue up I/O,
  even if the disk was fatally damaged.

Most classes of hardware would behave reasonably well on device removal,
but certain classes caused cascading failures in ZFS, all which should
be resolved in build 77 or later.

- Eric

--
Eric Schrock, FishWorkshttp://blogs.sun.com/eschrock
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] pls discontinue troll bait was: Yager on ZFS and

2007-11-19 Thread can you guess?
 :  Big talk from someone who seems so intent on hiding
 :  their credentials.
 
 : Say, what?  Not that credentials mean much to me since I evaluate people
 : on their actual merit, but I've not been shy about who I am (when I
 : responded 'can you guess?' in registering after giving billtodd as my
 : member name I was being facetious).
 
 You're using a web-based interface to a mailing list and the 'billtodd'
 bit doesn't appear to any users (such as me) subscribed via that
 mechanism.

Then perhaps Sun should make more of a point of this in their Web-based 
registration procedure.

  So yes, 'can you guess?' is unhelpful and makes you look as if
 you're being deliberately unhelpful.

Appearances can be deceiving, in large part because they're so subjective.  
That's why sensible people dig beneath them before forming any significant 
impressions.

...

 : If you're still referring to your incompetent alleged research, [...]
 : [...] right out of the
 : same orifice from which you've pulled the rest of your crap.
 
 It's language like that that is causing the problem.

No, it's ignorant loudmouths like cook and al who are causing the problem:  I'm 
simply responding to them as I see fit.

  IMHO you're being a
 tad rude.

I'm being rude as hell to people who truly earned it, and intend to continue 
until they shape up or shut up.  So if you feel that there's a problem here 
that you'd like to help fix, I suggest that you try tackling it at its source.

- bill
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Oops (accidentally deleted replaced drive)

2007-11-19 Thread Albert Chin
Running ON b66 and had a drive fail. Ran 'zfs replace' and resilvering
began. But, accidentally deleted the replacement drive on the array
via CAM.

# zpool status -v
...
  raidz2   DEGRADED 0 0 0
c0t600A0B800029996605964668CB39d0  ONLINE   0 0 0
spare  DEGRADED 0 0 0
  replacingUNAVAIL  0 79.14 
0  insufficient replicas
c0t600A0B8000299966059E4668CBD3d0  UNAVAIL 27   370 
0  cannot open
c0t600A0B800029996606584741C7C3d0  UNAVAIL  0 82.32 
0  cannot open
  c0t600A0B8000299CCC05D84668F448d0ONLINE   0 0 0
c0t600A0B8000299CCC05B44668CC6Ad0  ONLINE   0 0 0
c0t600A0B800029996605A44668CC3Fd0  ONLINE   0 0 0
c0t600A0B8000299CCC05BA4668CD2Ed0  ONLINE   0 0 0


Is there a way to recover from this?
  # zpool replace tww c0t600A0B8000299966059E4668CBD3d0 \
  c0t600A0B8000299CCC06734741CD4Ed0
  cannot replace c0t600A0B8000299966059E4668CBD3d0 with
  c0t600A0B8000299CCC06734741CD4Ed0: cannot replace a replacing device

-- 
albert chin ([EMAIL PROTECTED])
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Oops (accidentally deleted replaced drive)

2007-11-19 Thread Eric Schrock
You should be able to do a 'zpool detach' of the replacement and then
try again.

- Eric

On Mon, Nov 19, 2007 at 08:20:04PM -0600, Albert Chin wrote:
 Running ON b66 and had a drive fail. Ran 'zfs replace' and resilvering
 began. But, accidentally deleted the replacement drive on the array
 via CAM.
 
 # zpool status -v
 ...
   raidz2   DEGRADED 0 0   
   0
 c0t600A0B800029996605964668CB39d0  ONLINE   0 0   
   0
 spare  DEGRADED 0 0   
   0
   replacingUNAVAIL  0 79.14   
   0  insufficient replicas
 c0t600A0B8000299966059E4668CBD3d0  UNAVAIL 27   370   
   0  cannot open
 c0t600A0B800029996606584741C7C3d0  UNAVAIL  0 82.32   
   0  cannot open
   c0t600A0B8000299CCC05D84668F448d0ONLINE   0 0   
   0
 c0t600A0B8000299CCC05B44668CC6Ad0  ONLINE   0 0   
   0
 c0t600A0B800029996605A44668CC3Fd0  ONLINE   0 0   
   0
 c0t600A0B8000299CCC05BA4668CD2Ed0  ONLINE   0 0   
   0
 
 
 Is there a way to recover from this?
   # zpool replace tww c0t600A0B8000299966059E4668CBD3d0 \
   c0t600A0B8000299CCC06734741CD4Ed0
   cannot replace c0t600A0B8000299966059E4668CBD3d0 with
   c0t600A0B8000299CCC06734741CD4Ed0: cannot replace a replacing device
 
 -- 
 albert chin ([EMAIL PROTECTED])
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

--
Eric Schrock, FishWorkshttp://blogs.sun.com/eschrock
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Recommended many-port SATA controllers for budget ZFS

2007-11-19 Thread Brian Hechinger
On Sun, Nov 18, 2007 at 02:18:21PM +0100, Peter Schuller wrote:
  Right now I have noticed that LSI has recently began offering some
  lower-budget stuff; specifically I am looking at the MegaRAID SAS
  8208ELP/XLP, which are very reasonably priced.

I looked up the 8204XLP, which is really quite expensive compared to
the Supermicro MV based card.

That being said, for a small 1U box that is only going to have two SATA
disks, the Supermicro card is way overkill/overpriced for my needs.

Does anyone know if there are any PCI-X cards based on the MV88SX6041?

I'm not having much luck finding any.

Thanks,

-brian
-- 
Perl can be fast and elegant as much as J2EE can be fast and elegant.
In the hands of a skilled artisan, it can and does happen; it's just
that most of the shit out there is built by people who'd be better
suited to making sure that my burger is cooked thoroughly.  -- Jonathan 
Patschke
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] reslivering

2007-11-19 Thread Tim Cook
So... issues with reslivering yet again.  This is ~3TB pool.  I have one raid-z 
of 5 500GB disks, and a second pool of 3 300GB disks.  One of the 300GB disks 
failed, so I have replaced the drive.  After doing the resliver, it takes 
approximately 5 minutes for it to complete 68.05% of the reslivering... then it 
appears to just hang.  It's been this way for 30 hours now.

If I do a zpool status, the command does not finish, it just hangs after 
presenting the scrub: resliver in progress, 68.05% done

If I do a zpool iostat, it shows zero disk activity.  If I do a zpool iostat 
-v, the command hangs as well.

There's 0 activity to this pool as I suspended all shares while doing the 
reslivering in hopes it would speed things up.  I've seen previous threads 
about how slow this can be, but this is a bit ridiculous.  At this point I'm 
afraid to stick anymore data into a zpool if one disk failing will take weeks 
to rebuild.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Oops (accidentally deleted replaced drive)

2007-11-19 Thread Albert Chin
On Mon, Nov 19, 2007 at 06:23:01PM -0800, Eric Schrock wrote:
 You should be able to do a 'zpool detach' of the replacement and then
 try again.

Thanks. That worked.

 - Eric
 
 On Mon, Nov 19, 2007 at 08:20:04PM -0600, Albert Chin wrote:
  Running ON b66 and had a drive fail. Ran 'zfs replace' and resilvering
  began. But, accidentally deleted the replacement drive on the array
  via CAM.
  
  # zpool status -v
  ...
raidz2   DEGRADED 0 0 
  0
  c0t600A0B800029996605964668CB39d0  ONLINE   0 0 
  0
  spare  DEGRADED 0 0 
  0
replacingUNAVAIL  0 79.14 
  0  insufficient replicas
  c0t600A0B8000299966059E4668CBD3d0  UNAVAIL 27   370 
  0  cannot open
  c0t600A0B800029996606584741C7C3d0  UNAVAIL  0 82.32 
  0  cannot open
c0t600A0B8000299CCC05D84668F448d0ONLINE   0 0 
  0
  c0t600A0B8000299CCC05B44668CC6Ad0  ONLINE   0 0 
  0
  c0t600A0B800029996605A44668CC3Fd0  ONLINE   0 0 
  0
  c0t600A0B8000299CCC05BA4668CD2Ed0  ONLINE   0 0 
  0
  
  
  Is there a way to recover from this?
# zpool replace tww c0t600A0B8000299966059E4668CBD3d0 \
c0t600A0B8000299CCC06734741CD4Ed0
cannot replace c0t600A0B8000299966059E4668CBD3d0 with
c0t600A0B8000299CCC06734741CD4Ed0: cannot replace a replacing device
  
  -- 
  albert chin ([EMAIL PROTECTED])
  ___
  zfs-discuss mailing list
  zfs-discuss@opensolaris.org
  http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
 
 --
 Eric Schrock, FishWorkshttp://blogs.sun.com/eschrock
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
 
 

-- 
albert chin ([EMAIL PROTECTED])
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Indexing other than hash tables

2007-11-19 Thread James Cone
Hello All,

Mike Speirs at Sun in New Zealand pointed me toward you-all.  I have 
several sets of questions, so I plan to group them and send several emails.

This question is about the name/attribute mapping layer in ZFS.  In the 
last version of the source-code that I read, it provides hash-tables. 
They are a good way of finding an exact match for a name.

I'm from a database background, so I am thinking of the hash-tables as 
if they were an index to some data that could have been stored in a 
different form.

Are there any plans to provide the following searches efficiently:

   - matching prefixes
   - eg I give it a telephone number, and it gives me a list of file 
descriptors for the:
   - suburb
   - city
   - province/state/etc
   - country
   - hemisphere
 each of which matches fewer digits at the beginning of the number

   [[ general tries work for this ]]

   - upper and lower bounds
   - eg imagine fixed-length dates and times that are all digits;
 I give it a date and time, and it gives me the file descriptor 
for the latest date and time = the one I give it

   [[ tries can be made to do this, but B-variant-trees are better ]]

Regards,
James.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] reslivering

2007-11-19 Thread Tim Cook
After messing around... who knows what's going on with it now.  Finally 
rebooted because I was sick of it hanging.  After that, this is what it came 
back with:


root:= zpool status
  pool: fserv
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
 scrub: resilver in progress, 0.00% done, 87h56m to go
config:

NAMESTATE READ WRITE CKSUM
fserv   DEGRADED 0 0 0
  raidz1ONLINE   0 0 0
c4t0d0  ONLINE   0 0 0
c4t1d0  ONLINE   0 0 0
c4t2d0  ONLINE   0 0 0
c4t3d0  ONLINE   0 0 0
c4t4d0  ONLINE   0 0 0
  raidz1DEGRADED 0 0 0
c4t6d0  ONLINE   0 0 0
c4t7d0  ONLINE   0 0 0
replacing   DEGRADED 0 0 0
  12544952246745011915  FAULTED  0 0 0  was 
/dev/dsk/c4t5d0s0/old
  c4t5d0ONLINE   0 0 0





root:= zpool iostat -v
   capacity operationsbandwidth
pool used  avail   read  write   read  write
--  -  -  -  -  -  -
fserv990G  2.11T397 25  25.3M   101K
  raidz1 866G  1.42T201  1   804K  5.29K
c4t0d0  -  -133  1   533K  9.86K
c4t1d0  -  -133  1   544K  9.89K
c4t2d0  -  -133  1   541K  9.88K
c4t3d0  -  -133  1   535K  9.82K
c4t4d0  -  -132  1   525K  9.86K
  raidz1 124G   708G196 23  24.5M  95.6K
c4t6d0  -  -102 31  12.3M  84.1K
c4t7d0  -  -102 31  12.3M  83.8K
replacing   -  -  0 42  0  1.48M
  12544952246745011915  -  -  0  0  1.67K  0
  c4t5d0-  -  0 25  2.04K  1.50M
--  -  -  -  -  -  -
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS + DB + fragments

2007-11-19 Thread Richard Elling
James Cone wrote:
 Hello All,

 Here's a possibly-silly proposal from a non-expert.

 Summarising the problem:
- there's a conflict between small ZFS record size, for good random 
 update performance, and large ZFS record size for good sequential read 
 performance
   

Poor sequential read performance has not been quantified.

- COW probably makes that conflict worse

   

This needs to be proven with a reproducible, real-world workload before it
makes sense to try to solve it.  After all, if we cannot measure where 
we are,
how can we prove that we've improved?

Note: some block devices will not exhibit the phenomenon which people
seem to be worried about in this thread.  There are more options than just
re-architect ZFS.

I'm not saying there aren't situations where there may be a problem, I'm 
just
observing that nobody has brought data to this party.
 -- richard

- re-packing (~= defragmentation) would make it better, but cause 
 problems with the snapshot mechanism

 Proposed solution:
- keep COW

- create a new operation that combines snapshots and cloning

- when you're cloning, always write a tidy, re-packed layout of the data

- if you're using the new operation, keep the existing layout as the 
 clone, and give the new layout to the running file-system

 Things that have to be done to make this work:

- sort out the semantics, because the clone will be in the existing 
 zpool, and the file-system will move to a new zpool (not sure if I have 
 the terminology right)

- sort out the transactional properties; the changes made since the 
 start of the operation will have to be copied across into the new layout

 Regards,
 James.
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
   

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] reslivering

2007-11-19 Thread Tim Cook
That locked up pretty quickly as well, one more reboot and this is what I'm 
seeing now:

root:= zpool status
  pool: fserv
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
 scrub: resilver in progress, 1.81% done, 0h19m to go
config:

NAME  STATE READ WRITE CKSUM
fserv DEGRADED 0 0 0
  raidz1  ONLINE   0 0 0
c4t0d0ONLINE   0 0 0
c4t1d0ONLINE   0 0 0
c4t2d0ONLINE   0 0 0
c4t3d0ONLINE   0 0 0
c4t4d0ONLINE   0 0 0
  raidz1  DEGRADED 0 0 0
c4t6d0ONLINE   0 0 0
c4t7d0ONLINE   0 0 0
replacing DEGRADED 0 0 0
  c4t5d0s0/o  FAULTED  0 0 0  corrupted data
  c4t5d0  ONLINE   0 0 0

errors: No known data errors


The corrupted data seems a bit scary to me...
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool io to 6140 is really slow

2007-11-19 Thread Richard Elling
Asif Iqbal wrote:
 I have the following layout

 A 490 with 8 1.8Ghz and 16G mem. 6 6140s with 2 FC controllers using
 A1 anfd B1 controller port 4Gbps speed.
 Each controller has 2G NVRAM

 On 6140s I setup raid0 lun per SAS disks with 16K segment size.

 On 490 I created a zpool with 8 4+1 raidz1s

 I am getting zpool IO of only 125MB/s with zfs:zfs_nocacheflush = 1 in
 /etc/system

 Is there a way I can improve the performance. I like to get 1GB/sec IO.
   

I don't believe a V490 is capable of driving 1 GByte/s of I/O.
The V490 has two schizos and the schizo is not a full speed
bridge.  For more information see Section 1.2 of:
http://www.sun.com/processors/manuals/External_Schizo_PRM.pdf

 -- richard
 Currently each lun is setup as primary A1 and secondary B1 or vice versa

 I also have write cache eanble according to CAM

   

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS + DB + fragments

2007-11-19 Thread can you guess?
Regardless of the merit of the rest of your proposal, I think you have put your 
finger on the core of the problem:  aside from some apparent reluctance on the 
part of some of the ZFS developers to believe that any problem exists here at 
all (and leaving aside the additional monkey wrench that using RAID-Z here 
would introduce, because one could argue that files used in this manner are 
poor candidates for RAID-Z anyway hence that there's no need to consider 
reorganizing RAID-Z files), the *only* down-side (other than a small matter of 
coding) to defragmenting files in the background in ZFS is the impact that 
would have on run-time performance (which should be minimal if the 
defragmentation is performed at lower priority) and the impact it would have on 
the space consumed by a snapshot that existed while the defragmentation was 
being done.

One way to eliminate the latter would be simply not to reorganize while any 
snapshot (or clone) existed:  no worse than the situation today, and better 
whenever no snapshot or clone is present.  That would change the perceived 
'expense' of a snapshot, though, since you'd know you were potentially giving 
up some run-time performance whenever one existed - and it's easy to imagine 
installations which might otherwise like to run things such that a snapshot was 
*always* present.

Another approach would be just to accept any increased snapshot space overhead. 
 So many sequentially-accessed files are just written once and read-only 
thereafter that a lot of installations might not see any increased snapshot 
overhead at all.  Some files are never accessed sequentially (or done so only 
in situations where performance is unimportant), and if they could be marked 
Don't reorganize then they wouldn't contribute any increased snapshot 
overhead either.

One could introduce controls to limit the times when reorganization was done, 
though my inclination is to suspect that such additional knobs ought to be 
unnecessary.

One way to eliminate almost completely the overhead of the additional disk 
accesses consumed by background defragmentation would be to do it as part of 
the existing background scrubbing activity, but for actively-updated files one 
might want to defragment more often than one needed to scrub.

In any event, background defragmention should be a relatively easy feature to 
introduce and try out if suitable multi-block contiguous allocation mechanisms 
already exist to support ZFS's existing batch writes.  Use of ZIL to perform 
opportunistic defragmentation while updated data was still present in the cache 
might be a bit more complex, but could still be worth investigating.

- bill
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [storage-discuss] zpool io to 6140 is really slow

2007-11-19 Thread Asif Iqbal
On Nov 19, 2007 1:43 AM, Louwtjie Burger [EMAIL PROTECTED] wrote:
 On Nov 17, 2007 9:40 PM, Asif Iqbal [EMAIL PROTECTED] wrote:
  (Including storage-discuss)
 
  I have 6 6140s with 96 disks. Out of which 64 of them are Seagate
  ST337FC (300GB - 1 RPM FC-AL)

 Those disks are 2Gb disks, so the tray will operate at 2Gb.


That is still 256MB/s . I am getting about 194MB/s


  I created 16k seg size raid0 luns using single fcal disks. Then

 You showed the single disks as LUN's to the host... if I understand 
 correctly.

Yes


 Q: Why 16K?

To avoid segment crossing. It will mainly be used fro oracle db whose
block size is 16K


  created a zpool with 8 4+1 raidz1 using those luns, out of single

 What is the layout here? Inside 1 tray, over multiple trays?

Over multiple trays


  disks. Also set the zfs nocache flush to `1' to
  take advantage of the 2G NVRAM cache of the controllers.
 
  I am using one port per controller. Rest of them are down (not in
  use). Each controller port
  speed is 4Gbps.
 

 The 6140 is assymetric and as such the second controller will be
 available in fail-over mode, it is not actively used for load
 balancing.

So there is no way to create a aggreated channel off of both controllers?


 You need to hook up more FC links to the primary controller that has
 the active LUN's assigned, that is the only way to easily get more
 IOP's.

Adding a second loop by adding another non active port I may have to rebuild the
FS, no?

  All luns have one controller as primary and second one as secondary
 
  I am getting only 125MB/s according to the zpool IO.
 

 Seems a tad low, how are you testing?

  I should get ~ 512MB/s per IO.

 Hmmm, how did you get to this total? Keeping in mind that your tray is
 sitting at 2Gb and your extensions to the CSM trays are all single
 channel... you will get a 2Gb ceiling. Also have a look at

Even for the OS IO? So the controller nvram does not help increase the
IO for OS?

 http://en.wikipedia.org/wiki/Fibre_Channel#History

 At first glance and not knowing the exact setup I would say that you
 will not get more than 200MB/s (if that much).

I am gettin 194MB/s. Hmm my 490 has 16G memory. I really I could benefit some
from OS and controller RAM, atleast for Oracle IO


 Any reason why you are not using the RAID controller to do the work for you?

They are raid0 luns. So raid controller is in use. I get higher IO
from zpool off of raid0 luns
of single disks then raid5 type lun or raid0 among multilple disks as
one lun and then zpool
on top


  Also is it possible to get 2GB/s IO by using the leftover ports of the
  controllers?
 
  Is it also possible to get 4GB/s IO by aggregating the controllers (w/
  8 ports totat)?
 
 
 
  On Nov 16, 2007 5:30 PM, Asif Iqbal [EMAIL PROTECTED] wrote:
   I have the following layout
  
   A 490 with 8 1.8Ghz and 16G mem. 6 6140s with 2 FC controllers using
   A1 anfd B1 controller port 4Gbps speed.
   Each controller has 2G NVRAM
  
   On 6140s I setup raid0 lun per SAS disks with 16K segment size.
  
   On 490 I created a zpool with 8 4+1 raidz1s
  
   I am getting zpool IO of only 125MB/s with zfs:zfs_nocacheflush = 1 in
   /etc/system
  
   Is there a way I can improve the performance. I like to get 1GB/sec IO.
  
   Currently each lun is setup as primary A1 and secondary B1 or vice versa
  
   I also have write cache eanble according to CAM
  
   --
   Asif Iqbal
   PGP Key: 0xE62693C5 KeyServer: pgp.mit.edu
  
 
 
 
  --
  Asif Iqbal
  PGP Key: 0xE62693C5 KeyServer: pgp.mit.edu
  ___
  storage-discuss mailing list
  [EMAIL PROTECTED]
  http://mail.opensolaris.org/mailman/listinfo/storage-discuss
 




-- 
Asif Iqbal
PGP Key: 0xE62693C5 KeyServer: pgp.mit.edu
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss